"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

May 26, 2012

NOSQL Basics

[You may also like - NOSQL - can it replace RDBMS Databases]
Deep Dive is very important to understand the basics / fundamentals of a product design. I have explored couple of NOSQL database products. Based on readings from blogs / papers. I have tried to document the underlying fundamentals behind NOSQL Databases
Tip #1 - NOSQL Stands for “Not Only SQL”


Tip #2 - ACID properites - What it is all about ?
From Earlier Post - ACID Properties short RECAP
  • Atomicity - Transaction is one unit of work. All or None. Either all of its data modifications are performed or none of them are performed
  • Consistency - Transaction must leave database in consistent state. Maintain data integrity. Governing Data Structures for Indexes/Storage must be in correct state. If a Transaction violates a constraint it must be failed.
  • Isolation - Keep Transaction Seperate. Concurrency is governed by Isolation levels.
  • Durability - Incase of System failures changes persisit and can be recovered on abnormal termination
Tip #3 - What is CAP Theoram. In Every NOSQL White paper there is reference to CAP theorem.

CAP stands for consistency, availability and partition tolerance 

Short and easy summary of it I found from link
  • Consistency - Consistent (Latest) Data Reflected querying any server in Distributed Environment
  • Availability - Data Returned from Server irrespective it is latest / last updated data
  • Partition Tolerance - system is available even if individual nodes are down
As per CAP Theorem only two parameters can be targeted for complete support. To Summarize it
  • As per CAP Theorem, RDBMS targets Consistency & Parition Tolerance
  • NOSQL targets Availability and Partition Tolerance
Tip #4 - What is MVCC ? While working on NOSQL DB, I noticed MVCC for versioning / managing locks.  
  
MVCC refers to Multiversion concurrency control. MVCC Managing providing latest commited updates for read transactions by versioning. Here with versions present Reads will not block writes. MSSQL 2005 onwards we have Snapshot isolation feature. This is also based on versioning concept. Reposting my notes on how snapshot isolation is achieved in MSSQL
READ COMMITTED SNAPSHOT using Row Versioning in Microsoft SQL Server 2005 onwards (Applicable for 2008, 2012..) 
a. How it works - A new data snapshot is taken and remains consistent for each statement
  until the statement finishes execution.   uses a version store and reads the data from the version store.  

b. How it solves the concurrency issues  

  • SELECT statements do not lock data during a read operation  (readers do not block writers, and vice versa).  
c. Performance Advantages 
  • SELECT statements can access the last committed value of the row, while other transactions are updating the row without getting blocked
  • Reduces disk contention on the data files. Reducing locking resources, readers does not block writers. No more deadlocks involving readers and writers.  
d. Resource usage and overhead
  •  Row versioning increases resource usage  during data modification as row versions are maintained in tempdb. tempdb growth, contention. Additional memory usage.
Tip #5 - Below are common list of features implemented by NOSQL Databases and advantages of it 
  • NO Schema Reqd - Data Types need not be defined
  • CouchDB also uses MVCC for managing versions of data (Good Read Link )
  • Auto Sharding - Spread data across servers to scale out
  • Support for Replication
Still I have a long way to go to understand NOSQL, I am planning to explore NOSQL Database Architecture in detail in coming posts.
More Reads - NOSQL Patterns

Happy Learning!!!!

No comments: