Saturday, 8 November 2014

NoSQL Introduction

Introduction

A large section of these data is handled by Relational database management systems (RDBMS). The idea of relational model came with E.F.Codd’s 1970 paper "A relational model of data for large shared data banks" which made data modeling and application programming much easier.

Traditional relation database follow the ACID Rules

A database transaction, must be atomic, consistent, isolated and durable. Below we have discussed these four points.
  • Atomic : A transaction is a logical unit of work which must be either completed with all of its data modifications, or none of them is performed.
  • Consistent : At the end of the transaction, all data must be left in a consistent state.
  • Isolated : Modifications of data performed by a transaction must be independent of another transaction. Unless this happens, the outcome of a transaction may be erroneous.
  • Durable : When the transaction is completed, effects of the modifications performed by the transaction must be permanent in the system.



Distributed Systems

A distributed system consists of multiple computers and software components that communicate through a computer network (a local network or by a wide area network). A distributed system can consist of any number of possible configurations, such as mainframes, workstations, personal computers, and so on.The computers interact with each other and share the resources of the system to achieve a common goal.

Advantages of Distributed Computing

  • Reliability (fault tolerance)
  • Scalability
  • Sharing of Resources
  • Flexibility
  • Speed
  • Open system
  • Performance
Disadvantages of Distributed Computing



  • Troubleshooting
  • Software
  • Networking
  • Security
What is NoSQL ?



NoSQL is a non-relational database management systems, different from traditional relational database management systems in some significant ways. It is designed for distributed data stores where very large scale of data storing needs (for example Google or Twitter which collects terabits of data every day for their users). These type of data storing may not require fixed schema, avoid join operations and typically scale horizontally.

Brief history of NoSQL


The term NoSQL was coined by Carlo Strozzi in the year 1998. He used this term to name his Open Source, Light Weight, DataBase which did not have an SQL interface.


In the early 2009, when last.fm wanted to organize an event on open-source distributed databases, Eric Evans, a Rackspace employee, reused the term to refer databases which are non-relational, distributed, and does not conform to atomicity, consistency, isolation, durability - four obvious features of traditional relational database systems.


CAP Theorem 


You must understand the CAP theorem when you talk about NoSQL databases




  • Consistency - This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data.
  • Availability - This means that the system is always on (service guarantee availability), no downtime.
  • Partition Tolerance - This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another.


In theoretically it is impossible to fulfill all 3 requirements. CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements. Therefore all the current NoSQL database follow the different combinations of the C, A, P from the CAP theorem. Here is the brief description of three combinations CA, CP, AP :

CA - Single site cluster, therefore all nodes are always in contact. When a partition occurs, the system blocks.
CP - Some data may not be accessible, but the rest is still consistent/accurate.

AP - System is still available under partitioning, but some of the data returned may be inaccurate.

NoSQL Categories

There are four general types (most common categories) of NoSQL databases. Each of these categories has its own specific attributes and limitations.
  • Key-value stores  ex : Hbase
  • Column-oriented  ex :  cassandra
  • Graph       ex : neo4j
  • Document oriented  ex : mongodb

NoSQL vs. SQL Summary


SQL DATABASES
NOSQL DATABASES
Types
One type (SQL database) with minor variations
Many different types including key-value stores, document databases, wide-column stores, and graph databases
Development History
Developed in 1970s to deal with first wave of data storage applications
Developed in 2000s to deal with limitations of SQL databases, particularly concerning scale, replication and unstructured data storage
Examples         
MySQL, Postgres, Oracle Database
MongoDB, Cassandra, HBase, Neo4j
Schemas
Structure and data types are fixed in advance. To store information about a new data item, the entire database must be altered, during which time the database must be taken offline.
Typically dynamic. Records can add new information on the fly, and unlike SQL table rows, dissimilar data can be stored together as necessary. For some databases (e.g., wide-column stores), it is somewhat more challenging to add new fields dynamically.
Scaling
Vertically, meaning a single server must be made increasingly powerful in order to deal with increased demand. It is possible to spread SQL databases over many servers, but significant additional engineering is generally required.
Horizontally, meaning that to add capacity, a database administrator can simply add more commodity servers or cloud instances. The database automatically spreads data across servers as necessary
Development Model
Mix of open-source (e.g., Postgres, MySQL) and closed source (e.g., Oracle Database)
Open-source
Supports Transactions
Yes, updates can be configured to complete entirely or not at all
In certain circumstances and at certain levels (e.g., document level vs. database level)
Data Manipulation
Specific language using Select, Insert, and Update statements, e.g. SELECT fields FROM table WHERE
Through object-oriented APIs
Consistency
Can be configured for strong consistency
Depends on product. Some provide strong consistency (e.g., MongoDB) whereas others offer eventual consistency (e.g., Cassandra)


No comments:

Post a Comment