Introduction
A large section of these data is handled by Relational database management systems (RDBMS). The idea of relational model came with E.F.Codd’s 1970 paper "A relational model of data for large shared data banks" which made data modeling and application programming much easier.
Traditional relation database follow the ACID Rules
A database transaction, must be atomic, consistent, isolated and durable. Below we have discussed these four points.
- Atomic : A transaction is a logical unit of work which must be either completed with all of its data modifications, or none of them is performed.
- Consistent : At the end of the transaction, all data must be left in a consistent state.
- Isolated : Modifications of data performed by a transaction must be independent of another transaction. Unless this happens, the outcome of a transaction may be erroneous.
- Durable : When the transaction is completed, effects of the modifications performed by the transaction must be permanent in the system.
Distributed Systems
A distributed system consists of multiple computers and software components that communicate through a computer network (a local network or by a wide area network). A distributed system can consist of any number of possible configurations, such as mainframes, workstations, personal computers, and so on.The computers interact with each other and share the resources of the system to achieve a common goal.
Advantages of Distributed Computing
- Reliability (fault tolerance)
- Scalability
- Sharing of Resources
- Flexibility
- Speed
- Open system
- Performance
Disadvantages of Distributed Computing
- Troubleshooting
- Software
- Networking
- Security
What is NoSQL ?
NoSQL is a non-relational database management systems, different from traditional relational database management systems in some significant ways. It is designed for distributed data stores where very large scale of data storing needs (for example Google or Twitter which collects terabits of data every day for their users). These type of data storing may not require fixed schema, avoid join operations and typically scale horizontally.
Brief history of NoSQL
The term NoSQL was coined by Carlo Strozzi in the year 1998. He used this term to name his Open Source, Light Weight, DataBase which did not have an SQL interface.
In the early 2009, when last.fm wanted to organize an event on open-source distributed databases, Eric Evans, a Rackspace employee, reused the term to refer databases which are non-relational, distributed, and does not conform to atomicity, consistency, isolation, durability - four obvious features of traditional relational database systems.
CAP Theorem
You must understand the CAP theorem when you talk about NoSQL databases
In theoretically it is impossible to fulfill all 3 requirements. CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements. Therefore all the current NoSQL database follow the different combinations of the C, A, P from the CAP theorem. Here is the brief description of three combinations CA, CP, AP :
Brief history of NoSQL
The term NoSQL was coined by Carlo Strozzi in the year 1998. He used this term to name his Open Source, Light Weight, DataBase which did not have an SQL interface.
In the early 2009, when last.fm wanted to organize an event on open-source distributed databases, Eric Evans, a Rackspace employee, reused the term to refer databases which are non-relational, distributed, and does not conform to atomicity, consistency, isolation, durability - four obvious features of traditional relational database systems.
CAP Theorem
You must understand the CAP theorem when you talk about NoSQL databases
- Consistency - This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data.
- Availability - This means that the system is always on (service guarantee availability), no downtime.
- Partition Tolerance - This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another.
In theoretically it is impossible to fulfill all 3 requirements. CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements. Therefore all the current NoSQL database follow the different combinations of the C, A, P from the CAP theorem. Here is the brief description of three combinations CA, CP, AP :
CA - Single site cluster, therefore all nodes are always in contact. When a partition occurs, the system blocks.
CP - Some data may not be accessible, but the rest is still consistent/accurate.
AP - System is still available under partitioning, but some of the data returned may be inaccurate.
NoSQL Categories
There are four general types (most common categories) of NoSQL databases. Each of these categories has its own specific attributes and limitations.
CP - Some data may not be accessible, but the rest is still consistent/accurate.
AP - System is still available under partitioning, but some of the data returned may be inaccurate.
NoSQL Categories
There are four general types (most common categories) of NoSQL databases. Each of these categories has its own specific attributes and limitations.
- Key-value stores ex : Hbase
- Column-oriented ex : cassandra
- Graph ex : neo4j
- Document oriented ex : mongodb
NoSQL vs. SQL Summary
SQL DATABASES
|
NOSQL DATABASES
|
|
Types
|
One type (SQL database) with minor
variations
|
Many different types including
key-value stores, document databases, wide-column stores, and graph databases
|
Development History
|
Developed in 1970s to deal with
first wave of data storage applications
|
Developed in 2000s to deal with
limitations of SQL databases, particularly concerning scale, replication and
unstructured data storage
|
Examples
|
MySQL, Postgres, Oracle Database
|
MongoDB, Cassandra, HBase, Neo4j
|
Schemas
|
Structure and data types are fixed
in advance. To store information about a new data item, the entire database
must be altered, during which time the database must be taken offline.
|
Typically dynamic. Records can add
new information on the fly, and unlike SQL table rows, dissimilar data can be
stored together as necessary. For some databases (e.g., wide-column stores),
it is somewhat more challenging to add new fields dynamically.
|
Scaling
|
Vertically, meaning a single server
must be made increasingly powerful in order to deal with increased demand. It
is possible to spread SQL databases over many servers, but significant
additional engineering is generally required.
|
Horizontally, meaning that to add
capacity, a database administrator can simply add more commodity servers or
cloud instances. The database automatically spreads data across servers as
necessary
|
Development Model
|
Mix of open-source (e.g., Postgres,
MySQL) and closed source (e.g., Oracle Database)
|
Open-source
|
Supports Transactions
|
Yes, updates can be configured to
complete entirely or not at all
|
In certain circumstances and at
certain levels (e.g., document level vs. database level)
|
Data Manipulation
|
Specific language using Select,
Insert, and Update statements, e.g. SELECT fields FROM table WHERE
|
Through object-oriented APIs
|
Consistency
|
Can be configured for strong
consistency
|
Depends on product. Some provide
strong consistency (e.g., MongoDB) whereas others offer eventual consistency
(e.g., Cassandra)
|
No comments:
Post a Comment