Seminar Topic Nosql

July 11th, 2010
Apples, Oranges and NOSQL

Roi Aldaag Architect & Consultant Nadav Wiener Architect & Consultant
Agenda
Introduction
What is NoSQL? Whats wrong with RDBMS? Why now?
Agenda
RDBMS vs. NoSQL
Scaling CAP Theorem ACID vs. BASE
Agenda
NoSQL Taxonomy
Key / Value Column Document Graph
Agenda
How to choose?
Comparing Apples to Oranges Polyglot Persistence
Introduction
Introduction
Question: What do they all have in common?
Introduction
Before we answer some facts:
Introduction
Before we answer some facts:
Daily Page Views Daily Visitors Data size
7.8x109 620x106 Petabytes
7.1x109 500x106 Petabytes
550x106 56x106 Petabytes
350x106 37x106 Terabytes
82x106 12x106 Terabytes
July, 2010: http://www.alexa.com
10
Introduction
Answer: They use NoSQL data stores
11
Introduction
Why!?
12
Introduction
Relational DBs Have Scaling Limitations
ACID doesnt scale well horizontally
Sharding breaks relations Joins are inefficient
Transactions overhead Schema is not flexible

Predfined Hard to evolve
13
Introduction
What is NoSQL?
NO SQL / Not Only SQL A collective description of Open Source, Non-relational, data stores
Highly distributed Highly scalable Not ACID and... doesnt use SQL
Term coined in a convention in 2009 called NoSQL (Eric Evans) Started a movement that is gaining momentum
14
Introduction
15
Introduction
Why now?
NoSQL data stores predate RDBMS (1970)
But remained a niche
RDBMS most popular and generic option Web 2.0 introduced new requirements:
Exponential increase in data Information connectivity Semi-structured data
NoSQL data stores had answers

When time was right When RDBMSs didnt
16
Introduction
Its theory time:
17
Sc
ali
ng
18
Scaling
Scaling Up
Adding resources to a single node in a system
Add more CPUs or memory
Move system to a larger machine Pros:

Quick and Simple
Cons:
Outgrowing the capacity of largest system available (Mores law) Expensive Creates vendor lock-in
19
Scaling
Scaling Out
Add more nodes to a system Functional Scaling (vertical)
Grouping data by function and spreading functional groups across databases
Sharding (horizontal)
Splitting same functional data across multiple databases
Pros: More flexible

Cons: More complex
20
Distributed Databases
Many nodes Same database
Node 1
Node 2
Node 3
22
What are the requirements from distributed databases?
Consistency
All clients can see the same data
Availability
All clients can always access data
Partition tolerance
The ability to continue working when the network topology is broken The ability to recover once the network is healed
23
CAP Theorem (E. Brewer, N. Lynch)
You can fully satisfy at most 2 out of 3
Compromise on 3rd
Not all or nothing

Choose various levels of consistency, availability or partition tolerance
Recognize which of the CAP rules your business needs for the task
24
CA: Consistency & Availability
Partition Tolerance is compromised Single site clusters (easier to ensure all nodes are always in contact) When a network partition occurs, the system blocks e.g. Two Phase Commit (2PC)
Partition Tolerance 25
CP: Consistency & Partitioning
Availability is compromised Access to some data may be temporarily limited The rest is still consistent/accurate e.g. Sharded database TBD sample
AP: Availability & Partitioning
Consistency is compromised System is still available under partitioning Some data returned may be temporarily not up-to-date Requires conflict resolution strategy e.g. DNS, caches, Master/Slave replication TBD sample
ACID vs. BASE
ACID vs. BASE

ACID a quick recap
Atomicity
When a part of the transaction fails -> the entire transaction fails; Database state is left unchanged
Consistency
A transaction takes database from one consistent state to another
Isolation
A transaction can't see dirty state from other transactions
Durability
Commit means commit.
29
ACID vs. BASE

BASE
The CAP compliment of ACID
Just had to be called BASE Backronym:
Basically Available Soft State Eventually Consistent
30
ACID vs. BASE

RDBMS & ACID / NoSQL & BASE
RDBMSs strive to provide ACID guarantees
ACID forces consistency
NoSQL solutions often scale through BASE

BASE accepts that conflicts will happen
31
Taxonomy
Taxonomy
Key / Value Column
XML
Graph
Document
TXT
BIN
33
Taxonomy
Key / Value Databases
34
Taxonomy
Key/Value Stores
Simple Key / Value lookups (DHT) Value is opaque Focus on scaling to huge amounts of data Designed to handle massive load E.g.
Riak Project Voldemort Redis
Based on Amazons Dynamo paper
35
Taxonomy
Key/Value e.g.: Riak
No single point of failure No machines are special or central MapReduce queries (Erlang / Javascript) HTTP/JSON API Ring cluster with automatic replication Elastic / partition rebalancing
Written in: Erlang, C, Javascript Developed by: Basho Technologies Java client: (jonjlee / riak-java-client)
36

Data Model
Key / Value pairs are stored in a Bucket A Bucket ~ a namespace
Versioning
Each update is tracked by a Vector Clock
An algorithm for determining ordering and detecting conflicts
When in conflict
Last wins / manual resolution
37

Example: REST API
Read an object
GET /riak/bucket/key
Store a new object

POST /riak/bucket
Store an object with existing key (update)

PUT /riak/bucket/key
38

MapReduce
A framework supporting distributed computing on large data sets on clusters of machines Leverage parallel processing power Introduced by Google Inspired by map / reduce functions in functional programming Map step Reduce step
39

MapReduce example: Inverted Index
Map Parse each document Emit a sequence of <word, doc_id> pairs
<doc_id, doc_text>
<100, <200, <300,
TXT1
>, >, >
Node 1 Node 2 Node 3
<word ,doc_id> < word1 ,100>, < word2 ,100>, < < word2 ,200>, word2 ,300>
40
TXT2
TXT3

MapReduce example: Inverted Index
Reduce Accept all pairs for a given word Sort the corresponding document IDs Emit a <word, list(document ID)> pair
<word, < word1 < word2 < word3 list(document_id)> ,(100) >, ,(100,200)>, ,(300) >
41
Taxonomy
BigTable and Column Oriented Databases
42
Taxonomy
Column Stores BigTable derivatives
Conceptually a single, infinitely large table Each rows can have different number of columns Table is sparse: |rows|*|columns| > |values | Based on Googles BigTable paper E.g. Cassandra Hbase Hypertable
43
Taxonomy
Use Case: Manage products with diverse attributes
RDBMS: Create a central table with common attributes Create a table per product with unique attributes Use a join query Alternatively create a table that holds meta data on products NoSQL: Column oriented database Use arbitrarily columns
44
Taxonomy
Column Store e.g.: Cassandra
Data model: Googles BigTable Infrastructure: Amazon Dynamo Incremental scalability Flexible schema No single point of failure (Distributed P2P) Optimistic replication (Gossip protocol) Written in: Java Developed by: Facebook Java client: e.g. Hector / Thrift
45
Column e.g.: Cassandra

Data Model
Column
Smallest increment of data: tuple of name, value, timestamp {
name: "emailAddress", value: nosql@alphacsp.com", timestamp: 123456789

}
46

SuperColumn A sorted, associative, unbounded array of columns
{ // this is a SuperColumn name: "homeAddress", // with an unbounded array of Columns value: { // the keys is the name of the Column street: {name: "street", value: "s", timestamp:...}, city: {name: "city", value: "c", timestamp:...}, zip: {name: "zip", value: "z", timestamp:...} } } 47

ColumnFamily A container (~Table) for columns sorted by their names Column Families are referenced and sorted by row keys
Users = { // ColumnFamily john: { // key to row in CF "role" : "admin", "status" : "offline", "nick" : "dude1934" }, // end row fred: { // another row "nick" : freddy", "email" :"fred@example.com", "age" : "25", "gender" : "male", }, // more rows }
Column Family 48

Keyspace The outer most grouping of data (~DB Schema) Contains ColumnFamilys There is no imposed relationship between ColumsFamilys
49

Example
Tweets CF
Keyspace
Timeline CF
50
Taxonomy
XML
TXT
Document Oriented Databases
BIN
51
Taxonomy
Document Store
Store semi-structured documents (think JSON) Document versioning Map/Reduce based queries, sorting, aggregation, etc. DB is aware of internal structure E.g.
MongoDB CouchDB JackRabbit (JCR JSR 170)
52
Taxonomy
Use Case: Blog with tagged posts and comments
RDBMS: Table for each: posts, comments, tags Foreign relations NoSQL: Document storage Store post + tags + comments as a document
53
Taxonomy
Document Store e.g: MongoDB
MongoDB (from "humongous") Manages collections of JSON-like documents (BSON) Queries can return specific fields of documents Supports secondary indexes Atomic operations on single documents
Developed by: 10gen Written in: C++ Clients: Java, Scala and more
54
Docment e.g.: MongoDB

Example: Blog posts
Suppose you host a blog, where each post is tagged:
db.posts.save({ _id : 3, author:"john", title : Apples, Oranges and NOSQL", text : This article will", tags : [ database", nosql" ] });
Notice how posts have an array of tags

55

MongoDB supports secondary indexes and a query optimizer
Compound indexes are also supported
db.posts.ensureIndex({ tags: 1 }); db.posts.ensureIndex({ author: 1}); db.posts.find({ author: "john", tags: "nosql" }); // Result: { "_id" "author" "title" "text" "tags" }
: : : : :
3, "john", "Apples, Oranges and NOSQL", "This article will", ["database", "nosql", "mongodb" ]
56

Let's update our posts to include some comments:
db.posts.update({ _id: 3 }, { $inc: { comments_count: 4}, $pushAll : { comments: [ { text: Comment 1" }, { text: Comment 2", author: "Mr. T" }, { text: Comment 3" }, { text: Comment 4" } ] } });
57
Taxonomy
Graph Databases
58
Taxonomy
Graph databases
Inspired by mathematical graph theory G=(E,V) Models the structure of data Navigational data model Scalability / data complexity Data model: Key-Value pairs on Edges / Nodes Relationships: Edges between Nodes E.g.
Neo4j Pregel (Googles PageRank) AllegroGraph
59
Taxonomy
Use Case: Connected data - deep relationship links between users in a social network
RDBMS Complex recursive algorithm Multiple Self joins Round trips to DB / bulk read and resolve in RAM NoSQL: Graph Storage Network traversal
60
Taxonomy
Graph e.g.: Neo4J
High-performance graph engine Embedded / disk based Work with OO model: nodes, relationships, properties ACID Transactions
JTA support participate in 2PC with your RDBMS
Developed by: Neo Technologies Written in: Java Clients: Java, client libraries in other platforms
61
Graph e.g.: Neo4j
http://neo4j.org/
62
Comparing Apples to Oranges

Comparing Data Structures
RDBMS Databases contains tables, columns and rows All rows the same structure Inherent ORM mismatch NoSQL Choose your data structure Data is stored in natural structure (e.g. Documents, Graphs, Objects)
64

Comparing Schema Flexibility
RDBMS Strict schema, difficult to evolve Maintains relations and forces data integrity NoSQL Structure of data can be changed dynamically e.g. Column stores Cassandra Data can sometimes be completely opaque e.g Key/Value Project Voldemort
65

Comparing Normalization & Relations
RDBMS The data model is normalized to remove data duplication Normalization establishes table relations NoSQL Denormalization is not a dirty word Relations are not explicitly defined Related data is usually grouped and stored as one unit E.g. document, column
66

Comparing Data Acces
RDBMS CRUD operations using SQL Access data from multiple tables using SQL joins Generic API such as JDBC NoSQL Proprietary API and DSLs (e.g. Pig / Hive / Gremlin) MapReduce, graph traversals REST APIs, portable serialization formats
BSON, JSON, Apache Thrift, Memcached
67

Comparing Reporting Capabilities
RDBMS Slice and Dice data, then reassemble any way you like NoSQL Hard to repurpose data for ad-hoc usage Plan ahead Think in advance How and what you store Data access patterns
68
Summary
Summary
Why NOSQL / BASE
ACID ruled exclusively in the last 40 years
doesnt compromise on consistency
Database industry neglected distributed DBs w/ availability Vacuum was filled with NoSQL BASE architectures
Strict A and P, minimize C compromise
Relational databases are now trying to catch up
70
Summary
NoSQL Limitations
Missing some query capabilities
joins / composite transaction
Eventual consistency -- not for every problem Not a drop in replacement for RDBMS on ACID No standardization -> product lock-in Relatively immature (support, bugs, community)
71
Summary
Choose the right tool for the job
Relational databases and NoSQL databases are designed to meet different needs RDBMS-only should not be a default NOSQL databases outperform RDBMSs in their particular niche No one size fits all / Silver bullet ...but you dont have to choose one
72
Summary
Polyglot Persistence
Poly: many Glot: language Meshing up persistence mechanisms to best meet requirements Good integration stories:
E.g. Neo4j + JDBC using JTA
73

Seminar Topic Nosql

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Seminar Topic Nosql

Enviado por

Direitos autorais:

Formatos disponíveis

July 11th, 2010

Apples, Oranges and NOSQL

Daily Page Views Daily Visitors Data size

7.8x109 620x106 Petabytes

7.1x109 500x106 Petabytes

550x106 56x106 Petabytes

350x106 37x106 Terabytes

82x106 12x106 Terabytes

July, 2010: http://www.alexa.com

Transactions overhead Schema is not flexible

NoSQL data stores had answers

Move system to a larger machine Pros:

Pros: More flexible

Many nodes Same database

Not all or nothing

ACID vs. BASE

ACID vs. BASE

ACID vs. BASE

Basically Available Soft State Eventually Consistent

ACID vs. BASE

NoSQL solutions often scale through BASE

Key / Value Databases

Key/Value e.g.: Riak

Key/Value e.g.: Riak

Store a new object

Store an object with existing key (update)

Key/Value e.g.: Riak

Key/Value e.g.: Riak

>, >, >

Node 1 Node 2 Node 3

Key/Value e.g.: Riak

BigTable and Column Oriented Databases

Column e.g.: Cassandra

name: "emailAddress", value: nosql@alphacsp.com", timestamp: 123456789

Column e.g.: Cassandra

Column e.g.: Cassandra

Column e.g.: Cassandra

Column e.g.: Cassandra

Document Oriented Databases

Docment e.g.: MongoDB

Notice how posts have an array of tags

Docment e.g.: MongoDB

Docment e.g.: MongoDB

Graph e.g.: Neo4j

Comparing Apples to Oranges

Comparing Apples to Oranges

Comparing Apples to Oranges

Comparing Apples to Oranges

Comparing Apples to Oranges

Comparing Apples to Oranges

Relational databases are now trying to catch up

Você também pode gostar