SlideShare ist ein Scribd-Unternehmen logo
1 von 40
NOSQL
Agenda
 Introduction to NOSQL
 Objective
 Examples of NOSQL databases
 NOSQL vs SQL
 Conclusion
Basic Concepts

 Database – is a organized collection of data.
 Data base Management System (DBMS)- is a software
  package with computer program that controls the
  creation , maintainance & use of a database.
     for DBMS , we use structured language to interact with it
     Ex. Oracle , IBM DB2 , Ms Access , MySQL , FoxPro etc.
 Relational DBMS - A relational database is a
  collection of data items organized as a set of formally
  described tables from which data can be accessed easily.
  A relational database is created using the relational
  model. The software used in a relational database is
  called a relational database management
  system (RDBMS).
SQL

 Stuctured Query Language
 Special purpose programming language designed for
    managing data in RDBMS.
   Origininally based upon relational algebra & tuple relation
    calculas.
   SQl’s scope include data insert,upadte & delete, schema
    creation and modification , data access control.
   It is static and strong used in database.
   Most used widely used database language.
   Query is the most important operation in SQL.
   Ex. SELECT *
         FROM Book
         WHERE price > 100.00
         ORDER BY title;
NOSQL

 Stands for Not Only SQL
 Class of non-relational data storage systems
 Usually do not require a fixed table schema nor do
  they use the concept of joins
 All NOSQL offerings relax one or more of the ACID
  properties .
    Atomicity , Consistancy , Isolation , Durability ( ACID )
 “NOSQL” = “Not Only SQL” =
       Not Only using traditional relational DBMS
NOSQL

•   Alternative to traditional relational DBMS
    •   Flexible schema
    •   Quicker/cheaper to set up
    •   Massive scalability
    •   Relaxed consistency higher performance &
        availability

    * No declarative query language more programming
    * Relaxed consistency fewer guarantees
Why NOSQL?


 Every problem cannot be solved by traditional
    relational database system exclusively.
   Handles huge databases.
   Redundancy, data is pretty safe on commodity
    hardware
   Super flexible queries using map/reduce
   Rapid development (no fixed schema, yeah!)
   Very fast for common use cases
Contd..


 Inspired by Distributed Data Storage problems
 Scale easily by adding servers
 Not suited to all problem types, but super-suited to
  certain large problem types
 High-write situations (eg activity tracking or timeline
  rendering for millions of users)
 A lot of relational uses are really dumbed down (eg
  fetch by PK with update)
Architecture
How does it work?

 Clients know how to:
  Send items to servers (consistent hashing)
  What to do when a server fails
  How to fetch keys from servers
  Can “weigh” to server capacities

 Servers know how to:
  Store items they receive
  Expire them from the cache
  No inter-server comms – everything is unaware
Performance

 RDBMS uses buffer to ensure ACID properties
 NoSQL does not guarantee ACID and is therefore
  much faster
 We don’t need ACID everywhere!
 Ex. Data processing (every minute) is 4x faster with
  MongoDB, despite being a lot more detailed (due to
  much simple development)
Why NOSQL is faster than SQL ? - Scalling

 Simple web application with not much traffic
   Application server, database server all on one machine
Scalling contd..

 More traffic comes in
   Application server

   Database server




 Even more traffic comes in
   Load balancer

   Application server x2

   Database server
Scalling contd..


 Even more traffic comes in
     Load balancer x N
       easy
     Application server x N
       easy
     Database server xN
       hard for SQL databases
SQL Slowdown




 Not linear!
Scalling contd..


 NoSQL Scalling -
 Need more storage?
   Add more servers!

 Need higher performance?
   Add more servers!

 Need better reliability?
   Add more servers!
Scalling Summary

 You can scale SQL databases (Oracle, MySQL, SQL
  Server…)
     This will cost you dearly
     If you don’t have a lot of money, you will reach limits quickly
 You can scale NoSQL databases
   Very easy horizontal scaling

   Lots of open-source solutions

   Scaling is one of the basic incentives for design, so it is well
    handled
   Scaling is the cause of trade-offs causing you to have to use
    map/reduce
Characterstics

 Almost infinite horizontal scaling
 Very fast
 Performance doesn’t deteriorate with growth (much)
 No fixed table schemas
 No join operations
 Ad-hoc queries difficult or impossible
 Structured storage
 Almost everything happens in RAM
NOSQL Types


 Wide Column Store / Column Families
 Document Store
 Key Value / Tuple Store
 Graph Databases
 Object Databases
 XML Databases
 Multivalue Databases
Main types -

 Key-Value Stores
 Map Reduce Framework
 Document Databases
 Graph Databases
Key Value Stores

 Lineage: Amazon's Dynamo paper and Distributed
  HashTables.
 Data model: A global collection of key-value pairs
 Example systems
   Google BigTable , Amazon Dynamo, Cassandra,
     Voldemort , Hbase , …
 Implementation: efficiency, scalability, fault-tolerance
   Records distributed to nodes based on key
   Replication

   Single-record transactions, “eventual consistency”
Documented Databases

 Lineage: Inspired by Lotus Notes.
 Data model: Collections of documents, which
  contain key-value collections (called "documents").
 Example: CouchDB, MongoDB, Riak
Graph Database

 Lineage: Draws from Euler and graph theory.
 Data model: Nodes & relationships, both which can
  hold key-value pairs
 Example: AllegroGraph, InfoGrid, Neo4j
Map Reduce Framework

 Google’s framework for processing highly
  distributable problems across huge datasets
  using a large number of computers
 Let’s define large number of computers
    Cluster if all of them have same hardware
    Grid unless Cluster (if !Cluster for old-style programmers)
 Process split into two phases
   Map
      Take the input, partition it delegate to other machines
      Other machines can repeat the process, leading to tree structure
      Each machine returns results to the machine who gave it the task
Map Reduce Framework contd..

   Reduce
     collect results from machines you gave the tasks
     combine results and return it to requester

   Slower than sequential data processing, but massively parallel
   Sort petabyte of data in a few hours
   Input, Map, Shuffle, Reduce, Output
Popular NoSQL


 Hadoop / Hbase       MemcacheDB
 Cassandra            Voldemort
 Amazon               Hypertable
  SimpleDB             Cloudata
 MongoDB              IBM
 CouchDB              Lotus/Domino
 Redis
Real World Use

 Cassandra
   Facebook (original developer, used it till late 2010)
   Twitter
   Digg
   Reddit
   Rackspace
   Cisco

 BigTable
   Google (open-source version is HBase)

 MongoDB
   Foursquare
   Craigslist
   Bit.ly
   SourceForge
   GitHub
MONGODB

  Document store
  Basic support for dynamic (ad hoc) queries
  Query by example (nice!)




 Conditional Operators
    <, <=, >, >=
    $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $and, $si
     ze, $type
MONGODB

 Data is stored as BSON (binary JSON)
     Makes it very well suited for languages with native JSON support
 Map/Reduce written in Javascript
     Slow! There is one single thread of execution in Javascript
 Master/slave replication (auto failover with replica sets)
 Sharding built-in
 Uses memory mapped files for data storage
 Performance over features
 On 32bit systems, limited to ~2.5Gb
 An empty database takes up 192Mb
 GridFS to store big data + metadata (not actually an FS)
CASANDRA

 Written in: Java
 Protocol: Custom, binary (Thrift)
 Tunable trade-offs for distribution and replication
  (N, R, W)
 Querying by column, range of keys
 BigTable-like features: columns, column families
 Writes are much faster than reads (!)
    Constant write time regardless of database size
 Map/reduce possible with Apache Hadoop
Some more info about Cassndra in Facebook

 Cassandra is open source DBMS from Appache
  software foundation.
 Cassandra provides a structured key-value
  store with tunable consistency
 Cassandra is a distributed storage system for
  managing structured data that is designed to scale to
  a very large size across many commodity
  servers, with no single point of failure
 It is a NoSQL solution that was initially developed
  by Facebook and powered their Inbox Search feature
  until late 2010
HBASE

 Written in: Java
 Main point: Billions of rows X millions of columns
 Modeled after BigTable
 Map/reduce with Hadoop
 Query predicate push down via server side scan and get filters
 Optimizations for real time queries
 A high performance Thrift gateway
 HTTP supports XML, Protobuf, and binary
 Cascading, hive, and pig source and sink modules
 No single point of failure
 While Hadoop streams data efficiently, it has overhead for
  starting map/reduce jobs. HBase is column oriented
  key/value store and allows for low latency read and writes.
 Random access performance is like MySQL
COUCHDB

 Written in: Erlang
 Main point: DB consistency, ease of use
 Bi-directional (!) replication, continuous or ad-hoc, with conflict
    detection, thus, master-master replication. (!)
   MVCC - write operations do not block reads
   Previous versions of documents are available
   Crash-only (reliable) design
   Needs compacting from time to time
   Views: embedded map/reduce
   Formatting views: lists & shows
   Server-side document validation possible
   Authentication possible
   Real-time updates via _changes (!)
   Attachment handling
   CouchApps (standalone JS apps)
HADOOP

 Apache project
 A framework that allows for the distributed processing of
    large data sets across clusters of computers
   Designed to scale up from single servers to thousands of
    machines
   Designed to detect and handle failures at the application
    layer, instead of relying on hardware for it
   Created by Doug Cutting, who named it after his son's toy
    elephant
   Hadoop subprojects
       Cassandra
       HBase
       Pig
   Hive was a Hadoop subproject, but is now a top-level Apache project
HADOOP contd..

 Scales to hundreds or thousands of computers, each with several
    processor cores
   Designed to efficiently distribute large amounts of work across a
    set of machines
   Hundreds of gigabytes of data constitute the low end of Hadoop-
    scale
   Built to process "web-scale" data on the order of hundreds of
    gigabytes to terabytes or petabytes
   Uses Java, but allows streaming so other languages can easily
    send and accept data items to/from Hadoop
HADOOP contd..

 Uses distributed file system (HDFS)
   Designed to hold very large amounts of data (terabytes or even
    petabytes)
   Files are stored in a redundant fashion across multiple
    machines to ensure their durability to failure and high
    availability to very parallel applications
   Data organized into directories and files

   Files are divided into block (64MB by default) and distributed
    across nodes
 Design of HDFS is based on the design of the Google
  File System
HIVE

 A petabyte-scale data warehouse system for Hadoop
 Easy data summarization, ad-hoc queries
 Query the data using a SQL-like language called
  HiveQL
 Hive compiler generates map-reduce jobs for most
  queries
Conclusion

 NoSQL is a great problem solver if you need it
 Choose your NoSQL platform carefully as each is
  designed for specific purpose
 Get used to Map/Reduce
 It’s not a sin to use NoSQL alongside (yes)SQL
  database
Referance

 http://www.facebook.com/note.php?note_id=24413
    138919
   http://en.wikipedia.org/wiki/Apache_Cassandra
   http://en.wikipedia.org/wiki/SQL
   http://en.wikipedia.org/wiki/NoSQL
   www.slideshare.com
THANK
YOU..!!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
NoSql
NoSqlNoSql
NoSql
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Dax & sql in power bi
Dax & sql in power biDax & sql in power bi
Dax & sql in power bi
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
Google Big Table
Google Big TableGoogle Big Table
Google Big Table
 
Relational vs Non Relational Databases
Relational vs Non Relational DatabasesRelational vs Non Relational Databases
Relational vs Non Relational Databases
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
 
Physical architecture of sql server
Physical architecture of sql serverPhysical architecture of sql server
Physical architecture of sql server
 

Andere mochten auch

Smart quill seminar report final
Smart quill seminar report finalSmart quill seminar report final
Smart quill seminar report final
Pramod Kumar
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 
Jini network technology
Jini  network   technologyJini  network   technology
Jini network technology
Keerthi Thomas
 

Andere mochten auch (20)

NoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarNoSQL - 05March2014 Seminar
NoSQL - 05March2014 Seminar
 
Smart quill seminar report final
Smart quill seminar report finalSmart quill seminar report final
Smart quill seminar report final
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
 
Final ppt
Final pptFinal ppt
Final ppt
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Alpha compositing computer technology
Alpha compositing computer technologyAlpha compositing computer technology
Alpha compositing computer technology
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
 
Jini network technology
Jini  network   technologyJini  network   technology
Jini network technology
 
PRESENTATION ON MIRROR LINK
PRESENTATION ON MIRROR LINKPRESENTATION ON MIRROR LINK
PRESENTATION ON MIRROR LINK
 
Dna ppt
Dna pptDna ppt
Dna ppt
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 
E paper
E paperE paper
E paper
 
smart quill pen
smart quill pensmart quill pen
smart quill pen
 
Proyecto cine
Proyecto cineProyecto cine
Proyecto cine
 
Presentation_NEW.PPTX
Presentation_NEW.PPTXPresentation_NEW.PPTX
Presentation_NEW.PPTX
 
The Most effective models for Customer Support Operations
The Most effective models for Customer Support OperationsThe Most effective models for Customer Support Operations
The Most effective models for Customer Support Operations
 
Retail Idea
Retail IdeaRetail Idea
Retail Idea
 

Ähnlich wie Nosql seminar

DynamoDB Gluecon 2012
DynamoDB Gluecon 2012DynamoDB Gluecon 2012
DynamoDB Gluecon 2012
Appirio
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
Igor Moochnick
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
Amazon Web Services
 

Ähnlich wie Nosql seminar (20)

NoSQL
NoSQLNoSQL
NoSQL
 
DynamoDB Gluecon 2012
DynamoDB Gluecon 2012DynamoDB Gluecon 2012
DynamoDB Gluecon 2012
 
Gluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDBGluecon 2012 - DynamoDB
Gluecon 2012 - DynamoDB
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
 
Nosql
NosqlNosql
Nosql
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 
No sql
No sqlNo sql
No sql
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
NoSQL
NoSQLNoSQL
NoSQL
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
 

Kürzlich hochgeladen

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Kürzlich hochgeladen (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Nosql seminar

  • 2. Agenda  Introduction to NOSQL  Objective  Examples of NOSQL databases  NOSQL vs SQL  Conclusion
  • 3. Basic Concepts  Database – is a organized collection of data.  Data base Management System (DBMS)- is a software package with computer program that controls the creation , maintainance & use of a database.  for DBMS , we use structured language to interact with it  Ex. Oracle , IBM DB2 , Ms Access , MySQL , FoxPro etc.  Relational DBMS - A relational database is a collection of data items organized as a set of formally described tables from which data can be accessed easily. A relational database is created using the relational model. The software used in a relational database is called a relational database management system (RDBMS).
  • 4. SQL  Stuctured Query Language  Special purpose programming language designed for managing data in RDBMS.  Origininally based upon relational algebra & tuple relation calculas.  SQl’s scope include data insert,upadte & delete, schema creation and modification , data access control.  It is static and strong used in database.  Most used widely used database language.  Query is the most important operation in SQL.  Ex. SELECT * FROM Book WHERE price > 100.00 ORDER BY title;
  • 5. NOSQL  Stands for Not Only SQL  Class of non-relational data storage systems  Usually do not require a fixed table schema nor do they use the concept of joins  All NOSQL offerings relax one or more of the ACID properties .  Atomicity , Consistancy , Isolation , Durability ( ACID )  “NOSQL” = “Not Only SQL” = Not Only using traditional relational DBMS
  • 6. NOSQL • Alternative to traditional relational DBMS • Flexible schema • Quicker/cheaper to set up • Massive scalability • Relaxed consistency higher performance & availability * No declarative query language more programming * Relaxed consistency fewer guarantees
  • 7. Why NOSQL?  Every problem cannot be solved by traditional relational database system exclusively.  Handles huge databases.  Redundancy, data is pretty safe on commodity hardware  Super flexible queries using map/reduce  Rapid development (no fixed schema, yeah!)  Very fast for common use cases
  • 8. Contd..  Inspired by Distributed Data Storage problems  Scale easily by adding servers  Not suited to all problem types, but super-suited to certain large problem types  High-write situations (eg activity tracking or timeline rendering for millions of users)  A lot of relational uses are really dumbed down (eg fetch by PK with update)
  • 10. How does it work?  Clients know how to: Send items to servers (consistent hashing) What to do when a server fails How to fetch keys from servers Can “weigh” to server capacities  Servers know how to: Store items they receive Expire them from the cache No inter-server comms – everything is unaware
  • 11. Performance  RDBMS uses buffer to ensure ACID properties  NoSQL does not guarantee ACID and is therefore much faster  We don’t need ACID everywhere!  Ex. Data processing (every minute) is 4x faster with MongoDB, despite being a lot more detailed (due to much simple development)
  • 12. Why NOSQL is faster than SQL ? - Scalling  Simple web application with not much traffic  Application server, database server all on one machine
  • 13. Scalling contd..  More traffic comes in  Application server  Database server  Even more traffic comes in  Load balancer  Application server x2  Database server
  • 14. Scalling contd..  Even more traffic comes in  Load balancer x N  easy  Application server x N  easy  Database server xN  hard for SQL databases
  • 16. Scalling contd..  NoSQL Scalling -  Need more storage?  Add more servers!  Need higher performance?  Add more servers!  Need better reliability?  Add more servers!
  • 17. Scalling Summary  You can scale SQL databases (Oracle, MySQL, SQL Server…)  This will cost you dearly  If you don’t have a lot of money, you will reach limits quickly  You can scale NoSQL databases  Very easy horizontal scaling  Lots of open-source solutions  Scaling is one of the basic incentives for design, so it is well handled  Scaling is the cause of trade-offs causing you to have to use map/reduce
  • 18. Characterstics  Almost infinite horizontal scaling  Very fast  Performance doesn’t deteriorate with growth (much)  No fixed table schemas  No join operations  Ad-hoc queries difficult or impossible  Structured storage  Almost everything happens in RAM
  • 19. NOSQL Types  Wide Column Store / Column Families  Document Store  Key Value / Tuple Store  Graph Databases  Object Databases  XML Databases  Multivalue Databases
  • 20. Main types -  Key-Value Stores  Map Reduce Framework  Document Databases  Graph Databases
  • 21. Key Value Stores  Lineage: Amazon's Dynamo paper and Distributed HashTables.  Data model: A global collection of key-value pairs  Example systems  Google BigTable , Amazon Dynamo, Cassandra, Voldemort , Hbase , …  Implementation: efficiency, scalability, fault-tolerance  Records distributed to nodes based on key  Replication  Single-record transactions, “eventual consistency”
  • 22. Documented Databases  Lineage: Inspired by Lotus Notes.  Data model: Collections of documents, which contain key-value collections (called "documents").  Example: CouchDB, MongoDB, Riak
  • 23. Graph Database  Lineage: Draws from Euler and graph theory.  Data model: Nodes & relationships, both which can hold key-value pairs  Example: AllegroGraph, InfoGrid, Neo4j
  • 24. Map Reduce Framework  Google’s framework for processing highly distributable problems across huge datasets using a large number of computers  Let’s define large number of computers  Cluster if all of them have same hardware  Grid unless Cluster (if !Cluster for old-style programmers)  Process split into two phases  Map  Take the input, partition it delegate to other machines  Other machines can repeat the process, leading to tree structure  Each machine returns results to the machine who gave it the task
  • 25. Map Reduce Framework contd..  Reduce  collect results from machines you gave the tasks  combine results and return it to requester  Slower than sequential data processing, but massively parallel  Sort petabyte of data in a few hours  Input, Map, Shuffle, Reduce, Output
  • 26. Popular NoSQL  Hadoop / Hbase  MemcacheDB  Cassandra  Voldemort  Amazon  Hypertable SimpleDB  Cloudata  MongoDB  IBM  CouchDB Lotus/Domino  Redis
  • 27. Real World Use  Cassandra  Facebook (original developer, used it till late 2010)  Twitter  Digg  Reddit  Rackspace  Cisco  BigTable  Google (open-source version is HBase)  MongoDB  Foursquare  Craigslist  Bit.ly  SourceForge  GitHub
  • 28. MONGODB  Document store  Basic support for dynamic (ad hoc) queries  Query by example (nice!)  Conditional Operators  <, <=, >, >=  $all, $exists, $mod, $ne, $in, $nin, $nor, $or, $and, $si ze, $type
  • 29. MONGODB  Data is stored as BSON (binary JSON)  Makes it very well suited for languages with native JSON support  Map/Reduce written in Javascript  Slow! There is one single thread of execution in Javascript  Master/slave replication (auto failover with replica sets)  Sharding built-in  Uses memory mapped files for data storage  Performance over features  On 32bit systems, limited to ~2.5Gb  An empty database takes up 192Mb  GridFS to store big data + metadata (not actually an FS)
  • 30. CASANDRA  Written in: Java  Protocol: Custom, binary (Thrift)  Tunable trade-offs for distribution and replication (N, R, W)  Querying by column, range of keys  BigTable-like features: columns, column families  Writes are much faster than reads (!)  Constant write time regardless of database size  Map/reduce possible with Apache Hadoop
  • 31. Some more info about Cassndra in Facebook  Cassandra is open source DBMS from Appache software foundation.  Cassandra provides a structured key-value store with tunable consistency  Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure  It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010
  • 32. HBASE  Written in: Java  Main point: Billions of rows X millions of columns  Modeled after BigTable  Map/reduce with Hadoop  Query predicate push down via server side scan and get filters  Optimizations for real time queries  A high performance Thrift gateway  HTTP supports XML, Protobuf, and binary  Cascading, hive, and pig source and sink modules  No single point of failure  While Hadoop streams data efficiently, it has overhead for starting map/reduce jobs. HBase is column oriented key/value store and allows for low latency read and writes.  Random access performance is like MySQL
  • 33. COUCHDB  Written in: Erlang  Main point: DB consistency, ease of use  Bi-directional (!) replication, continuous or ad-hoc, with conflict detection, thus, master-master replication. (!)  MVCC - write operations do not block reads  Previous versions of documents are available  Crash-only (reliable) design  Needs compacting from time to time  Views: embedded map/reduce  Formatting views: lists & shows  Server-side document validation possible  Authentication possible  Real-time updates via _changes (!)  Attachment handling  CouchApps (standalone JS apps)
  • 34. HADOOP  Apache project  A framework that allows for the distributed processing of large data sets across clusters of computers  Designed to scale up from single servers to thousands of machines  Designed to detect and handle failures at the application layer, instead of relying on hardware for it  Created by Doug Cutting, who named it after his son's toy elephant  Hadoop subprojects  Cassandra  HBase  Pig  Hive was a Hadoop subproject, but is now a top-level Apache project
  • 35. HADOOP contd..  Scales to hundreds or thousands of computers, each with several processor cores  Designed to efficiently distribute large amounts of work across a set of machines  Hundreds of gigabytes of data constitute the low end of Hadoop- scale  Built to process "web-scale" data on the order of hundreds of gigabytes to terabytes or petabytes  Uses Java, but allows streaming so other languages can easily send and accept data items to/from Hadoop
  • 36. HADOOP contd..  Uses distributed file system (HDFS)  Designed to hold very large amounts of data (terabytes or even petabytes)  Files are stored in a redundant fashion across multiple machines to ensure their durability to failure and high availability to very parallel applications  Data organized into directories and files  Files are divided into block (64MB by default) and distributed across nodes  Design of HDFS is based on the design of the Google File System
  • 37. HIVE  A petabyte-scale data warehouse system for Hadoop  Easy data summarization, ad-hoc queries  Query the data using a SQL-like language called HiveQL  Hive compiler generates map-reduce jobs for most queries
  • 38. Conclusion  NoSQL is a great problem solver if you need it  Choose your NoSQL platform carefully as each is designed for specific purpose  Get used to Map/Reduce  It’s not a sin to use NoSQL alongside (yes)SQL database
  • 39. Referance  http://www.facebook.com/note.php?note_id=24413 138919  http://en.wikipedia.org/wiki/Apache_Cassandra  http://en.wikipedia.org/wiki/SQL  http://en.wikipedia.org/wiki/NoSQL  www.slideshare.com