SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Big Data Schema Design

               Deepak
Overview
•   Schema design is vital for performance.
•   Keywords : Non-relational, NOSQL, Distributed
•   Underlying File system : GFS, HDFS
•   Examples : Hadoop, GFS, Hbase, Big Tables etc
•   Example implementations : Facebook, Wallmart
    etc.
When to use
• Typically with systems having >=100’s of
  millions/billions rows
• Records of the order of 100’s or 1000’s of
  TB’s
• No advanced Query Language needed
• Typed columns or other RDBMS features not
  needed
Hadoop Architecture
Hadoop Ecosystem
HBase Architecture
Overview
• HBase runs on top of HDFS
• HDFS was chosen because of its fault tolerance,
  check summing, failover properties
• Java Native client or REST API
• Manager manages cluster, Region Servers
  manages data
HBase Data Model
• Table: design-time namespace, has many rows.
• Row: atomic key/value container, with one row
  key
• Column Family: divide columns into physical files
• Column: a key in the k/v container inside a row
• Timestamp: long milliseconds, sorted descending
• Value: a time-versioned value in the k/v container
Distribution
More distribution
Thoughts on the logical view
• Unit of scalability is Region.
• The rows are not tied to a server. They maybe
  moved around for load balancing.
• Add nodes so that we do not have too many
  regions per node
• Too many regions per node will work against
  distribution
Column Family
• Each Column Family represents a Physical storage
  unit ( A Directory)
• Data that are queried together should be stored
  together.
• Features such as compression can be enabled per
  Column Family
Bloom Filter
• Generated automatically when an HFile is
  flushed to disk
• Available in primary memory
• Contains Row keys
• CK can be stored as part of RK, but that
  might overload the memory.
• Can filter based on what is stored.
Physical View
Key Cardinality
Tall vs Fat Tables
• Fat tables with large amounts of data in each
  column.
• Tall tables with large amounts of rows.
• Tall is good for search or scans
• Fat is good for fetches or gets
• Rows don’t split
• Atomicity is only at row level, having compound
  keys, atomicity is not guaranteed
Key Design
• Sequential keys : Example timestamp as key
• With Sequential keys you keep hot spotting on a
  region.
• Salting to distribute the records
• Field promotion
• Random keys
Key Design Performance
Summary
• Think twice before you decide on NOSQL
  technologies
• Avoid hotspots
• Store values at appropriate places
• Choose the right keys
• Store inferences into RDBMS if necessary
Visit us:

   Facebook: http://www.facebook.com/QBurst
        Twitter: http://twitter.com/qburst
 Google+: https://plus.google.com/+qburst/posts
LinkedIn: http://www.linkedin.com/company/qburst
YouTube: http://www.youtube.com/QBurstVideos


                www.qburst.com

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
John Dougherty
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage Engines
Karthik .P.R
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
Howard Marks
 

Was ist angesagt? (20)

Supercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with ElasticsearchSupercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with Elasticsearch
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Indexing with solr search server and hadoop framework
Indexing with solr search server and hadoop frameworkIndexing with solr search server and hadoop framework
Indexing with solr search server and hadoop framework
 
HBase
HBaseHBase
HBase
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
 
Road to cloud-iaas
Road to cloud-iaasRoad to cloud-iaas
Road to cloud-iaas
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage Engines
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Share point 2013 on azure
Share point 2013 on azureShare point 2013 on azure
Share point 2013 on azure
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
 
Postgres Open
Postgres OpenPostgres Open
Postgres Open
 
Hive big-data meetup
Hive big-data meetupHive big-data meetup
Hive big-data meetup
 
Storage for VDI
Storage for VDIStorage for VDI
Storage for VDI
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
 
Short introduction to Redis
Short introduction to RedisShort introduction to Redis
Short introduction to Redis
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
Barcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWSBarcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWS
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
CosmosDb for beginners
CosmosDb for beginnersCosmosDb for beginners
CosmosDb for beginners
 

Ähnlich wie Schema Design

UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
Chris Huang
 

Ähnlich wie Schema Design (20)

HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Apache hive
Apache hiveApache hive
Apache hive
 
Comparative study of modern databases
Comparative study of modern databasesComparative study of modern databases
Comparative study of modern databases
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
NoSql
NoSqlNoSql
NoSql
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 

Mehr von QBurst

Mehr von QBurst (9)

Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...
 
Best Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native AppsBest Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native Apps
 
Project Tracking Application
Project Tracking ApplicationProject Tracking Application
Project Tracking Application
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best Practices
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best Practices
 
Implementing AMP on WP Blog
Implementing AMP on WP Blog Implementing AMP on WP Blog
Implementing AMP on WP Blog
 
HTTPS Impact on SEO
HTTPS Impact on SEOHTTPS Impact on SEO
HTTPS Impact on SEO
 
How to Secure Your WordPress Site
How to Secure Your WordPress SiteHow to Secure Your WordPress Site
How to Secure Your WordPress Site
 
QBurst Big Data Expertise - Infographic
 QBurst Big Data Expertise - Infographic  QBurst Big Data Expertise - Infographic
QBurst Big Data Expertise - Infographic
 

Kürzlich hochgeladen

Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
amitlee9823
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
Abortion pills in Kuwait Cytotec pills in Kuwait
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
dlhescort
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
lizamodels9
 

Kürzlich hochgeladen (20)

Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
 
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceMalegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Malegaon Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperity
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
 
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
 

Schema Design

  • 1. Big Data Schema Design Deepak
  • 2. Overview • Schema design is vital for performance. • Keywords : Non-relational, NOSQL, Distributed • Underlying File system : GFS, HDFS • Examples : Hadoop, GFS, Hbase, Big Tables etc • Example implementations : Facebook, Wallmart etc.
  • 3. When to use • Typically with systems having >=100’s of millions/billions rows • Records of the order of 100’s or 1000’s of TB’s • No advanced Query Language needed • Typed columns or other RDBMS features not needed
  • 7. Overview • HBase runs on top of HDFS • HDFS was chosen because of its fault tolerance, check summing, failover properties • Java Native client or REST API • Manager manages cluster, Region Servers manages data
  • 8. HBase Data Model • Table: design-time namespace, has many rows. • Row: atomic key/value container, with one row key • Column Family: divide columns into physical files • Column: a key in the k/v container inside a row • Timestamp: long milliseconds, sorted descending • Value: a time-versioned value in the k/v container
  • 11. Thoughts on the logical view • Unit of scalability is Region. • The rows are not tied to a server. They maybe moved around for load balancing. • Add nodes so that we do not have too many regions per node • Too many regions per node will work against distribution
  • 12. Column Family • Each Column Family represents a Physical storage unit ( A Directory) • Data that are queried together should be stored together. • Features such as compression can be enabled per Column Family
  • 13. Bloom Filter • Generated automatically when an HFile is flushed to disk • Available in primary memory • Contains Row keys • CK can be stored as part of RK, but that might overload the memory. • Can filter based on what is stored.
  • 16. Tall vs Fat Tables • Fat tables with large amounts of data in each column. • Tall tables with large amounts of rows. • Tall is good for search or scans • Fat is good for fetches or gets • Rows don’t split • Atomicity is only at row level, having compound keys, atomicity is not guaranteed
  • 17. Key Design • Sequential keys : Example timestamp as key • With Sequential keys you keep hot spotting on a region. • Salting to distribute the records • Field promotion • Random keys
  • 19. Summary • Think twice before you decide on NOSQL technologies • Avoid hotspots • Store values at appropriate places • Choose the right keys • Store inferences into RDBMS if necessary
  • 20. Visit us: Facebook: http://www.facebook.com/QBurst Twitter: http://twitter.com/qburst Google+: https://plus.google.com/+qburst/posts LinkedIn: http://www.linkedin.com/company/qburst YouTube: http://www.youtube.com/QBurstVideos www.qburst.com

Hinweis der Redaktion

  1. Activity 1   - Study Make a conscious effort to improve attention to detail everywhere. Wherever you go, look for things to recall later. When you're shopping look for three things to study. Take 15 to 20 seconds to study each object. After returning home, write down specific things about the objects. Make notes of the size, the shape, the color.   Activity 2     - Recollection           People tend to get careless about the things in which they are familiar. Complacency especially during routine actions does not exercise the mind. Make a point to look for details and notice things as often as possible. Have you noticed the number of steps you need to climb from the ground to reach 3rd floor and 4th floor  at QBurst.63,85)