SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Needle Meet HaystackAdapting your data models for Cassandra Gary Dusbabek  •  Rackspace•  ICOODB 2010
Outline First Things First Column Families Trade Offs Procedures & Best Practices Internals
It’s all about scalability
We can all be friends
Column Families
2.TradeOffs
No Transactions
No  Adhoc Queries
No Joins
No Flexible Indexes
Don’t Panic!
Scalability Availability Replication & Backup
3. Procedures & Practices
Relational Way Define entities Normalize Identify Many-to-many Query any way you want
How Come? Scarcity Efficiency
Cassandra Way Know your app Queries first Denormalize
Know Your App
Queries First
Nobody is Normal
Relational Example
Column Family Example
Column Family Example
Column Family Example
Column Family Example
Does it feel strange?
4. Internals
Sequential Writes Always
Consistency Level
Partitioning
Slices Data Locality
Summary ,[object Object]
ColumnFamilies != Relational tables
Trade-offs: you win some, you lose some

Weitere ähnliche Inhalte

Andere mochten auch (6)

Docker and CloudStack
Docker and CloudStackDocker and CloudStack
Docker and CloudStack
 
CloudStack Architecture
CloudStack ArchitectureCloudStack Architecture
CloudStack Architecture
 
Advanced excel 2010 & 2013 updated Terrabiz
Advanced excel 2010 & 2013 updated TerrabizAdvanced excel 2010 & 2013 updated Terrabiz
Advanced excel 2010 & 2013 updated Terrabiz
 
Key-Value Stores: a practical overview
Key-Value Stores: a practical overviewKey-Value Stores: a practical overview
Key-Value Stores: a practical overview
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databases
 
Sql queries with answers
Sql queries with answersSql queries with answers
Sql queries with answers
 

Ähnlich wie Data Modeling with Cassandra Column Families

Growing Up MongoDB
Growing Up MongoDBGrowing Up MongoDB
Growing Up MongoDB
MongoDB
 
Finding Love with MongoDB
Finding Love with MongoDBFinding Love with MongoDB
Finding Love with MongoDB
MongoDB
 

Ähnlich wie Data Modeling with Cassandra Column Families (20)

mongo db EMERSON EDUARDO RODRIGUES
mongo db EMERSON EDUARDO RODRIGUESmongo db EMERSON EDUARDO RODRIGUES
mongo db EMERSON EDUARDO RODRIGUES
 
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
 
Growing Up MongoDB
Growing Up MongoDBGrowing Up MongoDB
Growing Up MongoDB
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Neo4j Presentation
Neo4j PresentationNeo4j Presentation
Neo4j Presentation
 
Thinking about graphs
Thinking about graphsThinking about graphs
Thinking about graphs
 
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_WilkinsMongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
 
Cassandra via-docker
Cassandra via-dockerCassandra via-docker
Cassandra via-docker
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
Performance By Design
Performance By DesignPerformance By Design
Performance By Design
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
The Key to Keys - Database Design
The Key to Keys - Database DesignThe Key to Keys - Database Design
The Key to Keys - Database Design
 
Finding Love with MongoDB
Finding Love with MongoDBFinding Love with MongoDB
Finding Love with MongoDB
 
Scaling AWS With Scalr
Scaling AWS With ScalrScaling AWS With Scalr
Scaling AWS With Scalr
 
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillEnterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
 
NATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platformsNATS - A new nervous system for distributed cloud platforms
NATS - A new nervous system for distributed cloud platforms
 
PandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled EnsemblesPandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled Ensembles
 
Scaling Cloud Web & Data Technologies
Scaling Cloud Web & Data TechnologiesScaling Cloud Web & Data Technologies
Scaling Cloud Web & Data Technologies
 

Mehr von gdusbabek

Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYC
gdusbabek
 
Cassandra Codebase 2011
Cassandra Codebase 2011Cassandra Codebase 2011
Cassandra Codebase 2011
gdusbabek
 

Mehr von gdusbabek (15)

My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
 
How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014
 
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013
 
Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013
 
Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYC
 
Austin cassandra meetup
Austin cassandra meetupAustin cassandra meetup
Austin cassandra meetup
 
How Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses CassandraHow Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses Cassandra
 
Breaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL DatastoresBreaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL Datastores
 
Building Rackspace Cloud Monitoring
Building Rackspace Cloud MonitoringBuilding Rackspace Cloud Monitoring
Building Rackspace Cloud Monitoring
 
Cassandra Codebase 2011
Cassandra Codebase 2011Cassandra Codebase 2011
Cassandra Codebase 2011
 
Getting to Know the Cassandra Codebase
Getting to Know the Cassandra CodebaseGetting to Know the Cassandra Codebase
Getting to Know the Cassandra Codebase
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUG
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Data Modeling with Cassandra Column Families

Hinweis der Redaktion

  1. It’s not about one data model vs. another.It’s not about one storage engine vs. another.Cassandra excels at replicating data and achieving high sustained write throughput.
  2. The right tool for the right job
  3. Shaped by distribution model
  4. Shaped by distribution model
  5. Shaped by distribution model
  6. Sparse – do not have to exist in every row.
  7. Flexible column namingYou define the sort orderNot required to have a specific column just because another row does
  8. Look familiar?
  9. Arise because of distribution model, not CF model.
  10. * Atomic @ CF row. Not isolated.* Large trans apps push down to node (shared nothing)* Guaranteeing ACID constraints across nodes is a hard problem.
  11. OTOH, you do get a lot of things:Data redundancyVery fast writes, fast reads
  12. Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  13. Relational>formally defined>correctQuery first>not formally defined>somehow incorrectYou get some things in exchange:ScalabilityAvailabilityReplication
  14. Focus on query & analysis.B+treesUpdate once*Cassandra typically becomes IO bound before becoming CPU bound.
  15. Not set in stone.Your application may require a different approach.
  16. Recognize non-starters: Is my dataset going to become Very Large? Will I need to sustain high write throughput?Also, what are the common operations? Optimize CFs for those operations.
  17. *columns sorted. Choose keys and columns.you need to think about how you plan to slice your data.Related data is close to reduce io
  18. DenormalizeUse the disk.Don’t be afraid to create another CF that duplicates some data.
  19. Composite column namesPainful updates of denormalized partsFast reads & insertions
  20. Key
  21. Normal attributes
  22. Composite column names.Pulling in relationshipsPainful updates. Denormalization is best when data doesn’t change.
  23. Commit log – separate diskMemtableSstable