SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Cassandra



Tuesday, February 22, 2011               1
Operational Data Store
                                          Initial Requirements
                                                 (Late 2007)




                    • On big data security aggregator from
                             multiple sources using Morningstar global
                             security identifier
                    • Highly scalable both horizontally and
                             vertically
                    • Easy to distribute computation processing
                    • Easy to store various types of data
Tuesday, February 22, 2011                                               2
MySQL
                                         Initial Implementation
                                                  (2008)



                    •        One database on one big database server

                    •        Very simple data model - one table per source
                             with a simple key (Morningstar ID and date)

                    •        Tables were manually replicated with complicated
                             logic

                    •        Tables stored data as binary blobs

                    •        No indexing on the tables other than the primary
                             key(s)


Tuesday, February 22, 2011                                                      3
MySQL Tables




Tuesday, February 22, 2011                  4
What worked?

                    • Great interface to query the data
                    • Very stable system
                    • Simple data model meant high
                             efficiency for queries
                    • Great memory usage
Tuesday, February 22, 2011                                5
What did not work
                    •         Hard to implement Map-Reduce

                    •         Hard to increase capacity with data growth

                    •         Multi-site replication slow and somewhat
                              complicated

                    •         Limited number of columns and rows per table
                             - Did manual table partitioning to keep under 2 million records per table
                             - Table per source to keep column count down, and to not have sparsely
                               populated rows




Tuesday, February 22, 2011                                                                               6
Cassandra
                                         Current Implementation
                                                  (2010)




                    • 5 Machine Cluster
                             •   In house VMs on blade farm

                             •   4 cores, 8 GB ram per node

                    • Column families based on access type not
                             source
                    • Manual indexing of data unit type to key(s)

Tuesday, February 22, 2011                                          7
Cassandra Column Families
                             Data




Tuesday, February 22, 2011          8
Cassandra Column Families
                             Time Series Data




Tuesday, February 22, 2011                      9
What works?
                    •        Very easy to query when the keys are known (normal use)

                    •        Very scalable, just add more nodes, even at a later point in
                             time.

                    •        Multi-site replication is easy

                    •        Basically unlimited number of columns per column family

                    •        Unlimited number of rows per column family

                    •        Sparse rows don’t waste space

                    •        Disaster recovery automatically taken care of by multi-site
                             redundancy



Tuesday, February 22, 2011                                                                  10
What is hard
                    •        Arbitrary queries are dificult.

                             •   Had to create our own indexes to go from data
                                 unit type back to key (can’t select where != NULL)

                             •   Need to add extra indexes and/or de-normalized
                                 column families when we think of a new way that
                                 we want to query the data

                    •        Monitoring a cluster is harder than one server

                    •        Getting memory usage settings correct so that nodes
                             don’t die with OOM errors


Tuesday, February 22, 2011                                                            11
Future Plans


                    • Upgrade to 0.7
                    • Expand cluster to multiple data centers
                             around the globe




Tuesday, February 22, 2011                                      12

Weitere ähnliche Inhalte

Ähnlich wie Cassandra at Morningstar (Feb 2011)

Hpts 2011 flexible_oltp
Hpts 2011 flexible_oltpHpts 2011 flexible_oltp
Hpts 2011 flexible_oltp
Jags Ramnarayan
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
jbellis
 
MySQL DW Breakfast
MySQL DW BreakfastMySQL DW Breakfast
MySQL DW Breakfast
Ivan Zoratti
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
Eonblast
 

Ähnlich wie Cassandra at Morningstar (Feb 2011) (20)

Membase Meetup - San Diego
Membase Meetup - San DiegoMembase Meetup - San Diego
Membase Meetup - San Diego
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
My sql tutorial-oscon-2012
My sql tutorial-oscon-2012My sql tutorial-oscon-2012
My sql tutorial-oscon-2012
 
No sql findings
No sql findingsNo sql findings
No sql findings
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 
1 Unix basics. Part 1
1 Unix basics. Part 11 Unix basics. Part 1
1 Unix basics. Part 1
 
Hpts 2011 flexible_oltp
Hpts 2011 flexible_oltpHpts 2011 flexible_oltp
Hpts 2011 flexible_oltp
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"
 
Iwmn architecture
Iwmn architectureIwmn architecture
Iwmn architecture
 
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part V
 
MySQL DW Breakfast
MySQL DW BreakfastMySQL DW Breakfast
MySQL DW Breakfast
 
SortaSQL
SortaSQLSortaSQL
SortaSQL
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
Coding Potpourri: MySQL
Coding Potpourri: MySQLCoding Potpourri: MySQL
Coding Potpourri: MySQL
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Cassandra at Morningstar (Feb 2011)

  • 2. Operational Data Store Initial Requirements (Late 2007) • On big data security aggregator from multiple sources using Morningstar global security identifier • Highly scalable both horizontally and vertically • Easy to distribute computation processing • Easy to store various types of data Tuesday, February 22, 2011 2
  • 3. MySQL Initial Implementation (2008) • One database on one big database server • Very simple data model - one table per source with a simple key (Morningstar ID and date) • Tables were manually replicated with complicated logic • Tables stored data as binary blobs • No indexing on the tables other than the primary key(s) Tuesday, February 22, 2011 3
  • 5. What worked? • Great interface to query the data • Very stable system • Simple data model meant high efficiency for queries • Great memory usage Tuesday, February 22, 2011 5
  • 6. What did not work • Hard to implement Map-Reduce • Hard to increase capacity with data growth • Multi-site replication slow and somewhat complicated • Limited number of columns and rows per table - Did manual table partitioning to keep under 2 million records per table - Table per source to keep column count down, and to not have sparsely populated rows Tuesday, February 22, 2011 6
  • 7. Cassandra Current Implementation (2010) • 5 Machine Cluster • In house VMs on blade farm • 4 cores, 8 GB ram per node • Column families based on access type not source • Manual indexing of data unit type to key(s) Tuesday, February 22, 2011 7
  • 8. Cassandra Column Families Data Tuesday, February 22, 2011 8
  • 9. Cassandra Column Families Time Series Data Tuesday, February 22, 2011 9
  • 10. What works? • Very easy to query when the keys are known (normal use) • Very scalable, just add more nodes, even at a later point in time. • Multi-site replication is easy • Basically unlimited number of columns per column family • Unlimited number of rows per column family • Sparse rows don’t waste space • Disaster recovery automatically taken care of by multi-site redundancy Tuesday, February 22, 2011 10
  • 11. What is hard • Arbitrary queries are dificult. • Had to create our own indexes to go from data unit type back to key (can’t select where != NULL) • Need to add extra indexes and/or de-normalized column families when we think of a new way that we want to query the data • Monitoring a cluster is harder than one server • Getting memory usage settings correct so that nodes don’t die with OOM errors Tuesday, February 22, 2011 11
  • 12. Future Plans • Upgrade to 0.7 • Expand cluster to multiple data centers around the globe Tuesday, February 22, 2011 12