SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Cassandra



Tuesday, February 22, 2011               1
Operational Data Store
                                          Initial Requirements
                                                 (Late 2007)




                    • On big data security aggregator from
                             multiple sources using Morningstar global
                             security identifier
                    • Highly scalable both horizontally and
                             vertically
                    • Easy to distribute computation processing
                    • Easy to store various types of data
Tuesday, February 22, 2011                                               2
MySQL
                                         Initial Implementation
                                                  (2008)



                    •        One database on one big database server

                    •        Very simple data model - one table per source
                             with a simple key (Morningstar ID and date)

                    •        Tables were manually replicated with complicated
                             logic

                    •        Tables stored data as binary blobs

                    •        No indexing on the tables other than the primary
                             key(s)


Tuesday, February 22, 2011                                                      3
MySQL Tables




Tuesday, February 22, 2011                  4
What worked?

                    • Great interface to query the data
                    • Very stable system
                    • Simple data model meant high
                             efficiency for queries
                    • Great memory usage
Tuesday, February 22, 2011                                5
What did not work
                    •         Hard to implement Map-Reduce

                    •         Hard to increase capacity with data growth

                    •         Multi-site replication slow and somewhat
                              complicated

                    •         Limited number of columns and rows per table
                             - Did manual table partitioning to keep under 2 million records per table
                             - Table per source to keep column count down, and to not have sparsely
                               populated rows




Tuesday, February 22, 2011                                                                               6
Cassandra
                                         Current Implementation
                                                  (2010)




                    • 5 Machine Cluster
                             •   In house VMs on blade farm

                             •   4 cores, 8 GB ram per node

                    • Column families based on access type not
                             source
                    • Manual indexing of data unit type to key(s)

Tuesday, February 22, 2011                                          7
Cassandra Column Families
                             Data




Tuesday, February 22, 2011          8
Cassandra Column Families
                             Time Series Data




Tuesday, February 22, 2011                      9
What works?
                    •        Very easy to query when the keys are known (normal use)

                    •        Very scalable, just add more nodes, even at a later point in
                             time.

                    •        Multi-site replication is easy

                    •        Basically unlimited number of columns per column family

                    •        Unlimited number of rows per column family

                    •        Sparse rows don’t waste space

                    •        Disaster recovery automatically taken care of by multi-site
                             redundancy



Tuesday, February 22, 2011                                                                  10
What is hard
                    •        Arbitrary queries are dificult.

                             •   Had to create our own indexes to go from data
                                 unit type back to key (can’t select where != NULL)

                             •   Need to add extra indexes and/or de-normalized
                                 column families when we think of a new way that
                                 we want to query the data

                    •        Monitoring a cluster is harder than one server

                    •        Getting memory usage settings correct so that nodes
                             don’t die with OOM errors


Tuesday, February 22, 2011                                                            11
Future Plans


                    • Upgrade to 0.7
                    • Expand cluster to multiple data centers
                             around the globe




Tuesday, February 22, 2011                                      12

Weitere ähnliche Inhalte

Ähnlich wie Cassandra at Morningstar (Feb 2011)

Membase Meetup - San Diego
Membase Meetup - San DiegoMembase Meetup - San Diego
Membase Meetup - San DiegoMembase
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterMat Keep
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLDaniel Austin
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011jbellis
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessNick Barkas
 
Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"Alexey Mahotkin
 
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines
 
MySQL DW Breakfast
MySQL DW BreakfastMySQL DW Breakfast
MySQL DW BreakfastIvan Zoratti
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydbDaniel Austin
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talkSatish Mehta
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012Eonblast
 

Ähnlich wie Cassandra at Morningstar (Feb 2011) (20)

Membase Meetup - San Diego
Membase Meetup - San DiegoMembase Meetup - San Diego
Membase Meetup - San Diego
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
My sql tutorial-oscon-2012
My sql tutorial-oscon-2012My sql tutorial-oscon-2012
My sql tutorial-oscon-2012
 
No sql findings
No sql findingsNo sql findings
No sql findings
 
PayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL ClusterPayPal Big Data and MySQL Cluster
PayPal Big Data and MySQL Cluster
 
A Global In-memory Data System for MySQL
A Global In-memory Data System for MySQLA Global In-memory Data System for MySQL
A Global In-memory Data System for MySQL
 
1 Unix basics. Part 1
1 Unix basics. Part 11 Unix basics. Part 1
1 Unix basics. Part 1
 
Hpts 2011 flexible_oltp
Hpts 2011 flexible_oltpHpts 2011 flexible_oltp
Hpts 2011 flexible_oltp
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"Evan Ellis "Tumblr. Massively Sharded MySQL"
Evan Ellis "Tumblr. Massively Sharded MySQL"
 
Iwmn architecture
Iwmn architectureIwmn architecture
Iwmn architecture
 
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part V
 
MySQL DW Breakfast
MySQL DW BreakfastMySQL DW Breakfast
MySQL DW Breakfast
 
SortaSQL
SortaSQLSortaSQL
SortaSQL
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Yes sql08 inmemorydb
Yes sql08 inmemorydbYes sql08 inmemorydb
Yes sql08 inmemorydb
 
Coding Potpourri: MySQL
Coding Potpourri: MySQLCoding Potpourri: MySQL
Coding Potpourri: MySQL
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
 

Kürzlich hochgeladen

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Cassandra at Morningstar (Feb 2011)

  • 2. Operational Data Store Initial Requirements (Late 2007) • On big data security aggregator from multiple sources using Morningstar global security identifier • Highly scalable both horizontally and vertically • Easy to distribute computation processing • Easy to store various types of data Tuesday, February 22, 2011 2
  • 3. MySQL Initial Implementation (2008) • One database on one big database server • Very simple data model - one table per source with a simple key (Morningstar ID and date) • Tables were manually replicated with complicated logic • Tables stored data as binary blobs • No indexing on the tables other than the primary key(s) Tuesday, February 22, 2011 3
  • 5. What worked? • Great interface to query the data • Very stable system • Simple data model meant high efficiency for queries • Great memory usage Tuesday, February 22, 2011 5
  • 6. What did not work • Hard to implement Map-Reduce • Hard to increase capacity with data growth • Multi-site replication slow and somewhat complicated • Limited number of columns and rows per table - Did manual table partitioning to keep under 2 million records per table - Table per source to keep column count down, and to not have sparsely populated rows Tuesday, February 22, 2011 6
  • 7. Cassandra Current Implementation (2010) • 5 Machine Cluster • In house VMs on blade farm • 4 cores, 8 GB ram per node • Column families based on access type not source • Manual indexing of data unit type to key(s) Tuesday, February 22, 2011 7
  • 8. Cassandra Column Families Data Tuesday, February 22, 2011 8
  • 9. Cassandra Column Families Time Series Data Tuesday, February 22, 2011 9
  • 10. What works? • Very easy to query when the keys are known (normal use) • Very scalable, just add more nodes, even at a later point in time. • Multi-site replication is easy • Basically unlimited number of columns per column family • Unlimited number of rows per column family • Sparse rows don’t waste space • Disaster recovery automatically taken care of by multi-site redundancy Tuesday, February 22, 2011 10
  • 11. What is hard • Arbitrary queries are dificult. • Had to create our own indexes to go from data unit type back to key (can’t select where != NULL) • Need to add extra indexes and/or de-normalized column families when we think of a new way that we want to query the data • Monitoring a cluster is harder than one server • Getting memory usage settings correct so that nodes don’t die with OOM errors Tuesday, February 22, 2011 11
  • 12. Future Plans • Upgrade to 0.7 • Expand cluster to multiple data centers around the globe Tuesday, February 22, 2011 12