Cassandra at Morningstar (Feb 2011)

•

1 gefällt mir•544 views

jeremiahdjordan

Presenation from Cassandra Chicago Meetup on 2/23/2011. http://www.meetup.com/Cassandra-Chicago/events/15981897/

Technologie

Operational Data Store
Initial Requirements
(Late 2007)

• On big data security aggregator from
multiple sources using Morningstar global
security identiﬁer
• Highly scalable both horizontally and
vertically
• Easy to distribute computation processing
• Easy to store various types of data
Tuesday, February 22, 2011 2

MySQL
Initial Implementation
(2008)

• One database on one big database server

• Very simple data model - one table per source
with a simple key (Morningstar ID and date)

• Tables were manually replicated with complicated
logic

• Tables stored data as binary blobs

• No indexing on the tables other than the primary
key(s)

Tuesday, February 22, 2011 3

MySQL Tables

Tuesday, February 22, 2011 4

What worked?

• Great interface to query the data
• Very stable system
• Simple data model meant high
efﬁciency for queries
• Great memory usage
Tuesday, February 22, 2011 5

What did not work
• Hard to implement Map-Reduce

• Hard to increase capacity with data growth

• Multi-site replication slow and somewhat
complicated

• Limited number of columns and rows per table
- Did manual table partitioning to keep under 2 million records per table
- Table per source to keep column count down, and to not have sparsely
populated rows

Tuesday, February 22, 2011 6

Cassandra
Current Implementation
(2010)

• 5 Machine Cluster
• In house VMs on blade farm

• 4 cores, 8 GB ram per node

• Column families based on access type not
source
• Manual indexing of data unit type to key(s)

Tuesday, February 22, 2011 7

Cassandra Column Families
Data

Tuesday, February 22, 2011 8

Cassandra Column Families
Time Series Data

Tuesday, February 22, 2011 9

What works?
• Very easy to query when the keys are known (normal use)

• Very scalable, just add more nodes, even at a later point in
time.

• Multi-site replication is easy

• Basically unlimited number of columns per column family

• Unlimited number of rows per column family

• Sparse rows don’t waste space

• Disaster recovery automatically taken care of by multi-site
redundancy

Tuesday, February 22, 2011 10

What is hard
• Arbitrary queries are diﬁcult.

• Had to create our own indexes to go from data
unit type back to key (can’t select where != NULL)

• Need to add extra indexes and/or de-normalized
column families when we think of a new way that
we want to query the data

• Monitoring a cluster is harder than one server

• Getting memory usage settings correct so that nodes
don’t die with OOM errors

Tuesday, February 22, 2011 11

Future Plans

• Upgrade to 0.7
• Expand cluster to multiple data centers
around the globe

Tuesday, February 22, 2011 12

Weitere ähnliche Inhalte

Ähnlich wie Cassandra at Morningstar (Feb 2011)

Membase Meetup - San DiegoMembase

Data Modeling for NoSQLTony Tam

My sql tutorial-oscon-2012John David Duncan

No sql findingsChristian van der Leeden

PayPal Big Data and MySQL ClusterMat Keep

A Global In-memory Data System for MySQLDaniel Austin

1 Unix basics. Part 1Roman Prykhodchenko

Hpts 2011 flexible_oltpJags Ramnarayan

State of Cassandra, 2011jbellis

Spotify: Horizontal Scalability for Great SuccessNick Barkas

Evan Ellis "Tumblr. Massively Sharded MySQL"Alexey Mahotkin

Iwmn architectureLenz Gschwendtner

Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines

MySQL DW BreakfastIvan Zoratti

SortaSQLCloudflare

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn

Yes sql08 inmemorydbDaniel Austin

Coding Potpourri: MySQLNicole C. Engard

Cassandra tech talkSatish Mehta

VoltDB and Erlang - Tech planet 2012Eonblast

Ähnlich wie Cassandra at Morningstar (Feb 2011) (20)

Membase Meetup - San Diego

Data Modeling for NoSQL

My sql tutorial-oscon-2012

No sql findings

PayPal Big Data and MySQL Cluster

A Global In-memory Data System for MySQL

1 Unix basics. Part 1

Hpts 2011 flexible_oltp

State of Cassandra, 2011

Spotify: Horizontal Scalability for Great Success

Evan Ellis "Tumblr. Massively Sharded MySQL"

Iwmn architecture

Severalnines Self-Training: MySQL® Cluster - Part V

MySQL DW Breakfast

SortaSQL

Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn

Yes sql08 inmemorydb

Coding Potpourri: MySQL

Cassandra tech talk

VoltDB and Erlang - Tech planet 2012

Kürzlich hochgeladen

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica

QCon London: Mastering long-running processes in modern architecturesBernd Ruecker

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3

A Framework for Development in the AI AgeCprime

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

Scale your database traffic with Read & Write split using MySQL RouterMydbops

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Kürzlich hochgeladen (20)

Long journey of Ruby standard library at RubyConf AU 2024

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Time Series Foundation Models - current state and future directions

React Native vs Ionic - The Best Mobile App Framework

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

The State of Passkeys with FIDO Alliance.pptx

How AI, OpenAI, and ChatGPT impact business and software.

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...

QCon London: Mastering long-running processes in modern architectures

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx

A Framework for Development in the AI Age

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

Scale your database traffic with Read & Write split using MySQL Router

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Cassandra at Morningstar (Feb 2011)

1. Cassandra Tuesday, February 22, 2011 1

2. Operational Data Store Initial Requirements (Late 2007) • On big data security aggregator from multiple sources using Morningstar global security identiﬁer • Highly scalable both horizontally and vertically • Easy to distribute computation processing • Easy to store various types of data Tuesday, February 22, 2011 2

3. MySQL Initial Implementation (2008) • One database on one big database server • Very simple data model - one table per source with a simple key (Morningstar ID and date) • Tables were manually replicated with complicated logic • Tables stored data as binary blobs • No indexing on the tables other than the primary key(s) Tuesday, February 22, 2011 3

4. MySQL Tables Tuesday, February 22, 2011 4

5. What worked? • Great interface to query the data • Very stable system • Simple data model meant high efﬁciency for queries • Great memory usage Tuesday, February 22, 2011 5

6. What did not work • Hard to implement Map-Reduce • Hard to increase capacity with data growth • Multi-site replication slow and somewhat complicated • Limited number of columns and rows per table - Did manual table partitioning to keep under 2 million records per table - Table per source to keep column count down, and to not have sparsely populated rows Tuesday, February 22, 2011 6

7. Cassandra Current Implementation (2010) • 5 Machine Cluster • In house VMs on blade farm • 4 cores, 8 GB ram per node • Column families based on access type not source • Manual indexing of data unit type to key(s) Tuesday, February 22, 2011 7

8. Cassandra Column Families Data Tuesday, February 22, 2011 8

9. Cassandra Column Families Time Series Data Tuesday, February 22, 2011 9

10. What works? • Very easy to query when the keys are known (normal use) • Very scalable, just add more nodes, even at a later point in time. • Multi-site replication is easy • Basically unlimited number of columns per column family • Unlimited number of rows per column family • Sparse rows don’t waste space • Disaster recovery automatically taken care of by multi-site redundancy Tuesday, February 22, 2011 10

11. What is hard • Arbitrary queries are diﬁcult. • Had to create our own indexes to go from data unit type back to key (can’t select where != NULL) • Need to add extra indexes and/or de-normalized column families when we think of a new way that we want to query the data • Monitoring a cluster is harder than one server • Getting memory usage settings correct so that nodes don’t die with OOM errors Tuesday, February 22, 2011 11

12. Future Plans • Upgrade to 0.7 • Expand cluster to multiple data centers around the globe Tuesday, February 22, 2011 12

Cassandra at Morningstar (Feb 2011)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Cassandra at Morningstar (Feb 2011)

Ähnlich wie Cassandra at Morningstar (Feb 2011) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Cassandra at Morningstar (Feb 2011)