SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Eric Lubow
@elubow
elubow@simplereach.com
Big
Architectures
for Big Data
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Overvie
• SimpleReach
• Goals
• Tools
• Architecture Implementation
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
The 2 Truths
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Even with the right tools, 80% of
the work of building a big data
system is acquiring and refining
The Real Truth
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
• Millions of URLs per day
• Over 1.25 billion page views per month
• 500m events per day (~6k events/second)
• Auto-scale 125-160 machines depending on traffic
SimpleReach
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
And It Goes Like This...
C*
Vertic
a
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Goals• Consistent non-data storage layer access patterns
• Data accuracy across storage engines
• Minimize downtime/Minimize cost of downtime
• High availability
• Allow access to many toolsets (for all languages, DBs,
Engines)
• Clients should have minimal architecture knowledge
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Consistent Access Patterns
realtime_scor
e
(‘score’,
‘realtime’)
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Authentication, Tracking,
Per service
access keys
Track call
volume by
access key
Prevent
internal
denial of
service
Monitor
availability and
performance
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Controlled Data Flow
Social
Event
Collector
Social
Data
Batch & Write
Processed
Data
Batch & Write
Raw Data
Calculate
Score
Write
NSQ Multicast NSQ NSQ
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
NSQ by Bit.ly• Distributed and de-centralized topology
• At least once delivery guaranteed
• Multicast style message routing
• Runtime discovery for consumers to find
producers
• Allow for maintenance windows with no
downtime
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Path of a Packet
Internet
EC
InternalAPI
Solr
C*
Mong
Redis
Vertic
API
Fire
Hos
SC
Consumers
Queue
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Evolution Takes Work• Know your access patterns
• Service Oriented Architecture (Internal API)
• Data accuracy checks: visual and programmatic
• Built framework for testing out engines (Storage,
Queueing, etc)
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Homogeneous Machines at Base
Application
Base AMI
Organizational Base
Event Collection
NSQ
Mongos
App Config
Users
Monitoring
Consumer
NSQ
Mongos
App Config
Users
Base Image Layout Producer Consumer
Amazon Linux
Monitoring
Amazon Linux
Application Group
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
DevOps Wizardry
• Extensive use of AWS
• Monitor: Nagios, Statsd, and Graphite
• Manage: Chef, OpsWorks, cSSHx, Vagrant
• Deployments
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Evolving Amazon Tools
• Full Featured API
• OpsWorks
• Cloud Formation
• S3 / CloudFront
• Elastic Beanstalk
• Elastic
MapReduce
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Service
Internal API
Solr
Real-time
C*
C*
Vertica
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Service Architecture Machines
Application
Base AMI
Organizational Base
iAPI Front End
nginx
App Config
Users
Monitoring
Data Store
App Config
Users
Base Image Layout Proxy Machines Storage Machines
Amazon Linux
Monitoring
Amazon Linux
Application Group
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Anatomy of an Endpoint
Mong
Mong
Vertic
C*
C*
hourly
content
Mong
Mong
Vertic
C*
C*
tenminute
content
QueryingMachines
Helen
Helen
PyVertic
PyMon
PyMon
PyVertic
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Endpoint Breakout• Availability
• Consistent Access Patterns
• Minimal downtime changes
• Smaller code deploys
• Non-monolithic code base
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Architecture Distribution
US-EAST-1a
MONGO-SHARD-0001-B
MONGO-SHARD-0000-A
CASSANDRA-0001
CASSANDRA-0010
REDIS-0001A
VERTICA-0001
iAPI-0001
US-EAST-1b
MONGO-SHARD-0002-B
MONGO-SHARD-0001-A
CASSANDRA-0002
CASSANDRA-0011
REDIS-0001B
iAPI-0002
US-EAST-1e
MONGO-SHARD-0002-A
MONGO-SHARD-0000-B
CASSANDRA-0003
CASSANDRA-0012
VERTICA-0003
iAPI-0003
VERTICA-0002
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Problems?
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
The Schrute of the Problem
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
New Service Questions
• Can its host be completely homogenous?
• Can it accept downtime (and what should downtime look
like)?
• Does it fit into an existing service?
• Does it require datacenter distribution?
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Summary• Solutions Require Evolution
• Build, Use, and Integrate Tools
• Abstraction
• Homogeneous Distribution
• Monitoring & Automation
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
We’re
(Ask about Food Coma Fridays)
Big Architectures for Big
Data
Eric Lubow @elubow
#Cassandra13
Questions are guaranteed in life.
Answers aren’t.
Eric Lubow
@elubow
elubow@simplereach.co
Thank
you.

Weitere ähnliche Inhalte

Mehr von DataStax Academy

Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph DatabasesDataStax Academy
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkDataStax Academy
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and CassandraDataStax Academy
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talkDataStax Academy
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayDataStax Academy
 

Mehr von DataStax Academy (20)

Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with Spark
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and Cassandra
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

C* Summit 2013: Big Architectures for Big Data by Eric Lubow

  • 2. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Overvie • SimpleReach • Goals • Tools • Architecture Implementation
  • 3. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 The 2 Truths
  • 4. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Even with the right tools, 80% of the work of building a big data system is acquiring and refining The Real Truth
  • 5. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13
  • 6. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13
  • 7. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 • Millions of URLs per day • Over 1.25 billion page views per month • 500m events per day (~6k events/second) • Auto-scale 125-160 machines depending on traffic SimpleReach
  • 8. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 And It Goes Like This... C* Vertic a
  • 9. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Goals• Consistent non-data storage layer access patterns • Data accuracy across storage engines • Minimize downtime/Minimize cost of downtime • High availability • Allow access to many toolsets (for all languages, DBs, Engines) • Clients should have minimal architecture knowledge
  • 10. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Consistent Access Patterns realtime_scor e (‘score’, ‘realtime’)
  • 11. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Authentication, Tracking, Per service access keys Track call volume by access key Prevent internal denial of service Monitor availability and performance
  • 12. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Controlled Data Flow Social Event Collector Social Data Batch & Write Processed Data Batch & Write Raw Data Calculate Score Write NSQ Multicast NSQ NSQ
  • 13. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 NSQ by Bit.ly• Distributed and de-centralized topology • At least once delivery guaranteed • Multicast style message routing • Runtime discovery for consumers to find producers • Allow for maintenance windows with no downtime
  • 14. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Path of a Packet Internet EC InternalAPI Solr C* Mong Redis Vertic API Fire Hos SC Consumers Queue
  • 15. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Evolution Takes Work• Know your access patterns • Service Oriented Architecture (Internal API) • Data accuracy checks: visual and programmatic • Built framework for testing out engines (Storage, Queueing, etc)
  • 16. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Homogeneous Machines at Base Application Base AMI Organizational Base Event Collection NSQ Mongos App Config Users Monitoring Consumer NSQ Mongos App Config Users Base Image Layout Producer Consumer Amazon Linux Monitoring Amazon Linux Application Group
  • 17. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 DevOps Wizardry • Extensive use of AWS • Monitor: Nagios, Statsd, and Graphite • Manage: Chef, OpsWorks, cSSHx, Vagrant • Deployments
  • 18. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Evolving Amazon Tools • Full Featured API • OpsWorks • Cloud Formation • S3 / CloudFront • Elastic Beanstalk • Elastic MapReduce
  • 19. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Service Internal API Solr Real-time C* C* Vertica
  • 20. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Service Architecture Machines Application Base AMI Organizational Base iAPI Front End nginx App Config Users Monitoring Data Store App Config Users Base Image Layout Proxy Machines Storage Machines Amazon Linux Monitoring Amazon Linux Application Group
  • 21. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Anatomy of an Endpoint Mong Mong Vertic C* C* hourly content Mong Mong Vertic C* C* tenminute content QueryingMachines Helen Helen PyVertic PyMon PyMon PyVertic
  • 22. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Endpoint Breakout• Availability • Consistent Access Patterns • Minimal downtime changes • Smaller code deploys • Non-monolithic code base
  • 23. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Architecture Distribution US-EAST-1a MONGO-SHARD-0001-B MONGO-SHARD-0000-A CASSANDRA-0001 CASSANDRA-0010 REDIS-0001A VERTICA-0001 iAPI-0001 US-EAST-1b MONGO-SHARD-0002-B MONGO-SHARD-0001-A CASSANDRA-0002 CASSANDRA-0011 REDIS-0001B iAPI-0002 US-EAST-1e MONGO-SHARD-0002-A MONGO-SHARD-0000-B CASSANDRA-0003 CASSANDRA-0012 VERTICA-0003 iAPI-0003 VERTICA-0002
  • 24. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Problems?
  • 25. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 The Schrute of the Problem
  • 26. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 New Service Questions • Can its host be completely homogenous? • Can it accept downtime (and what should downtime look like)? • Does it fit into an existing service? • Does it require datacenter distribution?
  • 27. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Summary• Solutions Require Evolution • Build, Use, and Integrate Tools • Abstraction • Homogeneous Distribution • Monitoring & Automation
  • 28. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 We’re (Ask about Food Coma Fridays)
  • 29. Big Architectures for Big Data Eric Lubow @elubow #Cassandra13 Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow elubow@simplereach.co Thank you.