SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
MongoDB & EC2: A Love Story?




        Eytan Daniyalzade
            @daniyalzade
http://bit.ly/cb_mongodb_meetup
Contents


● Chartbeat
● Architecture
● MongoDB & EC2 Challenges
● Happy Ending: (MongoDB ? EC2)
● Takeaways
chartbeat
Chartbeat: real-time analytics service

 ● 18 person startup in New York
 ● part of Betaworks
 ● peaking at just under 5M concurrents daily
    ○ up from 1M in July/2010
What chartbeat Provides

● real-time view of site performance

   ○ top pages

   ○ new/returning visitors

   ○ traffic flow
       ■ where are people coming from
       ■ where are people going to

● historic replay for the last 30 days
the architecture
Architecture, Browser

Part 1:
<head>
<script type="text/javascript">var _sf_startpt=(new Date()).getTime()</script>
...

Part 2:
...
function loadChartbeat() {
  // insert script tag
}
window.onload = loadChartbeat;
</body>
(highly simplified)

Ping is standard beacon logic, i.e. loading a 1x1 image.
Architecture, Backend

● custom libevent-based C backend
   ○ real-time collection and aggregation

● real-time system in-memory only

● background queue jobs snapshot every x minutes
   ○ Gearman

● historical data
    ○ mostly in MongoDB
Why Chartbeat uses MongoDB

● Pure JSON all along
   ○ Live API
   ○ Historical data
   ○ No mapping back and forth

● Fast Inserts (fire and forget)

● Flexible Schema
Why Chartbeat uses EC2

● Elastic Capacity

● No trips to datacenter

● EBS snapshots
Chartbeat & MongoDB & EC2 (1)

● 3 Clusters
    ○ 1 for each product
    ○ 1 as a caching layer
    ○ 2 - 4 instance/cluster

● m2-2xlarge
   ○ 34.2 GB merory
   ○ Ubuntu 10.04
   ○ RAID0 x 4 - 1 TB volumes

● Dedicated Snapshot Server
   ○ Shared among clusters
   ○ Serves as an arbiter as well
Chartbeat & MongoDB & EC2 (2)
           Cluster View
MongoDB & EC2 Challenges

● Instances disappear
    ○ MongoDB can have long recovery operations
    ○ MongoDB is (was) not ACID compliant. Unclean
      shutdown could corrupt your data.

● Poor IO performance on EBS
   ○ MongoDB has global read/write lock

● Variable IO performance on EBS
   ○ Could cause replication issues
Question:


                ??
            ?
Disappearing Instances
Instances Disappearing - Master/Slave

● Down-time :(

● Slave-promotion = headache
   ○ New instance
   ○ Copy oplog
   ○ Code change
   ○ Long/manual/error prone
Instances Disappearing - Replica Sets

● No down-time :) yay!

● Automatic failover on writes

● Eventual failover on reads

● No code change
Instances Disappearing - Replica Sets
(caveats)
● pymongo driver reads/writes from primary
   ○ pymongo 2.1 will fix this

● chartbeat pymongo driver
   ○ based on MasterSlaveConnection
   ○ writes to primary
   ○ distribute reads among secondaries
   ○ automatic failover
   ○ eventual read re-distribution
Instances Disappearing - Fact of Life
 ● Accept this fact of life

 ● Always snapshot
    ○ Dedicated snapshot server
    ○ Hidden, i.e. no reads

 ● Automate everything
    ○ puppet
       ■ New instance from scratch within a minute
    ○ python-boto
       ■ Script all EC2 interaction
       ■ new_instance.py
       ■ mount_volumes_from_snap.py -o iid -n iid
       ■ snapshot_mongo.py
Instances Disappearing - Caveats

● New volumes - slow!!!
   ○ EBS loads blocks lazily

● Warm up EBS & File Cache before use
   ○ Options
      ■ Slowly direct the reads (app by app)
      ■ Run cache warm-up scripts
   ○ Not automated currently
Poor IO Performance on EBS
Poor IO Performance on EBS

 ● XFS & RAIDing Helps

but,

 ● Disk IO varies over time

 ● MongoDB holds global lock on writes

 ● Query of death
    ○ Grinding-halt if not careful
Case Study: Historical Data
  ● For historical data, we store time series.
{
key:<key>
ts:<key>
values: {metric1: int1, metric2: int2}
meta:{}
}
   ● High Insert Rate vs Fast Historical Read
       ○ Optimize reads or writes?
   ● Fast inserts: ~1 MB/sec (through append only)
       ○ No disk-seek
   ● Historical reads: painfully slow
Faster Reads Through Cache DB
  ● Avoid reading from disk
  ● Favor reads over writes
  ● Aim for disk & memory locality
 {day_tskey:<key>values: {metric1: list(int), metric2: list(int)}
}

  ● Data for historical reads resides together

  ● .append() to list could cause disk fragmentation
Avoid Fragmentation w/ Preallocation
 ● Fragmentation causes:
     ○ Inefficient disk usage
     ○ Slower writes (due to block allocation)
 ● Preallocate daily arrays instead
     ○ Pros:
          ■ No fragmentation
          ■ Write causes no change in data size
     ○ Cons:
          ■ Wasteful (we don't know keys ahead of time)
          ■ Requires heavy disk IO, ~7MB/sec (~60Mbis/sec on EBS)

 ● Conclusion: spread preallocation over 1 hour
EC2 Performance is Unpredictable
EC2 Unpredictability - Challenges

● Resource contention in virtualized environment

● EBS and Network IO performance varies drastically

● RAID0 over 4 disks = 4 x risk
Heavy Monitoring (1)
● Track individual disk performance over time




● Create a new instance if disk not getting better
Heavy Monitoring (2)
● Monitor replication lag




● Remove from read mix if lag gets too high
   ○ Incorrect data
   ○ Strain on primary
Heavy Monitoring (3)
● Track slow queries / opcounts / track page faults / IO
  volume
   ○ Tweak indexes accordingly
   ○ Limit requested data size if you can
Open Issues

● More granular page-fault / memory usage information
   ○ Difficult due to mmap

● Multi-datacenter usage

● Burn-in scripts

● Sharding
   ○ Tipping point will be insert volume
   ○ Or inefficient read memory usage

● Better understand replication failures
Take-aways (1)

● Automate everything
   ○ Instance creation, snapshotting, mount/unmount
● Strive for high locality & low fragmentation
● Repeatedly revise schema/index
● Heavily monitor
   ○ Server: IO/mem/disk
   ○ MongoDB: Opcounts/Index Hits/Slow queries
   ○ Cluster: Replication lag
   ○ Application: CRUD times
Take-aways (2)
Questions?



     Slides: http://bit.
ly/cb_mongodb_meetup

Weitere ähnliche Inhalte

Was ist angesagt?

Toolchain Independent Distributed Compilation
Toolchain Independent Distributed CompilationToolchain Independent Distributed Compilation
Toolchain Independent Distributed CompilationDietmar Hauser
 
Breaking the RpiDocker challenge
Breaking the RpiDocker challenge Breaking the RpiDocker challenge
Breaking the RpiDocker challenge Nicolas De Loof
 
Sharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjaySharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjayGluster.org
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
.NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov).NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov)ITCamp
 
Improving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois TigeotImproving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois Tigeoteurobsdcon
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016Pierre Mavro
 
Avoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleAvoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleScyllaDB
 
Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Orit Wasserman
 
Framework workshop
Framework workshopFramework workshop
Framework workshopNico Tristan
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...NETWAYS
 
Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_Robert Burrell Donkin
 

Was ist angesagt? (18)

Containers and Logging
Containers and LoggingContainers and Logging
Containers and Logging
 
NUMA and Java Databases
NUMA and Java DatabasesNUMA and Java Databases
NUMA and Java Databases
 
Threads and Node.js
Threads and Node.jsThreads and Node.js
Threads and Node.js
 
Toolchain Independent Distributed Compilation
Toolchain Independent Distributed CompilationToolchain Independent Distributed Compilation
Toolchain Independent Distributed Compilation
 
erlang 101
erlang 101erlang 101
erlang 101
 
Breaking the RpiDocker challenge
Breaking the RpiDocker challenge Breaking the RpiDocker challenge
Breaking the RpiDocker challenge
 
Sharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjaySharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika Dhananjay
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
.NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov).NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov)
 
Improving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois TigeotImproving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois Tigeot
 
Long Term Road Test of C*
Long Term Road Test of C*Long Term Road Test of C*
Long Term Road Test of C*
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Avoiding Data Hotspots at Scale
Avoiding Data Hotspots at ScaleAvoiding Data Hotspots at Scale
Avoiding Data Hotspots at Scale
 
Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018
 
Framework workshop
Framework workshopFramework workshop
Framework workshop
 
Fluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at ScaleFluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at Scale
 
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
 
Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_Threads - Why Can't You Just Play Nicely With Your Memory_
Threads - Why Can't You Just Play Nicely With Your Memory_
 

Andere mochten auch

Pushing the hassle from production to developers. Easily
Pushing the hassle from production to developers. EasilyPushing the hassle from production to developers. Easily
Pushing the hassle from production to developers. EasilyMartin Gutenbrunner
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchMark Greene
 
Modern Monitoring - devopsdays Cuba
Modern Monitoring - devopsdays CubaModern Monitoring - devopsdays Cuba
Modern Monitoring - devopsdays Cubabridgetkromhout
 
Trace everything, when APM meets SysAdmins
Trace everything, when APM meets SysAdminsTrace everything, when APM meets SysAdmins
Trace everything, when APM meets SysAdminsSysdig
 
ContainerDays NYC 2016: "Observability and Manageability in a Container Envir...
ContainerDays NYC 2016: "Observability and Manageability in a Container Envir...ContainerDays NYC 2016: "Observability and Manageability in a Container Envir...
ContainerDays NYC 2016: "Observability and Manageability in a Container Envir...DynamicInfraDays
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDBMongoDB
 
Monitoring distributed (micro-)services
Monitoring distributed (micro-)servicesMonitoring distributed (micro-)services
Monitoring distributed (micro-)servicesRafael Winterhalter
 
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracingTracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracingYuri Shkuro
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONAdrian Cockcroft
 
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesDistributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesContainer Solutions
 
More nines for your dimes: Improving availability and lowering costs using au...
More nines for your dimes: Improving availability and lowering costs using au...More nines for your dimes: Improving availability and lowering costs using au...
More nines for your dimes: Improving availability and lowering costs using au...Amazon Web Services
 

Andere mochten auch (13)

Pushing the hassle from production to developers. Easily
Pushing the hassle from production to developers. EasilyPushing the hassle from production to developers. Easily
Pushing the hassle from production to developers. Easily
 
Building a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearchBuilding a CRM on top of ElasticSearch
Building a CRM on top of ElasticSearch
 
Modern Monitoring - devopsdays Cuba
Modern Monitoring - devopsdays CubaModern Monitoring - devopsdays Cuba
Modern Monitoring - devopsdays Cuba
 
Benchmark slideshow
Benchmark slideshowBenchmark slideshow
Benchmark slideshow
 
Trace everything, when APM meets SysAdmins
Trace everything, when APM meets SysAdminsTrace everything, when APM meets SysAdmins
Trace everything, when APM meets SysAdmins
 
ContainerDays NYC 2016: "Observability and Manageability in a Container Envir...
ContainerDays NYC 2016: "Observability and Manageability in a Container Envir...ContainerDays NYC 2016: "Observability and Manageability in a Container Envir...
ContainerDays NYC 2016: "Observability and Manageability in a Container Envir...
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
Monitoring distributed (micro-)services
Monitoring distributed (micro-)servicesMonitoring distributed (micro-)services
Monitoring distributed (micro-)services
 
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracingTracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and KubernetesDistributed Tracing with OpenTracing, ZipKin and Kubernetes
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
 
More nines for your dimes: Improving availability and lowering costs using au...
More nines for your dimes: Improving availability and lowering costs using au...More nines for your dimes: Improving availability and lowering costs using au...
More nines for your dimes: Improving availability and lowering costs using au...
 

Ähnlich wie Mongodb meetup

The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan BlueDatabricks
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloudOVHcloud
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems confluent
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodbPGConf APAC
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsMatthew Dennis
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkDemi Ben-Ari
 
Data Lessons Learned at Scale
Data Lessons Learned at ScaleData Lessons Learned at Scale
Data Lessons Learned at ScaleCharlie Reverte
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)RichardWarburton
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbWei Shan Ang
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have ToHostedbyConfluent
 
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingChinaNetCloud
 
MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017Severalnines
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingMartinStrycek
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...Amazon Web Services
 
Raft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdfRaft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdffengxun
 

Ähnlich wie Mongodb meetup (20)

The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan Blue
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
 
Data Lessons Learned at Scale
Data Lessons Learned at ScaleData Lessons Learned at Scale
Data Lessons Learned at Scale
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To
 
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud TrainingLinux Memory Basics for SysAdmins - ChinaNetCloud Training
Linux Memory Basics for SysAdmins - ChinaNetCloud Training
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017MySQL Cluster (NDB) - Best Practices Percona Live 2017
MySQL Cluster (NDB) - Best Practices Percona Live 2017
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
AWS re:Invent 2016: Case Study: Librato's Experience Running Cassandra Using ...
 
Raft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdfRaft Engine Meetup 220702.pdf
Raft Engine Meetup 220702.pdf
 

Kürzlich hochgeladen

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Kürzlich hochgeladen (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Mongodb meetup

  • 1. MongoDB & EC2: A Love Story? Eytan Daniyalzade @daniyalzade http://bit.ly/cb_mongodb_meetup
  • 2. Contents ● Chartbeat ● Architecture ● MongoDB & EC2 Challenges ● Happy Ending: (MongoDB ? EC2) ● Takeaways
  • 4. Chartbeat: real-time analytics service ● 18 person startup in New York ● part of Betaworks ● peaking at just under 5M concurrents daily ○ up from 1M in July/2010
  • 5. What chartbeat Provides ● real-time view of site performance ○ top pages ○ new/returning visitors ○ traffic flow ■ where are people coming from ■ where are people going to ● historic replay for the last 30 days
  • 7. Architecture, Browser Part 1: <head> <script type="text/javascript">var _sf_startpt=(new Date()).getTime()</script> ... Part 2: ... function loadChartbeat() { // insert script tag } window.onload = loadChartbeat; </body> (highly simplified) Ping is standard beacon logic, i.e. loading a 1x1 image.
  • 8. Architecture, Backend ● custom libevent-based C backend ○ real-time collection and aggregation ● real-time system in-memory only ● background queue jobs snapshot every x minutes ○ Gearman ● historical data ○ mostly in MongoDB
  • 9. Why Chartbeat uses MongoDB ● Pure JSON all along ○ Live API ○ Historical data ○ No mapping back and forth ● Fast Inserts (fire and forget) ● Flexible Schema
  • 10. Why Chartbeat uses EC2 ● Elastic Capacity ● No trips to datacenter ● EBS snapshots
  • 11. Chartbeat & MongoDB & EC2 (1) ● 3 Clusters ○ 1 for each product ○ 1 as a caching layer ○ 2 - 4 instance/cluster ● m2-2xlarge ○ 34.2 GB merory ○ Ubuntu 10.04 ○ RAID0 x 4 - 1 TB volumes ● Dedicated Snapshot Server ○ Shared among clusters ○ Serves as an arbiter as well
  • 12. Chartbeat & MongoDB & EC2 (2) Cluster View
  • 13. MongoDB & EC2 Challenges ● Instances disappear ○ MongoDB can have long recovery operations ○ MongoDB is (was) not ACID compliant. Unclean shutdown could corrupt your data. ● Poor IO performance on EBS ○ MongoDB has global read/write lock ● Variable IO performance on EBS ○ Could cause replication issues
  • 14. Question: ?? ?
  • 16. Instances Disappearing - Master/Slave ● Down-time :( ● Slave-promotion = headache ○ New instance ○ Copy oplog ○ Code change ○ Long/manual/error prone
  • 17. Instances Disappearing - Replica Sets ● No down-time :) yay! ● Automatic failover on writes ● Eventual failover on reads ● No code change
  • 18. Instances Disappearing - Replica Sets (caveats) ● pymongo driver reads/writes from primary ○ pymongo 2.1 will fix this ● chartbeat pymongo driver ○ based on MasterSlaveConnection ○ writes to primary ○ distribute reads among secondaries ○ automatic failover ○ eventual read re-distribution
  • 19. Instances Disappearing - Fact of Life ● Accept this fact of life ● Always snapshot ○ Dedicated snapshot server ○ Hidden, i.e. no reads ● Automate everything ○ puppet ■ New instance from scratch within a minute ○ python-boto ■ Script all EC2 interaction ■ new_instance.py ■ mount_volumes_from_snap.py -o iid -n iid ■ snapshot_mongo.py
  • 20. Instances Disappearing - Caveats ● New volumes - slow!!! ○ EBS loads blocks lazily ● Warm up EBS & File Cache before use ○ Options ■ Slowly direct the reads (app by app) ■ Run cache warm-up scripts ○ Not automated currently
  • 22. Poor IO Performance on EBS ● XFS & RAIDing Helps but, ● Disk IO varies over time ● MongoDB holds global lock on writes ● Query of death ○ Grinding-halt if not careful
  • 23. Case Study: Historical Data ● For historical data, we store time series. { key:<key> ts:<key> values: {metric1: int1, metric2: int2} meta:{} } ● High Insert Rate vs Fast Historical Read ○ Optimize reads or writes? ● Fast inserts: ~1 MB/sec (through append only) ○ No disk-seek ● Historical reads: painfully slow
  • 24. Faster Reads Through Cache DB ● Avoid reading from disk ● Favor reads over writes ● Aim for disk & memory locality {day_tskey:<key>values: {metric1: list(int), metric2: list(int)} } ● Data for historical reads resides together ● .append() to list could cause disk fragmentation
  • 25. Avoid Fragmentation w/ Preallocation ● Fragmentation causes: ○ Inefficient disk usage ○ Slower writes (due to block allocation) ● Preallocate daily arrays instead ○ Pros: ■ No fragmentation ■ Write causes no change in data size ○ Cons: ■ Wasteful (we don't know keys ahead of time) ■ Requires heavy disk IO, ~7MB/sec (~60Mbis/sec on EBS) ● Conclusion: spread preallocation over 1 hour
  • 26. EC2 Performance is Unpredictable
  • 27. EC2 Unpredictability - Challenges ● Resource contention in virtualized environment ● EBS and Network IO performance varies drastically ● RAID0 over 4 disks = 4 x risk
  • 28. Heavy Monitoring (1) ● Track individual disk performance over time ● Create a new instance if disk not getting better
  • 29. Heavy Monitoring (2) ● Monitor replication lag ● Remove from read mix if lag gets too high ○ Incorrect data ○ Strain on primary
  • 30. Heavy Monitoring (3) ● Track slow queries / opcounts / track page faults / IO volume ○ Tweak indexes accordingly ○ Limit requested data size if you can
  • 31. Open Issues ● More granular page-fault / memory usage information ○ Difficult due to mmap ● Multi-datacenter usage ● Burn-in scripts ● Sharding ○ Tipping point will be insert volume ○ Or inefficient read memory usage ● Better understand replication failures
  • 32. Take-aways (1) ● Automate everything ○ Instance creation, snapshotting, mount/unmount ● Strive for high locality & low fragmentation ● Repeatedly revise schema/index ● Heavily monitor ○ Server: IO/mem/disk ○ MongoDB: Opcounts/Index Hits/Slow queries ○ Cluster: Replication lag ○ Application: CRUD times
  • 34. Questions? Slides: http://bit. ly/cb_mongodb_meetup