SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Building               Stats

          Richard Crowley
      richard@opendns.com
@400000004a381ba80c294ddc   q1   69.64.43.245 normal 558867 alt2.gmail-smtp-in.l.google.com. 1 0
Then: 8 billion DNS queries per day
@400000004a381ba80dd39e94
@400000004a381ba80dd3a664
                            q1
                            q1
                                 163.192.13.30 normal 894966 dns.hitachi-koki.co.jp. 1 0
                                 63.84.243.25 normal 0 photos-d.ak.fbcdn.net. 1 0
@400000004a381ba80dd3ae34   q1   24.155.125.240 normal 1045953 my-iqquiz.com. 1 0
@400000004a381ba80dd3b604   q1   64.253.103.18 normal 788290 6.164.133.166.in-addr.arpa. 12 2
@400000004a381ba80dd3bdd4   q1   70.246.80.10 normal 0 googleads.g.doubleclick.net. 1 0
@400000004a381ba80dd3c5a4   q1   98.108.66.45 normal 0 _ldap._tcp.nj-bloomfield._sites.dc._msdcs.mrii.c
@400000004a381ba80dd41b94   q1   98.144.16.195 normal 0 js.casalemedia.com. 1 0
@400000004a381ba80dd42364   q1   68.165.29.60 normal 0 img-cdn.mediaplex.com. 1 0
@400000004a381ba80dd42b34   q1   12.233.75.219 normal 0 zsmseno.clnet.cz. 1 0
@400000004a381ba80dd43304   q1   174.37.58.88 normal 0 70.96.118.85.bl.spamcop.net. 16 0
@400000004a381ba80dd43ad4   q1   208.76.86.13 normal 519070 252.76.75.208.bl.spamcop.net. 1 3
@400000004a381ba80dd442a4   q1   201.138.19.196 normal 0 isatap.domain.local. 1 3
@400000004a381ba80dd465cc   q1   24.192.98.53 normal 0 208.85.224.82.in-addr.arpa. 12 0
@400000004a381ba80dd46d9c   q1   64.91.71.57 normal 0 liveupdate.symantecliveupdate.com. 1 0
@400000004a381ba80dd4756c   q1   69.64.43.245 normal 558867 alt4.gmail-smtp-in.l.google.com. 1 0
@400000004a381ba80dd47d3c   q1   69.64.43.245 normal 558867 alt4.gmail-smtp-in.l.google.com. 1 0
@400000004a381ba80dd4850c   q1   72.10.191.11 normal 812477 iprep1.t.ctmail.com. 1 0
@400000004a381ba80dd49c7c   q1   12.233.75.219 normal 0 zsmseno.clnet.cz. 1 0
@400000004a381ba80dd4a44c   q1   69.157.60.79 normal 0 img-cdn.mediaplex.com. 1 0
@400000004a381ba80dd4ac1c   q1   208.43.52.205 nxdomain 0 haghway.com.br. 1 0
@400000004a381ba80dd4b3ec   q1   204.145.0.242 normal 488877 105.12.90.201.asetnhap5duax9a26l24rda5g3gv
@400000004a381ba80dd4bbbc   q1   206.246.157.1 normal 0 penninegas.co.uk. 15 2
@400000004a381ba80dd4c38c   q1   69.21.243.131 normal 0 svn.atomicobject.com. 28 0
@400000004a381ba80dd4dafc   q1   163.192.13.65 normal 894966 dns.hitachi-koki.co.jp. 1 0
@400000004a381ba80dd4e2cc   q1   76.65.199.42 nxdomain 0 cs16.msg.dcn.yahoo.com. 1 0
@400000004a381ba80dd4ea9c   q1   189.169.97.227 normal 0 impaktosoo.gateway.2wire.net. 1 3
@400000004a381ba80dd4f26c   q1   69.64.43.245 normal 558867 gmail.com. 15 0
@400000004a381ba80dd4f654   q1   189.168.174.182 normal 0 wpad.2wire.net. 1 3
@400000004a381ba80dd4fe24   q1   69.64.43.245 normal 558867 alt3.gmail-smtp-in.l.google.com. 1 0
@400000004a381ba80dd51594   q1   189.133.170.67 normal 0 v13.lscache5.googlevideo.com. 1 0
@400000004a381ba80dd538bc   q1   12.186.60.189 nxdomain 0 carolyn5.ktemca.com. 1 0
@400000004a381ba80dd5408c   q1   72.249.148.132 normal 384918 mailin-04.mx.aol.com. 1 0
@400000004a381ba80dd5485c   q1   76.65.199.42 nxdomain 0 csa.yahoo.com. 1 0
@400000004a381ba80dd5502c   q1   208.73.228.5 normal 119716 3.0.0.172.in-addr.arpa. 12 3
@400000004a381ba80dd55414
@400000004a381ba80dd55be4   Now: 14 billion DNS queries per day
                            q1
                            q1
                                 72.249.26.8 normal 0 schnurr.de. 1 0
                                 96.61.141.172 servfail 0 bc2.gamingsquared.com. 1 0
Logs are silly, let’s make graphs
High level design from my OpenDNS interview


    map/reduce/ish


    Stage 1 buckets data by network
    Stage 2 aggregates and stores


    Prefers to duplicate data rather than omit data


    Give each network a separate table (keeps each table
    small(er) and keeps the primary key small(er))
False starts
False start #1: storing domains

    auto_increment is bad (table lock)


    Use the SHA1 of the domain as primary key


    Currently we have 2 machines storing domains
    About 48 GB in each domains.ibd
    28 GB memcached across 8 machines
    effectively makes this database write-only
False start #2: std::bad_alloc

  Stage 2 aggregated too much data and ran out of memory


  Bad idea: improve the heuristic used to guess
  memory usage and prevent std::bad_alloc


  Good idea: catch std::bad_alloc, clean up and restart
  Pre-allocating buffers that will be reused makes this easy


  Protip: Run two programs (memcached and Stage 2, for
  example) compiled 32-bit on a 64-bit CPU with 8 GB RAM
False start #3: open tables

  80+ %iowait from opening and closing tables


  strace showed lots of calls to open() and close()
  strace crashed MySQL


  Altered mysqld_safe to set ulimit -n 600000
False start #4: MyISAM


    Didn’t mind table locks, so I used MyISAM


    12 MB/sec total across 4 nodes


    Migration to InnoDB is in progress
    Expect a 2x improvement from InnoDB
    innodb_flush_log_at_trx_commit=2
Architecture
Bird’s eye view                                        Resolvers
                          Domains DB     User DB       (worldwide)


              Proxy




Web servers
(Palo Alto)




                                         Stage 1
              Stats DBs     Stage 2


                                       San Francisco
Stage 1 (“map”)

   rsync log files from our DNS servers to
   3 servers in San Francisco


   Looking up a network in memcached (or $GLOBALS)
   gives the preferred Stage 2


   Write log lines back to local disk,
   one bucket for each Stage 2 machine


   Future work: automated rebalancing and failover
Stage 2 data structures
{
    “db1”: {                             Stats aggregation (pseudocode)
      “123456”: {
        “2009-06-17”: {
          “last_updated”: 1234567890,
          “file_ptrs”: [0xDEADBEEF, 0xDECAFBAD],
          “topdomains”: {
             “xkcd.com”: [12,3,5,47,0,0,6,10,1,9,2,3,0,4,2,0,5,12,19,35,32,2,4,0],
          },
          “requesttypes”: { “A”: [ /* 24 hours */ ], “MX”: [ /* 24 hours */ ] },
          “uniqueips”: { “1.2.3.4”: [ /* 24 hours */ ] }
        }
      }
    }
}


__gnu_cxx::hash_map<
  char *, // Filename                        File reference counting (C++)
  std::pair<
    unsigned int, // Reference count
    pthread_t // Owning thread or NULL
  >,
  hash_ptr // Hashes a pointer as if it were an integer
>
Stage 2 (“reduce”)
 rsync intermediate files from all Stage 1 servers

 8 aggregator threads read intermediate files into memory

 8 pruning threads write SQL statements to disk
 They decide what to prune based on the last_updated time
 They prefer to prune data that allows many files to be deleted


 Files are reference counted and only deleted
 when all of their rows are on disk as SQL
Stats Databases (“satan”)
  MySQL 5.0.77-percona
  12 disks
  16 GB RAM


  table_cache=300000


  innodb_dict_size_limit=2G
  innodb_flush_log_at_trx_commit=2
Website

  opendns.com is in Palo Alto
  DNS Stats are in San Francisco


  (Private) JSON API proxies small chunks
  of stats data to the website as needed


  Queries are done with no LIMIT clause
  Results are paginated in memcached (TTL = 1 hour)
Questions?

   http://opendns.com/dashboard/stats


   http://rcrowley.org/talks/opendns_stats.pdf


   richard@opendns.com

   Photo credits: http://flic.kr/p/4Szofb, http://flic.kr/p/4aH3YK,
   http://flic.kr/p/RUfEt, http://flic.kr/p/4Zng8Y, http://flic.kr/p/2MRnuq,
   http://flic.kr/p/9T4HX, http://flic.kr/p/41eEvH, http://flic.kr/p/5Rhxbq,
   http://flic.kr/p/68RgCp, http://flic.kr/p/oEVp, http://flic.kr/p/tfpXk,
   http://flic.kr/p/4Twpd4

Weitere ähnliche Inhalte

Was ist angesagt?

Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part IIIAlkin Tezuysal
 
Cassandra SF 2013 - In Case Of Emergency Break Glass
Cassandra SF 2013 - In Case Of Emergency Break GlassCassandra SF 2013 - In Case Of Emergency Break Glass
Cassandra SF 2013 - In Case Of Emergency Break Glassaaronmorton
 
Redis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamRedis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamCodemotion
 
MongoDB Drivers And High Availability: Deep Dive
MongoDB Drivers And High Availability: Deep DiveMongoDB Drivers And High Availability: Deep Dive
MongoDB Drivers And High Availability: Deep Diveemptysquare
 
Analysis of Compromised Linux Server
Analysis of Compromised Linux ServerAnalysis of Compromised Linux Server
Analysis of Compromised Linux Serveranandvaidya
 
Improving Authenticated Dynamic Dictionaries, with Applications to Cryptocurr...
Improving Authenticated Dynamic Dictionaries, with Applications to Cryptocurr...Improving Authenticated Dynamic Dictionaries, with Applications to Cryptocurr...
Improving Authenticated Dynamic Dictionaries, with Applications to Cryptocurr...Alex Chepurnoy
 
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeSCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeJeff Frost
 
MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External AuthenticationJason Terpko
 
Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGDuyhai Doan
 
Riyaj real world performance issues rac focus
Riyaj real world performance issues rac focusRiyaj real world performance issues rac focus
Riyaj real world performance issues rac focusRiyaj Shamsudeen
 
Become a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsBecome a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsTier1app
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsNicholas Kiraly
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialJason Terpko
 
High Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatHigh Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatChris Barber
 
Advanced rac troubleshooting
Advanced rac troubleshootingAdvanced rac troubleshooting
Advanced rac troubleshootingRiyaj Shamsudeen
 

Was ist angesagt? (18)

Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part III
 
Cassandra SF 2013 - In Case Of Emergency Break Glass
Cassandra SF 2013 - In Case Of Emergency Break GlassCassandra SF 2013 - In Case Of Emergency Break Glass
Cassandra SF 2013 - In Case Of Emergency Break Glass
 
Redis - for duplicate detection on real time stream
Redis - for duplicate detection on real time streamRedis - for duplicate detection on real time stream
Redis - for duplicate detection on real time stream
 
MongoDB Drivers And High Availability: Deep Dive
MongoDB Drivers And High Availability: Deep DiveMongoDB Drivers And High Availability: Deep Dive
MongoDB Drivers And High Availability: Deep Dive
 
Analysis of Compromised Linux Server
Analysis of Compromised Linux ServerAnalysis of Compromised Linux Server
Analysis of Compromised Linux Server
 
Improving Authenticated Dynamic Dictionaries, with Applications to Cryptocurr...
Improving Authenticated Dynamic Dictionaries, with Applications to Cryptocurr...Improving Authenticated Dynamic Dictionaries, with Applications to Cryptocurr...
Improving Authenticated Dynamic Dictionaries, with Applications to Cryptocurr...
 
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeSCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
 
MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External Authentication
 
Shapira oda perf_webinar_v2
Shapira oda perf_webinar_v2Shapira oda perf_webinar_v2
Shapira oda perf_webinar_v2
 
Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUG
 
CryptoWall: How It Works
CryptoWall: How It WorksCryptoWall: How It Works
CryptoWall: How It Works
 
Riyaj real world performance issues rac focus
Riyaj real world performance issues rac focusRiyaj real world performance issues rac focus
Riyaj real world performance issues rac focus
 
Become a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsBecome a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day Devops
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and Operators
 
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
 
High Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatHigh Availability With DRBD & Heartbeat
High Availability With DRBD & Heartbeat
 
Advanced rac troubleshooting
Advanced rac troubleshootingAdvanced rac troubleshooting
Advanced rac troubleshooting
 

Ähnlich wie Building OpenDNS Stats

Hacker Halted 2014 - Why Botnet Takedowns Never Work, Unless It’s a SmackDown!
Hacker Halted 2014 - Why Botnet Takedowns Never Work, Unless It’s a SmackDown!Hacker Halted 2014 - Why Botnet Takedowns Never Work, Unless It’s a SmackDown!
Hacker Halted 2014 - Why Botnet Takedowns Never Work, Unless It’s a SmackDown!EC-Council
 
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason JonesASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jonesarborjjones
 
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?SegFaultConf
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemCyber Security Alliance
 
Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performanceRicky Zhu
 
1404 app dev series - session 8 - monitoring & performance tuning
1404   app dev series - session 8 - monitoring & performance tuning1404   app dev series - session 8 - monitoring & performance tuning
1404 app dev series - session 8 - monitoring & performance tuningMongoDB
 
Live Memory Forensics on Android devices
Live Memory Forensics on Android devicesLive Memory Forensics on Android devices
Live Memory Forensics on Android devicesNikos Gkogkos
 
Sourcefire Vulnerability Research Team Labs
Sourcefire Vulnerability Research Team LabsSourcefire Vulnerability Research Team Labs
Sourcefire Vulnerability Research Team Labslosalamos
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMartin Zapletal
 
A curious case of broken dns responses - RIPE75
A curious case of broken dns responses - RIPE75A curious case of broken dns responses - RIPE75
A curious case of broken dns responses - RIPE75Babak Farrokhi
 
Don't dump thread dumps
Don't dump thread dumpsDon't dump thread dumps
Don't dump thread dumpsTier1app
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Databasewangzhonnew
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PROIDEA
 
hacking-embedded-devices.pptx
hacking-embedded-devices.pptxhacking-embedded-devices.pptx
hacking-embedded-devices.pptxssuserfcf43f
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfgaros1
 

Ähnlich wie Building OpenDNS Stats (20)

Hacker Halted 2014 - Why Botnet Takedowns Never Work, Unless It’s a SmackDown!
Hacker Halted 2014 - Why Botnet Takedowns Never Work, Unless It’s a SmackDown!Hacker Halted 2014 - Why Botnet Takedowns Never Work, Unless It’s a SmackDown!
Hacker Halted 2014 - Why Botnet Takedowns Never Work, Unless It’s a SmackDown!
 
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason JonesASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
ASERT's DDoS Malware Corral, Volume 1 by Dennis Schwarz and Jason Jones
 
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
 
Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performance
 
1404 app dev series - session 8 - monitoring & performance tuning
1404   app dev series - session 8 - monitoring & performance tuning1404   app dev series - session 8 - monitoring & performance tuning
1404 app dev series - session 8 - monitoring & performance tuning
 
Live Memory Forensics on Android devices
Live Memory Forensics on Android devicesLive Memory Forensics on Android devices
Live Memory Forensics on Android devices
 
Sourcefire Vulnerability Research Team Labs
Sourcefire Vulnerability Research Team LabsSourcefire Vulnerability Research Team Labs
Sourcefire Vulnerability Research Team Labs
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
A curious case of broken dns responses - RIPE75
A curious case of broken dns responses - RIPE75A curious case of broken dns responses - RIPE75
A curious case of broken dns responses - RIPE75
 
Don't dump thread dumps
Don't dump thread dumpsDon't dump thread dumps
Don't dump thread dumps
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Quick Wins
Quick WinsQuick Wins
Quick Wins
 
SequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational DatabaseSequoiaDB Distributed Relational Database
SequoiaDB Distributed Relational Database
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 
Stress your DUT
Stress your DUTStress your DUT
Stress your DUT
 
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
PLNOG20 - Paweł Małachowski - Stress your DUT–wykorzystanie narzędzi open sou...
 
hacking-embedded-devices.pptx
hacking-embedded-devices.pptxhacking-embedded-devices.pptx
hacking-embedded-devices.pptx
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdf
 

Mehr von George Ang

Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...George Ang
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarizationGeorge Ang
 
Huffman coding
Huffman codingHuffman coding
Huffman codingGeorge Ang
 
Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textGeorge Ang
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿George Ang
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势George Ang
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程George Ang
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qqGeorge Ang
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道George Ang
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化George Ang
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间George Ang
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨George Ang
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站George Ang
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程George Ang
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagementGeorge Ang
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享George Ang
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享George Ang
 

Mehr von George Ang (20)

Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarization
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
 
Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar text
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qq
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享
 

Kürzlich hochgeladen

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Kürzlich hochgeladen (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Building OpenDNS Stats

  • 1. Building Stats Richard Crowley richard@opendns.com
  • 2. @400000004a381ba80c294ddc q1 69.64.43.245 normal 558867 alt2.gmail-smtp-in.l.google.com. 1 0 Then: 8 billion DNS queries per day @400000004a381ba80dd39e94 @400000004a381ba80dd3a664 q1 q1 163.192.13.30 normal 894966 dns.hitachi-koki.co.jp. 1 0 63.84.243.25 normal 0 photos-d.ak.fbcdn.net. 1 0 @400000004a381ba80dd3ae34 q1 24.155.125.240 normal 1045953 my-iqquiz.com. 1 0 @400000004a381ba80dd3b604 q1 64.253.103.18 normal 788290 6.164.133.166.in-addr.arpa. 12 2 @400000004a381ba80dd3bdd4 q1 70.246.80.10 normal 0 googleads.g.doubleclick.net. 1 0 @400000004a381ba80dd3c5a4 q1 98.108.66.45 normal 0 _ldap._tcp.nj-bloomfield._sites.dc._msdcs.mrii.c @400000004a381ba80dd41b94 q1 98.144.16.195 normal 0 js.casalemedia.com. 1 0 @400000004a381ba80dd42364 q1 68.165.29.60 normal 0 img-cdn.mediaplex.com. 1 0 @400000004a381ba80dd42b34 q1 12.233.75.219 normal 0 zsmseno.clnet.cz. 1 0 @400000004a381ba80dd43304 q1 174.37.58.88 normal 0 70.96.118.85.bl.spamcop.net. 16 0 @400000004a381ba80dd43ad4 q1 208.76.86.13 normal 519070 252.76.75.208.bl.spamcop.net. 1 3 @400000004a381ba80dd442a4 q1 201.138.19.196 normal 0 isatap.domain.local. 1 3 @400000004a381ba80dd465cc q1 24.192.98.53 normal 0 208.85.224.82.in-addr.arpa. 12 0 @400000004a381ba80dd46d9c q1 64.91.71.57 normal 0 liveupdate.symantecliveupdate.com. 1 0 @400000004a381ba80dd4756c q1 69.64.43.245 normal 558867 alt4.gmail-smtp-in.l.google.com. 1 0 @400000004a381ba80dd47d3c q1 69.64.43.245 normal 558867 alt4.gmail-smtp-in.l.google.com. 1 0 @400000004a381ba80dd4850c q1 72.10.191.11 normal 812477 iprep1.t.ctmail.com. 1 0 @400000004a381ba80dd49c7c q1 12.233.75.219 normal 0 zsmseno.clnet.cz. 1 0 @400000004a381ba80dd4a44c q1 69.157.60.79 normal 0 img-cdn.mediaplex.com. 1 0 @400000004a381ba80dd4ac1c q1 208.43.52.205 nxdomain 0 haghway.com.br. 1 0 @400000004a381ba80dd4b3ec q1 204.145.0.242 normal 488877 105.12.90.201.asetnhap5duax9a26l24rda5g3gv @400000004a381ba80dd4bbbc q1 206.246.157.1 normal 0 penninegas.co.uk. 15 2 @400000004a381ba80dd4c38c q1 69.21.243.131 normal 0 svn.atomicobject.com. 28 0 @400000004a381ba80dd4dafc q1 163.192.13.65 normal 894966 dns.hitachi-koki.co.jp. 1 0 @400000004a381ba80dd4e2cc q1 76.65.199.42 nxdomain 0 cs16.msg.dcn.yahoo.com. 1 0 @400000004a381ba80dd4ea9c q1 189.169.97.227 normal 0 impaktosoo.gateway.2wire.net. 1 3 @400000004a381ba80dd4f26c q1 69.64.43.245 normal 558867 gmail.com. 15 0 @400000004a381ba80dd4f654 q1 189.168.174.182 normal 0 wpad.2wire.net. 1 3 @400000004a381ba80dd4fe24 q1 69.64.43.245 normal 558867 alt3.gmail-smtp-in.l.google.com. 1 0 @400000004a381ba80dd51594 q1 189.133.170.67 normal 0 v13.lscache5.googlevideo.com. 1 0 @400000004a381ba80dd538bc q1 12.186.60.189 nxdomain 0 carolyn5.ktemca.com. 1 0 @400000004a381ba80dd5408c q1 72.249.148.132 normal 384918 mailin-04.mx.aol.com. 1 0 @400000004a381ba80dd5485c q1 76.65.199.42 nxdomain 0 csa.yahoo.com. 1 0 @400000004a381ba80dd5502c q1 208.73.228.5 normal 119716 3.0.0.172.in-addr.arpa. 12 3 @400000004a381ba80dd55414 @400000004a381ba80dd55be4 Now: 14 billion DNS queries per day q1 q1 72.249.26.8 normal 0 schnurr.de. 1 0 96.61.141.172 servfail 0 bc2.gamingsquared.com. 1 0
  • 3. Logs are silly, let’s make graphs
  • 4. High level design from my OpenDNS interview map/reduce/ish Stage 1 buckets data by network Stage 2 aggregates and stores Prefers to duplicate data rather than omit data Give each network a separate table (keeps each table small(er) and keeps the primary key small(er))
  • 6. False start #1: storing domains auto_increment is bad (table lock) Use the SHA1 of the domain as primary key Currently we have 2 machines storing domains About 48 GB in each domains.ibd 28 GB memcached across 8 machines effectively makes this database write-only
  • 7. False start #2: std::bad_alloc Stage 2 aggregated too much data and ran out of memory Bad idea: improve the heuristic used to guess memory usage and prevent std::bad_alloc Good idea: catch std::bad_alloc, clean up and restart Pre-allocating buffers that will be reused makes this easy Protip: Run two programs (memcached and Stage 2, for example) compiled 32-bit on a 64-bit CPU with 8 GB RAM
  • 8. False start #3: open tables 80+ %iowait from opening and closing tables strace showed lots of calls to open() and close() strace crashed MySQL Altered mysqld_safe to set ulimit -n 600000
  • 9. False start #4: MyISAM Didn’t mind table locks, so I used MyISAM 12 MB/sec total across 4 nodes Migration to InnoDB is in progress Expect a 2x improvement from InnoDB innodb_flush_log_at_trx_commit=2
  • 11. Bird’s eye view Resolvers Domains DB User DB (worldwide) Proxy Web servers (Palo Alto) Stage 1 Stats DBs Stage 2 San Francisco
  • 12. Stage 1 (“map”) rsync log files from our DNS servers to 3 servers in San Francisco Looking up a network in memcached (or $GLOBALS) gives the preferred Stage 2 Write log lines back to local disk, one bucket for each Stage 2 machine Future work: automated rebalancing and failover
  • 13. Stage 2 data structures { “db1”: { Stats aggregation (pseudocode) “123456”: { “2009-06-17”: { “last_updated”: 1234567890, “file_ptrs”: [0xDEADBEEF, 0xDECAFBAD], “topdomains”: { “xkcd.com”: [12,3,5,47,0,0,6,10,1,9,2,3,0,4,2,0,5,12,19,35,32,2,4,0], }, “requesttypes”: { “A”: [ /* 24 hours */ ], “MX”: [ /* 24 hours */ ] }, “uniqueips”: { “1.2.3.4”: [ /* 24 hours */ ] } } } } } __gnu_cxx::hash_map< char *, // Filename File reference counting (C++) std::pair< unsigned int, // Reference count pthread_t // Owning thread or NULL >, hash_ptr // Hashes a pointer as if it were an integer >
  • 14. Stage 2 (“reduce”) rsync intermediate files from all Stage 1 servers 8 aggregator threads read intermediate files into memory 8 pruning threads write SQL statements to disk They decide what to prune based on the last_updated time They prefer to prune data that allows many files to be deleted Files are reference counted and only deleted when all of their rows are on disk as SQL
  • 15. Stats Databases (“satan”) MySQL 5.0.77-percona 12 disks 16 GB RAM table_cache=300000 innodb_dict_size_limit=2G innodb_flush_log_at_trx_commit=2
  • 16. Website opendns.com is in Palo Alto DNS Stats are in San Francisco (Private) JSON API proxies small chunks of stats data to the website as needed Queries are done with no LIMIT clause Results are paginated in memcached (TTL = 1 hour)
  • 17. Questions? http://opendns.com/dashboard/stats http://rcrowley.org/talks/opendns_stats.pdf richard@opendns.com Photo credits: http://flic.kr/p/4Szofb, http://flic.kr/p/4aH3YK, http://flic.kr/p/RUfEt, http://flic.kr/p/4Zng8Y, http://flic.kr/p/2MRnuq, http://flic.kr/p/9T4HX, http://flic.kr/p/41eEvH, http://flic.kr/p/5Rhxbq, http://flic.kr/p/68RgCp, http://flic.kr/p/oEVp, http://flic.kr/p/tfpXk, http://flic.kr/p/4Twpd4