Suche senden
Hochladen
Scaling Instagram
•
318 gefällt mir
•
189,477 views
I
iammutex
Folgen
Instagram 扩展性实践
Weniger lesen
Mehr lesen
Technologie
Melden
Teilen
Melden
Teilen
1 von 185
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
NoSQL databases
NoSQL databases
Harri Kauhanen
Apache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
NAVER D2
Alphorm.com Formation Elastic : Maitriser les fondamentaux
Alphorm.com Formation Elastic : Maitriser les fondamentaux
Alphorm
Six Degrees of Domain Admin - BloodHound at DEF CON 24
Six Degrees of Domain Admin - BloodHound at DEF CON 24
Andy Robbins
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Igor Anishchenko
Introduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
BigData_TP3 : Spark
BigData_TP3 : Spark
Lilia Sfaxi
Empfohlen
NoSQL databases
NoSQL databases
Harri Kauhanen
Apache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
NAVER D2
Alphorm.com Formation Elastic : Maitriser les fondamentaux
Alphorm.com Formation Elastic : Maitriser les fondamentaux
Alphorm
Six Degrees of Domain Admin - BloodHound at DEF CON 24
Six Degrees of Domain Admin - BloodHound at DEF CON 24
Andy Robbins
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Igor Anishchenko
Introduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
BigData_TP3 : Spark
BigData_TP3 : Spark
Lilia Sfaxi
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData
Introduction to Redis
Introduction to Redis
Dvir Volk
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
Cryptography for Absolute Beginners (May 2019)
Cryptography for Absolute Beginners (May 2019)
Svetlin Nakov
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
BigData_Chp3: Data Processing
BigData_Chp3: Data Processing
Lilia Sfaxi
Migrating Monolithic Applications with the Strangler Pattern
Migrating Monolithic Applications with the Strangler Pattern
Thanh Nguyen
MongodB Internals
MongodB Internals
Norberto Leite
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
Apache NiFi User Guide
Apache NiFi User Guide
Deon Huang
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...
Amazon Web Services Korea
Content Storage With Apache Jackrabbit
Content Storage With Apache Jackrabbit
Jukka Zitting
Building an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012
Ian Varley
대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest
Terry Cho
BigData_Chp5: Putting it all together
BigData_Chp5: Putting it all together
Lilia Sfaxi
Apache Spark 3 Dynamic Partition Pruning
Apache Spark 3 Dynamic Partition Pruning
Aparup Chatterjee
Intro to Apache Spark
Intro to Apache Spark
Robert Sanders
Cloud Migration
Cloud Migration
Jolyne Marie
Distributed Systems In One Lesson
Distributed Systems In One Lesson
Tim Berglund
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
Amy W. Tang
11 Stats You Didn’t Know About Employee Recognition
11 Stats You Didn’t Know About Employee Recognition
Officevibe
Weitere ähnliche Inhalte
Was ist angesagt?
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData
Introduction to Redis
Introduction to Redis
Dvir Volk
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
Cryptography for Absolute Beginners (May 2019)
Cryptography for Absolute Beginners (May 2019)
Svetlin Nakov
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
BigData_Chp3: Data Processing
BigData_Chp3: Data Processing
Lilia Sfaxi
Migrating Monolithic Applications with the Strangler Pattern
Migrating Monolithic Applications with the Strangler Pattern
Thanh Nguyen
MongodB Internals
MongodB Internals
Norberto Leite
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
Apache NiFi User Guide
Apache NiFi User Guide
Deon Huang
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...
Amazon Web Services Korea
Content Storage With Apache Jackrabbit
Content Storage With Apache Jackrabbit
Jukka Zitting
Building an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012
Ian Varley
대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest
Terry Cho
BigData_Chp5: Putting it all together
BigData_Chp5: Putting it all together
Lilia Sfaxi
Apache Spark 3 Dynamic Partition Pruning
Apache Spark 3 Dynamic Partition Pruning
Aparup Chatterjee
Intro to Apache Spark
Intro to Apache Spark
Robert Sanders
Cloud Migration
Cloud Migration
Jolyne Marie
Distributed Systems In One Lesson
Distributed Systems In One Lesson
Tim Berglund
Was ist angesagt?
(20)
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
Introduction to Redis
Introduction to Redis
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Cryptography for Absolute Beginners (May 2019)
Cryptography for Absolute Beginners (May 2019)
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
BigData_Chp3: Data Processing
BigData_Chp3: Data Processing
Migrating Monolithic Applications with the Strangler Pattern
Migrating Monolithic Applications with the Strangler Pattern
MongodB Internals
MongodB Internals
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Apache NiFi User Guide
Apache NiFi User Guide
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...
Content Storage With Apache Jackrabbit
Content Storage With Apache Jackrabbit
Building an open data platform with apache iceberg
Building an open data platform with apache iceberg
HBase Schema Design - HBase-Con 2012
HBase Schema Design - HBase-Con 2012
대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest
BigData_Chp5: Putting it all together
BigData_Chp5: Putting it all together
Apache Spark 3 Dynamic Partition Pruning
Apache Spark 3 Dynamic Partition Pruning
Intro to Apache Spark
Intro to Apache Spark
Cloud Migration
Cloud Migration
Distributed Systems In One Lesson
Distributed Systems In One Lesson
Andere mochten auch
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
Amy W. Tang
11 Stats You Didn’t Know About Employee Recognition
11 Stats You Didn’t Know About Employee Recognition
Officevibe
Dropbox startup lessons learned 2011
Dropbox startup lessons learned 2011
Eric Ries
Dropbox Startup Lessons Learned
Dropbox Startup Lessons Learned
gueste94e4c
Startup Ideas and Validation
Startup Ideas and Validation
Yevgeniy Brikman
The Little Book of IDEO: Values
The Little Book of IDEO: Values
Tim Brown
Andere mochten auch
(6)
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
11 Stats You Didn’t Know About Employee Recognition
11 Stats You Didn’t Know About Employee Recognition
Dropbox startup lessons learned 2011
Dropbox startup lessons learned 2011
Dropbox Startup Lessons Learned
Dropbox Startup Lessons Learned
Startup Ideas and Validation
Startup Ideas and Validation
The Little Book of IDEO: Values
The Little Book of IDEO: Values
Ähnlich wie Scaling Instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
Mohit Jain
How a Small Team Scales Instagram
How a Small Team Scales Instagram
C4Media
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
Jean-Luc David
OrientDB for real & Web App development
OrientDB for real & Web App development
Luca Garulli
Intro to Spark development
Intro to Spark development
Spark Summit
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
Andy Petrella
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
UA Mobile
Introduction to Spark Training
Introduction to Spark Training
Spark Summit
Architecture by Accident
Architecture by Accident
Gleicon Moraes
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
Paco Nathan
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...
Codemotion
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
StampedeCon
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
What's new with Apache Spark?
What's new with Apache Spark?
Paco Nathan
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
Mike Broberg
The Future of Computing is Distributed
The Future of Computing is Distributed
Alluxio, Inc.
Scaling PHP apps
Scaling PHP apps
Matteo Moretti
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Maarten Balliauw
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
Paco Nathan
Ähnlich wie Scaling Instagram
(20)
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
How a Small Team Scales Instagram
How a Small Team Scales Instagram
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
Mike Krieger - A Brief, Rapid History of Scaling Instagram (with a tiny team)
OrientDB for real & Web App development
OrientDB for real & Web App development
Intro to Spark development
Intro to Spark development
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
Критика "библиотечного" подхода в разработке под Android. UA Mobile 2016.
Introduction to Spark Training
Introduction to Spark Training
Architecture by Accident
Architecture by Accident
How Apache Spark fits in the Big Data landscape
How Apache Spark fits in the Big Data landscape
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
What's new with Apache Spark?
What's new with Apache Spark?
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
The Future of Computing is Distributed
The Future of Computing is Distributed
Scaling PHP apps
Scaling PHP apps
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
Mehr von iammutex
Redis深入浅出
Redis深入浅出
iammutex
深入了解Redis
深入了解Redis
iammutex
NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析
iammutex
MongoDB 在盛大大数据量下的应用
MongoDB 在盛大大数据量下的应用
iammutex
8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide
iammutex
skip list
skip list
iammutex
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Models
iammutex
Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告
iammutex
redis 适用场景与实现
redis 适用场景与实现
iammutex
Introduction to couchdb
Introduction to couchdb
iammutex
What every data programmer needs to know about disks
What every data programmer needs to know about disks
iammutex
Ooredis
Ooredis
iammutex
Ooredis
Ooredis
iammutex
redis运维之道
redis运维之道
iammutex
Realtime hadoopsigmod2011
Realtime hadoopsigmod2011
iammutex
[译]No sql生态系统
[译]No sql生态系统
iammutex
Couchdb + Membase = Couchbase
Couchdb + Membase = Couchbase
iammutex
Redis cluster
Redis cluster
iammutex
Redis cluster
Redis cluster
iammutex
Hadoop introduction berlin buzzwords 2011
Hadoop introduction berlin buzzwords 2011
iammutex
Mehr von iammutex
(20)
Redis深入浅出
Redis深入浅出
深入了解Redis
深入了解Redis
NoSQL误用和常见陷阱分析
NoSQL误用和常见陷阱分析
MongoDB 在盛大大数据量下的应用
MongoDB 在盛大大数据量下的应用
8 minute MongoDB tutorial slide
8 minute MongoDB tutorial slide
skip list
skip list
Thoughts on Transaction and Consistency Models
Thoughts on Transaction and Consistency Models
Rethink db&tokudb调研测试报告
Rethink db&tokudb调研测试报告
redis 适用场景与实现
redis 适用场景与实现
Introduction to couchdb
Introduction to couchdb
What every data programmer needs to know about disks
What every data programmer needs to know about disks
Ooredis
Ooredis
Ooredis
Ooredis
redis运维之道
redis运维之道
Realtime hadoopsigmod2011
Realtime hadoopsigmod2011
[译]No sql生态系统
[译]No sql生态系统
Couchdb + Membase = Couchbase
Couchdb + Membase = Couchbase
Redis cluster
Redis cluster
Redis cluster
Redis cluster
Hadoop introduction berlin buzzwords 2011
Hadoop introduction berlin buzzwords 2011
Kürzlich hochgeladen
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Boston Institute of Analytics
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
MIND CTI
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Juan lago vázquez
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
The Digital Insurer
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
SynarionITSolutions
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Kürzlich hochgeladen
(20)
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Scaling Instagram
1.
Scaling Instagram
AirBnB Tech Talk 2012 Mike Krieger Instagram
2.
me -
Co-founder, Instagram - Previously: UX & Front-end @ Meebo - Stanford HCI BS/MS - @mikeyk on everything
3.
4.
5.
6.
communicating and sharing in
the real world
7.
30+ million users
in less than 2 years
8.
the story of
how we scaled it
9.
a brief tangent
10.
the beginning
11.
Text
12.
2 product guys
13.
no real back-end
experience
14.
analytics & python
@ meebo
15.
CouchDB
16.
CrimeDesk SF
17.
18.
let’s get hacking
19.
good components in
place early on
20.
...but were hosted
on a single machine somewhere in LA
21.
22.
less powerful than
my MacBook Pro
23.
okay, we launched.
now what?
24.
25k signups in
the first day
25.
everything is on
fire!
26.
best & worst
day of our lives so far
27.
load was through
the roof
28.
first culprit?
29.
30.
favicon.ico
31.
404-ing on Django, causing
tons of errors
32.
lesson #1: don’t
forget your favicon
33.
real lesson #1:
most of your initial scaling problems won’t be glamorous
34.
favicon
35.
ulimit -n
36.
memcached -t 4
37.
prefork/postfork
38.
friday rolls around
39.
not slowing down
40.
let’s move to
EC2.
41.
42.
43.
scaling = replacing
all components of a car while driving it at 100mph
44.
since...
45.
“"canonical [architecture] of an
early stage startup in this era." (HighScalability.com)
46.
Nginx & Redis & Postgres
& Django.
47.
Nginx & HAProxy
& Redis & Memcached & Postgres & Gearman & Django.
48.
24h Ops
49.
50.
51.
our philosophy
52.
1 simplicity
53.
2 optimize for minimal
operational burden
54.
3 instrument everything
55.
walkthrough: 1 scaling the
database 2 choosing technology 3 staying nimble 4 scaling for android
56.
1 scaling the
db
57.
early days
58.
django ORM, postgresql
59.
why pg? postgis.
60.
moved db to
its own machine
61.
but photos kept
growing and growing...
62.
...and only 68GB
of RAM on biggest machine in EC2
63.
so what now?
64.
vertical partitioning
65.
django db routers
make it pretty easy
66.
def db_for_read(self, model):
if app_label == 'photos': return 'photodb'
67.
...once you untangle
all your foreign key relationships
68.
a few months
later...
69.
photosdb > 60GB
70.
what now?
71.
horizontal partitioning!
72.
aka: sharding
73.
“surely we’ll have
hired someone experienced before we actually need to shard”
74.
you don’t get
to choose when scaling challenges come up
75.
evaluated solutions
76.
at the time,
none were up to task of being our primary DB
77.
did in Postgres
itself
78.
what’s painful about
sharding?
79.
1 data retrieval
80.
hard to know
what your primary access patterns will be w/out any usage
81.
in most cases,
user ID
82.
2 what happens
if one of your shards gets too big?
83.
in range-based schemes
(like MongoDB), you split
84.
A-H: shard0 I-Z: shard1
85.
A-D:
shard0 E-H: shard2 I-P: shard1 Q-Z: shard2
86.
downsides (especially on
EC2): disk IO
87.
instead, we pre-split
88.
many many many (thousands)
of logical shards
89.
that map to
fewer physical ones
90.
// 8 logical
shards on 2 machines user_id % 8 = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: A, 3: A, 4: B, 5: B, 6: B, 7: B }
91.
// 8 logical
shards on 2 4 machines user_id % 8 = logical shard logical shards -> physical shard map { 0: A, 1: A, 2: C, 3: C, 4: B, 5: B, 6: D, 7: D }
92.
little known but
awesome PG feature: schemas
93.
not “columns” schema
94.
- database:
- schema: - table: - columns
95.
machineA: shard0
photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user
96.
machineA:
machineA’: shard0 shard0 photos_by_user photos_by_user shard1 shard1 photos_by_user photos_by_user shard2 shard2 photos_by_user photos_by_user shard3 shard3 photos_by_user photos_by_user
97.
machineA:
machineC: shard0 shard0 photos_by_user photos_by_user shard1 shard1 photos_by_user photos_by_user shard2 shard2 photos_by_user photos_by_user shard3 shard3 photos_by_user photos_by_user
98.
can do this
as long as you have more logical shards than physical ones
99.
lesson: take tech/tools you
know and try first to adapt them into a simple solution
100.
2 which tools
where?
101.
where to cache
/ otherwise denormalize data
102.
we <3 redis
103.
what happens when
a user posts a photo?
104.
1 user uploads
photo with (optional) caption and location
105.
2 synchronous write
to the media database for that user
106.
3 queues!
107.
3a if geotagged,
async worker POSTs to Solr
108.
3b follower delivery
109.
can’t have every
user who loads her timeline look up all their followers and then their photos
110.
instead, everyone gets
their own list in Redis
111.
media ID is
pushed onto a list for every person who’s following this user
112.
Redis is awesome
for this; rapid insert, rapid subsets
113.
when time to
render a feed, we take small # of IDs, go look up info in memcached
114.
Redis is great
for...
115.
data structures that
are relatively bounded
116.
(don’t tie yourself
to a solution where your in- memory DB is your main data store)
117.
caching complex objects where
you want to more than GET
118.
ex: counting, sub-
ranges, testing membership
119.
especially when Taylor Swift
posts live from the CMAs
120.
follow graph
121.
v1: simple DB
table (source_id, target_id, status)
122.
who do I
follow? who follows me? do I follow X? does X follow me?
123.
DB was busy,
so we started storing parallel version in Redis
124.
follow_all(300 item list)
125.
inconsistency
126.
extra logic
127.
so much extra
logic
128.
exposing your support
team to the idea of cache invalidation
129.
130.
redesign took a
page from twitter’s book
131.
PG can handle
tens of thousands of requests, very light memcached caching
132.
two takeaways
133.
1 have a
versatile complement to your core data storage (like Redis)
134.
2 try not
to have two tools trying to do the same job
135.
3 staying nimble
136.
2010: 2 engineers
137.
2011: 3 engineers
138.
2012: 5 engineers
139.
scarcity -> focus
140.
engineer solutions that
you’re not constantly returning to because they broke
141.
1 extensive unit-tests
and functional tests
142.
2 keep it
DRY
143.
3 loose coupling
using notifications / signals
144.
4 do most
of our work in Python, drop to C when necessary
145.
5 frequent code
reviews, pull requests to keep things in the ‘shared brain’
146.
6 extensive monitoring
147.
munin
148.
statsd
149.
150.
“how is the
system right now?”
151.
“how does this
compare to historical trends?”
152.
scaling for android
153.
1 million new
users in 12 hours
154.
great tools that
enable easy read scalability
155.
redis: slaveof <host>
<port>
156.
our Redis framework assumes
0+ readslaves
157.
tight iteration loops
158.
statsd & pgfouine
159.
know where you
can shed load if needed
160.
(e.g. shorter feeds)
161.
if you’re tempted
to reinvent the wheel...
162.
don’t.
163.
“our app servers sometimes
kernel panic under load”
164.
...
165.
“what if we
write a monitoring daemon...”
166.
wait! this is
exactly what HAProxy is great at
167.
surround yourself with
awesome advisors
168.
culture of openness around
engineering
169.
give back; e.g.
node2dm
170.
focus on making
what you have better
171.
“fast, beautiful photo
sharing”
172.
“can we make
all of our requests 50% the time?”
173.
staying nimble =
remind yourself of what’s important
174.
your users around
the world don’t care that you wrote your own DB
175.
wrapping up
176.
unprecedented times
177.
2 backend engineers can
scale a system to 30+ million users
178.
key word =
simplicity
179.
cleanest solution with
the fewest moving parts as possible
180.
don’t over-optimize or expect
to know ahead of time how site will scale
181.
don’t think “someone else
will join & take care of this”
182.
will happen sooner
than you think; surround yourself with great advisors
183.
when adding software
to stack: only if you have to, optimizing for operational simplicity
184.
few, if any,
unsolvable scaling challenges for a social startup
185.
have fun
Jetzt herunterladen