SlideShare a Scribd company logo
1 of 20
Building a Scalable Online Backup System in Python Joe Drumgoole http://twitter/jdrumgoole
Scaling You probably shouldn’t care Throughput vs response time Scaling is a fractal problem The database is what will get ya! Amazing what a well tuned DB will support http://twitter.com/jdrumgoole 2
PutPlace Architecture http://twitter.com/jdrumgoole 3
Online Backup Not really a Web 2.0 play More like client server Larger vision of PutPlace Map of your Digital World http://twitter.com/jdrumgoole 4
Online Backup : Client Installation and support of Windows 20** Mac Support Open file/locked file handling Bandwidth throttling CPU Throttling Upload restarts Feedback http://twitter.com/jdrumgoole 5
Online Backup : Server Don’t loose any files De-duplication Thumbnail generation for images Flickr Backup Client Feedback Bulk download File relationships http://twitter.com/jdrumgoole 6
Online Backup - Secrets People Don’t backup Compute dominates Restores represent 0.01% of bandwidth and load Writing web clones of Windows Explorer is hard The browser sucks as a client side app container (for now) http://twitter.com/jdrumgoole 7
Scaling For online backup the challenge is to receive shed loads of data from lots of clients Clients upload in 1MB chunks Chunks must be stored coalesced and push to stable backup (S3) Clients must get acknowledgement Web page must update Quota management http://twitter.com/jdrumgoole 8
Load Balancer Load Balancer : Perlbal http://www.danga.com/perlbal/ Can handle 100 x millions of requests per day Event based (sshhh : Don’t tell anyone, but its Perl!) It does fall over occasionally Otherwise works perfectly http://twitter.com/jdrumgoole 9
App Server Our app servers: Handle login Deliver web pages Handle uploads from clients Hand off heavy duty processing to task servers Thumbnail generation File coalescing Checksum generation Hand off is via a database queue http://twitter.com/jdrumgoole 10
App Server Just Django Instances Templates deliver web pages Views handle chunks/login etc. Models update the database Task Servers do the heavy lifting http://twitter.com/jdrumgoole 11
Task Server Run off a database queue (table) Four main task servers: Assemble completed file uploads Create thumbnails Remove deleted files Generate user statistics Servers are multi-threaded http://twitter.com/jdrumgoole 12
Refactoring Originally N blacknight servers writing to NFS Then N blacknight servers writing to S3 Then N EC2 servers writing to S3 The N EC2 servers writing to MogileFS/S3 Lots of uploading optimisations along the way http://twitter.com/jdrumgoole 13
Results System has successfully uploaded over 100k files in a single day Regularily does 50k files a day Have about 2k registered users Continues to get registrations Runs in lights out mode (no daily/weekly/monthly housekeeping) http://twitter.com/jdrumgoole 14
What worked Python proved extremely flexible Standard library saved us lots of work Django provided a lot of glue Easy to  migrate from dedicated host on NFS to Cloud Hosting and S3 storage Nagios/Monitis monitoring http://twitter.com/jdrumgoole 15
What Didn’t Work Would use MySQL rather than Postgres Easier to cluster, more knowledge available Native Windows Client Unecessary, Python client was good enough Would use an off the shelf queueing system RabbitMQ, ActiveMQ, SQS Kludgey client side API Threading The Client http://twitter.com/jdrumgoole 16
Tool Chain Wush.net 		: Subversion and Trac DynDNS: Dynamic DNS Python/Django: Dev Stack Postgres: Database Hudson		: Build Server Perlbal: Load Balancing MogileFS		: Distributed File System Memcached 		: Caching Nagios, Monitis: Monitoring Hamachi		: VPN through Firewall Google Apps	: Email, Calendar, Docs, Wiki AuthSMTP		: Validated SMTP Zendesk: Support Desk Amazon		: Storage, Compute, Bandwidth Paypal		: Billing http://twitter.com/jdrumgoole 17
Costs Capital Expenditure One server 5k euro One laptop per developer 2.5k (7 devs) One Linksys WIFI/Firewall (won at Raffle) Two 24 port switches 1.6k Total: ~24k Running Costs for Grid and Storage ~1800 euro a month (8 instances) http://twitter.com/jdrumgoole 18
If I Were Doing it Again Stick with native python client Look at eventing ala Node.js for server Use MySQL Use Google App Engine as Front End/Load Balancer Use a commercial queueing package http://twitter.com/jdrumgoole 19
Thanks Q&A http://twitter.com/jdrumgoole 20

More Related Content

What's hot

Altitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeAltitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeFastly
 
4.2. Web analyst fiddler
4.2. Web analyst fiddler4.2. Web analyst fiddler
4.2. Web analyst fiddlerdefconmoscow
 
HTTP/2 Update - FOSDEM 2016
HTTP/2 Update - FOSDEM 2016HTTP/2 Update - FOSDEM 2016
HTTP/2 Update - FOSDEM 2016Daniel Stenberg
 
I got 99 problems, but ReST ain't one
I got 99 problems, but ReST ain't oneI got 99 problems, but ReST ain't one
I got 99 problems, but ReST ain't oneAdrian Cole
 
Service workers: what and why UmbUKFest 2018!
Service workers: what and why UmbUKFest 2018!Service workers: what and why UmbUKFest 2018!
Service workers: what and why UmbUKFest 2018!Matthew Wise
 
Efficient HTTP Apis
Efficient HTTP ApisEfficient HTTP Apis
Efficient HTTP ApisAdrian Cole
 
Background Processing - PyCon MY 2015
Background Processing - PyCon MY 2015Background Processing - PyCon MY 2015
Background Processing - PyCon MY 2015Kok Hoor Chew
 
Debugging with Fiddler
Debugging with FiddlerDebugging with Fiddler
Debugging with FiddlerIdo Flatow
 
Websockets at tossug
Websockets at tossugWebsockets at tossug
Websockets at tossugclkao
 
HTTP colon slash slash: the end of the road?
HTTP colon slash slash: the end of the road?HTTP colon slash slash: the end of the road?
HTTP colon slash slash: the end of the road?Alessandro Nadalin
 
HTTP/2 for Developers
HTTP/2 for DevelopersHTTP/2 for Developers
HTTP/2 for DevelopersSvetlin Nakov
 
Introduction to HTTP/2
Introduction to HTTP/2Introduction to HTTP/2
Introduction to HTTP/2Ido Flatow
 
HTTP 2.0 – What do I need to know?
HTTP 2.0 – What do I need to know? HTTP 2.0 – What do I need to know?
HTTP 2.0 – What do I need to know? Sigma Software
 
JS digest. Decemebr 2017
JS digest. Decemebr 2017JS digest. Decemebr 2017
JS digest. Decemebr 2017ElifTech
 
HTTP 2.0 Why, How and When
HTTP 2.0 Why, How and WhenHTTP 2.0 Why, How and When
HTTP 2.0 Why, How and WhenCodemotion
 
HTTP/2 Introduction
HTTP/2 IntroductionHTTP/2 Introduction
HTTP/2 IntroductionWalter Liu
 
Search in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itSearch in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itOtto Kekäläinen
 
Getting the most out of WebPageTest
Getting the most out of WebPageTestGetting the most out of WebPageTest
Getting the most out of WebPageTestPatrick Meenan
 
Scaling my sql_in_3d
Scaling my sql_in_3dScaling my sql_in_3d
Scaling my sql_in_3dsarahnovotny
 

What's hot (20)

Altitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the EdgeAltitude San Francisco 2018: Programming the Edge
Altitude San Francisco 2018: Programming the Edge
 
4.2. Web analyst fiddler
4.2. Web analyst fiddler4.2. Web analyst fiddler
4.2. Web analyst fiddler
 
HTTP/2 Update - FOSDEM 2016
HTTP/2 Update - FOSDEM 2016HTTP/2 Update - FOSDEM 2016
HTTP/2 Update - FOSDEM 2016
 
I got 99 problems, but ReST ain't one
I got 99 problems, but ReST ain't oneI got 99 problems, but ReST ain't one
I got 99 problems, but ReST ain't one
 
Service workers: what and why UmbUKFest 2018!
Service workers: what and why UmbUKFest 2018!Service workers: what and why UmbUKFest 2018!
Service workers: what and why UmbUKFest 2018!
 
Efficient HTTP Apis
Efficient HTTP ApisEfficient HTTP Apis
Efficient HTTP Apis
 
Background Processing - PyCon MY 2015
Background Processing - PyCon MY 2015Background Processing - PyCon MY 2015
Background Processing - PyCon MY 2015
 
Debugging with Fiddler
Debugging with FiddlerDebugging with Fiddler
Debugging with Fiddler
 
Websockets at tossug
Websockets at tossugWebsockets at tossug
Websockets at tossug
 
HTTP colon slash slash: the end of the road?
HTTP colon slash slash: the end of the road?HTTP colon slash slash: the end of the road?
HTTP colon slash slash: the end of the road?
 
HTTP/2 for Developers
HTTP/2 for DevelopersHTTP/2 for Developers
HTTP/2 for Developers
 
Introduction to HTTP/2
Introduction to HTTP/2Introduction to HTTP/2
Introduction to HTTP/2
 
HTTP 2.0 – What do I need to know?
HTTP 2.0 – What do I need to know? HTTP 2.0 – What do I need to know?
HTTP 2.0 – What do I need to know?
 
Web sockets in Java
Web sockets in JavaWeb sockets in Java
Web sockets in Java
 
JS digest. Decemebr 2017
JS digest. Decemebr 2017JS digest. Decemebr 2017
JS digest. Decemebr 2017
 
HTTP 2.0 Why, How and When
HTTP 2.0 Why, How and WhenHTTP 2.0 Why, How and When
HTTP 2.0 Why, How and When
 
HTTP/2 Introduction
HTTP/2 IntroductionHTTP/2 Introduction
HTTP/2 Introduction
 
Search in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize itSearch in WordPress - how it works and howto customize it
Search in WordPress - how it works and howto customize it
 
Getting the most out of WebPageTest
Getting the most out of WebPageTestGetting the most out of WebPageTest
Getting the most out of WebPageTest
 
Scaling my sql_in_3d
Scaling my sql_in_3dScaling my sql_in_3d
Scaling my sql_in_3d
 

Similar to Building a scalable online backup system in python

Getting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and HowGetting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and HowAaron Peters
 
Http/2 - What's it all about?
Http/2  - What's it all about?Http/2  - What's it all about?
Http/2 - What's it all about?Andy Davies
 
The Case for HTTP/2 - Internetdagarna 2015 - Stockholm
The Case for HTTP/2  - Internetdagarna 2015 - StockholmThe Case for HTTP/2  - Internetdagarna 2015 - Stockholm
The Case for HTTP/2 - Internetdagarna 2015 - StockholmAndy Davies
 
REST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in MainzREST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in MainzAlessandro Nadalin
 
Web performance Part 1 "Networking theoretical basis"
Web performance Part 1 "Networking theoretical basis"Web performance Part 1 "Networking theoretical basis"
Web performance Part 1 "Networking theoretical basis"Binary Studio
 
The case for HTTP/2
The case for HTTP/2The case for HTTP/2
The case for HTTP/2GreeceJS
 
The Case for HTTP/2 - GreeceJS - June 2016
The Case for HTTP/2 -  GreeceJS - June 2016The Case for HTTP/2 -  GreeceJS - June 2016
The Case for HTTP/2 - GreeceJS - June 2016Andy Davies
 
EQR Reporting: Rails + Amazon EC2
EQR Reporting:  Rails + Amazon EC2EQR Reporting:  Rails + Amazon EC2
EQR Reporting: Rails + Amazon EC2jeperkins4
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Nati Shalom
 
Optimising Web Application Frontend
Optimising Web Application FrontendOptimising Web Application Frontend
Optimising Web Application Frontendtkramar
 
Make Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedMake Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedAndy Kucharski
 
On-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-upOn-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-upJonathan Lee
 
data.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt Dowledata.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt DowleSri Ambati
 
Meetup Tech Talk on Web Performance
Meetup Tech Talk on Web PerformanceMeetup Tech Talk on Web Performance
Meetup Tech Talk on Web PerformanceJean Tunis
 
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...Holger Bartel
 
Make Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedMake Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedPromet Source
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
My other computer is a datacentre
My other computer is a datacentreMy other computer is a datacentre
My other computer is a datacentreSteve Loughran
 

Similar to Building a scalable online backup system in python (20)

Getting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and HowGetting a Grip on CDN Performance - Why and How
Getting a Grip on CDN Performance - Why and How
 
Http/2 - What's it all about?
Http/2  - What's it all about?Http/2  - What's it all about?
Http/2 - What's it all about?
 
The Case for HTTP/2 - Internetdagarna 2015 - Stockholm
The Case for HTTP/2  - Internetdagarna 2015 - StockholmThe Case for HTTP/2  - Internetdagarna 2015 - Stockholm
The Case for HTTP/2 - Internetdagarna 2015 - Stockholm
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
REST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in MainzREST in peace @ IPC 2012 in Mainz
REST in peace @ IPC 2012 in Mainz
 
Web performance Part 1 "Networking theoretical basis"
Web performance Part 1 "Networking theoretical basis"Web performance Part 1 "Networking theoretical basis"
Web performance Part 1 "Networking theoretical basis"
 
The case for HTTP/2
The case for HTTP/2The case for HTTP/2
The case for HTTP/2
 
The Case for HTTP/2 - GreeceJS - June 2016
The Case for HTTP/2 -  GreeceJS - June 2016The Case for HTTP/2 -  GreeceJS - June 2016
The Case for HTTP/2 - GreeceJS - June 2016
 
EQR Reporting: Rails + Amazon EC2
EQR Reporting:  Rails + Amazon EC2EQR Reporting:  Rails + Amazon EC2
EQR Reporting: Rails + Amazon EC2
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
Optimising Web Application Frontend
Optimising Web Application FrontendOptimising Web Application Frontend
Optimising Web Application Frontend
 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
 
Make Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedMake Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speed
 
On-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-upOn-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-up
 
data.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt Dowledata.table and H2O at LondonR with Matt Dowle
data.table and H2O at LondonR with Matt Dowle
 
Meetup Tech Talk on Web Performance
Meetup Tech Talk on Web PerformanceMeetup Tech Talk on Web Performance
Meetup Tech Talk on Web Performance
 
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...
Web Performance in the Age of HTTP/2 - FEDay Conference, Guangzhou, China 19/...
 
Make Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speedMake Drupal Run Fast - increase page load speed
Make Drupal Run Fast - increase page load speed
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
My other computer is a datacentre
My other computer is a datacentreMy other computer is a datacentre
My other computer is a datacentre
 

More from Joe Drumgoole

MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema DesignJoe Drumgoole
 
The Rise of Microservices
The Rise of MicroservicesThe Rise of Microservices
The Rise of MicroservicesJoe Drumgoole
 
Back to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB ApplicationBack to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB ApplicationJoe Drumgoole
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLJoe Drumgoole
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopJoe Drumgoole
 
Server discovery and monitoring with MongoDB
Server discovery and monitoring with MongoDBServer discovery and monitoring with MongoDB
Server discovery and monitoring with MongoDBJoe Drumgoole
 
Event sourcing the best ubiquitous pattern you have never heard off
Event sourcing   the best ubiquitous pattern you have never heard offEvent sourcing   the best ubiquitous pattern you have never heard off
Event sourcing the best ubiquitous pattern you have never heard offJoe Drumgoole
 
EuroPython 2016 : A Deep Dive into the Pymongo Driver
EuroPython 2016 : A Deep Dive into the Pymongo DriverEuroPython 2016 : A Deep Dive into the Pymongo Driver
EuroPython 2016 : A Deep Dive into the Pymongo DriverJoe Drumgoole
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationJoe Drumgoole
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLJoe Drumgoole
 
Introduction to CQRS and Event Sourcing
Introduction to CQRS and Event SourcingIntroduction to CQRS and Event Sourcing
Introduction to CQRS and Event SourcingJoe Drumgoole
 
Back to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsBack to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsJoe Drumgoole
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB ApplicationJoe Drumgoole
 
Back to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQLBack to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQLJoe Drumgoole
 
Cloud Computing - Halfway through the revolution
Cloud Computing - Halfway through the revolutionCloud Computing - Halfway through the revolution
Cloud Computing - Halfway through the revolutionJoe Drumgoole
 
Enterprise mobility for fun and profit
Enterprise mobility for fun and profitEnterprise mobility for fun and profit
Enterprise mobility for fun and profitJoe Drumgoole
 
How to run a company for 2k a year
How to run a company for 2k a yearHow to run a company for 2k a year
How to run a company for 2k a yearJoe Drumgoole
 
Mobile monday mhealth
Mobile monday mhealthMobile monday mhealth
Mobile monday mhealthJoe Drumgoole
 
Harness the web and grow your business
Harness the web and grow your businessHarness the web and grow your business
Harness the web and grow your businessJoe Drumgoole
 

More from Joe Drumgoole (20)

MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
The Rise of Microservices
The Rise of MicroservicesThe Rise of Microservices
The Rise of Microservices
 
Back to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB ApplicationBack to Basics 2017 - Your First MongoDB Application
Back to Basics 2017 - Your First MongoDB Application
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQL
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB Workshop
 
Server discovery and monitoring with MongoDB
Server discovery and monitoring with MongoDBServer discovery and monitoring with MongoDB
Server discovery and monitoring with MongoDB
 
Event sourcing the best ubiquitous pattern you have never heard off
Event sourcing   the best ubiquitous pattern you have never heard offEvent sourcing   the best ubiquitous pattern you have never heard off
Event sourcing the best ubiquitous pattern you have never heard off
 
EuroPython 2016 : A Deep Dive into the Pymongo Driver
EuroPython 2016 : A Deep Dive into the Pymongo DriverEuroPython 2016 : A Deep Dive into the Pymongo Driver
EuroPython 2016 : A Deep Dive into the Pymongo Driver
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Introduction to CQRS and Event Sourcing
Introduction to CQRS and Event SourcingIntroduction to CQRS and Event Sourcing
Introduction to CQRS and Event Sourcing
 
Back to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in DocumentsBack to Basics Webinar 3 - Thinking in Documents
Back to Basics Webinar 3 - Thinking in Documents
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB Application
 
Back to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQLBack to Basics Webinar 1 - Introduction to NoSQL
Back to Basics Webinar 1 - Introduction to NoSQL
 
Cloud Computing - Halfway through the revolution
Cloud Computing - Halfway through the revolutionCloud Computing - Halfway through the revolution
Cloud Computing - Halfway through the revolution
 
Enterprise mobility for fun and profit
Enterprise mobility for fun and profitEnterprise mobility for fun and profit
Enterprise mobility for fun and profit
 
How to run a company for 2k a year
How to run a company for 2k a yearHow to run a company for 2k a year
How to run a company for 2k a year
 
Mobile monday mhealth
Mobile monday mhealthMobile monday mhealth
Mobile monday mhealth
 
Cloudsplit original
Cloudsplit originalCloudsplit original
Cloudsplit original
 
Harness the web and grow your business
Harness the web and grow your businessHarness the web and grow your business
Harness the web and grow your business
 

Recently uploaded

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Building a scalable online backup system in python

  • 1. Building a Scalable Online Backup System in Python Joe Drumgoole http://twitter/jdrumgoole
  • 2. Scaling You probably shouldn’t care Throughput vs response time Scaling is a fractal problem The database is what will get ya! Amazing what a well tuned DB will support http://twitter.com/jdrumgoole 2
  • 4. Online Backup Not really a Web 2.0 play More like client server Larger vision of PutPlace Map of your Digital World http://twitter.com/jdrumgoole 4
  • 5. Online Backup : Client Installation and support of Windows 20** Mac Support Open file/locked file handling Bandwidth throttling CPU Throttling Upload restarts Feedback http://twitter.com/jdrumgoole 5
  • 6. Online Backup : Server Don’t loose any files De-duplication Thumbnail generation for images Flickr Backup Client Feedback Bulk download File relationships http://twitter.com/jdrumgoole 6
  • 7. Online Backup - Secrets People Don’t backup Compute dominates Restores represent 0.01% of bandwidth and load Writing web clones of Windows Explorer is hard The browser sucks as a client side app container (for now) http://twitter.com/jdrumgoole 7
  • 8. Scaling For online backup the challenge is to receive shed loads of data from lots of clients Clients upload in 1MB chunks Chunks must be stored coalesced and push to stable backup (S3) Clients must get acknowledgement Web page must update Quota management http://twitter.com/jdrumgoole 8
  • 9. Load Balancer Load Balancer : Perlbal http://www.danga.com/perlbal/ Can handle 100 x millions of requests per day Event based (sshhh : Don’t tell anyone, but its Perl!) It does fall over occasionally Otherwise works perfectly http://twitter.com/jdrumgoole 9
  • 10. App Server Our app servers: Handle login Deliver web pages Handle uploads from clients Hand off heavy duty processing to task servers Thumbnail generation File coalescing Checksum generation Hand off is via a database queue http://twitter.com/jdrumgoole 10
  • 11. App Server Just Django Instances Templates deliver web pages Views handle chunks/login etc. Models update the database Task Servers do the heavy lifting http://twitter.com/jdrumgoole 11
  • 12. Task Server Run off a database queue (table) Four main task servers: Assemble completed file uploads Create thumbnails Remove deleted files Generate user statistics Servers are multi-threaded http://twitter.com/jdrumgoole 12
  • 13. Refactoring Originally N blacknight servers writing to NFS Then N blacknight servers writing to S3 Then N EC2 servers writing to S3 The N EC2 servers writing to MogileFS/S3 Lots of uploading optimisations along the way http://twitter.com/jdrumgoole 13
  • 14. Results System has successfully uploaded over 100k files in a single day Regularily does 50k files a day Have about 2k registered users Continues to get registrations Runs in lights out mode (no daily/weekly/monthly housekeeping) http://twitter.com/jdrumgoole 14
  • 15. What worked Python proved extremely flexible Standard library saved us lots of work Django provided a lot of glue Easy to migrate from dedicated host on NFS to Cloud Hosting and S3 storage Nagios/Monitis monitoring http://twitter.com/jdrumgoole 15
  • 16. What Didn’t Work Would use MySQL rather than Postgres Easier to cluster, more knowledge available Native Windows Client Unecessary, Python client was good enough Would use an off the shelf queueing system RabbitMQ, ActiveMQ, SQS Kludgey client side API Threading The Client http://twitter.com/jdrumgoole 16
  • 17. Tool Chain Wush.net : Subversion and Trac DynDNS: Dynamic DNS Python/Django: Dev Stack Postgres: Database Hudson : Build Server Perlbal: Load Balancing MogileFS : Distributed File System Memcached : Caching Nagios, Monitis: Monitoring Hamachi : VPN through Firewall Google Apps : Email, Calendar, Docs, Wiki AuthSMTP : Validated SMTP Zendesk: Support Desk Amazon : Storage, Compute, Bandwidth Paypal : Billing http://twitter.com/jdrumgoole 17
  • 18. Costs Capital Expenditure One server 5k euro One laptop per developer 2.5k (7 devs) One Linksys WIFI/Firewall (won at Raffle) Two 24 port switches 1.6k Total: ~24k Running Costs for Grid and Storage ~1800 euro a month (8 instances) http://twitter.com/jdrumgoole 18
  • 19. If I Were Doing it Again Stick with native python client Look at eventing ala Node.js for server Use MySQL Use Google App Engine as Front End/Load Balancer Use a commercial queueing package http://twitter.com/jdrumgoole 19