SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Scalability
                                Set Amazon’s Servers on Fire, Not Yours




                                        Parks Hall Fire, July 3, 2002 - http://www.acadweb.wwu.edu/dbrunner/

                                                                                                                                                                      1
Managing hardware in our datacenter is a pain. We do it because it’s a necessary evil. Amazon’s starting to abstract some of that away via APIs - and I’m thrilled.
Why trust us?
              Bootstrapped.
              Profitable.
              No debt.
              140M photos.
              192TB at S3.
              Doubling yearly.




                                                                                                                    2
SmugMug’s a bunch of green-haired freaks, right? Yes, but we also know a lot about storage, and doing it cheaply.
Why trust us?
               Bootstrapped.
               Profitable.
               No debt.
               140M photos.
               192TB at S3.
               Doubling yearly.
               Super Heroes.


                                           3
Oh, yes, and we’re also Super Heroes. :)
Biz Stuff




                                                               SmugMug’s Founders

                                                                                    4
The presentation is broken into two parts - Biz & Geek. This is the Biz section.
Our Love Affair with S3




                                                                                                 5
It’s no secret that we love S3. But, like all good love aairs, it has it’s ups and downs. :)
Our Love Affair with S3

                  Always on, global, infinite storage.

                   Inexpensive. $0.15/GB/month w/replicas.

                   Easy. REST API. (SOAP too, but...)

                   Fast. Not 15K-SCSI fast, but Internet fast.

                   Game changer.




                                                                                                                                                                                            6
S3, or Simple Storage Service, solves a huge chunk of our storage problems. The CEO in me thinks I should keep my mouth shut, since it really levels the playing field, but the geek in me
just thinks it’s too cool. :)
Amazon? Infrastructure?




                                            Photo by Bob Knight - http://bobknight.smugmug.com/
                                                                                                  7
Amazon’s just a book store, right? Wrong.
Amazon? Infrastructure?

                   Started with books.

                   Soon added CDs  DVDs.

                  Toys R Us, Borders, Target.

                   zShops, Marketplace, E-Commerce API

                   People building their businesses on Amazon is cool.

                  What else do we have lurking in the corners?


                                                                                                                                                                                        8
I’m not totally sure how Amazon came up with AWS, but I’ll bet it went something like this. It sure makes sense that they began to like having businesses building on top of them and
their expertise. And I don’t buy the argument that this is silly because Amazon’s a bookseller. What a dumb argument. In reality, Amazon’s finding ways to monetize other things they
do well. More businesses should do this.
Why use them?

                  Not a lot of web-scale expertise on Planet Earth.

                  Reputation for systems.

                  Once competed with Amazon - fatbrain {*}

                 They eat their own dogfood. Dozens of products.

                  Focus on the app, not the muck.




                                                                                                                                                                                    9
You can count the # of companies who do this on one, possibly two, hands. My father (SmugMug’s co-founder) competed directly with Amazon with his last company, fatbrain, so we
know just how talented they are at their business and their infrastructure. Amazon does use S3 and the other services themselves (and yes, when S3 has had problems, Amazon’s had
problems. I watched.)
Show me the money!




                                                                             Photo by Kirk Tanner - http://kirktanner.smugmug.com/
                                                                                                                                     10
Money doesn’t grow on trees, everyone knows that. But in this case, it’s pretty dang close.
Show me the money!

                   Guesstimate: ~$500K saved per year.
                   Actual:
                           Growth: 64M photos - 140M photos
                           Disks would cost: $40K - $100K/month.
                           $922K would have been spent.
                           $230K spent instead.
                    $692K in cold, hard savings.
                   Nasty taxes! $295K ‘saved’ in cash flow. Bonus!
                   Reselling disks - recouping sunk cost.
                                                                                                                                                                                              11
Early on in S3, I estimated we’d save $500K. Here’s the latest hard numbers. We’ve been using S3 since April of 2006, so it’s nearly a year. Total saved? $692K. Plus we don’t have to
pre-pay some stupid taxes on the order of $295K. (Technically, not savings, because the gov’t would give it back to us over 5 years - but still, I’d like to keep that $295K, thanks). Plus
we’re actually thinking of re-selling some of the disks we had bought in the past, recouping some of our sunk costs.
$ sweet spots

                   Perfect for startups  small companies.

                   Ideal for ‘store lots, serve little’ businesses of all sizes.

               Not so great (yet?) for serving lots if you’re a medium
              or large sized business. Transfer costs high if you can
              buy bandwidth in 1 Gbps+ chunks.

                  We’re a ‘store lots, serve lots’ company. What to do?




                                                                                                                                                                                           12
S3 is great if you’re a small company that can’t or won’t buy lots of bandwidth. It’s also great if you’re gonna just store a lot, but not read or write it often. Why? Because Amazon’s
storage rate ($0.15/GB/month) is fantastic, but the transfer rate ($0.20/GB) is merely competitive, rather than being fantastic. If you can buy bandwidth in 1 Gbps chunks, you can
probably save a few pennies doing it yourself.
Geek Stuff

    5 of my employees.                                                                                     Me with my NeXT gear on.




                                                                                                                                                             13
I’ve been a geek for a long time. Here’s the photographic proof. I was probably 10 here. Oh, yes, and now we’re onto the geek half of the presentation. :)
Like SmugFS

                   Architecture remarkably similar to SmugFS.

                    Similar to lots of startups.

                    Stupid we’re all building the same thing.

                    Easy to drop-in.

                    Started on Monday, live in production on Friday.



                                                                                                                                                                                            14
We had our own redundant, replicated, reliable internal storage system, SmugFS. Lots of recent startups probably have similar architectures, and they’ve all likely just built it themselves.
It’s stupid we’re all building the same thing over and over. Amazon S3 saves everyone that step. It was super-easy to drop into our code because it was so similar to SmugFS already.
When I started writing the code on a Monday, we were live and in production the Friday of that week.
Our S3 evolution

                   Started just doing secondary storage. Too cold!

                  Tried out as Primary. Too hot!

                   Finally, hot  cold model = Just right!

                  Amazon gets 100% of the data.

                   SmugMug keeps “hot” data local.

                   95% reduction in # of disks bought.


                                                                                                                                                                                      15
We’ve played around with a few dierent models with S3. At first, they were just backup. They worked so well, we wanted to do more (and save more money), so we tried them as stand-
alone storage. That didn’t work quite as well when they had one of their hiccups, so we next tried a hot/cold model, which works really really well. Amazon is our primary storage, and
we use SmugFS as our local hot cache. We end up storing 100% of the data at Amazon, and 10% locally. In the end, we need 95% less disks in our datacenter than we did before.
Sample Request

                 Client ‘Smuggy’ - www.smugmug.com
                  “Hey, gimme photo 31337”
                 www.smugmug.com - SmugFS
                  “Hey, you got photo 31337?”
                  If YES, send to Smuggy.
                  If NO:
                 Log that it wasn’t in SmugFS for analysis.
                 www.smugmug.com - Amazon S3
                  “Hey, you got photo 31337?”
                  If YES, send to Smuggy.
                  If NO:
                 PANIC! :)
                                                                                                                     16
Here’s a sample request for a SmugMug photo. We rarely, if ever, get to PANIC stage, but I’m sure it could happen.
Proxy vs Redirect vs Direct Links

                   Built SmugMug-S3 with multiple modes.

                   Can flip a switch to change.

                   Nearly 100% served are proxy reads.

                   Sometimes HTTP redirects.

                   Rarely direct S3 links.




                                                                                                                                                                                       17
We have three modes in SmugMug’s codebase, and can switch between them at will on-the-fly. We can proxy read from S3 and then serve it to the customer, we can send an HTTP
redirect to the S3 object, or we can embed real S3 urls (CNAME’d to smugmug.com) in our HTML. Almost 100% of our stu is served via proxy read so we can try hitting our cache first
(saving on transfer costs to Amazon), make sure we have the permissions right, etc.
Permissions

                 We have complicated permissions.

                  Passwords, privacy, external links, oh my!

                  Proxying allows strong protection.




                                                                                                                                                                                  18
We have a rich permissions model at SmugMug, and need to make sure all the permissions are intact when someone tries to view a photo. Proxying allows the strongest protection,
though HTTP redirects are also quite strong with time-expiring S3 URLs.
REST vs SOAP

                   Love REST, hate SOAP.

                   Lightweight.

                   Nothing useful added with SOAP’s complexity.




                                                                                                                                                                                              19
REST is so simple, easy to develop for, human readable. I love it. I’m not a fan of SOAP, and in this case, SOAP adds nothing but complexity. Use it if SOAP is your thing, otherwise start
with REST.
Reliability

                   Not 100%. Close, though.

                   More reliable than SmugFS which is quite reliable.

                   Lots of failure points:
                     SmugMug’s datacenter
                     Internet backbones
                     Amazon’s datacenter

               No other software, hardware, or service we use is
              100%, either.


                                                                                                                                                                                  20
Everything fails, and Amazon’s no exception. There are lots of pieces that could fail outside of Amazon’s control, too. In our experience, they’ve been quite reliable overall.
Handling failure

                   Build from day one with failure in mind.

                   Stuff breaks - try again.

                  Writes fail? Write locally, sync later.

                   Reads fail? Handle intelligently. Alerts?




                                                                                                                                                                                             21
Failure happens. Even if you’re not using Amazon, your gear will fail. Write your app to handle failure. In Amazon’s case, the easiest thing to do on a failed read or write is simply try
again a few times. If the write continues to fail, write it somewhere locally then asynchronously replicate it back up later. With reads, have a proactive failure plan in place.
Performance

                   Fast for reads and writes. (XX Mbps)

                   Mostly speed-of-light limited. (20-80ms)

                   Parallel i/o for massive throughput. (XXX Mbps)

                   Machine measurable, human indistinguishable.




                                                                                                                                                                                         22
S3 has been really fast for us. On single reads/writes, we get tens of megabits per second. It would likely be even faster except that our datacenters aren’t close to Amazon’s, so we have
to deal with internet latency. We do use lots of simultaneous reads  writes to get hundreds of megabits per second at any given time of the day. We did some blind taste tests with
customers in the US, on both coasts, who couldn’t tell the dierence if they were viewing photos from SmugMug or directly from S3 - so the speed was measurable on a machine, but
humans couldn’t tell. It’s quite fast.
CDN?

                   S3 isn’t a Content Delivery Network.

                   It’s storage.

                   No global locations (yet?).

                   Limited edge caching.

                   Future Amazon Web Service?




                                                                                                                                                                                           23
I get asked a lot if we use S3 as a CDN. We don’t, because it’s not a CDN. That’s not to say that Amazon’s not good for serving - it is. But if you really want edge caching with lots of
endpoints all over the world, well, that’s not what S3 was designed for. They don’t have global locations, they do limited edge caching, etc. It’s for storage and serving that storage. Treat
it like a single web cluster rather than a CDN. I would imagine this may be a future Web Service that Amazon would oer.
Store-and-forward vs Stream

                  Two ways to serve your content.

                   Store-and-forward
                     Great resiliency.
                     Poor performance (TTFB).

                   Stream
                     Poor resiliency.
                     Great performance (TTFB)
                     Do a quick HEAD first to verify.


                                                                                                                                                                                    24
When proxy reading, you can read the entire file, then re-serve to the customer, or you can stream the bytes through to the customer as they arrive from S3. Each has pros and cons.
With store-and-forward, you can re-read the bytes again if the first request fails. But you have a slower time-to-first-byte response. With streaming, you have no idea if all the bytes
safely made it to the customer, but you get a great time-to-first-byte response. We tend to issue a fast HEAD request first to SmugFS and/or S3 before doing the streamed GET so we can
verify the file is there, intact, and the right size  hash.
The Speed of Light Problem

                Amazon hasn’t solved faster-than-light data
              transmission.Yet.

                    Unavoidable - make sure your app can deal.

                    Parallelized i/o can mask problem.

                    Caching can help.

                    Streaming can help.


                                                                                                                                                                                             25
Latency associated with the speed of light can’t be avoided. Write your app with it in mind. Try to parallelize reads/writes, try to cache, and try to stream reads to clients if you can.
Outages  Problems

                   Not perfect. 5 major issues.

               3 outages (15-30 mins). 2 core switch failures and
              one DNS problem. Amazon.com affected.

               2 performance degradations. One, our customer
              noticed. Second, they didn’t.

                   Not a big deal - everything fails. Expect it.




                                                                                                                                                                                       26
Amazon’s had 5 major issues in the last year. Not a bad track record for a new service. We expect them to fail, as we expect everything our own datacenter to fail, so we handled most of
these fairly well.
SLA, Service,  Support

                  We don’t care about SLA, but you may.

                  Service Support: One area where Amazon is weak.
                    This is a utility.
                    They need a service status dashboard.
                    Pro-active customer notifications.
                    Ability to get ahold of a human.

               Amazon.com’s customer service is good, AWS will
             likely catch up.



                                                                                                                                                                                     27
They don’t have an SLA yet. We don’t care, but medium and large businesses probably do. Until then, you may be out of luck. They do need to do a better job at handling the service-as-
a-utility situation. With our bandwidth and datacenter providers, we get status updates and pre-announcements of software updates, possible service outages, etc. Amazon needs to do
a better job notifying their customers about these sorts of things. On the bright side, Amazon.com’s customer service is quite good, so AWS will likely catch up.
Saving our butts

                   Knocked power out of ~70TB of storage. Oops!

               Moved datacenters during normal business hours,
             customers not affected.

                   Stupid bugs.




                                                                                                                                                                                     28
S3 has saved our butts a few times. My brother accidentally knocked out power to 70TB of storage once - no customers noticed, since it failed over to S3 automatically. We also managed
to move everything from one datacenter to another during normal business hours without any service interruptions. And finally, I’ve had some software bugs that we were able to repair
thanks to Amazon.
Misc Tips

                   Use cURL
                     Faster.
                     More reliable.
                     Storing vs Streaming is simple.

                   Make stuff as async as possible
                     Hides speed-of-light issue
                     Hides or masks problems
                     Fast customer response




                                                                                                                                                                                           29
If you can, use cURL to do your transfers. We tested a number of dierent built-in functions and libraries, and cURL is super-fast and reliable at setting up the HTTP connection. Also, in
your app, hide the S3 latency as much as you can by doing asynchronous background transfers. Don’t make your customers wait.
Flirting with the other services.




                                                                                   30
The other Amazon services are exciting, too, so we’re playing with them as well.
Elastic Compute Cloud (EC2)

                 Like S3, only for compute.
                  Scale up or down via API.
                  Web servers, processing boxes, development test
                 beds, build servers, etc. You name it.

                   Launching large EC2 implementation “soon”
                    Image processing.
                    500K-1M photos/day.
                    10-20 Terapixels/day processed
                    Peaky traffic on weekends, holidays
                    Ridiculously parallel


                                                                                                                                                                                      31
I planned to have our EC2 cluster up and running in production for this presentation, but one of our hardware vendors (Sun) gave us some hardware that’s underperforming, so we’re in a
holding pattern. Ironic that physical hardware limitations are preventing me from using virtualized hardware, but that’s the case. (We need to make some DB schema changes, and Sun’s
storage arrays aren’t keeping up). When launched, though, EC2 will handle lots of our image processing needs. Great because we can turn it up during busy times (Sunday nights,
holidays) and down during low points. I will be blogging about the Sun situation at some point, once I have a solution and all the facts, so check out my blog at http://
blogs.smugmug.com/onethumb for updates.
Simple Queue Service (SQS)

                   Simple, reliable queuing.
                   Mates well with EC2  S3
                    Stick jobs in SQS
                    Retrieve jobs with EC2 instances using S3 data
                    Run jobs, report status to SQS.

                   $0.10/1000 items
                    Priced well for small projects.
                    Gets costly for huge ones (millions+).




                                                                                                                                                                                         32
We don’t currently use SQS because we already have our own queuing system and SQS doesn’t price well for people needing hundreds of thousands or millions of items per day, like we
do. But that may change if Amazon introduces bulk pricing or a sliding scale. There are a few places (like S3’s cost per GB to serve) where a sliding scale or bulk pricing might make
things more attractive for larger companies.
Missing Pieces

                   Database API or DB grade EC2 instances.
                    Fast (lots of local spindles, lots of RAM)
                    Persistent.

                  Load balancer API.
                   Single IP in front of lots of EC2 instances.
                   Programmable to add/remove/change clusters.
                   Can be done with software on an EC2 instance, but
                  painful.

                   CDN


                                                                                                                                                                                          33
To truly get rid of our entire datacenter, Amazon’s still missing a few pieces. DB boxes require lots more spindles and RAM than EC2 currentlyl provides. Even cooler, and more dificult,
would be some high-performance DB API that abstracted the machines. A load balancer API to provide programmatic addition and subtraction of EC2 instances would be fantastic, too,
and easier to use than a custom load-balancer on an EC2 instance. And finally, of course, a true CDN layered on top of S3 might be interesting.
Questions?
             Blog: http://blogs.smugmug.com/onethumb

             Slides: See the blog. Posting soon.

             Email: don AT smugmug

             Twitter: http://twitter.com/DonMacAskill

             Photo sharing: http://www.smugmug.com/

             Thanks!




                                                        34

Weitere ähnliche Inhalte

Ähnlich wie Funny Humor

How we did RoR in Wakoopa
How we did RoR in WakoopaHow we did RoR in Wakoopa
How we did RoR in Wakoopablueace
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud
 
Operations is a Strategic Weapon (PuppetConf)
Operations is a Strategic Weapon (PuppetConf)Operations is a Strategic Weapon (PuppetConf)
Operations is a Strategic Weapon (PuppetConf)dev2ops
 
DataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesDataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesMax De Marzi
 
To Cloud or Not To Cloud?
To Cloud or Not To Cloud?To Cloud or Not To Cloud?
To Cloud or Not To Cloud?Greg Lindahl
 
You wouldn't build a toast, would you?
You wouldn't build a toast, would you?You wouldn't build a toast, would you?
You wouldn't build a toast, would you?Yan Cui
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB
 
Scaling MongoDB for real time analytics
Scaling MongoDB for real time analyticsScaling MongoDB for real time analytics
Scaling MongoDB for real time analyticsDavid Tollmyr
 
Generative Adversarial Networks (GANs) using Apache MXNet
Generative Adversarial Networks (GANs) using Apache MXNetGenerative Adversarial Networks (GANs) using Apache MXNet
Generative Adversarial Networks (GANs) using Apache MXNetApache MXNet
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!Databricks
 
Google Drive + A Cloud Storage Service Is Coming Up. The Best Way To Register?
Google Drive + A Cloud Storage Service Is Coming Up. The Best Way To Register?Google Drive + A Cloud Storage Service Is Coming Up. The Best Way To Register?
Google Drive + A Cloud Storage Service Is Coming Up. The Best Way To Register?parchedoffice1211
 
CyberLink MediaShow 5
CyberLink MediaShow 5CyberLink MediaShow 5
CyberLink MediaShow 5CyberLink
 
Growing Up MongoDB
Growing Up MongoDBGrowing Up MongoDB
Growing Up MongoDBMongoDB
 
STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012Amazon Web Services
 
Commercialization Challenges Of Mobile Software Development In A Fragmented M...
Commercialization Challenges Of Mobile Software Development In A Fragmented M...Commercialization Challenges Of Mobile Software Development In A Fragmented M...
Commercialization Challenges Of Mobile Software Development In A Fragmented M...Stephen King
 
6 Things to Consider when Buying A Laptop
6 Things to Consider when Buying A Laptop6 Things to Consider when Buying A Laptop
6 Things to Consider when Buying A LaptopAbdul Careem
 
The Evolution of Database Technologies Christian Bandulet
The Evolution of Database Technologies Christian BanduletThe Evolution of Database Technologies Christian Bandulet
The Evolution of Database Technologies Christian BanduletChristian Bandulet
 
Scaling Rails with memcached
Scaling Rails with memcachedScaling Rails with memcached
Scaling Rails with memcachedelliando dias
 

Ähnlich wie Funny Humor (20)

How we did RoR in Wakoopa
How we did RoR in WakoopaHow we did RoR in Wakoopa
How we did RoR in Wakoopa
 
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
BigDataCloud meetup - July 8th - Cost effective big-data processing using Ama...
 
Operations is a Strategic Weapon (PuppetConf)
Operations is a Strategic Weapon (PuppetConf)Operations is a Strategic Weapon (PuppetConf)
Operations is a Strategic Weapon (PuppetConf)
 
DataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesDataDay 2023 Presentation - Notes
DataDay 2023 Presentation - Notes
 
To Cloud or Not To Cloud?
To Cloud or Not To Cloud?To Cloud or Not To Cloud?
To Cloud or Not To Cloud?
 
You wouldn't build a toast, would you?
You wouldn't build a toast, would you?You wouldn't build a toast, would you?
You wouldn't build a toast, would you?
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
Scaling MongoDB for real time analytics
Scaling MongoDB for real time analyticsScaling MongoDB for real time analytics
Scaling MongoDB for real time analytics
 
Generative Adversarial Networks (GANs) using Apache MXNet
Generative Adversarial Networks (GANs) using Apache MXNetGenerative Adversarial Networks (GANs) using Apache MXNet
Generative Adversarial Networks (GANs) using Apache MXNet
 
Doodling for-great-success
Doodling for-great-successDoodling for-great-success
Doodling for-great-success
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Google Drive + A Cloud Storage Service Is Coming Up. The Best Way To Register?
Google Drive + A Cloud Storage Service Is Coming Up. The Best Way To Register?Google Drive + A Cloud Storage Service Is Coming Up. The Best Way To Register?
Google Drive + A Cloud Storage Service Is Coming Up. The Best Way To Register?
 
CyberLink MediaShow 5
CyberLink MediaShow 5CyberLink MediaShow 5
CyberLink MediaShow 5
 
Growing Up MongoDB
Growing Up MongoDBGrowing Up MongoDB
Growing Up MongoDB
 
STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012
 
Tonethatplone
TonethatploneTonethatplone
Tonethatplone
 
Commercialization Challenges Of Mobile Software Development In A Fragmented M...
Commercialization Challenges Of Mobile Software Development In A Fragmented M...Commercialization Challenges Of Mobile Software Development In A Fragmented M...
Commercialization Challenges Of Mobile Software Development In A Fragmented M...
 
6 Things to Consider when Buying A Laptop
6 Things to Consider when Buying A Laptop6 Things to Consider when Buying A Laptop
6 Things to Consider when Buying A Laptop
 
The Evolution of Database Technologies Christian Bandulet
The Evolution of Database Technologies Christian BanduletThe Evolution of Database Technologies Christian Bandulet
The Evolution of Database Technologies Christian Bandulet
 
Scaling Rails with memcached
Scaling Rails with memcachedScaling Rails with memcached
Scaling Rails with memcached
 

Mehr von moto

Science Illustration
Science IllustrationScience Illustration
Science Illustrationmoto
 
Euia Medicaments 2007 2008
Euia Medicaments 2007 2008Euia Medicaments 2007 2008
Euia Medicaments 2007 2008moto
 
Funny8
Funny8Funny8
Funny8moto
 
Tintin And The Red Sea Sharks
Tintin And The Red Sea SharksTintin And The Red Sea Sharks
Tintin And The Red Sea Sharksmoto
 
The Mullet
The MulletThe Mullet
The Mulletmoto
 
Snappy Baby
Snappy  BabySnappy  Baby
Snappy Babymoto
 
Redes Peea
Redes PeeaRedes Peea
Redes Peeamoto
 
Funny Cartoons
Funny CartoonsFunny Cartoons
Funny Cartoonsmoto
 
Flowers
FlowersFlowers
Flowersmoto
 
Ravel Bolero
Ravel BoleroRavel Bolero
Ravel Boleromoto
 
Where In The World
Where In The WorldWhere In The World
Where In The Worldmoto
 
Dream House
Dream HouseDream House
Dream Housemoto
 
Best Websites List
Best Websites ListBest Websites List
Best Websites Listmoto
 
[Awesome] Military Photos Of The Twin Towers
[Awesome] Military Photos Of The Twin Towers[Awesome] Military Photos Of The Twin Towers
[Awesome] Military Photos Of The Twin Towersmoto
 
Audrey 20 Portraits Of Audrey Hepburn
Audrey  20 Portraits Of Audrey HepburnAudrey  20 Portraits Of Audrey Hepburn
Audrey 20 Portraits Of Audrey Hepburnmoto
 
Presenting Your Code
Presenting Your CodePresenting Your Code
Presenting Your Codemoto
 
What Is Love
What Is LoveWhat Is Love
What Is Lovemoto
 
The Urban Water System
The Urban Water SystemThe Urban Water System
The Urban Water Systemmoto
 
Imagini Cu Munti Din China
Imagini Cu Munti Din ChinaImagini Cu Munti Din China
Imagini Cu Munti Din Chinamoto
 
Office Furnishings
Office FurnishingsOffice Furnishings
Office Furnishingsmoto
 

Mehr von moto (20)

Science Illustration
Science IllustrationScience Illustration
Science Illustration
 
Euia Medicaments 2007 2008
Euia Medicaments 2007 2008Euia Medicaments 2007 2008
Euia Medicaments 2007 2008
 
Funny8
Funny8Funny8
Funny8
 
Tintin And The Red Sea Sharks
Tintin And The Red Sea SharksTintin And The Red Sea Sharks
Tintin And The Red Sea Sharks
 
The Mullet
The MulletThe Mullet
The Mullet
 
Snappy Baby
Snappy  BabySnappy  Baby
Snappy Baby
 
Redes Peea
Redes PeeaRedes Peea
Redes Peea
 
Funny Cartoons
Funny CartoonsFunny Cartoons
Funny Cartoons
 
Flowers
FlowersFlowers
Flowers
 
Ravel Bolero
Ravel BoleroRavel Bolero
Ravel Bolero
 
Where In The World
Where In The WorldWhere In The World
Where In The World
 
Dream House
Dream HouseDream House
Dream House
 
Best Websites List
Best Websites ListBest Websites List
Best Websites List
 
[Awesome] Military Photos Of The Twin Towers
[Awesome] Military Photos Of The Twin Towers[Awesome] Military Photos Of The Twin Towers
[Awesome] Military Photos Of The Twin Towers
 
Audrey 20 Portraits Of Audrey Hepburn
Audrey  20 Portraits Of Audrey HepburnAudrey  20 Portraits Of Audrey Hepburn
Audrey 20 Portraits Of Audrey Hepburn
 
Presenting Your Code
Presenting Your CodePresenting Your Code
Presenting Your Code
 
What Is Love
What Is LoveWhat Is Love
What Is Love
 
The Urban Water System
The Urban Water SystemThe Urban Water System
The Urban Water System
 
Imagini Cu Munti Din China
Imagini Cu Munti Din ChinaImagini Cu Munti Din China
Imagini Cu Munti Din China
 
Office Furnishings
Office FurnishingsOffice Furnishings
Office Furnishings
 

Kürzlich hochgeladen

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Funny Humor

  • 1. Scalability Set Amazon’s Servers on Fire, Not Yours Parks Hall Fire, July 3, 2002 - http://www.acadweb.wwu.edu/dbrunner/ 1 Managing hardware in our datacenter is a pain. We do it because it’s a necessary evil. Amazon’s starting to abstract some of that away via APIs - and I’m thrilled.
  • 2. Why trust us? Bootstrapped. Profitable. No debt. 140M photos. 192TB at S3. Doubling yearly. 2 SmugMug’s a bunch of green-haired freaks, right? Yes, but we also know a lot about storage, and doing it cheaply.
  • 3. Why trust us? Bootstrapped. Profitable. No debt. 140M photos. 192TB at S3. Doubling yearly. Super Heroes. 3 Oh, yes, and we’re also Super Heroes. :)
  • 4. Biz Stuff SmugMug’s Founders 4 The presentation is broken into two parts - Biz & Geek. This is the Biz section.
  • 5. Our Love Affair with S3 5 It’s no secret that we love S3. But, like all good love aairs, it has it’s ups and downs. :)
  • 6. Our Love Affair with S3 Always on, global, infinite storage. Inexpensive. $0.15/GB/month w/replicas. Easy. REST API. (SOAP too, but...) Fast. Not 15K-SCSI fast, but Internet fast. Game changer. 6 S3, or Simple Storage Service, solves a huge chunk of our storage problems. The CEO in me thinks I should keep my mouth shut, since it really levels the playing field, but the geek in me just thinks it’s too cool. :)
  • 7. Amazon? Infrastructure? Photo by Bob Knight - http://bobknight.smugmug.com/ 7 Amazon’s just a book store, right? Wrong.
  • 8. Amazon? Infrastructure? Started with books. Soon added CDs DVDs. Toys R Us, Borders, Target. zShops, Marketplace, E-Commerce API People building their businesses on Amazon is cool. What else do we have lurking in the corners? 8 I’m not totally sure how Amazon came up with AWS, but I’ll bet it went something like this. It sure makes sense that they began to like having businesses building on top of them and their expertise. And I don’t buy the argument that this is silly because Amazon’s a bookseller. What a dumb argument. In reality, Amazon’s finding ways to monetize other things they do well. More businesses should do this.
  • 9. Why use them? Not a lot of web-scale expertise on Planet Earth. Reputation for systems. Once competed with Amazon - fatbrain {*} They eat their own dogfood. Dozens of products. Focus on the app, not the muck. 9 You can count the # of companies who do this on one, possibly two, hands. My father (SmugMug’s co-founder) competed directly with Amazon with his last company, fatbrain, so we know just how talented they are at their business and their infrastructure. Amazon does use S3 and the other services themselves (and yes, when S3 has had problems, Amazon’s had problems. I watched.)
  • 10. Show me the money! Photo by Kirk Tanner - http://kirktanner.smugmug.com/ 10 Money doesn’t grow on trees, everyone knows that. But in this case, it’s pretty dang close.
  • 11. Show me the money! Guesstimate: ~$500K saved per year. Actual: Growth: 64M photos - 140M photos Disks would cost: $40K - $100K/month. $922K would have been spent. $230K spent instead. $692K in cold, hard savings. Nasty taxes! $295K ‘saved’ in cash flow. Bonus! Reselling disks - recouping sunk cost. 11 Early on in S3, I estimated we’d save $500K. Here’s the latest hard numbers. We’ve been using S3 since April of 2006, so it’s nearly a year. Total saved? $692K. Plus we don’t have to pre-pay some stupid taxes on the order of $295K. (Technically, not savings, because the gov’t would give it back to us over 5 years - but still, I’d like to keep that $295K, thanks). Plus we’re actually thinking of re-selling some of the disks we had bought in the past, recouping some of our sunk costs.
  • 12. $ sweet spots Perfect for startups small companies. Ideal for ‘store lots, serve little’ businesses of all sizes. Not so great (yet?) for serving lots if you’re a medium or large sized business. Transfer costs high if you can buy bandwidth in 1 Gbps+ chunks. We’re a ‘store lots, serve lots’ company. What to do? 12 S3 is great if you’re a small company that can’t or won’t buy lots of bandwidth. It’s also great if you’re gonna just store a lot, but not read or write it often. Why? Because Amazon’s storage rate ($0.15/GB/month) is fantastic, but the transfer rate ($0.20/GB) is merely competitive, rather than being fantastic. If you can buy bandwidth in 1 Gbps chunks, you can probably save a few pennies doing it yourself.
  • 13. Geek Stuff 5 of my employees. Me with my NeXT gear on. 13 I’ve been a geek for a long time. Here’s the photographic proof. I was probably 10 here. Oh, yes, and now we’re onto the geek half of the presentation. :)
  • 14. Like SmugFS Architecture remarkably similar to SmugFS. Similar to lots of startups. Stupid we’re all building the same thing. Easy to drop-in. Started on Monday, live in production on Friday. 14 We had our own redundant, replicated, reliable internal storage system, SmugFS. Lots of recent startups probably have similar architectures, and they’ve all likely just built it themselves. It’s stupid we’re all building the same thing over and over. Amazon S3 saves everyone that step. It was super-easy to drop into our code because it was so similar to SmugFS already. When I started writing the code on a Monday, we were live and in production the Friday of that week.
  • 15. Our S3 evolution Started just doing secondary storage. Too cold! Tried out as Primary. Too hot! Finally, hot cold model = Just right! Amazon gets 100% of the data. SmugMug keeps “hot” data local. 95% reduction in # of disks bought. 15 We’ve played around with a few dierent models with S3. At first, they were just backup. They worked so well, we wanted to do more (and save more money), so we tried them as stand- alone storage. That didn’t work quite as well when they had one of their hiccups, so we next tried a hot/cold model, which works really really well. Amazon is our primary storage, and we use SmugFS as our local hot cache. We end up storing 100% of the data at Amazon, and 10% locally. In the end, we need 95% less disks in our datacenter than we did before.
  • 16. Sample Request Client ‘Smuggy’ - www.smugmug.com “Hey, gimme photo 31337” www.smugmug.com - SmugFS “Hey, you got photo 31337?” If YES, send to Smuggy. If NO: Log that it wasn’t in SmugFS for analysis. www.smugmug.com - Amazon S3 “Hey, you got photo 31337?” If YES, send to Smuggy. If NO: PANIC! :) 16 Here’s a sample request for a SmugMug photo. We rarely, if ever, get to PANIC stage, but I’m sure it could happen.
  • 17. Proxy vs Redirect vs Direct Links Built SmugMug-S3 with multiple modes. Can flip a switch to change. Nearly 100% served are proxy reads. Sometimes HTTP redirects. Rarely direct S3 links. 17 We have three modes in SmugMug’s codebase, and can switch between them at will on-the-fly. We can proxy read from S3 and then serve it to the customer, we can send an HTTP redirect to the S3 object, or we can embed real S3 urls (CNAME’d to smugmug.com) in our HTML. Almost 100% of our stu is served via proxy read so we can try hitting our cache first (saving on transfer costs to Amazon), make sure we have the permissions right, etc.
  • 18. Permissions We have complicated permissions. Passwords, privacy, external links, oh my! Proxying allows strong protection. 18 We have a rich permissions model at SmugMug, and need to make sure all the permissions are intact when someone tries to view a photo. Proxying allows the strongest protection, though HTTP redirects are also quite strong with time-expiring S3 URLs.
  • 19. REST vs SOAP Love REST, hate SOAP. Lightweight. Nothing useful added with SOAP’s complexity. 19 REST is so simple, easy to develop for, human readable. I love it. I’m not a fan of SOAP, and in this case, SOAP adds nothing but complexity. Use it if SOAP is your thing, otherwise start with REST.
  • 20. Reliability Not 100%. Close, though. More reliable than SmugFS which is quite reliable. Lots of failure points: SmugMug’s datacenter Internet backbones Amazon’s datacenter No other software, hardware, or service we use is 100%, either. 20 Everything fails, and Amazon’s no exception. There are lots of pieces that could fail outside of Amazon’s control, too. In our experience, they’ve been quite reliable overall.
  • 21. Handling failure Build from day one with failure in mind. Stuff breaks - try again. Writes fail? Write locally, sync later. Reads fail? Handle intelligently. Alerts? 21 Failure happens. Even if you’re not using Amazon, your gear will fail. Write your app to handle failure. In Amazon’s case, the easiest thing to do on a failed read or write is simply try again a few times. If the write continues to fail, write it somewhere locally then asynchronously replicate it back up later. With reads, have a proactive failure plan in place.
  • 22. Performance Fast for reads and writes. (XX Mbps) Mostly speed-of-light limited. (20-80ms) Parallel i/o for massive throughput. (XXX Mbps) Machine measurable, human indistinguishable. 22 S3 has been really fast for us. On single reads/writes, we get tens of megabits per second. It would likely be even faster except that our datacenters aren’t close to Amazon’s, so we have to deal with internet latency. We do use lots of simultaneous reads writes to get hundreds of megabits per second at any given time of the day. We did some blind taste tests with customers in the US, on both coasts, who couldn’t tell the dierence if they were viewing photos from SmugMug or directly from S3 - so the speed was measurable on a machine, but humans couldn’t tell. It’s quite fast.
  • 23. CDN? S3 isn’t a Content Delivery Network. It’s storage. No global locations (yet?). Limited edge caching. Future Amazon Web Service? 23 I get asked a lot if we use S3 as a CDN. We don’t, because it’s not a CDN. That’s not to say that Amazon’s not good for serving - it is. But if you really want edge caching with lots of endpoints all over the world, well, that’s not what S3 was designed for. They don’t have global locations, they do limited edge caching, etc. It’s for storage and serving that storage. Treat it like a single web cluster rather than a CDN. I would imagine this may be a future Web Service that Amazon would oer.
  • 24. Store-and-forward vs Stream Two ways to serve your content. Store-and-forward Great resiliency. Poor performance (TTFB). Stream Poor resiliency. Great performance (TTFB) Do a quick HEAD first to verify. 24 When proxy reading, you can read the entire file, then re-serve to the customer, or you can stream the bytes through to the customer as they arrive from S3. Each has pros and cons. With store-and-forward, you can re-read the bytes again if the first request fails. But you have a slower time-to-first-byte response. With streaming, you have no idea if all the bytes safely made it to the customer, but you get a great time-to-first-byte response. We tend to issue a fast HEAD request first to SmugFS and/or S3 before doing the streamed GET so we can verify the file is there, intact, and the right size hash.
  • 25. The Speed of Light Problem Amazon hasn’t solved faster-than-light data transmission.Yet. Unavoidable - make sure your app can deal. Parallelized i/o can mask problem. Caching can help. Streaming can help. 25 Latency associated with the speed of light can’t be avoided. Write your app with it in mind. Try to parallelize reads/writes, try to cache, and try to stream reads to clients if you can.
  • 26. Outages Problems Not perfect. 5 major issues. 3 outages (15-30 mins). 2 core switch failures and one DNS problem. Amazon.com affected. 2 performance degradations. One, our customer noticed. Second, they didn’t. Not a big deal - everything fails. Expect it. 26 Amazon’s had 5 major issues in the last year. Not a bad track record for a new service. We expect them to fail, as we expect everything our own datacenter to fail, so we handled most of these fairly well.
  • 27. SLA, Service, Support We don’t care about SLA, but you may. Service Support: One area where Amazon is weak. This is a utility. They need a service status dashboard. Pro-active customer notifications. Ability to get ahold of a human. Amazon.com’s customer service is good, AWS will likely catch up. 27 They don’t have an SLA yet. We don’t care, but medium and large businesses probably do. Until then, you may be out of luck. They do need to do a better job at handling the service-as- a-utility situation. With our bandwidth and datacenter providers, we get status updates and pre-announcements of software updates, possible service outages, etc. Amazon needs to do a better job notifying their customers about these sorts of things. On the bright side, Amazon.com’s customer service is quite good, so AWS will likely catch up.
  • 28. Saving our butts Knocked power out of ~70TB of storage. Oops! Moved datacenters during normal business hours, customers not affected. Stupid bugs. 28 S3 has saved our butts a few times. My brother accidentally knocked out power to 70TB of storage once - no customers noticed, since it failed over to S3 automatically. We also managed to move everything from one datacenter to another during normal business hours without any service interruptions. And finally, I’ve had some software bugs that we were able to repair thanks to Amazon.
  • 29. Misc Tips Use cURL Faster. More reliable. Storing vs Streaming is simple. Make stuff as async as possible Hides speed-of-light issue Hides or masks problems Fast customer response 29 If you can, use cURL to do your transfers. We tested a number of dierent built-in functions and libraries, and cURL is super-fast and reliable at setting up the HTTP connection. Also, in your app, hide the S3 latency as much as you can by doing asynchronous background transfers. Don’t make your customers wait.
  • 30. Flirting with the other services. 30 The other Amazon services are exciting, too, so we’re playing with them as well.
  • 31. Elastic Compute Cloud (EC2) Like S3, only for compute. Scale up or down via API. Web servers, processing boxes, development test beds, build servers, etc. You name it. Launching large EC2 implementation “soon” Image processing. 500K-1M photos/day. 10-20 Terapixels/day processed Peaky traffic on weekends, holidays Ridiculously parallel 31 I planned to have our EC2 cluster up and running in production for this presentation, but one of our hardware vendors (Sun) gave us some hardware that’s underperforming, so we’re in a holding pattern. Ironic that physical hardware limitations are preventing me from using virtualized hardware, but that’s the case. (We need to make some DB schema changes, and Sun’s storage arrays aren’t keeping up). When launched, though, EC2 will handle lots of our image processing needs. Great because we can turn it up during busy times (Sunday nights, holidays) and down during low points. I will be blogging about the Sun situation at some point, once I have a solution and all the facts, so check out my blog at http:// blogs.smugmug.com/onethumb for updates.
  • 32. Simple Queue Service (SQS) Simple, reliable queuing. Mates well with EC2 S3 Stick jobs in SQS Retrieve jobs with EC2 instances using S3 data Run jobs, report status to SQS. $0.10/1000 items Priced well for small projects. Gets costly for huge ones (millions+). 32 We don’t currently use SQS because we already have our own queuing system and SQS doesn’t price well for people needing hundreds of thousands or millions of items per day, like we do. But that may change if Amazon introduces bulk pricing or a sliding scale. There are a few places (like S3’s cost per GB to serve) where a sliding scale or bulk pricing might make things more attractive for larger companies.
  • 33. Missing Pieces Database API or DB grade EC2 instances. Fast (lots of local spindles, lots of RAM) Persistent. Load balancer API. Single IP in front of lots of EC2 instances. Programmable to add/remove/change clusters. Can be done with software on an EC2 instance, but painful. CDN 33 To truly get rid of our entire datacenter, Amazon’s still missing a few pieces. DB boxes require lots more spindles and RAM than EC2 currentlyl provides. Even cooler, and more dificult, would be some high-performance DB API that abstracted the machines. A load balancer API to provide programmatic addition and subtraction of EC2 instances would be fantastic, too, and easier to use than a custom load-balancer on an EC2 instance. And finally, of course, a true CDN layered on top of S3 might be interesting.
  • 34. Questions? Blog: http://blogs.smugmug.com/onethumb Slides: See the blog. Posting soon. Email: don AT smugmug Twitter: http://twitter.com/DonMacAskill Photo sharing: http://www.smugmug.com/ Thanks! 34