Learn how AWS customers save money, time and effort by using AWS's backup and archive services. Organizations of all sizes rely on AWS services to durably safeguard their data off-premises at a surprisingly low cost. This session will illustrate backup and archive architectures that AWS customers are benefitting from today.
2. Agenda
• Why AWS for Backup and Archive?
• AWS Global Infrastructure
• Traditional vs. Cloud Approach
• Cloud Backup and Archive Architecture
• Cloud Integrated Backup and Archive Gateways
• TCO
3. Why AWS for Backup and Archive?
Metered usage:
Pay as you go
No capital investment
No commitment
No risky capacity planning
Avoid OPEX and risks
of physical media
handling
Control your
geographic locality for
performance and
compliance
4. Gartner Magic Quadrant for Public Cloud Storage Services
2014
Gartner, Magic Quadrant for Cloud Storage Services, Gene Ruth, Arun Chandrasekaran et al., July 9, 2014. This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the
entire document. The Gartner document is available at http://www.gartner.com/technology/reprints.do?id=1-1WWKTQ3&ct=140709&st=sb. Gartner does not endorse any vendor, product or service depicted in its research publications, and
does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner
disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
6. AWS Regions and Availability Zones
You decide where your data resides
7. Archive:
Data retained for the
long term, for
Backup and Archive defined
compliance or research
Backup:
Data retained to
support near-term
business continuity
9. Traditional Backup and Archive
• Time: Long/slow recovery time
• Cost: Capital intensive with ongoing upgrades
• Effort: Complex to manage
• Quality: Low durability, error prone
Days or Weeks
10. Traditional Backup and Archive
• Backup Software
• Catalogs backup sets
• Application agents
• Media servers
• Connectivity:
• LAN/WAN
• SAN: FibreChannel
• Targets:
• Tape Libraries
• Virtual Tape Libraries
• Tape out / Vaulting
12. Cloud Backup and Archive Topologies
1. Branch office backup to cloud
2. Core data center backup to cloud
3. Cloud backup to cloud
4. Hybrid cloud backup
13. Branch office backup to cloud
Considerations:
- Backup Software
- Storage / Caching Gateway
- WAN or Internet
- Deduplication
- Compression
- Encryption
- WAN Acceleration
14. Core data center backup to cloud
Considerations:
- Backup Software
- Storage / Caching gateway
- Direct Connect or Internet
- Deduplication
- Compression
- Encryption
- WAN Acceleration
16. Cloud backup to Cloud
Applications running on EC2 backing up to S3 / Glacier
Considerations:
- Backup software
- Encryption
- Deduplication
- Compression
- Native S3 and Glacier
integration
- EBS Snaps / Scripting
17. AWS Storage and Archive Options
Amazon Simple Storage Service (S3)
Highly scalable object storage
1 byte to 5 TB in size
99.999999999% durability
Amazon Elastic Block Store (EBS)
High-performance block storage device
1 GB to 1 TB in size
Mount as drives to instances with
snapshot/cloning functionalities
Amazon Glacier
Long-term object archive
Extremely low cost per gigabyte
99.999999999% durability
18. AWS Storage and Archive Options
Amazon Elastic Block Store (EBS)
• High I/O block storage for Amazon EC2
• Point-in-time snapshots to Amazon S3
• 99.999999999% Durability
• Snapshot software is FREE
• Point-in-time snapshots across regions
19. AWS Storage and Archive Options
Amazon Simple Storage Service (S3)
• Durable and low cost
• Unlimited number of objects and volume
• Back up to Amazon S3 buckets via
HTTP/HTTPS
– Create scripts using PowerShell,
Perl, Python…
– Numerous solutions for data backup
• Authentication mechanisms ensure data
is kept secure
• Reduced redundancy storage (RRS)
option
20. AWS Storage and Archive Options
Amazon Glacier
• $0.01 per GB/mo, $120 per TB/yr
• 3-5 hour data retrieval latency
• Archives: single file or zipped files
• Vaults: collection of archives
• Infinite archival storage
• 99.999999999% durability
• Immutable, encrypted by default
21. AWS Storage and Archive Options
Object Lifecycle Management: Amazon S3 → Amazon Glacier
→
• Seamlessly move data from Amazon S3 → Amazon Glacier
• 3-5 hour asynchronous retrieval
• Data lifecycle policies
• $0.01 per GB for Amazon Glacier costs
22. Data Ingestion Options
AWS Direct Connect
Dedicated bandwidth between
your site and AWS
Internet
Transfer data in a secure SSL tunnel over
the public Internet
AWS Import/Export
Physical transfer of media into and
out of AWS
23. AWS Ingest Options
AWS Direct Connect
• Private connectivity to AWS
– Physical connection – 1 Gbps or 10 Gbps
port
• Consistent network performance
• Consider burst models on ingest
• Reduces costs for bandwidth-heavy
outbound workloads
Locations
• CoreSite 32 Avenue of the Americas, NY
• CoreSite One Wilshire & 900 North Alameda, LA
• Equinix DC1 – DC6 & DC10 - DC11, Ashburn, VA
• Equinix SV1 & SV5, San Jose, CA
• Equinix SE2 & SE3, Seattle, WA
• Equinix SG2, Singapore
• Equinix SY3, Sydney
• Equinix TY2, Tokyo
• Eircom, Clonshaugh
• TelecityGroup Docklands, London
• Terremark NAP do Brasil, Sao Paulo
24. AWS Ingest Options
AWS Import/Export
• Rapidly move data into and
out of AWS
• Portable storage device
shipment to AWS
• Supports
– Amazon EBS
– Amazon S3
– Amazon Glacier
• Use cases
– Initial data migration
– Content distribution via portable
devices
– Disaster recovery
28. Riverbed SteelStore
• Local caching appliance
• Presents NAS protocols
– CIFS / NFS
• Up to 30x deduplication
• Compression
• Encryption
• Key Management
• WAN Acceleration
• S3 and Glacier support
• AMI Available
30. Commvault
• Unified platform integrates
Backup, Archive, Replication,
Analysis and Search, Alerting,
Reporting, and Tracking of all
data via a single common
code base
• Integrated with Amazon S3
and Glacier with deduplication
& encryption support
• Single console management Amazon S3 Amazon Glacier
31. TCO: On-Premises Cost Considerations
1. Primary storage hardware (primary / remote site)
2. DR / Remote site storage hardware
3. Raw to utilized storage (both primary and DR)
4. Storage growth (cost of upgrades)
5. Storage management software and 3rd party tools
6. Professional services
7. Hardware maintenance
8. Software maintenance
9. Backup software
10.Backup hardware (primary / remote site)
11.Offsite tape storage / vault
12. Archive software
13. Archive hardware
14.Power
15.Cooling
16.Space
17.Labor
18.Cost of capital
19.Training
20.Asset depreciation
21.Migration
22.Decommission / remove
23.Recycle
32. AWS – Your Global Data Center for Backup and Archive
• Choose the region that fits your business and compliance needs
• 10 regions world wide – set up with a few clicks
• Broad range of backup/archive tools that are AWS integrated
• Low cost, reliable AWS Transport and Storage options
• Enhance Security Posture
• Increase Scalability
• Significantly Higher Data Durability
• All at a lower TCO
34. AWS Storage Gateway
• On-premises, virtual iSCSI
storage appliance
• $125 / Month
• Local cache enables low
latency access to data
• Server Side Encryption (SSE)
• 5 TB of throughput per day
• Recover to Amazon EBS
36. AWS Ingest Options
Internet / One Common Theme: Parallel Uploads
1. Multipart upload
2. Request rate optimization
3. TCP window scaling
4. TCP selective
acknowledgement
AWS has customers that ingest roughly 1 PB per day
37. Customer Stories
AWS Storage Gateway is used in a variety of ways
Jollibee (JFC) is using the AWS Storage Gateway to backup and
mirror their Oracle SQL server database from their on-premises
data center to AWS. JFC is the largest fast food chain in the
Philippines with revenues well over 2 Billion USD.. The Storage
Gateway also provides us access to the same database
snapshots for use in Amazon EC2, providing a cost-effective in-cloud
DR solution.
“Amazon Web Services and AWS Storage Gateway are great
assets that help us scale fast, store data in an ultra-secure
environment, spend more time on product development (rather
than disaster recovery & backup)
…By using AWS Storage Gateway, we went to just hours instead
of days to restore from backup.”
AWS Storage Gateway provided us the most cost
effective way to backup our SAP workloads to AWS, it is
helped us perform SAP System ‘refresh’ much faster and
in a more convenient way, backing up to S3 has also helped us
to prepare for DR & also run SAP Dev/QA restores easily on EC2
The large Japanese Retail chain uses AWS
Storage Gateway to share & store files in S3 and
drastically cut down it’s spend on premise NAS
footprint.
Hinweis der Redaktion
Amazon Web Services give you reliable, durable backup storage without the up-front capital expenditures and complex capacity-planning burden of on-premises storage. Amazon storage services remove the need for complex and time-consuming capacity planning, ongoing negotiations with multiple hardware and software vendors, specialized training, and maintenance of offsite facilities or transportation of storage media to third party offsite locations.
Our data center footprint is global, spanning 5 continents with highly redundant clusters of data centers in each region. Our footprint is expanding continuously as we increase capacity, redundancy and add locations to meet the needs of our customers around the world.
You can choose to deploy and run your applications in multiple physical locations within the AWS cloud.
Our data center footprint is global, spanning 5 continents with highly redundant clusters of data centers in each region.
Amazon Web Services are available in geographic Regions that are independent and separate as much as possible for data sovertenty and as much as possible offer the same services.
When you use AWS, you can specify the Region in which your data will be stored, instances run, queues started, and databases instantiated.
Within each Region are Availability Zones (AZs).
Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. By launching instances in separate Availability Zones, you can protect your applications from a failure (unlikely as it might be) that affects an entire zone. Regions consist of one or more Availability Zones, are geographically dispersed, and are in separate geographic areas or countries. The Amazon EC2 service level agreement commitment is 99.95% availability for each Amazon EC2 Region.
Our footprint is expanding continuously as we increase capacity, redundancy and add locations to meet the needs of our customers around the world.
AWS maintains Regions, which are major geographic areas, and Availability Zones (AZ), which are individual data centers, or clusters of data centers that make up a Region. Independent and separate that as much as possible offer the same services. But they have isolation as much as possible for data sovertenty.
Today, AWS operates 9 Regions around the world. Each Region has a minimum of 2 Azs (separate power, flood planes, etc) to allow customers to set up high availability architectures and data redundancy. An abstraction of a datacenter with fault isolation but close enough to build high availability architectures.
In addition to Regions, AWS maintains edge locations that supporting Route 53 DNS and Amazon CloudFront (CDN) points of presence.
Backup
Point in time snapshot of primary data set
Immediate access, fast restores
Backup frequency and data loss tolerance
Retention and clean up policies
Example: Email server backup
- 5 min incrementals, daily, weekly, monthly
- Retain 6 month backups, roll up schedule
Archive
Store all data for long term retention
Access infrequently
High access latency acceptable
Meet compliance and regulatory requirements
Example: Email archival
- Corporate Governance: 10 year retention
- Accessed for subpoena with 48 hour turnaround
Time: New hardware needs to be setup before they can be used
Money: Tape libraries and tech refresh requires additional cost and effort
Call out that this is a cover/intro slide for the use-cases
Define Hybrid in this fashion as on-prem to cloud AND in-cloud to cloud
Media examples
This is what you will be using for a lot of your processing. / Make sure you discuss details on how to get it started. Work with SA to stand up some EC2 instances and EBS vols.
EBS is a service. Think about how you leverage and manage a SAN in the enterprise. Have a dedicated team that manages this service on-demand. This is a tremendous value. We offer replicated disk in the background. This is all bundled into the price: replication, snapshots (the service is free).
PIOPs - We don’t overprovision. You pay for what you use. When you use EBS: Standard Volumes (100 iOPS) – typically boot volumes and Provisioned IOPS Volumes – large scale DB, Lustre FS, Gluster FS…..use this to scale out and get the performance you want. I want 1000 IOPs or 4000 IOPs – you get this – we guarantee this performance. 4000 IOPS / 64MBs per sec per volume…….can then stripe volumes as long as you don’t exceed available BW.
EBS optimized EC2 instances gives you this guarantee – network traffic to storage is only dedicated to this. EBS is dedicated bandwidth from instance to block store – 1Gb per second.
Site some Apps on EBS: High Performance DBs
Use this for each topic you cover:
1. Here it is and why it works
Why would you want to use this?
How do you do this / next steps?
Multipart Upload
Request Rate Optimization
TCP Window Scaling
TCP Selective Acknowledgement
The question to expect: Since most transcoders / render farm are looking for an FS…..how do you use S3? What is the granular workflow of S3 in the bigger picture of a media workflow?
Can I use my existing tools and point to S3? Those that are not natively integrated……S3FS, Panzura, Maginatics, Luster?.....etc.
Use this for each topic you cover:
1. Here it is and why it works
Why would you want to use this?
How do you get started?
Use-cases:
Secure and low-cost offsite archiving
Tape replacement
Raw media assets
Digital preservation
Big dataset – Genomics, analytics, logs
Might want to discuss 3-5 hour wait time vs. tape time (several days)
Amazon Glacier vs. S3 Lifecycle Archival
Where is my data?
Do I need an index (list objects?)
Should I aggregate my files/objects?
Where do I process my restored data?
Re:Invent – what does someone need to do to actually use this? The order with which a customer would need to do things to light it up. Talk steps to turn it up.
How to actually do this…..what are the steps to engage – talk in more concrete terms regarding steps
Virtual Tape Library (VTL) is a new configuration for the Storage Gateway. Customers will be able to present iSCSI targets representing virtual tape drives to their backup software (such as Symantec NetBackup), which then write archival data to the virtual tapes.
There are two major components to this, VTL and VTS. First, the VTL represents the collection of tapes that are accessible to the on-premise Storage Gateway. The VTL is analogous to the collection of tapes that are sitting at the customer data center, including blank tapes, written tapes that are still on-premises, and retrieved tapes from the VTS. These tapes can be presented to the backup software application, and the data written to these tapes are stored in Amazon S3.
Once the customer has “ejected” the tape from the backup software, we move the tape from the VTL to the VTS (Virtual Tape Shelf). The VTS is analogous to the collection of tapes that sit off-site in a tape holding facility, such as Iron Mountain. On eject, we write an S3 to Glacier tiering policy to transition the tape to Glacier. These tapes will then be stored in Glacier and are not associated with any specific on-premises gateway. Customers can retrieve these tapes and present them to a gateway. Similar to S3-Glacier, we will present the retrieved tape as a copy of the Glacier data in RRS.
Frame in terms of what customers are doing
“Here’s what some of our customers are doing”
Just say the facts – not that it is “compelling”
Maintain the tone that we love all of our customers
Use this for each topic you cover:
1. Here it is and why it works
Why would you want to use this?
How do you do this / next steps?
We are BIG. We manage this level of durability across over 1 trilion unique customer objects. This does not include all the objects that AWS stores like snapshots.
Re:Invent – get more technical here…….enlist an SA to go deeper