We are excited to announce Amazon Glacier, a fully-managed archive service in the cloud that allows customers to store data in 'cold storage' at an extremely competitive price point. Built to support the same 11 9s durability as S3, we'll take you through Glacier, how it works, where it sits with the storage spectrum and our planned integration with S3.
2. Getting to Glacier…
Why AWS for storage & archive?
AWS fundamental services
Storage & archive – examples &
patterns
Amazon Glacier
3. Storage & Archive
AWS is used in a variety of ways…
Powers applications that allows
customers to access historical Store its vast repository of music to
stock price information feed to over 15 million active users
Estimates it has saved $500,000 Digital assets and usage data behind
in storage expenditures and cut publication sites and mobile
its disk storage array costs in half applications
4. Business & technical drivers
You might be able to:
Reduce costs Reduce on-premise
Slash storage & archive budgets Eliminate on premise equipment to
manage archives
Change processes Remove aging technologies
Remove the need to do capacity Eliminate tape for backup and archive
planning
5. Business & technical drivers
You might be able to:
Reduce costs Reduce on-premise
Reduce CAPEX while dramatically
Slash storage & archive budgets by Eliminate on premise equipment to
increasing scalability
up to 50% manage archives
Eliminate the need for secondary
sites
Change processes Remove aging technologies
Remove the need to do capacity Eliminate tape for backup and archive
planning
6. Business & technical drivers
You might be able to:
Reduce costs Reduce on-premise
Reduce CAPEX while dramatically Eliminate 30%+ of your storage
Slash storage & archive budgets by Eliminate on premise equipment to
increasing scalability footprint
up to 50% manage archives
Eliminate the need for secondary Consolidate on-premise and
sites augment with cloud
Change processes Remove aging technologies
Remove the need to do capacity Eliminate tape for backup and archive
planning
7. Business & technical drivers
You might be able to:
Reduce costs Reduce on-premise
Reduce CAPEX while dramatically Eliminate 30%+ of your storage
Slash storage & archive budgets by Eliminate on premise equipment to
increasing scalability footprint
up to 50% manage archives
Eliminate the need for secondary Consolidate on-premise and
sites augment with cloud
Change processes Remove aging technologies
Remove the need to do capacity
Eliminate capacity planning Eliminate tape for backup and archive
planning
Eliminate provisioning for peak
demand
8. Business & technical drivers
You might be able to:
Reduce costs Reduce on-premise
Reduce CAPEX while dramatically Eliminate 30%+ of your storage
Slash storage & archive budgets by Eliminate on premise equipment to
increasing scalability footprint
up to 50% manage archives
Eliminate the need for secondary Consolidate on-premise and
sites augment with cloud
Change processes Remove aging technologies
Remove the need to do capacity
Eliminate capacity planning Eliminate tape for backup and
planning Remove tape archives
Eliminate provisioning for peak
Cycle out aging disk arrays
demand
10. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store Simple Storage Service Glacier
High performance block storage device Highly scalable object storage Long term object archive
1GB to 1TB in size 1 byte to 5TB in size Extremely low cost per gigabyte
Mount as drives to instances with 99.999999999% durability 99.999999999% durability
snapshot/cloning functionalities
11. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store Simple Storage Service Glacier
High performance block storage device Highly scalable object storage Long term object archive
1GB to 1TB in size 1 byte to 5TB in size Extremely low cost per gigabyte
Mount as drives to instances with 99.999999999% durability 99.999999999% durability
snapshot/cloning functionalities
Very fast Fast web object Slow, rare access
‘instance’ disks storage
12. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store Simple Storage Service Glacier
High performance block storage device Highly scalable object storage Long term object archive
1GB to 1TB in size 1 byte to 5TB in size Extremely low cost per gigabyte
Mount as drives to instances with 99.999999999% durability 99.999999999% durability
snapshot/cloning functionalities
13. Fundamental Storage Options
Elastic Block Store, S3 and Glacier
Elastic Block Store
Archive Backup
Simple Storage Service
DR
Glacier
High performance block storage device Highly scalable object storage Long term object archive
Data1TB in size
1GB to accessed Snapshots 1 byte to 5TB in size Extremely low cost per gigabyte
Rapid RTO
~>10% / month
Amazo as drives to instances with
Mount 99.999999999% durability 99.999999999% durability
Shorter term data
nsnapshot/cloning functionalities backup with rapid
S3 Expiration policies
11 9s durability
RTO
Amazo
Lower cost when 11
n S3 9s not required
Lower cost Lower cost
RRS
Long term
Amazo archiving Use policies to Retain “write once -
move cold backup read never” copy in
n Infrequent data data for long term case of worst case
Glacier access (~<10% retention scenario
data/month)
14. Use case journey
On-premise On-instance Object level Long term
Locally
accessible file
systems
Workloads
with local data
15. Use case journey
On-premise On-instance Object level Long term
Locally
accessible file
systems AWS
Workloads
with local data
16. Use case journey
On-premise On-instance Object level Long term
Locally EC2 based Data System images
accessible file applications distribution Database
systems DR Durable media backups
Workloads deployments storage Data archives
with local data
17. Use case journey
On-premise On-instance Object level Long term
Locally EC2 based Data System images
accessible file applications distribution Database
systems DR Durable media backups
Workloads deployments storage Data archives
with local data
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
18. Use case journey
On-premise On-instance Object level Long term
Locally EC2 based Data System images
accessible file applications distribution Database
systems DR Durable media backups
Workloads deployments storage Data archives
with local data
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
19. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications distribution Database
systems DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
20. Getting data into the cloud
Direct connect, import/export and storage gateway
AWS Direct Connect AWS Import/Export Amazon Storage Gateway
Dedicated bandwidth between you Physical transfer of media into and Shrink-wrapped gateway for volume
site and AWS out of AWS synchronization
21. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications distribution Database
systems DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
22. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
24. Curiosity
The mars.jpl.nasa.gov website
is based on the open-source
Content Management System
(CMS) Railo, running on
Amazon EC2
Shared storage for Railo is
provided by Amazon EC2
instances running Gluster on a
pool of Amazon Elastic Block
Store (EBS) volumes for
consistently high performance
disk I/O.
25. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
performance Scalability
26. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
Database
performance
as a
service
3 Scalability
28. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
Database
performance
as a
service
3 Scalability
29. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2 4
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
Database
performance
Object
as a
service
3 Scalability
serving
and
storage
31. You put in it S3
AWS stores with 99.999999999% durability
32. Highly scalable web
access to objects
You put in it S3
AWS stores with 99.999999999% durability
Multiple redundant
copies in a region
33.
34. “Spotify needed a storage solution that
could scale very quickly without incurring
long lead times for upgrades. This led us to
cloud storage, and in that market, Amazon
Simple Storage Service (Amazon S3) is the
most mature large-scale product.
Amazon S3 gives us confidence in our
ability to expand storage quickly while also
providing high data durability.”
Emil Fredriksson, Operations Director
35. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data System images
accessible file applications and distribution
Disks
Database
systems data
DR Durable media backups
Workloads deployments storage Data archives
with local data
Getting
data into 2 4
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
Database
performance
Object
as a
service
3 Scalability
serving
and
storage
36. Use case journey
On-premise On-instance Object level Long term
Locally 1 EC2 based Data Cold System images
accessible file applications and distribution storage & Database
Disks
systems data
DR Durable mediaarchiving backups
Workloads deployments storage Data archives
with local data
Getting
data into 2 4 5
the cloud
High IO High IO performance Good Very low price
performance Provisioned IOPS performance High durability
High network Backup & Restore High durability Slow access
Database
performance
Object
as a
service
3 Scalability
serving
and
storage
37. What we heard from you
You love Amazon S3
for its simplicity,
security, durability,
and performance.
38. What we heard from you
You love Amazon S3 You wanted a highly
for its simplicity, secure, extremely
security, durability, durable, and extremely
and performance. cost effective option for
archiving data for years
39. The need…
Reliable and cheap storage of data
Data with long Multi-PB, infrequently
retention periods accessed data sets
42. Our goals with Glacier…
Redefine data archiving Replace physical media for
and backup: archiving:
no upfront payments an easy to use storage service that is
a very low price for storage infinitely scalable
ability to scale up and down as a secure service for important data
needed assets
designed for an annual average
99.999999999% durability per saved
object
for as little as $0.01 per gigabyte per month
48. Glacier allows you to cost-effectively and securely store
Offsite archive enterprise data offsite, making it simple, inexpensive and safe
to retain archived data for as long as desired. Common use
cases include enterprise data, media assets, and research and
scientific data
49. Glacier allows you to cost-effectively and securely store
Offsite archive enterprise data offsite, making it simple, inexpensive and safe
to retain archived data for as long as desired. Common use
cases include enterprise data, media assets, and research and
scientific data
Libraries, historical societies, non-profit organizations and
Digital preservation governments are increasing their efforts to preserve
valuable but aging digital content such as websites, software
source code, video games, user-generated content and
other digital artifacts
50. Glacier allows you to cost-effectively and securely store
Offsite archive enterprise data offsite, making it simple, inexpensive and safe
to retain archived data for as long as desired. Common use
cases include enterprise data, media assets, and research and
scientific data
Libraries, historical societies, non-profit organizations and
Digital preservation governments are increasing their efforts to preserve
valuable but aging digital content such as websites, software
source code, video games, user-generated content and
other digital artifacts
Amazon Glacier is cost competitive, even at scale, and
Tape replacement eliminates pain points like capacity planning, capital
budgeting and investments, media formats, hardware
refreshes, and off-site storage costs, shipping and
retrieving
51. Good reasons to replace off-site tape archives
100% restore success rate – no broken or missing
tapes
No lost tapes and improved security posture
No device or media admin or handling
No capacity planning
Pay as you go
No need for recurrent and risky data migrations
56. What is an archive?
Any object, such as a photo, video, document or
compressed collection
It is a base unit of storage in Amazon Glacier
Upload an archive in a single request
For large archives use multipart upload API
57. API credentials
Glacier client (keys)
Region endpoint
client = new AmazonGlacierClient(credentials);
client.setEndpoint("https://glacier.us-east-1.amazonaws.com/");
ArchiveTransferManager atm = new ArchiveTransferManager(client, credentials);
UploadResult result = atm.upload(vaultName, ”MyArc “, new
File(archiveToUpload));
Transfer manager
File to upload
Vault & archive
name Java
58. Transfer manager
Region endpoint
var manager = new
ArchiveTransferManager(Amazon.RegionEndpoint.USEast1);
string archiveId = manager.Upload(vaultName, ”MyArchive",
archiveToUpload).ArchiveId;
Vault & archive
File to upload name
.net
62. Initiate job
JobParameters jobParameters = new JobParameters()
Glacier .withArchiveId("*** provide an archive id ***")
client .withDescription("archive retrieval")
.withType("archive-retrieval");
InitiateJobResult initiateJobResult =
client.initiateJob(new InitiateJobRequest()
.withJobParameters(jobParameters)
.withVaultName(vaultName));
String jobId = initiateJobResult.getJobId();
JobID to track Java
63. Track job
After 3-5 hours:
1. SNS topic notification
2. Call describeJob
Using JobID
64. API credentials Download job
Glacier client (keys)
Region endpoint
client = new AmazonGlacierClient(credentials);
client.setEndpoint("https://glacier.us-east-1.amazonaws.com/");
ArchiveTransferManager atm = new ArchiveTransferManager(client, credentials);
atm.download(vaultName, archiveId, new File(downloadFilePath));
Transfer manager
Download path
Vault name &
archive id Java
65. Download job
var manager = new ArchiveTransferManager(Amazon.RegionEndpoint.USEast1);
var options = new DownloadOptions();
options.StreamTransferProgress += ArchiveDownloadHighLevel.progress;
manager.Download(vaultName, archiveId, downloadFilePath, options);
static int currentPercentage = -1;
static void progress(object sender, StreamTransferProgressArgs args)
{
if (args.PercentDone != currentPercentage)
{
currentPercentage = args.PercentDone;
Console.WriteLine("Downloaded {0}%", args.PercentDone);
}
}
.net
66. “Every day our genome sequencers produce
terabytes of data. As our company moves into
the clinical space, we face a legal
requirement to archive patient data for years
that would drastically raise the cost of
storage.
Thanks to Amazon Glacier’s secure and
scalable solution, we will be able to provide
cost-effective, long-term storage and thereby
eliminate a barrier to providing whole genome
sequencing for medical treatment of cancer
and other genetic diseases.”
67. “An organization like ours thinks in centuries
when it comes to content retention, and long
term preservation of our Master Archives is a
critical part our mission here at NYPR.
Storing these core assets on traditional media
such as local disk and off-site tape exposes us to
corruption and even outright-loss of data. We
are excited to move our archives to Amazon
Glacier, which will be a better long-term
solution.”
Steve Shultis, CTO, New York Public Radio
70. Storage Retrievals Data In Data Out
From $0.1 per GB Free up to 5% of Free Tiered (1st GB free)
average monthly
storage the tiered
fees
71. Storage Retrievals Data In Data Out
From $0.1 per GB Free up to 5% of Free Tiered (1st GB free)
average monthly
storage the tiered
fees
Anticipation is archives will be accessed infrequently
Storage is cheap, trade-off on retrieval pricing
72. Benefits of Amazon Glacier
Low cost Secure
As little as $0.01/GB/month with no up-front capital Secure and durable technology platform with
commitments. industry-recognized certifications and audits.
Durable Simple
Average annual durability of 99.999999999% per Eliminate hardware, software, and capacity
archive. planning.
Flexible Use multiple services
Add any amount of data, quickly. Easily expire and Easily leverage other AWS services once your data is
delete without handling media. in the AWS cloud.