Value Mining: How Entity Extraction Informs Analysis
AWS Overview: Cloud Computing With AWS
1. Cloud Computing With AWS
An Overview
Tim Bixler
Federal Solutions Architecture Manager & Principal Solutions Architect
Worldwide Public Sector
October 11, 2012
4. Consumer Business Seller IT Infrastructure
Business Business
Tens of millions of Sell on Amazon Cloud computing
active customer websites infrastructure for
accounts hosting web-scale
Use Amazon
solutions
technology for your
own retail website
Eight countries: Hundreds of
US, UK, Germany, Leverage Amazon’s thousands of
Japan, France, Canada, massive fulfillment registered customers
China, Italy center network in over 190 countries
5. Over 10 years in the making
Enablement of sellers on Amazon
Internal need for scalable deployment environment
Early forays proved developers were hungry for more
6. AWS Mission
Enable businesses and developers to
use web services* to build scalable,
sophisticated applications.
*What people now call “the cloud”
9. Utility computing
On demand Pay as you go
Compute
Scaling
Security
CDN Backup
DNS
Database
Storage Load Balancing
Workflow Monitoring
Networking
Uniform Messaging Available
10. No Up-Front Capital Expense
On-Premise
Up-Front On-Premise Costs VariableCloud Computing Costs
Cloud Computing
Physical Space
Cabling
Power
Cooling
Networking
Racks
Servers
$0
to Get Started
Storage no long-term contracts
Certification
Labor
11. Elastic capacity Traditional IT
capacity
Capacity
Time
Your IT needs
12. Elastic capacity
On and Off Fast Growth
Variable peaks Predictable peaks
13. Elastic capacity
WASTE
On and Off Fast Growth
Variable peaks Predictable peaks
CUSTOMER DISSATISFACTION
14. Elastic capacity
On and Off Fast Growth
Variable peaks Predictable peaks
17. 40 servers to 5000 in 3 days
EC2 scaled to peak of 5000
instances
Number of EC2 Instances
“Techcrunched”
Launch of Facebook
modification
Steady state of ~40
instances
Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Day 8 Day 9
18. Each day AWS adds the equivalent server capacity
to power Amazon when it was a global, $2.76B
enterprise
(circa 2000)
19. Relational Database Service
Virtual Private Cloud Simple Notification Service
Elastic Map Reduce Route 53
Auto Scaling RDS Multi-AZ
Reserved Instances 2009 Singapore Region
Elastic Load Balancer 2010 Identity Access Management
48 61
Cluster Instances
Elastic Beanstalk
Simple Email Service
CloudFormation
2008 RDS for Oracle
ElastiCache
24
SimpleDB
CloudFront 2012 2011
EBS
Availability Zones
Elastic IPs 63 82
2007
Amazon FPS
9 Red Hat EC2 DynamoDB
Simple Workflow
CloudSearch
Storage Gateway
Route 53 Latency Based Routing
number of released features, sample services described
24. AWS Platform
Your Applications
Management & Administration
Identity & Access Deployment & Automation
AWS IAM Web Interface Monitoring AWS Elastic Beanstalk
Identity Federation Management Console Amazon CloudWatch
AWS CloudFormation
Consolidated Billing
Application Platform Services
Application Svcs Libraries & SDKs
Content Distribution Simple Workflow Service Parallel Processing
Java, PHP, Python,
Amazon CloudFront CloudSearch Elastic MapReduce
Ruby, .NET
Amazon SNS, SQS, SES
Foundation Services
Storage Database Networking
Compute Amazon RDS Amazon VPC
Amazon S3
Amazon EC2 Amazon SimpleDB Elastic Load Balancing
Amazon EBS
Auto Scale Amazon ElastiCache Amazon Route 53
Amazon StorageGateway
Amazon DynamoDB AWS Direct Connect
Availability Zones
AWS Global Infrastructure Edge Locations
Regions
25. AWS Platform
Your Applications
Management & Administration
Identity & Access Deployment & Automation
AWS IAM Web Interface Monitoring AWS Elastic Beanstalk
Identity Federation Management Console Amazon CloudWatch
AWS CloudFormation
Consolidated Billing
Application Platform Services
Application Svcs Libraries & SDKs
Content Distribution Simple Workflow Service Parallel Processing
Java, PHP, Python,
Amazon CloudFront CloudSearch Elastic MapReduce
Ruby, .NET
Amazon SNS, SQS, SES
Foundation Services
Storage Database Networking
Compute Amazon RDS Amazon VPC
Amazon S3
Amazon EC2 Amazon SimpleDB Elastic Load Balancing
Amazon EBS
Auto Scale Amazon ElastiCache Amazon Route 53
Amazon StorageGateway
Amazon DynamoDB AWS Direct Connect
Availability Zones
AWS Global Infrastructure Edge Locations
Regions
26. AWS Global Infrastructure
GovCloud US West US West US East South EU Asia Asia
(US ITAR (Northern (Oregon) (Northern America (Ireland) Pacific Pacific
Region) California) Virginia) (Sao Paulo) (Singapore) (Tokyo)
AWS Regions (8)
AWS Edge Locations (33)
27. AWS Regions & Availability Zones
Customer Decides Where Applications and Data Reside
Note: Conceptual drawing only. The number of Availability Zones may vary.
28. Built to Enterprise & Gov Standards
Physical Certifications and Accreditations
• ISO 27001
• Datacenters in nondescript facilities
• SSAE 16 / ISAE 3402 / SOC1 (formerly U.S.
• Physical access strictly controlled standard SAS-70 Type II)
• FISMA Moderate & DIACAP Controls; ITAR region
• Must pass two-factor authentication at least
twice for floor access • HIPAA applications certified on AWS
• Payment Card Industry (PCI) Data Security
• Physical access logged and audited
Standard (DSS) Level 1
Hardware, Software & Network Security & Compliance Resources
• Systematic change management • Security & Compliance Center:
• Phased updates deployment http://aws.amazon.com/security
• Safe storage decommission • Security Overview & Best Practices
• Automated monitoring and self-audit
• AWS Risk & Compliance Whitepaper
• Advanced network protection systems
• Creating HIPAA Compliant Applications
29. Foundation Services
Your Applications
Management & Administration
Identity & Access Deployment & Automation
AWS IAM Web Interface Monitoring AWS Elastic Beanstalk
Identity Federation Management Console Amazon CloudWatch
AWS CloudFormation
Consolidated Billing
Application Platform Services
Application Svcs Libraries & SDKs
Content Distribution Simple Workflow Service Parallel Processing
Java, PHP, Python,
Amazon CloudFront CloudSearch Elastic MapReduce
Ruby, .NET
Amazon SNS, SQS, SES
Foundation Services
Storage Database Networking
Compute Amazon RDS Amazon VPC
Amazon S3
Amazon EC2 Amazon SimpleDB Elastic Load Balancing
Amazon EBS
Auto Scale Amazon ElastiCache Amazon Route 53
Amazon StorageGateway
Amazon DynamoDB AWS Direct Connect
Availability Zones
AWS Global Infrastructure Edge Locations
Regions
31. Compute
Amazon Elastic Compute Cloud (Amazon EC2)
EC2 Instances = Virtual Servers
• Resizable compute capacity in 14 instance types
• Reduces the time required to obtain and boot new server instances to minutes or seconds
• Scale capacity as your computing requirements change
• Pay only for capacity that you actually use
• Choose Linux or Windows
• Deploy across Regions and Availability Zones for reliability
• Flexible networking (NAT/classic, VPC, Elastic IPs)
• Support for virtual network interfaces that can be attached to EC2 instances in your VPC
32. Compute
Hi-Mem 4XL 68.4 GB High I/O 4XL 60.5 GB
26 EC2 Compute Units 35 EC2 Compute Units
128 Amazon Elastic Compute Cloud (Amazon EC2) 8 virtual cores
$1.80/2.28
$3.10/3.58
Cluster Compute 8XL 60.5 GB
64
Hi-Mem 2XL 34.2 GB 88 EC2 Compute Units
13 EC2 Compute Units $2.40/2.97
4 virtual cores
$0.90/1.14
Cluster Compute 4XL 23 GB
32 33.5 EC2 Compute Units
$1.30/1.61
Memory (GB)
Hi-Mem XL 17.1 GB
Cluster GPU 4XL 22 GB
6.5 EC2 Compute Units
33.5 EC2 Compute Units,
16
2 virtual cores
2 x NVIDIA Tesla “Fermi” M2050
$0.45/0.57
Extra Large 15 GB GPUs
8 EC2 Compute Units $2.10/2.60
4 virtual cores
Medium 3.75 GB $0.64/0.92
2 EC2 Compute Units
8 1 virtual cores
$0.16/0.23
Large 7.5 GB
4 EC2 Compute Units
2 virtual cores
$0.32/0.46
Small 1.7 GB, 32-Bit High-CPU XL 7 GB
4 1 EC2 Compute Unit
1 virtual core
$0.08/0.115 High-CPU Med 1.7 GB
5 EC2 Compute Units
20 EC2 Compute Units
8 virtual cores
$0.66/1.14
2 virtual cores
2
$0.165/0.285
Micro 613 MB
Up to 2 ECUs (for short
bursts)
$0.02/0.03
1
1 2 4 8 16 32 64 128
EC2 Compute Units (HP)
33. Compute
Auto Scaling
• Client Defined Business Rules
• Scale your Amazon EC2 capacity automatically once you define the conditions (may be 1000’s of
servers)
• Can scale up just a little…doesn’t need to be massive number of servers (may be simply 2 servers)
• Well suited for applications that experience variability in usage
• Set minimum and maximum scaling policies
• Alternate Use is for Fault Tolerance
34. Storage
S3
Import/Export
EBS
Storage Gateway
Glacier
so new we don’t have an icon!
35. Storage
Simple Storage Service (S3)
Web-scale Internet Storage
• A “Bucket” is equivalent to a “folder”
• Able to store unlimited number of Objects in a Bucket
• Objects from 1B-5 TB; no bucket size limit
• Highly available storage for the Internet (object store)
• HTTP/S endpoint to store and retrieve any amount of data, at any time, from anywhere on the web
• Highly scalable, reliable, fast, and inexpensive
• Over 1 trillion objects stored
• Peak requests 750,000+ per second
• Ideal Use Cases:
• Static web content – often used with CloudFront CDN
• Source and output storage for large-scale “Big Data” analytics
• Backup, archival, and DR storage that is always “live”
36. Objects in S3
1 Trillion
1000.000
750.000
500.000
250.000
0.000
750,000+ peak transactions per second
37. Storage
Elastic Block Store (EBS)
EBS Volumes = Virtual Disks
• Use for persistent storage
• Can use to create RAID configuration for a server
• Off-instance block storage that persists independently
• Storage volumes for use with Amazon EC2 instances – create, attach, backup, restore and delete
• Can be attached to a running Amazon EC2 instance and exposed as a block device for raw or
formatted (filesystem) access
• Volumes behave like unformatted block devices for Linux or Windows instances
• Ideas use cases:
• OS Boot device / root file system; secondary volumes/filesystems
• Typical basis for database storage
• Raw block devices for RAID, some databases
38. Storage
AWS Glacier
• A low-cost storage service for data archiving and backup
• $0.01 per GB / Month
• Optimized for data that is infrequently accessed
• Retrieval times measured in hours not days or weeks
• Annual durability of 99.999999999% for an archive
• AES 256 data at rest encryption
• Data stored as archives within a vault. Vaults are located within a specific AWS region
• Archives can be up-to 40 TB in size
39. Storage
AWS Import/Export
• Accelerates moving large amounts of data into and out of S3 or EBS
• Transfers your data directly onto and off of USB or SATA storage devices shipped to AWS with
manifest file
• Final copy uses high-speed datacenter network
40. Storage
AWS Storage Gateway
• Storage gateway service connects an on-premise software appliance with cloud-based storage
• On-premises software appliance solution to store data on Amazon S3’s storage infrastructure
• Exposes standard iSCSI interface to on-premises applications, while maintaining low-latency data
access
• Data in Amazon S3 stored as Amazon EBS snapshots for local & EC2-based recovery
• Use Cases
• Backup/Restore on-premise data
• Set up a test/dev environment with production data
• Migrating applications to the cloud
• On-premise DR/COOP to AWS
42. Database
DynamoDB
• Fully managed NoSQL database.
• Eliminates the administrative burden of data modeling, index maintenance, and performance
tuning.
• Durability and high-availability - stores data on Solid State Drives (SSDs) and replicates it
synchronously across multiple AWS Availability Zones in an AWS Region.
• Scalability - With AWS Console, you can grow your DynamoDB table from 10 to 100,000 writes per
sec.
• See video: http://www.youtube.com/watch?v=oz-7wJJ9HZ0
43. Database
Amazon Relational Database Service (RDS)
RDS • Fully-managed, tuned MySQL, Oracle 11g, or MS SQL databases
• Cost-efficient and resizable capacity
• Manages time-consuming database admin tasks
• Code, applications, and tools you already use today work seamlessly
• Automatically patches the database software and backs up your database
• Flexible Licensing: BYOL or License Include
44. Database
Amazon ElastiCache
• Fully-managed, distributed, in-memory cache
• Memcached compliant cache cluster on-demand
• Manages patching, cache node failure detection and recovery
• Simple APIs calls to grow and shrink the cache cluster
• Seamlessly caches in front of SimpleDB or RDS instances
• Integrated with CloudWatch and SNS for monitoring and alerts
45. Database
Amazon SimpleDB
• Core database functions of data indexing and querying of text data
• No schema, automatic indexing
• Eliminates the administrative burden of data modeling, index maintenance, and performance tuning
• Real-time lookup and simple querying of structured data
• Use cases:
• Metadata storage -- often used in conjunction with S3
• Structured, fine-grained data needing query
• Data needing flexible schema
47. Networking
Amazon Elastic Load Balancing
• Supports the routing and load balancing of HTTP, HTTPS and generic TCP traffic to EC2 instances
• Supports health checks to ensure detect and remove failing instances
• Dynamically grows and shrinks required resources based on traffic
• Seamlessly integrates with Auto-scaling to add and remove instances based on scaling activities
• Single CNAME provides stable entry point for DNS configuration
48. Networking
Amazon Route 53
• Route end users to Internet applications
• Answers DNS queries with low latency by using a global network of DNS servers
• Latency based routing to closest AWS endpoint (e.g. EC2 instances, Elastic IPs or ELBs)
• Deep integration with other AWS services (ELB,
EC2 NAT/EIP, etc.)
49. Networking
Amazon Virtual Private Cloud (VPC)
• Secure and seamless bridge between a company’s existing private network and the AWS cloud
• Connect existing infrastructure to a set of isolated AWS compute resources via a Virtual Private
Network (VPN) connection
• Bring your own address space and extend existing management capabilities
51. Application Platform Services
Your Applications
Management & Administration
Identity & Access Deployment & Automation
AWS IAM Web Interface Monitoring AWS Elastic Beanstalk
Identity Federation Management Console Amazon CloudWatch
AWS CloudFormation
Consolidated Billing
Application Platform Services
Application Svcs Libraries & SDKs
Content Distribution Simple Workflow Service Parallel Processing
Java, PHP, Python,
Amazon CloudFront CloudSearch Elastic MapReduce
Ruby, .NET
Amazon SNS, SQS, SES
Foundation Services
Storage Database Networking
Compute Amazon RDS Amazon VPC
Amazon S3
Amazon EC2 Amazon SimpleDB Elastic Load Balancing
Amazon EBS
Auto Scale Amazon ElastiCache Amazon Route 53
Amazon StorageGateway
Amazon DynamoDB AWS Direct Connect
Availability Zones
AWS Global Infrastructure Edge Locations
Regions
52. Content Delivery
Amazon CloudFront
• Web service for content delivery
• Distribute content to end users with low latency, high data transfer speeds, and no commitments
• Delivers your content using a global network of 33 edge locations
• Supports download, streaming, live streaming, and dynamic content
• Key features: RTMP Streaming, HTTPS Delivery, Private Content for HTTP &
Streaming, Programmatic Invalidation, Detailed Logs for HTTP & Streaming, Default Root
Object
• Use Cases: Video and Rich Media, Online Gaming, Interactive Agencies, Software Downloads, Static
Websites
• Static web content that must be delivered to global user base at Highest bandwidth /
Lowest latency / Lowest cost
54. Application Services
Amazon Simple Notification Service (SNS)
• Set up, operate, and send notifications
• Publish messages from an application and immediately deliver them to subscribers or other
applications
55. Application Services
Amazon Simple Queue Service (SQS)
• Hosted queue for storing messages as they travel between computers
• Move data between distributed components of their applications
56. Application Services
Amazon Simple Email Service (SES, beta)
• Bulk and transactional email-sending service
• Eliminates the hassle of email server management, network configuration, and meeting rigorous
Internet Service Provider (ISP) standards
• Provides a built-in feedback loop, which includes notifications of bounce backs, failed and successful
delivery attempts, and spam complaints
57. Application Services
Amazon Simple Workflow Service (SWF)
• Easily manage workflows, including state, decisions, executions, tasks and logging
• Coordinate processing steps across distributed systems
• Ensure tasks are executed reliably, in order, and without duplication
• Simple API calls that can be executed from code written in any language and run on your EC2
instances, or any of your machines located anywhere in the world that can access the Internet
58. Application Services
Amazon CloudSearch (beta)
• Fully-managed search service
• Integrate fast and highly scalable search functionality into applications
• Scales automatically: with increases in searchable data or as query rate changes
• AWS manages hardware provisioning, data partitioning, and software patches
59. Parallel Processing
Amazon Elastic MapReduce (EMR)
• Managed Hadoop 0.20.205 infrastructure
• Reduces complexity of Hadoop management
• Handles node provisioning, customization, and shutdown
• Tunes Hadoop to your hardware and network
• Provides tools to debug and monitor your Hadoop clusters
• Provides tight integration with AWS services
• Optimized for Amazon Simple Storage Service (S3)
• EC2 integration with automatic re-provisioning on node failure
• Cluster monitoring/alarming through CloudWatch
• Leverages significant operational experience
• Monitor thousands of clusters per day
• Use cases span from University students to Fortune 50
60. Libraries & SDKs
• Your choice of programming language (Java, PHP, Python, Ruby, .NET) and mobile platform
(Android, iOS)
• The Developer Centers contains sample code, documentation, tools, and additional resources to
help you build applications on Amazon Web Services.
• http://aws.amazon.com/java/
• http://aws.amazon.com/mobile/
• http://aws.amazon.com/php/
• http://aws.amazon.com/python/
• http://aws.amazon.com/ruby/
• http://aws.amazon.com/net/
61. Management & Administration
Your Applications
Management & Administration
Identity & Access Deployment & Automation
AWS IAM Web Interface Monitoring AWS Elastic Beanstalk
Identity Federation Management Console Amazon CloudWatch
AWS CloudFormation
Consolidated Billing
Application Platform Services
Application Svcs Libraries & SDKs
Content Distribution Simple Workflow Service Parallel Processing
Java, PHP, Python,
Amazon CloudFront CloudSearch Elastic MapReduce
Ruby, .NET
Amazon SNS, SQS, SES
Foundation Services
Storage Database Networking
Compute Amazon RDS Amazon VPC
Amazon S3
Amazon EC2 Amazon SimpleDB Elastic Load Balancing
Amazon EBS
Auto Scale Amazon ElastiCache Amazon Route 53
Amazon StorageGateway
Amazon DynamoDB AWS Direct Connect
Availability Zones
AWS Global Infrastructure Edge Locations
Regions
62. Web Console
On-demand, Self Service
Management Access
63. Identity & Access Management
• IAM enables customers to create and manage users in AWS’s identity
system
• Identity Federation with local directory is an option for
enterprises
• Very familiar security model
• Users, groups, permissions
• Allows customers to
• Create users
• Assign individual passwords, access keys, multi-factor
authentication devices
• Grant fine-grained permissions
• Optionally grant them access to the AWS Console
• Organize users in groups
64. Consolidated Billing with IAM
• Allows you to get one bill for multiple accounts
• You can easily track each account's costs and download the cost data in
CSV format
• You may be able to reduce costs by combining usage from all the
accounts to qualify for volume pricing discounts
65. Deployment and Management
AWS Elastic Beanstalk (beta)
• Simply upload your application (Java, NET, and PHP)
• Automatically handles the deployment details of capacity provisioning, load balancing, auto-
scaling, and application health monitoring
• Retain full control over the AWS resources powering your application
67. Deployment and Management
Amazon CloudWatch
• Visibility into resource utilization, operational performance, and overall demand patterns
• Metrics such as CPU utilization, disk reads and writes, and network traffic
• Accessible via the AWS Management Console, web service APIs or Command Line Tools
• Add custom metrics of your own
• Alarms (which tie into auto-scaling, SNS, SQS, etc.)
• Billing Alerts to help manage charges on AWS bill
68. Your Applications
Your Applications
Management & Administration
Identity & Access Deployment & Automation
AWS IAM Web Interface Monitoring AWS Elastic Beanstalk
Identity Federation Management Console Amazon CloudWatch
AWS CloudFormation
Consolidated Billing
Application Platform Services
Application Svcs Libraries & SDKs
Content Distribution Simple Workflow Service Parallel Processing
Java, PHP, Python,
Amazon CloudFront CloudSearch Elastic MapReduce
Ruby, .NET
Amazon SNS, SQS, SES
Foundation Services
Storage Database Networking
Compute Amazon RDS Amazon VPC
Amazon S3
Amazon EC2 Amazon SimpleDB Elastic Load Balancing
Amazon EBS
Auto Scale Amazon ElastiCache Amazon Route 53
Amazon StorageGateway
Amazon DynamoDB AWS Direct Connect
Availability Zones
AWS Global Infrastructure Edge Locations
Regions
70. A scalable compute platform
• Researchers and scientists want:
– A platform that can scale
– Offers choice at run time
– Can be automated to run complex workflows
– Don’t want to be bothered about the muck of
managing infrastructure
• AWS provides Just-in-Time infrastructure
75. GPUs for Molecular Dynamics
GPU compute instances
Intel® Xeon® X5570 processors
2 x NVIDIA Tesla “Fermi” M2050 GPUs
I/O Performance: Very High (10 Gigabit Ethernet)
33.5 EC2
Compute Units
20GB RAM
2x NVIDIA GPU
@ >400 Cores
Each
Cluster GPU
76. CC2 Instance Cluster
240 TFLOPS
Making it the 72nd fastest
supercomputer in the world
Yours for $2554/hr – on demand
77. A cluster that you can automate, control, auto-scale…
CLI, API and Console
Scripted configurations as-create-auto-scaling-group MyGroup
--launch-configuration MyConfig
--availability-zones eu-west-1a
--min-size 2
--max-size 200
ec2-run-instances ami-b232d0db
--instance-count 3
--availability-zone eu-west-1a
--instance-type m1.small
78. …and coordinate workloads and task clusters in
Handle long running processes across many nodes and task steps
with Simple Workflow
1
2
Task A
3
Task B
(Auto-
scaling)
Task C
80. Harvard Medical School
The Laboratory of Personal Medicine
Run EC2 clusters to analyze entire genomes
“The AWS solution is stable, robust, flexible, and low cost. It
has everything to recommend it.”
Dr. Peter Tonellato, LPM, Center for Biomedical Informatics, Harvard Medical School
Leverage Spot instances in workflows
1 days worth of effort
resulted in
50% savings in cost
83. Data Ingestion
• AWS Import/Export
AWS
– Move large amounts of data into and outside AWS
Import/Export – Data Migration, Content Distribution, DR, etc.
• AWS Direct Connect
AWS Direct – Secure private link to AWS
Connect
– 1Gbps, 10Gbps connectivity
– You can also co-locate hardware in AWS DX locations
• Bandwidth Optimization Solutions
– Commercial providers – Aspera, Riverbed, Attunity, etc.
– Open Source – Tsunami UDP, Globus Online
84. Data Collection
Fully managed SQL, NoSQL and object storage
Relational Database Service DynamoDB S3
Fully managed database NoSQL, Schemaless, Prov Object datastore up to 5TB
(MySQL, Oracle, MSSQL) isioned throughput per object
database 99.999999999% durability
85. Data Archival
• Announcing Amazon Glacier
– Meet your regulatory requirements
– Long term archival
– 11 9’s of durability as S3 standard
– All data encrypted using Server Side Encryption
– Starting at $0.01/GB/month
“Every day our genome sequencers produce terabytes of data. As our company
moves into the clinical space, we face a legal requirement to archive patient data
for years that would drastically raise the cost of storage. Thanks to Amazon
Glacier’s secure and scalable solution, we will be able to provide cost-
effective, long-term storage and thereby eliminate a barrier to providing whole
genome sequencing for medical treatment of cancer and other genetic diseases.”
- Keith Raffel, Senior Vice President and Chief Commercial Officer, Complete
Genomics
86. Share your data
• Share Amazon Machine Images (AMIs)
– Share installations of your software packages and tools with collaborators
so that they can duplicate your set up using EBS snapshots
– Collaborate by sharing your images with partners and customers
• Share architecture templates
– Share the collection of resources required to run your pipeline with
collaborators by using CloudFormation templates
• Share data
– Decouple your compute from data and share storage buckets with
collaborators
– Create Requester Pays buckets so the charges associated with accessing
data are paid by the requesters
87. AWS Public Data Sets
• A centralized repository of public datasets
• Seamless integration with cloud based applications
• No charge to the community
• Some of the datasets available today:
– 1000 Genomes Project
– Ensembl
– GenBank
– Illumina – Jay Flateley Human Genome Dataset
– YRI Trio Dataset
– The Cannabis Sativa Genome
– UniGene
– Influenza Virrus
– PubChem
• Tell us what else you’d like for us to host …
96. Architect to use cloud strengths
Elastic Load Balancing Route 53 RDS Auto-scaling
Use at regional level Leverage SLA Scale databases without Dynamically scale resources &
Combined with autoscaling will Improve application reliability with admin overhead control costs
balance requests and resource Route 53’s SLA on requests served Choose instance size for databases Only provision the resources that
capacity across availability zones and scale up over time are required with scale up and cool
Weighted routing down policies that match demand
Within VPC Perform A/B analysis, and staged Add high availability from
Use to loadbalance between application roll-outs by moving a management console
application tiers within an portion of traffic to new Create master-slave configurations
availability zone infrastructure and read-replicas. AWS takes care of
the failover and recreation of a new
Instance migrations Control TTLs and updates slave in event of master DB loss
Easily move instances from dev Take absolute control of DNS
environments to test environments updates for more decisive system
by moving between ELBs updates
97. Services not software
Use AWS services + Your technology skills
=
Less time managing and installing software
More time focused on mission applications
let AWS do the heavy lifting
98. Services not software
Relational Database Service
Use RDS for databases Database-as-a-Service
No need to install or manage database instances
Scalable and fault tolerant configurations
DynamoDB Use DynamoDB for
Provisioned throughput NoSQL database high performance key-
Fast, predictable performance
value DB
Fully distributed, fault tolerant architecture
99. Services not software
Processing results
Amazon SQS Reliable message
Reliable, highly scalable, queue service
Amazon SQS
queuing without
for storing messages as they travel
between instances
additional software
Processing
task/processing
trigger 1
2
Push inter-process Simple Workflow Task A
workflows into the Reliably coordinate processing steps
Task B 3
across applications
cloud with SWF (Auto-scaling)
Integrate AWS and non-AWS resources
Manage distributed state in complex
systems Task C
100. Services not software
Document
Cloud Search Server
Don’t install search
Elastic search engine based upon
software, use
Amazon A9 search engine
CloudSearch Fully managed service with sophisticated
feature set
Search
Scales automatically
Server
Results
Elastic MapReduce
Elastic Hadoop cluster
Process large volumes
Integrates with S3 & DynamoDB of data cost effectively
Leverage Hive & Pig analytics scripts with EMR
Integrates with instance types such as
spot
102. NASA - Mission Data Processing
Challenge
Because of the latency of data transmission from and to
Mars, during a 2 hour window, it took mission planners 90 minutes
to process telemetry data from the Mars Rover, 20 mins to decide
where to move the Rover to, and 10 mins to up load the data.
Solution
NASA-JPL, loading their custom software application on EC2, was
able to horizontally scale the number of virtual machines
supporting the data processing.
Benefit
Reduced data processing time from 90 minutes to 15 minutes using
parallel processing
Increased mission planning time, resulting in high quality scientific
observations
103. NASA - Mission Data Processing
Daily Mars Rover Data Processing Window
Pre-cloud:
Process Plan Upload
Cloud:
Process Plan Upload
Increase available mission planning time from
15 minutes to 105 minutes!
104. “We were able to reduce our DNS
costs by ninety-three
percent, which in tandem allowed
us to shorten our time-to-live
(TTLs) for easier, timelier
management of DNS records.”
Nathan Butler
The Newsweek/Daily Beast Company
105. RECOVERY.GOV – Website/App Hosting
Challenge
Recovery and Transparency Board needed a platform for their
website that was scalable, secure, could be quickly deployed, and
saved tax payer money
Solution “By migrating to the public cloud, the Recovery
Board is in position to leverage many advantages
RATB chose a FISMA-compliant cloud computing solution based on
including the ability keep the site up as millions of
Amazon Web Services Americans help report potential fraud, waste, and
deployed applications: abuse. The Board expects savings of about
Microsoft SharePoint for web Content Management $750,000 during its current budget cycle and
Business Objection SAP for BI significantly more savings in the long-term.”
Benefit - Vivek Kundra, CIO, United States
• Avoided Capital expense, and added capacity to scale up and
down based on demand
• Saved $750k per year in first year and additional dollars from
existing solution
109. Examples of Customer Responsibilities
• Apply Your Information Management Program - • Use Encryption – for data in transit, for data at
that integrates Information Assurance rest, filesystem
• Standardize Machine Images – create gold copy • Key Management – rotate keys used to access
images for production deployment/to launch new your resources (AWS does not hold these…you do)
instances • Setup Monitoring/Alerting – collect metrics and
• Build and test in a sandbox environment – work enable alerting for when events occur
out the bugs, figure out how to break it, architect • Vulnerability Scans – allowed via a permission
to be resilient process (else we’ll kill/block the source of scans)
• Do the same stuff you do in-house – quarterly • Prepare for Failure – create backups, store data
patch in more than one location, test backups, have a
management, IDS/IPS, logging, tripwire, etc. contingency system ready
• Conduct a Risk Assessment - to determine level of
security controls you require
• Role Based Access Controls – restrict access to
system components based upon need to know
110. Build upon AWS features
Dedicated Instances Security Groups VPC Direct Connect & VPN
Single Tenant Instance firewalls Subnet control Private connections to VPC
Physical Nodes Firewall control on instances via Create low level networking Secured access to resources in AWS
Run your virtualized operating Security Groups constraints for resource access, such over software or hardware VPN and
systems and apps in a “single tenant as public and private dedicated network links
per physical node” model within the CLIs and APIs subnets, internet gateways and
AWS infrastructure NATs
Instantly audit your entire AWS
infrastructure from scriptable APIs –
generate an on-demand IT inventory
Bastion hosts
enabled by programmatic nature of Only allow access for management
AWS of production resources from a
bastion host. Turn off when not
needed
111. AWS Multi-Factor Authentication
Groups Account Roles
• Helps prevent access based on unauthorized
knowledge of your e-mail address and password
• Additional protection for account information Administrators Developers Applications
• Works with master account and IAM users
• Integrated into Jim Brad Reporting
• AWS Management Console
• Key pages on the AWS Portal Bob Mark Console
• S3 (Secure Delete)
Susan Tomcat
• Virtual MFA (using OATH standard)
Kevin
Multi-factor authentication AWS system entitlements
112. Account Management/Isolation
Consolidated Billing
Payor Account
Linked Account
Linked Account Linked Account Linked Account
Reseller Internal
Customer 1 Customer 2 Customer 3 Use
Identity & Access Management
End User 1 End User 1 End User Group Reseller User 1
End User 2 End User 2 End User 1 Reseller User 2
End User 3 End User 3 End User 2 Reseller User 3
End User 4 End User 3 Reseller User 4
End User 5 End User 4
113. AWS GovCloud – Who can Use?
• US Government/State/Local Clients & organizations conducting
work on their behalf
• AWS will screen customers prior to providing access to the AWS
GovCloud (US). Customers must be:
• U.S. Persons
• Not subject to export restrictions
• Agree to comply with U.S. export control laws and
regulations, including the International Traffic In Arms
Regulations
We are often asked the question: how did Amazon get into cloud computing? Amazon is really good at providing an immense selection of products, and of shipping those products to customers efficiently. But behind that online capability lies years of experience in providing technical services to the business that ensures our online stores are secure, fast, always available and capable of meeting huge seasonal demand.
Amazon Web Services is part of Amazon.com. Most of us at some point in time have used the online amazon retail store to buy books, cd's and gifts for friends and family. There are three parts to the amazon business: Our retail consumer business where amazon stocks and ships many thousands of different products, our seller business that enables retailers to sell through the same world class online store as amazon, and finally amazon web services, our IT infrastructure business.
Over ten years ago, the technical teams supporting Amazon were moving from providing software and hardware capabilities to a service orientated approach - that is packaging things in an easy to consume way so that deployments by parts of the business were easier, faster and more scalable. As Amazon opened up the it's internal services to third party sellers, and we published simple web services such as our catalog search, it became apparent very quickly that developers were hungry for more, and that Amazon had developed significant technical know-how that could be packaged for others to use. We asked ourselves 'what if we could package everything we do and offer it to others over the web?'. 'What if other businesses could leverage the scale and reach of Amazon.com?'
So in 2006 Amazon Web Services was born. It's mission was clear: to enable businesses and developers to use web services to scalable sophisticated applications. It's interesting to note that what we called Web Services, has now morphed into a common term 'the Cloud'. Amazon Web Services is and always has been a distinct and individual Amazon organization.
To help understand why Amazon Web Services and Cloud Computing are changing IT delivery, a nice comparison to make is that of a utility like electricity. When electricity was discovered businesses would generate their own, using steam generators to power factories. When electricity was brought together under a national system of supply, it was no longer necessary for everyone to generate their own and buy and maintain their generators, you could simply tap into the grid and use what you needed, paying only for what you did use, and be assured that the electricity you consumed was consistent and always available.
Utility computing brings those same benefits to the deliver of IT - the factories of many businesses.
services that are normally expensive to manage or difficult to use become available on-demand, in a uniform and available way, and only paid for when used. Just like electricity.This is what AWS does. It takes away the hard work from providing infrastructure IT services and makes them available to anyone on a pay as you go basis.
Utility computing brings those same benefits to the deliver of IT - the factories of many businesses.
Traditional IT capacity planning, by the very nature of the logistics of acquiring hardware, installation, configuration and networking, has to take a forward looking view. Complex estimates of the utilization of resources are made in order to handle the peaks you anticipate. Shown here in red is the level of resources a business needs to install in order to handle the peak needs of a service. Demand on that service might vary by the time of day, week, month or year, or be driven by exceptional demand driven by promotions or seasonal events.
There are many patterns of usage that make capacity planning a complex science. From on and off usage patterns, where capacity is only needed at fixed times and not at others, fast growth where an online service becomes so successful that step changes in traditional capacity need to be added, variable peaks - where you just don't know what demand will be when and best guess applies, to predictable peaks such as during commute times as customers use mobile devices to access your service.
Each of these examples is typified by wasted IT resources. Where you planned correctly, the IT resources will be over provisioned so that services are not impacted and customers lost during high demand. In the worst cases, that capacity will not be enough, and customer dissatisfaction will result. Most businesses have a mix differing patterns at play, and much time and resource is dedicated to planning and management to ensure services are always available. And when a new online service is really successful, you often can't ship in new capacity fast enough. Some say that's a nice problem to have, but those that have lived through it will tell you otherwise!
You control how and when your service scales, so you can closely match increasing load in small increments, scale up fast when needed, and cool off and reduce the resources being used at any time of day. Even the most variable and complex demand patterns can be matched with the right amount of capacity - all automatically handled by AWS.
Elasticity works from just 1 EC2 instance to many thousands. Just dial up and down as required.
Back in 2008, they launched a Facebook application that lets people tell their friends when they've uploaded a video that includes that friend. When people saw the music videos their friends created when the application shared it with them, they wanted to go out and create their own videos. Shortly after launching their social networking modification, they were featured on Techcrunch. As you can imagine, this brought them a lot of unexpected traffic. In the course of 3 days, they went from running on 40 instances to 5,000 instances. Because they were using Amazon Web Services, they were able to handle all of this incoming traffic without having to do a thing. AWS managed it all for them.
To give you an idea of the scale that AWS operates at, even though AWS is only 6 years old, each day the equivalent server capacity to run Amazon.com when it was a 2.7 billion dollar business is added.
And over time the pace of innovation has been intense. Since the very early days of 2007 year on year we have added more and more services to help customers deliver world class applications. From Relational Database Service to DynamoDB, each service is delivered with the same focus on reliability, scale, ease of use. AWS is a technical tool box of sophisticated building blocks, all available at the end of a web service call.
Essential to our success. Both help reduce the friction to move to the cloud.ESRI… CycleComputing
Microsoft WindowsMultiple flavors of Linux including our own Amazon Linux flavorPlatform agnostic. If it runs on x86 most likely it will run on AWS. Offers flexibility/mobilityRich SDK libraryEnterprise certified applications
We take care of the MuckGlobal Infrastructure – Cloud Regions, Availability Zones and Edge Locations Foundation Services – Core infrastructure as a services. Compute, Storage, Database, and Networking,Application Platform Services – Kind of the glue that ties everything togetherManagement and Administration – Makes it easy to deploy and administer. You deliver value to the end userApplication – Your application lives at the top and leverages each layer of the stack.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
This is one of the most important slides to remember, specifically the difference between Cloud regions and Availability Zones.We have 8 cloud regions… when you put your data in a region it stays there. We will never copy your data into another cloud regions.Now within a single region we have 2 or more availability zones. An availability zone is a collection of one or more datacenters that is on separate power, flood plain and ISPs. This allows you create highly available applications across multiple AZ’s in a single region. Having 2 or more availability zones also allows us to create highly durable services such as our object storage which has 11 9s of durability.Do not automatically create a DR/Coop site… customer responsibility… we provide the infrastructure While many of our services have various SLAs the SLA of your application and business needs... Leverage our infrastructure and define your SLA
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
Forcing function – keep a server up at al time or make sure there is at least 1 server up in separate AzsA “policy” into which you launch instances.
The commitment to Amazon Web Services is also shown through the products offered.
Storage for the Internet. Natively online, HTTP/HTTPS accessStore and retrieve any amount of data, any time, from anywhere on the webHighly scalable, reliable, fast and durable 11 nines of durablity.11 nines of durability - Designed to provide 99.999999999% durability and 99.99% availability of objects over a given year. Designed to sustain the concurrent loss of data in two facilities.NOTposix… u
And scale is something AWS is used to dealing with. The Amazon Simple Storage Service, S3, recently passed 1 trillion objects in storage, with a peak transaction rate of 750 thousand per second. That's a lot of objects, all stored with 11 9's of durability.
Block storage volumes for use with Amazon EC2 instancesAttach to running instance and expose as a block deviceOff-instance storage that persists independently of EC2 instancesSnapshots stored durably in S3Block level device – like SAN. Format or encrypt however you like.
The commitment to Amazon Web Services is also shown through the products offered.
Accelerates moving large amounts of data into and out of S3 or EBSTransfers your data directly onto and off of storage devices
Connect an on-premises software appliance with cloud-based storage.Securely upload data to the AWS cloud for cost effective backup and rapid disaster recoveryBack up point-in-time snapshots of your on-prem application data to S3 for future recoveryMirror your on-prem data to EC2 instances
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
Talking PointsHosted and managed MySQL or Oracle or MS SQLManages time-consuming database admin tasksBackups/snapshotsUpgradesReplicationCode/apps/tools integrate seamlessly: just change connection stringMulti-AZ and Read-Replica options for MySQLNarrativeAmazon RDS is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. Amazon RDS gives you access to the full capabilities of a familiar MySQL database. This means the code, applications, and tools you already use today with your existing MySQL databases work seamlessly with Amazon RDS. Amazon RDS automatically patches the database software and backs up your database, storing the backups for a user-defined retention period. You also benefit from the flexibility of being able to scale the compute resources or storage capacity associated with your relational database instance via a single API call.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
Latency and Weighted round robin routing (“DNS load balancing)”
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
Talking PointsMessaging is very important for developing scalable appsEffective scaling – especially horizontally – requires decouplingSeparating components into simplest formsUse messages to communicate between componentsState, tasks, etcNarrativeMessaging is a very important concept when developing applications that scale well in the cloud. For applications to scale effective – especially in the horizontal direction – they should be decoupled (i.e., broken into their simplest components). Messages are used by an application’s decoupled components to communicate things like state, tasks, etc. Probably no surprise at this point is the fact that AWS offers several different services that address unique messaging requirements: Amazon Simple Notification Service (SNS) for delivering messages to HTTP or e-mail endpoints; Amazon Simple E-mail Service Beta (SES) for delivering more traditional e-mail content exclusively to e-mail address; and Amazon Simple Queue Service (SQS) for passing messages between computers in a highly available and distributed messaging queue.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
NASA JPL uses this for their Mars Exploration Rover program to process stereo images that come down from the rover.SWFgives you the ability to build and run distributed, fault-tolerant applications that span multiple systems (cloud-based, on-premise, or both). Amazon Simple Workflow coordinates the flow of synchronous or asynchronous tasks (logical application steps) so that you can focus on your business and your application instead of having to worry about the infrastructure. We provide the piping for async/decoupled orchestration of both cloud and on prem infrastructureA Workflow is the automation of a business process.A Domain is a collection of related Workflows.Actions are the individual tasks undertaken to carry out a Workflow.Activity Workers are the pieces of code that actually implement the tasks. Each kind of Worker has its own Activity Type.A Decider implements a Workflow's coordination logic.
Create and configure a Search Domain. This is a data container and a related set of services. It exists within a particular Availability Zone of a single AWS Region (initially US East).Upload your documents. Documents can be uploaded as JSON or XML that conforms to our Search Document Format (SDF). Uploaded documents will typically be searchable within seconds. You can, if you'd like, send data over an HTTPS connection to protect it while it is transit.Perform searches.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
Talking PointsQuickly deploy Java, NET or PHP web app to AWS cloudSimply upload the fileStores in S3Creates EC2 instance w/ Tomcat 6 or 7 and deploys appCloudWatch alarms and AutoscalingSNS to track progress and eventsAfter stack creation, you have full control of environmentSSH into instanceChange AutoScaling parametersNarrative“Easy to begin, Impossible to outgrow”AWS Elastic Beanstalk is an even easier way for you to quickly deploy and manage applications in the AWS cloud. You simply upload your application, and Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing, auto-scaling, and application health monitoring. At the same time, with Elastic Beanstalk, you retain full control over the AWS resources powering your application and can access the underlying resources at any time. Elastic Beanstalk leverages AWS services such as Amazon EC2, Amazon S3, Amazon Simple Notification Service, Elastic Load Balancing, and Auto-Scaling to deliver the same highly reliable, scalable, and cost-effective infrastructure that hundreds of thousands of businesses depend on today. AWS Elastic Beanstalk is easy to begin and impossible to outgrow. Most existing application containers or platform-as-a-service solutions, while reducing the amount of programming required, significantly diminish developers' flexibility and control. Developers are forced to live with all the decisions pre-determined by the vendor - with little to no opportunity to take back control over various parts of their application's infrastructure. However, with Elastic Beanstalk, you retain full control over the AWS resources powering your application. If you decide you want to take over some (or all) of the elements of their infrastructure, you can do so seamlessly by using Elastic Beanstalk's management capabilities. The first release of Elastic Beanstalk is built for Java developers using the familiar Apache Tomcat (6 or 7) software stack which ensures easy portability for your application. There is no additional charge for Elastic Beanstalk - you only pay for the AWS resources needed to store and run your applications. Supports Tomcat 6 and 7 containersQuirkus by Coucho – interprets PHP and bundlesPHP Fog and Cloud…
Talking PointsTemplate language (JSON)Define resources (e.g., EC2 w/ EBS vols, S3 buckets, SimpleDB domains)Define input/runtime parameters (e.g.. # instances, name of war file to deploy)Resources provisioned in correct order: CloudFormation calculates dependency treeVery powerful way to create stacks for Dev, Test, Stage, etc.NarrativeAWS CloudFormation gives developers and systems administrators an easy way to create a collection of related AWS resources and provision them in an orderly and predictable fashion.Developers can use AWS CloudFormation’s sample templates or create their own templates to describe the AWS resources, and any associated dependencies or runtime parameters, required to run their application. You don’t need to figure out the order in which AWS services need to be provisioned or the subtleties of how to make those dependencies work. CloudFormation takes care of this for you.Different approaches…baking AMIs vs. bootstrapping – trade offsDev Ops course coming in a few months…and that gets into details – 1 day dedicated to configmgmt
The commitment to Amazon Web Services is also shown through the products offered.
The commitment to Amazon Web Services is also shown through the products offered.
Elasticity works from just 1 EC2 instance to many thousands. Just dial up and down as required.