SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Steffen Krause
Amazon Redshift
Technology Evangelist
@AWS_Aktuell
skrause@amazon.de
Data warehousing done the AWS way
• No upfront costs, pay as you go
• Really fast performance at a really low price
• Open and flexible with support for popular tools
• Easy to provision and scale up massively
We set out to build…
A fast and powerful, petabyte-scale data warehouse that is:
Delivered as a managed service
A Lot Faster
A Lot Cheaper
A Lot Simpler
Amazon Redshift
Amazon Redshift dramatically reduces I/O
ID Age State
123 20 CA
345 25 WA
678 40 FL
Row storage Column storage
Scan
Direction
Amazon Redshift uses SQL
• Industry standard SQL
• ODBC and JDBC driver to access data (using Postgres 8.x drivers)
– Most PostgreSQL features supported
– See documentation for differences
• INSERT/UPDATE/DELETE are supported
– But loading data using COPY from S3 or DynamoDB is significantly faster
– Use VACUUM after significant number of deletes or update
Amazon Redshift loads data from S3 or DynamoDB
• Direct load from S3 or DynamoDB is supported
copy customer from 's3://mybucket/customer.txt’
credentials 'aws_access_key_id=<your-access-key-id>;
aws_secret_access_key=<your-secret-access-key>’
gzip delimiter '|’;
• Load Data in Parallel
– Split data into multiple files for parallel load
– Name files with a common prefix
• customer.txt.1, customer.txt.2, …
– Compress large datasets with gzip
• Load data in sort key order if possible
Amazon Redshift automatically compresses your data
• Compress saves space and reduces disk I/O
• COPY automatically analyzes and compresses
your data
– Samples data; selects best compression encoding
– Supports: byte dictionary, delta, mostly n, run
length, text
• Customers see 4-8x space savings with real data
– 20x and higher possible based on data set
• ANALYZE COMPRESSION to see details
analyze compression listing;
Table | Column | Encoding
---------+----------------+----------
listing | listid | delta
listing | sellerid | delta32k
listing | eventid | delta32k
listing | dateid | bytedict
listing | numtickets | bytedict
listing | priceperticket | delta32k
listing | totalprice | mostly32
listing | listtime | raw
Amazon Redshift architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via Amazon S3
– Parallel load from Amazon DynamoDB
• Single node version available
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
jdbc:postgresql://mycluster.c7lp0qs37f41.us-east-1.redshift.amazonaws.com:8192/mydb
Amazon Redshift runs on optimized hardware
HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate
HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage
• Optimized for I/O intensive workloads
• High disk density
• Runs in HPC - fast network
• HS1.8XL available on Amazon EC2
Amazon Redshift parallelizes and distributes everything
• Query
• Load
• Backup
• Restore
• Resize
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
Amazon Redshift lets you start small and grow big
Extra Large Node (HS1.XL)
3 spindles, 2 TB, 16 GB RAM, 2 cores
Single Node (2 TB)
Cluster 2-32 Nodes (4 TB – 64 TB)
Eight Extra Large Node (HS1.8XL)
24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
Note: Nodes not to scale
Amazon Redshift is priced to let you analyze all your data
Price Per Hour for HS1.XL
Single Node
Effective Hourly Price Per
TB
Effective Annual Price
per TB
On-Demand $ 0.850 $ 0.425 $ 3,723
1 Year Reservation $ 0.500 $ 0.250 $ 2,190
3 Year Reservation $ 0.228 $ 0.114 $ 999
Simple Pricing
Number of Nodes x Cost per Hour
No charge for Leader Node
No upfront costs
Pay as you go
Amazon Redshift is easy to use
• Provision in minutes
• Monitor query performance
• Point and click resize
• Built in security
• Automatic backups
Provision a data warehouse in minutes
Provision a data warehouse in minutes
Provision a data warehouse in minutes
Provision a data warehouse in minutes
Monitor query performance
Monitor query performance
Point and click resize
Resize your cluster while remaining online
• New target provisioned in the background
• Only charged for source cluster
Resize your cluster while remaining online
• Fully automated
– Data automatically redistributed
• Read only mode during resize
• Parallel node-to-node data copy
• Automatic DNS-based endpoint cutover
• Only charged for one cluster
Amazon Redshift has security built-in
• SSL to secure data in transit
• Encryption to secure data at rest
– AES-256; hardware accelerated
– All blocks on disks and in Amazon S3
encrypted
• No direct access to compute nodes
• Amazon VPC support
10 GigE
(HPC)
Ingestion
Backup
Restore
Customer VPC
Internal
VPC
JDBC/ODBC
Amazon Redshift continuously backs up your data and
recovers from failures
• Replication within the cluster and backup to Amazon S3 to maintain multiple copies of
data at all times
• Backups to Amazon S3 are continuous, automatic, and incremental
– Designed for eleven nines of durability
• Continuous monitoring and automated recovery from failures of drives and nodes
• Able to restore snapshots to any Availability Zone within a region
Amazon Redshift integrates with multiple data sources
Amazon
DynamoDB
Amazon Elastic
MapReduce
Amazon Simple
Storage Service (S3)
Amazon Elastic
Compute Cloud (EC2)
AWS Storage
Gateway Service
Corporate
Data Center
Amazon Relational
Database Service
(RDS)
Amazon
Redshift
More coming soon…
Amazon Redshift provides multiple data loading options
• Upload to Amazon S3
• AWS Import/Export
• AWS Direct Connect
• Work with a partner
Data Integration Systems Integrators
More coming soon…
Amazon Redshift works with your existing analysis tools
JDBC/ODBC
Amazon Redshift
More coming soon…
SkillPages
Customer Use Case
Everyone Needs
Skilled People
At Home
At Work
In Life
Repeatedly
Data Architecture
Data Analyst
Raw Data
Get Data
Join via Facebook
Add a Skill Page
Invite Friends
Web Servers Amazon S3
User Action Trace Events
EMR
Hive Scripts Process Content
• Process log files with regular
expressions to parse out the
info we need.
• Processes cookies into useful
searchable data such as
Session, UserId, API Security
token.
• Filters surplus info like internal
varnish logging.
Amazon S3
Aggregated Data
Raw Events
Internal Web
Excel Tableau
Amazon Redshift
Resources & Questions
• Steffen Krause | skrause@amazon.de | @AWS_Aktuell
• http://aws.amazon.com/redshift
• Getting Started Guide: http://docs.aws.amazon.com/redshift/latest/gsg/welcome.html
• Setting Up SQL Workbench/J:
http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-using-workbench.html
• SQL Reference:
http://docs.aws.amazon.com/redshift/latest/dg/cm_chap_SQLCommandRef.html
• Client Tools:
• https://aws.amazon.com/marketplace/redshift/
• https://www.jaspersoft.com/webinar-AWS-Agile-Reporting-and-Analytics-in-the-Cloud
Thank you!

Weitere ähnliche Inhalte

Mehr von AWS Germany

Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureAWS Germany
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopAWS Germany
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWSAWS Germany
 
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS AWS Germany
 
AWS Programme fĂĽr Nonprofits
AWS Programme fĂĽr NonprofitsAWS Programme fĂĽr Nonprofits
AWS Programme fĂĽr NonprofitsAWS Germany
 
Microservices and Data Design
Microservices and Data DesignMicroservices and Data Design
Microservices and Data DesignAWS Germany
 
Serverless vs. Developers – the real crash
Serverless vs. Developers – the real crashServerless vs. Developers – the real crash
Serverless vs. Developers – the real crashAWS Germany
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceAWS Germany
 
Secret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultSecret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultAWS Germany
 
EKS Workshop
 EKS Workshop EKS Workshop
EKS WorkshopAWS Germany
 
Scale to Infinity with ECS
Scale to Infinity with ECSScale to Infinity with ECS
Scale to Infinity with ECSAWS Germany
 
Containers on AWS - State of the Union
Containers on AWS - State of the UnionContainers on AWS - State of the Union
Containers on AWS - State of the UnionAWS Germany
 
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailDeploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailAWS Germany
 
Building Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to ProductBuilding Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to ProductAWS Germany
 
Introduction to AWS Amplify and the Amplify CLI Toolchain
Introduction to AWS Amplify and the Amplify CLI ToolchainIntroduction to AWS Amplify and the Amplify CLI Toolchain
Introduction to AWS Amplify and the Amplify CLI ToolchainAWS Germany
 
Savings on scale - Spot Instances with Autospotter
Savings on scale - Spot Instances with AutospotterSavings on scale - Spot Instances with Autospotter
Savings on scale - Spot Instances with AutospotterAWS Germany
 
Is Platform Engineering the new Ops?
Is Platform Engineering the new Ops?Is Platform Engineering the new Ops?
Is Platform Engineering the new Ops?AWS Germany
 
Managing AWS Accounts at Scale
Managing AWS Accounts at ScaleManaging AWS Accounts at Scale
Managing AWS Accounts at ScaleAWS Germany
 
IoT: Detect abnormal device behavior and disconnect devices automatically
IoT: Detect abnormal  device behavior  and disconnect  devices automaticallyIoT: Detect abnormal  device behavior  and disconnect  devices automatically
IoT: Detect abnormal device behavior and disconnect devices automaticallyAWS Germany
 
Introduction to AWS IoT
Introduction to  AWS IoTIntroduction to  AWS IoT
Introduction to AWS IoTAWS Germany
 

Mehr von AWS Germany (20)

Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS
 
AWS Programme fĂĽr Nonprofits
AWS Programme fĂĽr NonprofitsAWS Programme fĂĽr Nonprofits
AWS Programme fĂĽr Nonprofits
 
Microservices and Data Design
Microservices and Data DesignMicroservices and Data Design
Microservices and Data Design
 
Serverless vs. Developers – the real crash
Serverless vs. Developers – the real crashServerless vs. Developers – the real crash
Serverless vs. Developers – the real crash
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performance
 
Secret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultSecret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s Vault
 
EKS Workshop
 EKS Workshop EKS Workshop
EKS Workshop
 
Scale to Infinity with ECS
Scale to Infinity with ECSScale to Infinity with ECS
Scale to Infinity with ECS
 
Containers on AWS - State of the Union
Containers on AWS - State of the UnionContainers on AWS - State of the Union
Containers on AWS - State of the Union
 
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailDeploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
 
Building Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to ProductBuilding Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to Product
 
Introduction to AWS Amplify and the Amplify CLI Toolchain
Introduction to AWS Amplify and the Amplify CLI ToolchainIntroduction to AWS Amplify and the Amplify CLI Toolchain
Introduction to AWS Amplify and the Amplify CLI Toolchain
 
Savings on scale - Spot Instances with Autospotter
Savings on scale - Spot Instances with AutospotterSavings on scale - Spot Instances with Autospotter
Savings on scale - Spot Instances with Autospotter
 
Is Platform Engineering the new Ops?
Is Platform Engineering the new Ops?Is Platform Engineering the new Ops?
Is Platform Engineering the new Ops?
 
Managing AWS Accounts at Scale
Managing AWS Accounts at ScaleManaging AWS Accounts at Scale
Managing AWS Accounts at Scale
 
IoT: Detect abnormal device behavior and disconnect devices automatically
IoT: Detect abnormal  device behavior  and disconnect  devices automaticallyIoT: Detect abnormal  device behavior  and disconnect  devices automatically
IoT: Detect abnormal device behavior and disconnect devices automatically
 
Introduction to AWS IoT
Introduction to  AWS IoTIntroduction to  AWS IoT
Introduction to AWS IoT
 

KĂĽrzlich hochgeladen

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

KĂĽrzlich hochgeladen (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

AWS Summit Berlin 2013 - Amazon Redshift

  • 1. Steffen Krause Amazon Redshift Technology Evangelist @AWS_Aktuell skrause@amazon.de
  • 2. Data warehousing done the AWS way • No upfront costs, pay as you go • Really fast performance at a really low price • Open and flexible with support for popular tools • Easy to provision and scale up massively
  • 3. We set out to build… A fast and powerful, petabyte-scale data warehouse that is: Delivered as a managed service A Lot Faster A Lot Cheaper A Lot Simpler Amazon Redshift
  • 4. Amazon Redshift dramatically reduces I/O ID Age State 123 20 CA 345 25 WA 678 40 FL Row storage Column storage Scan Direction
  • 5. Amazon Redshift uses SQL • Industry standard SQL • ODBC and JDBC driver to access data (using Postgres 8.x drivers) – Most PostgreSQL features supported – See documentation for differences • INSERT/UPDATE/DELETE are supported – But loading data using COPY from S3 or DynamoDB is significantly faster – Use VACUUM after significant number of deletes or update
  • 6. Amazon Redshift loads data from S3 or DynamoDB • Direct load from S3 or DynamoDB is supported copy customer from 's3://mybucket/customer.txt’ credentials 'aws_access_key_id=<your-access-key-id>; aws_secret_access_key=<your-secret-access-key>’ gzip delimiter '|’; • Load Data in Parallel – Split data into multiple files for parallel load – Name files with a common prefix • customer.txt.1, customer.txt.2, … – Compress large datasets with gzip • Load data in sort key order if possible
  • 7. Amazon Redshift automatically compresses your data • Compress saves space and reduces disk I/O • COPY automatically analyzes and compresses your data – Samples data; selects best compression encoding – Supports: byte dictionary, delta, mostly n, run length, text • Customers see 4-8x space savings with real data – 20x and higher possible based on data set • ANALYZE COMPRESSION to see details analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
  • 8. Amazon Redshift architecture • Leader Node – SQL endpoint – Stores metadata – Coordinates query execution • Compute Nodes – Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB • Single node version available 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC jdbc:postgresql://mycluster.c7lp0qs37f41.us-east-1.redshift.amazonaws.com:8192/mydb
  • 9. Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage • Optimized for I/O intensive workloads • High disk density • Runs in HPC - fast network • HS1.8XL available on Amazon EC2
  • 10. Amazon Redshift parallelizes and distributes everything • Query • Load • Backup • Restore • Resize 10 GigE (HPC) Ingestion Backup Restore JDBC/ODBC
  • 11. Amazon Redshift lets you start small and grow big Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores Single Node (2 TB) Cluster 2-32 Nodes (4 TB – 64 TB) Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB) Note: Nodes not to scale
  • 12. Amazon Redshift is priced to let you analyze all your data Price Per Hour for HS1.XL Single Node Effective Hourly Price Per TB Effective Annual Price per TB On-Demand $ 0.850 $ 0.425 $ 3,723 1 Year Reservation $ 0.500 $ 0.250 $ 2,190 3 Year Reservation $ 0.228 $ 0.114 $ 999 Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go
  • 13. Amazon Redshift is easy to use • Provision in minutes • Monitor query performance • Point and click resize • Built in security • Automatic backups
  • 14. Provision a data warehouse in minutes
  • 15. Provision a data warehouse in minutes
  • 16. Provision a data warehouse in minutes
  • 17. Provision a data warehouse in minutes
  • 20. Point and click resize
  • 21. Resize your cluster while remaining online • New target provisioned in the background • Only charged for source cluster
  • 22. Resize your cluster while remaining online • Fully automated – Data automatically redistributed • Read only mode during resize • Parallel node-to-node data copy • Automatic DNS-based endpoint cutover • Only charged for one cluster
  • 23. Amazon Redshift has security built-in • SSL to secure data in transit • Encryption to secure data at rest – AES-256; hardware accelerated – All blocks on disks and in Amazon S3 encrypted • No direct access to compute nodes • Amazon VPC support 10 GigE (HPC) Ingestion Backup Restore Customer VPC Internal VPC JDBC/ODBC
  • 24. Amazon Redshift continuously backs up your data and recovers from failures • Replication within the cluster and backup to Amazon S3 to maintain multiple copies of data at all times • Backups to Amazon S3 are continuous, automatic, and incremental – Designed for eleven nines of durability • Continuous monitoring and automated recovery from failures of drives and nodes • Able to restore snapshots to any Availability Zone within a region
  • 25. Amazon Redshift integrates with multiple data sources Amazon DynamoDB Amazon Elastic MapReduce Amazon Simple Storage Service (S3) Amazon Elastic Compute Cloud (EC2) AWS Storage Gateway Service Corporate Data Center Amazon Relational Database Service (RDS) Amazon Redshift More coming soon…
  • 26. Amazon Redshift provides multiple data loading options • Upload to Amazon S3 • AWS Import/Export • AWS Direct Connect • Work with a partner Data Integration Systems Integrators More coming soon…
  • 27. Amazon Redshift works with your existing analysis tools JDBC/ODBC Amazon Redshift More coming soon…
  • 28.
  • 29. SkillPages Customer Use Case Everyone Needs Skilled People At Home At Work In Life Repeatedly
  • 30.
  • 31. Data Architecture Data Analyst Raw Data Get Data Join via Facebook Add a Skill Page Invite Friends Web Servers Amazon S3 User Action Trace Events EMR Hive Scripts Process Content • Process log files with regular expressions to parse out the info we need. • Processes cookies into useful searchable data such as Session, UserId, API Security token. • Filters surplus info like internal varnish logging. Amazon S3 Aggregated Data Raw Events Internal Web Excel Tableau Amazon Redshift
  • 32. Resources & Questions • Steffen Krause | skrause@amazon.de | @AWS_Aktuell • http://aws.amazon.com/redshift • Getting Started Guide: http://docs.aws.amazon.com/redshift/latest/gsg/welcome.html • Setting Up SQL Workbench/J: http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-using-workbench.html • SQL Reference: http://docs.aws.amazon.com/redshift/latest/dg/cm_chap_SQLCommandRef.html • Client Tools: • https://aws.amazon.com/marketplace/redshift/ • https://www.jaspersoft.com/webinar-AWS-Agile-Reporting-and-Analytics-in-the-Cloud