SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Welcome to Big Data Day @ AWS NY Loft
Today’s Presentations
10:00 AM – 10:50 AM : Big Data Architectural Patterns and
Best Practices on AWS
11:00AM – 11:50 AM : Spark and the Hadoop Ecosystem
12:00 PM – 01:00 PM : Lunch Break
01:00 PM – 01:50 PM : Data Warehousing in the Era of Big
Data
02:00 PM – 02:50 PM : Introduction to Amazon Athena
03:00 PM – 05:30 PM : Workshop: Querying and Analyzing
Data in S3
Introduction to Amazon Athena
Liam Morrison
Solutions Architect, Amazon Web Services
What to Expect from the Session
Introduction to Serverless
Big data challenges
Introduction to Athena
Demo
Customer Use Cases
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
S3
Data Catalog
AthenaEMR Redshift
Spectrum
Amazon ML / MXNet
RDS
QuickSight
Kinesis
Database
Migration
Service
Glue
Amazon Analytics End to End Architecture
IAM
Other
Sources
Cloud Services Evolution
Virtual
machines
Managed
services
Serverless
Serverless
No servers to provision
or manage
Scales with usage
Never pay for idle Availability and fault
tolerance built in
Serverless characteristics
1990 2000 2010 2020
Generated Data
Available for Analysis
Sources:
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Data Volume
Year
The Dark Data Problem
Most generated data is unavailable for analysis
Amazon Athena
Amazon Athena is an interactive query service
that makes it easy to analyze data directly from
Amazon S3 using Standard SQL
Value Proposition
• Decouple storage from compute
• Serverless – No infrastructure or resources to manage
• Pay only for data scanned
• Schema on read – Same data, many views
• Secure – IAM for authentication; Encryption at rest
• Standard compliant and open storage formats
• Built on powerful community supported OSS solutions
Athena is Serverless
• No Infrastructure or
administration
• Zero Spin up time
• Transparent upgrades
Easy To Use
• Log into the Console
• Create a table
• Type in a Hive DDL Statement
• Use Glue Data Catalog
• Use the console Add Table wizard
• Start querying
Highly Available
• You connect to a service endpoint or log into the console
• Athena uses warm compute pools across multiple
Availability Zones
• Your data is in Amazon S3, which is also highly available
and designed for 99.999999999% durability
Query Data Directly from Amazon S3
• No loading of data
• Query data in its raw format
• Text, CSV, JSON, weblogs, AVRO, AWS service logs
• Convert to an optimized form like ORC or Parquet for the best
performance and lowest cost
• No ETL required (but it can help!)
• Stream data from directly from Amazon S3
• Take advantage of Amazon S3 durability and availability
Simple Pricing
• DDL operations – FREE
• SQL operations – FREE
• Query concurrency – FREE
• Data scanned - $5 / TB
• Standard S3 rates for storage, requests, and data transfer apply
Familiar Technologies Under the Covers
Used for SQL Queries
In-memory distributed query engine
ANSI-SQL compatible with extensions
Used for DDL functionality
Complex data types
Multitude of formats
Supports data partitioning
Presto SQL
• ANSI SQL compliant
• Complex joins, nested queries &
window functions
• Complex data types (arrays,
structs, maps)
• Partitioning of data by any key
• date, time, custom keys
• Presto built-in functions
Simple Pricing - $5/TB Scanned
• Pay by the amount of data scanned per query
• Ways to save costs
• Compress
• Convert to Columnar format
• Use partitioning
• Free: DDL Queries, Failed Queries
Dataset Size on Amazon S3 Query Run time Data Scanned Cost
Logs stored as Text
files
1 TB 237 seconds 1.15TB $5.75
Logs stored in
Apache Parquet
format*
130 GB 5.13 seconds 2.69 GB $0.013
Savings 87% less with Parquet 34x faster 99% less data scanned 99.7% cheaper
CloudTrail
• CloudTrail tracks all Athena APIs
• Can be queried by Athena
• Includes IAM user and account ID of whom executed queries
• Helpful in tracking query usage
Demo
https://pages.awscloud.com/rs/112-TZM-766/images/EV_athena-
cookbook_Aug-2017.pdf
Athena or Redshift Spectrum (or both)?
• Athena
• Separation of storage and compute
• Good query performance
• Text and optimized file formats – help reduce cost and improve
performance
• Serverless
• Redshift Spectrum
• Best cost/query – Best query performance
• Data Warehouse capabilities
• Join Redshift and S3 data
• Cluster management overhead
AWS Glue Data Catalog
Discover and organize your data
How to connect
Glue data catalog
Manage table metadata through a Hive metastore API or Hive SQL.
Supported by tools like Hive, Presto, Spark etc.
We added a few extensions:
 Search over metadata for data discovery
 Connection info – JDBC URLs, credentials
 Classification for identifying and parsing files
 Versioning of table metadata as schemas evolve and other metadata are
updated
Populate using Hive DDL, bulk import, or automatically through Crawlers.
Data Catalog: Crawlers
Automatically discover new data, extracts schema definitions
• Detect schema changes and version tables
• Detect Hive style partitions on Amazon S3
Built-in classifiers for popular types; custom classifiers using Grok expressions
Run ad hoc or on a schedule; serverless – only pay when crawler runs
Crawlers automatically build your Data Catalog and keep it in sync
Amazon QuickSight
Amazon QuickSight is a Business Analytics Service that lets business users
quickly and easily visualize, explore, and share insights from their data.
Who Is QuickSight For?
DATA
PROFESSIONALS
BUSINESS
PROFESSIONALS
DATA CONSUMERS
Deep Integration with
AWS Data Sources
Amazon RDS,
Aurora
Amazon
Redshift
Amazon
Athena
Amazon S3
Flat Files
Discussing Today
Popular Customer Use Cases
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Log Aggregation
AWS Service Logs
Web Application Logs
Server Logs
S3
Athena
Data Catalog
New File
Trigger
Update table partition
Create partition
on S3
Copy to new
partition
Query data
S3
Lambda
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Log Aggregation with ETL
AWS Service Logs
Web Application Logs
Server Logs
S3
Athena
Data Catalog
Glue
Crawler
Update table partition
Create partition
on S3
Query data
S3
Glue ETL
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-Time Data Collection
S3
Athena
Data Catalog
Real-time events Store partitioned in S3
Trigger Lambda
Update table partition
Query data
Lambda
Kinesis
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Export
S3
Athena
Data Catalog
Database Migration Exported tables in S3
Trigger Lambda
Update table partition
Query data
Lambda
Database Migration
Service
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SaaS Model
S3
Athena
Data Catalog
Query data
Hot data
Warm & cold dataApplication request
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Analytics Reporting*
Athena
Data Catalog
Redshift
Spectrum
EMR
QuickSight
API

Weitere ähnliche Inhalte

Was ist angesagt?

ABCs of AWS: S3
ABCs of AWS: S3ABCs of AWS: S3
ABCs of AWS: S3Mark Cohen
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanAmazon Web Services
 
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...Simplilearn
 
AWS Monitoring & Logging
AWS Monitoring & LoggingAWS Monitoring & Logging
AWS Monitoring & LoggingJason Poley
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceAmazon Web Services
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveCobus Bernard
 
Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Amazon Web Services
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAmazon Web Services
 
Getting Started with AWS Lambda and Serverless
Getting Started with AWS Lambda and ServerlessGetting Started with AWS Lambda and Serverless
Getting Started with AWS Lambda and ServerlessAmazon Web Services
 

Was ist angesagt? (20)

ABCs of AWS: S3
ABCs of AWS: S3ABCs of AWS: S3
ABCs of AWS: S3
 
AWS for Backup and Recovery
AWS for Backup and RecoveryAWS for Backup and Recovery
AWS for Backup and Recovery
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
AWS Cloud Watch
AWS Cloud WatchAWS Cloud Watch
AWS Cloud Watch
 
AWS Lambda Features and Uses
AWS Lambda Features and UsesAWS Lambda Features and Uses
AWS Lambda Features and Uses
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
 
AWS Monitoring & Logging
AWS Monitoring & LoggingAWS Monitoring & Logging
AWS Monitoring & Logging
 
Amazon RDS Deep Dive
Amazon RDS Deep DiveAmazon RDS Deep Dive
Amazon RDS Deep Dive
 
Intro to AWS Lambda
Intro to AWS Lambda Intro to AWS Lambda
Intro to AWS Lambda
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
AWS CloudWatch
AWS CloudWatchAWS CloudWatch
AWS CloudWatch
 
AWS Lake Formation Deep Dive
AWS Lake Formation Deep DiveAWS Lake Formation Deep Dive
AWS Lake Formation Deep Dive
 
Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)Deep Dive on Amazon RDS (Relational Database Service)
Deep Dive on Amazon RDS (Relational Database Service)
 
Deep Dive on AWS Lambda
Deep Dive on AWS LambdaDeep Dive on AWS Lambda
Deep Dive on AWS Lambda
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
 
Getting Started with AWS Lambda and Serverless
Getting Started with AWS Lambda and ServerlessGetting Started with AWS Lambda and Serverless
Getting Started with AWS Lambda and Serverless
 

Ähnlich wie AWS Big Data Day Agenda and Introduction to Amazon Athena

使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
 
Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017David McDaniel
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAmazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.Amazon Web Services
 
Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight - May...
Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight - May...Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight - May...
Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight - May...Amazon Web Services
 
Serverless Big Data Analytics with Amazon Athena and Amazon Quicksight - May ...
Serverless Big Data Analytics with Amazon Athena and Amazon Quicksight - May ...Serverless Big Data Analytics with Amazon Athena and Amazon Quicksight - May ...
Serverless Big Data Analytics with Amazon Athena and Amazon Quicksight - May ...Amazon Web Services
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Amazon Web Services
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개 2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개 Amazon Web Services Korea
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAmazon Web Services
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...Amazon Web Services
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)Amazon Web Services Korea
 
Los Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveLos Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveKevin Epstein
 

Ähnlich wie AWS Big Data Day Agenda and Introduction to Amazon Athena (20)

使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
 
Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017Denver AWS Users' Group meeting - September 2017
Denver AWS Users' Group meeting - September 2017
 
The Best of re:invent 2016
The Best of re:invent 2016The Best of re:invent 2016
The Best of re:invent 2016
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQLAnnouncing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
Announcing Amazon Athena - Instantly Analyze Your Data in S3 Using SQL
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
NEW LAUNCH! Intro to Amazon Athena. Easily analyze data in S3, using SQL.
 
Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight - May...
Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight - May...Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight - May...
Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight - May...
 
Serverless Big Data Analytics with Amazon Athena and Amazon Quicksight - May ...
Serverless Big Data Analytics with Amazon Athena and Amazon Quicksight - May ...Serverless Big Data Analytics with Amazon Athena and Amazon Quicksight - May ...
Serverless Big Data Analytics with Amazon Athena and Amazon Quicksight - May ...
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개 2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
2017 AWS DB Day | Amazon Athena 서비스 최신 기능 소개
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
Replicate & Manage Data Using Managed Databases & Serverless Technologies (DA...
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
 
Los Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep DiveLos Angeles AWS Users Group - Athena Deep Dive
Los Angeles AWS Users Group - Athena Deep Dive
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

AWS Big Data Day Agenda and Introduction to Amazon Athena

  • 1. Welcome to Big Data Day @ AWS NY Loft
  • 2. Today’s Presentations 10:00 AM – 10:50 AM : Big Data Architectural Patterns and Best Practices on AWS 11:00AM – 11:50 AM : Spark and the Hadoop Ecosystem 12:00 PM – 01:00 PM : Lunch Break 01:00 PM – 01:50 PM : Data Warehousing in the Era of Big Data 02:00 PM – 02:50 PM : Introduction to Amazon Athena 03:00 PM – 05:30 PM : Workshop: Querying and Analyzing Data in S3
  • 3. Introduction to Amazon Athena Liam Morrison Solutions Architect, Amazon Web Services
  • 4. What to Expect from the Session Introduction to Serverless Big data challenges Introduction to Athena Demo Customer Use Cases
  • 5. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. S3 Data Catalog AthenaEMR Redshift Spectrum Amazon ML / MXNet RDS QuickSight Kinesis Database Migration Service Glue Amazon Analytics End to End Architecture IAM Other Sources
  • 8. No servers to provision or manage Scales with usage Never pay for idle Availability and fault tolerance built in Serverless characteristics
  • 9. 1990 2000 2010 2020 Generated Data Available for Analysis Sources: Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares Data Volume Year The Dark Data Problem Most generated data is unavailable for analysis
  • 11. Amazon Athena is an interactive query service that makes it easy to analyze data directly from Amazon S3 using Standard SQL
  • 12. Value Proposition • Decouple storage from compute • Serverless – No infrastructure or resources to manage • Pay only for data scanned • Schema on read – Same data, many views • Secure – IAM for authentication; Encryption at rest • Standard compliant and open storage formats • Built on powerful community supported OSS solutions
  • 13. Athena is Serverless • No Infrastructure or administration • Zero Spin up time • Transparent upgrades
  • 14. Easy To Use • Log into the Console • Create a table • Type in a Hive DDL Statement • Use Glue Data Catalog • Use the console Add Table wizard • Start querying
  • 15. Highly Available • You connect to a service endpoint or log into the console • Athena uses warm compute pools across multiple Availability Zones • Your data is in Amazon S3, which is also highly available and designed for 99.999999999% durability
  • 16. Query Data Directly from Amazon S3 • No loading of data • Query data in its raw format • Text, CSV, JSON, weblogs, AVRO, AWS service logs • Convert to an optimized form like ORC or Parquet for the best performance and lowest cost • No ETL required (but it can help!) • Stream data from directly from Amazon S3 • Take advantage of Amazon S3 durability and availability
  • 17. Simple Pricing • DDL operations – FREE • SQL operations – FREE • Query concurrency – FREE • Data scanned - $5 / TB • Standard S3 rates for storage, requests, and data transfer apply
  • 18. Familiar Technologies Under the Covers Used for SQL Queries In-memory distributed query engine ANSI-SQL compatible with extensions Used for DDL functionality Complex data types Multitude of formats Supports data partitioning
  • 19. Presto SQL • ANSI SQL compliant • Complex joins, nested queries & window functions • Complex data types (arrays, structs, maps) • Partitioning of data by any key • date, time, custom keys • Presto built-in functions
  • 20. Simple Pricing - $5/TB Scanned • Pay by the amount of data scanned per query • Ways to save costs • Compress • Convert to Columnar format • Use partitioning • Free: DDL Queries, Failed Queries Dataset Size on Amazon S3 Query Run time Data Scanned Cost Logs stored as Text files 1 TB 237 seconds 1.15TB $5.75 Logs stored in Apache Parquet format* 130 GB 5.13 seconds 2.69 GB $0.013 Savings 87% less with Parquet 34x faster 99% less data scanned 99.7% cheaper
  • 21. CloudTrail • CloudTrail tracks all Athena APIs • Can be queried by Athena • Includes IAM user and account ID of whom executed queries • Helpful in tracking query usage
  • 23. Athena or Redshift Spectrum (or both)? • Athena • Separation of storage and compute • Good query performance • Text and optimized file formats – help reduce cost and improve performance • Serverless • Redshift Spectrum • Best cost/query – Best query performance • Data Warehouse capabilities • Join Redshift and S3 data • Cluster management overhead
  • 24. AWS Glue Data Catalog Discover and organize your data How to connect
  • 25. Glue data catalog Manage table metadata through a Hive metastore API or Hive SQL. Supported by tools like Hive, Presto, Spark etc. We added a few extensions:  Search over metadata for data discovery  Connection info – JDBC URLs, credentials  Classification for identifying and parsing files  Versioning of table metadata as schemas evolve and other metadata are updated Populate using Hive DDL, bulk import, or automatically through Crawlers.
  • 26. Data Catalog: Crawlers Automatically discover new data, extracts schema definitions • Detect schema changes and version tables • Detect Hive style partitions on Amazon S3 Built-in classifiers for popular types; custom classifiers using Grok expressions Run ad hoc or on a schedule; serverless – only pay when crawler runs Crawlers automatically build your Data Catalog and keep it in sync
  • 28. Amazon QuickSight is a Business Analytics Service that lets business users quickly and easily visualize, explore, and share insights from their data.
  • 29. Who Is QuickSight For? DATA PROFESSIONALS BUSINESS PROFESSIONALS DATA CONSUMERS
  • 30. Deep Integration with AWS Data Sources Amazon RDS, Aurora Amazon Redshift Amazon Athena Amazon S3 Flat Files Discussing Today
  • 32. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Log Aggregation AWS Service Logs Web Application Logs Server Logs S3 Athena Data Catalog New File Trigger Update table partition Create partition on S3 Copy to new partition Query data S3 Lambda
  • 33. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Log Aggregation with ETL AWS Service Logs Web Application Logs Server Logs S3 Athena Data Catalog Glue Crawler Update table partition Create partition on S3 Query data S3 Glue ETL
  • 34. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-Time Data Collection S3 Athena Data Catalog Real-time events Store partitioned in S3 Trigger Lambda Update table partition Query data Lambda Kinesis
  • 35. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Export S3 Athena Data Catalog Database Migration Exported tables in S3 Trigger Lambda Update table partition Query data Lambda Database Migration Service
  • 36. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SaaS Model S3 Athena Data Catalog Query data Hot data Warm & cold dataApplication request
  • 37. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Analytics Reporting* Athena Data Catalog Redshift Spectrum EMR QuickSight API