Level 200: Visualize Your Data in Data Lake with AWS Athena and AWS Quicksight
Nowadays, enterprises are building Data Lake which store lots of structured and unstructured data for data analysis. But it takes lots of time for building the data modeling and infrastructure that is required. How to make quick data queries without servers and databases is the next big question for every enterprises.
In this workshop, eCloudvalley, the first and only Premier Consulting Partner in GCR, will demonstrate how to use serverless architecture to visualize your data using Amazon Athena and Amazon Quicksight.
You can easily query and visualize the data in your S3, and get business insights with the combination of these two services. Also, you can also build business reports with other tools such as AWS IoT, Amazon Kinesis Firehose.
Reason to Attend:
Learn how to quickly search for thousands of data on S3 via serverless Amazon's Athena
Learn how to use AWS QuickSight to retrieve information from your database quickly and create detailed reports
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on Workshop
1. Level 200 Hands-on Workshop
Visualize your data in Data Lake with
AWS Athena and AWS Quicksight
Jeff Ng, Solutions Architect
eCloudvalley 27 July, 2017
2. Agenda
About eCloudvalley
Overview of Amazon Athena?
Overview of Amazon QuickSight?
Athena+QuickSight vs ELK
Demo
Lab
4. Create a New Cloud Standard by
becoming
TRUE CLOUD EXPERT.
10 X All 5-Cert Engineers
90+ AWS Certifications
AWS Authorized Instructor
Microsoft Certified Trainer
100% Focus on the
Largest Cloud- AWS
Born-in-the-cloud
System Integrator
6. The 1st and the Only Premier
Partner in GCR
AWS China
Region
Consulting
Partner
AWS Premier
Consulting
Partner
(1st and the Only
Premier Partner in
GCR)
Marketing &
Commerce
Competency
AWS Audited
Managed
Services
Provider
Mobile
Competency
11. Challenges Customers Faced
Significant amount of work required to analyze data in
Amazon S3
Users often only have access to aggregated data sets
Managing a Hadoop cluster or data warehouse requires
expertise
12. Introducing Amazon Athena
Amazon Athena is an interactive query service that
makes it easy to analyze data directly from Amazon
S3 using Standard SQL
13. Athena is Serverless
No Infrastructure or
administration
Zero Spin up time
Transparent upgrades
14. Amazon Athena is Easy To Use
Log into the Console
Create a table
Type in a Hive DDL Statement
Use the console Add Tablewizard
Start querying
15. Amazon Athena is Highly Available
You connect to a service endpoint or log into the console
Athena uses warm compute pools across multiple
Availability Zones
Your data is in Amazon S3, which is also highly available
and designed for 99.999999999% durability
16. Query Data Directly from Amazon S3
No loading of data
Query data in its raw format
Text, CSV, JSON, weblogs, AWS service logs
Convert to an optimized form like ORC or Parquet for the best
performance and lowest cost
No ETL required
Stream data from directly from Amazon S3
Take advantage of Amazon S3 durability and availability
17. Use ANSI SQL
Start writing ANSI SQL
Support for complex joins, nested
queries & window functions
Support for complex data types
(arrays, structs)
Support for partitioning of data by
any key
(date, time, custom keys)
e.g., Year, Month, Day, Hour or
Customer Key, Date
18. Familiar Technologies Under the Covers
Used for SQL Queries
In-memory distributed query engine
ANSI-SQL compatible with extensions
Used for DDL functionality
Complex data types
Multitude of formats
Supports data partitioning
19. Amazon Athena Supports Multiple Data
Formats
Text files, e.g., CSV, raw logs
Apache Web Logs, TSV files
JSON (simple, nested)
Compressed files
Columnar formats such as Apache Parquet &Apache ORC
AVRO support
20. Amazon Athena is Fast
Tuned for performance
Automatically parallelizes
queries
Results are streamed to console
Results also stored in S3
Improve Query performance
Compress your data
Use columnar formats
21. Amazon Athena is Cost Effective
Pay per query
$5 per TB scanned from S3
DDL Queries and failed queries are free
Save by using compression, columnar formats, partitions
25. A Sample Pipeline
Ad-hoc access to raw data usingSQL Athena can query
aggregated datasets as well
26. Summary
No ETL required. No loading of data. Query data where it lives
Query data at whatever latitude and longitude you want
No infrastructure to manage
33. Using Amazon Athena with Amazon QuickSight
AmazonS3
AmazonRDS
AmazonRedshift
AmazonAthena
QuickSight allows you to connect to data from a wide variety of AWS, third-party, and on-
premises sources including AmazonAthena
35. What are the big data challenges
our customers face?
36. Who are my top customers and what are they buying?
Which devices are showing time for maintenance?
What is my product profitability by region?Why is my most profitable region not growing?
How much inventory do I have?
Has my fraud account expense
increased?
How is my marketing campaign performing?
How is my employee satisfaction
trending?
Lots of data
Lots and lots of questions
Few insights
37. Old-guard BI
Costs too much
Pay $ million before seeing first analysis
3 year TCO $150 to $250 per user per month
Takes too long
Spend 6 to 12 months of consulting
and software implementation time
42. QuickSightAPI
Data prep Metadata SuggestionsConnectors SPICE
Business User Business User
QuickSight UI
Mobile devices Web browsers
Partner BI products
Amazon
S3
Amazon
Kinesis
Amazon
DynamoDB
Amazon
EMR
Amazon
Redshift
Amazon RDSFiles Apps
Direct connect
JDBC/ODBC
On-premises data
Athena
43. I have multiple datasets both on-premises and on
AWS from different sources, and I need to make
data available and enable access by using
Amazon QuickSight.
How do I do this?
44. 1. Data made available in “data lakes” using
Amazon S3 or
Amazon Redshift
2. Data access managed with bucket- or schema-
level policies
3. Data enabled by using Amazon QuickSight
45. Amazon EMR
or Apache
Hadoop
Log files,
applicationAPI
extracts
On-premisesdata
Amazon
Redshift
Amazon
DynamoDB or EC2
based MongoDB,
Cassandra
Amazon
S3
Data made
available in
data lakes
QuickSight
Mobile devices Web browsers
Bucket- or
schema-level
permissions by
user and data
access needs
Data access
managed at
the data lake
Data enabled
by user
in data marts
46. Easy exploration of AWS data
Securely discover and connect to AWS data
Quickly explore AWS data sources
• Relational databases
• NoSQL databases
• Amazon EMR, Amazon S3, files
• Streaming data sources
Easily import data from any table or file
Automatic detection of data types
47. Intuitive visualizations with AutoGraph
• Automatic detection of data types
• Optimal query generation
• Appropriate graph type selection
• Ability to customize the graph type
• Very fast response
48. Native mobile experience
• iOS,Android
• Full experience on tablets
• Consumption experience on smart phones
• Very fast response
49. Tell a story with your data
• Capture the critical snapshot of analysis
• Build a sequence of analysis
• Share it securely
• Enable interactive exploration
• Very fast response
50. Advantage of Amazon QuickSight
Fast to get started Fast insights with SPICEEasily explore any AWS data
Easy to use and share Effortless scale Low cost
51. Amazon QuickSight pricing
Standard edition Enterprise edition
Subscription Annual Monthly Annual Monthly
Price per user per month
$9 $12 $18 $24
SPICE Capacity (GB)* 10 10 10 10
Additional SPICE
GB-month
$0.25 $0.38
* Per user SPICE capacity is pooled across all users in an account. As an example, acustomer
with 100 user subscriptions will get 1,000 GB of SPICE capacity for theaccount.
60. Athena+Quickview vs. ELK stack
60
Athena (Serverless) ELK Stack
Response Time Inter-active partial results Few seconds
Pre-process time Almost 0 Few Seconds to Minutes
Query String SQL Structured queries
Infrastructure No infra Logstash & Kibana :
EC2
ElastiSearch :
Managed Service
Management effort Low Medium
Input Format CSV、JSON、ORC、Apache
Parquet 和 Avro
JSON 或 XML
Price model Charge by Scanned data Charge by
Infrastructure
Index doc
Front-end QuickSight Kibana