The Power of Big Data - AWS Summit Bahrain 2017

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pratim Das – Specialist, Solutions Architect
Analytics, EME
25th September 2017
The Power of Big Data

Big Data on AWS
Immediate Availability. Deploy instantly. No hardware to
procure, no infrastructure to maintain & scale.
Trusted & Secure. Designed to meet the strictest
requirements. Continuously audited, including certifications
such as ISO 27001, FedRAMP, DoD CSM, and PCI DSS.
Broad & Deep Capabilities. Over 100 services and 100s
of features to support virtually any big data application &
workload.
Hundreds of Partners & Solutions. Get help from a
consulting partner or choose from a multitude of tools and
applications across the entire data management stack.

Storage & Streams
Catalogue & Search
Entitlements
API & UI
Attributes of a Modern
Data Architecture
Key Pillars of a
Data Lake
Key Components of a Successful Data Strategy

Building a Data Strategy on AWS
Kinesis Firehose
1
2
3
4
5
6
Athena
Query Service
7
8
Glue
Batch
9
10

Processing Data for Analytics
on your data lake

Broad Set of Analytics Capabilities

Processing & Analytics
Transactional &
RDBMS
DynamoDB
NoSQL DB Relational Database
Aurora
BI & Data Visualization
Kinesis Streams
& Firehose
Batch
EMR
Hadoop, Spark,
Presto
Redshift
Data Warehouse
Athena
Query Service
AWS Batch
Predictive
Real-time
AWS Lambda
Apache Storm
on EMR
Apache Flink
on EMR
Spark Streaming
on EMR
Elasticsearch
Service
Kinesis Analytics,
Kinesis Streams
EastiCache DAX

How to be successful in serving your
customers/citizens
E*
BI
RT
ML
Amazon EC2
Amazon ECS
AWS Elastic Beanstalk
Amazon Redshift
Amazon EMR
Amazon QuickSight
Amazon Kinesis
Amazon
Elasticsearch
Amazon AI
Spark ML (on Amazon EMR)

Amazon AI
“Inside AWS, we’re excited to lower the costs
and barriers to machine learning and AI so
organizations of all sizes can take advantage of
these advanced techniques”

Model
Training
Inference
in the Cloud
Amazon AI: Building Intelligent Systems, End To End

https://aws.amazon.com/iot-platform/how-it-works/
Internet of Things

Start Querying Instantly
Serverless. No ETL.
Pay Per Query
Only pay for data scanned.
Open. Powerful. Standard
Built on Presto. Runs standard SQL.
Fast. Really Fast
Interactive performance
even for large datasets

Demo:
Running an analytic query
over an exabyte in S3

Lets build an analytic query - #1
An author is releasing the 8th book in her popular series. How
many should we order for Seattle? What were prior first few
day sales?
Lets get the prior books she’s written.
1 Table
2 Filters
SELECT
P.ASIN,
P.TITLE
FROM
products P
WHERE
P.TITLE LIKE ‘%POTTER%’ AND
P.AUTHOR = ‘J. K. Rowling’

day sales?
Lets compute the sales of the prior books she’s written in this
series and return the top 20 values
2 Tables (1 S3, 1 local)
2 Filters
1 Join
2 Group By columns
1 Order By
1 Limit
1 Aggregation
SELECT
P.ASIN,
P.TITLE,
SUM(D.QUANTITY * D.OUR_PRICE) AS SALES_sum
FROM
s3.d_customer_order_item_details D,
products P
WHERE
D.ASIN = P.ASIN AND
P.TITLE LIKE '%Potter%' AND
P.AUTHOR = 'J. K. Rowling' AND
GROUP BY P.ASIN, P.TITLE
ORDER BY SALES_sum DESC
LIMIT 20;

day sales?
series and return the top 20 values, just for the first three days
of sales of first editions
5 Filters
2 Joins
3 Group By columns
1 Order By
1 Limit
1 Aggregation
1 Function
2 Casts
SELECT
P.ASIN,
P.TITLE,
P.RELEASE_DATE,
FROM
asin_attributes A,
products P
WHERE
D.ASIN = P.ASIN AND
P.ASIN = A.ASIN AND
A.EDITION LIKE '%FIRST%' AND
D.ORDER_DAY :: DATE >= P.RELEASE_DATE AND
D.ORDER_DAY :: DATE < dateadd(day, 3, P.RELEASE_DATE)
GROUP BY P.ASIN, P.TITLE, P.RELEASE_DATE
LIMIT 20;

day sales?
series and return the top 20 values, just for the first three days
of sales of first editions in the city of Seattle, WA, USA
8 Filters
3 Joins
4 Group By columns
1 Order By
1 Limit
1 Aggregation
1 Function
2 Casts
SELECT
P.ASIN,
P.TITLE,
R.POSTAL_CODE,
P.RELEASE_DATE,
FROM
asin_attributes A,
products P,
regions R
WHERE
D.ASIN = P.ASIN AND
P.ASIN = A.ASIN AND
D.REGION_ID = R.REGION_ID AND
A.EDITION LIKE '%FIRST%' AND
R.COUNTRY_CODE = ‘US’ AND
R.CITY = ‘Seattle’ AND
R.STATE = ‘WA’ AND
D.ORDER_DAY :: DATE >= P.RELEASE_DATE AND
D.ORDER_DAY :: DATE < dateadd(day, 3, P.RELEASE_DATE)
GROUP BY P.ASIN, P.TITLE, R.POSTAL_CODE, P.RELEASE_DATE
LIMIT 20;

Now let’s run that query over an exabyte of data in S3
Roughly 140 TB of customer item order detail
records for each day over past 20 years.
190 million files across 15,000 partitions in S3.
One partition per day for USA and rest of world.
Need a billion-fold reduction in data processed.
Running this query using a 1000 node Hive cluster
would take over 5 years.*
• Compression ……………..….……..5X
• Columnar file format……….......…10X
• Scanning with 2500 nodes…....2500X
• Static partition elimination…............2X
• Dynamic partition elimination..….350X
• Redshift’s query optimizer……......40X
---------------------------------------------------
Total reduction……….…………3.5B X
* Estimated using 20 node Hive cluster & 1.4TB, assume linear
* Query used a 20 node DC1.8XLarge Amazon Redshift cluster
* Not actual sales data - generated for this demo based on data
format used by Amazon Retail.

Is Amazon Redshift Spectrum useful if I don’t have an exabyte?
Your data will get bigger
• On average, data warehousing volumes grow 10x every 5 years
• The average Amazon Redshift customer doubles data each year
Amazon Redshift Spectrum makes data analysis simpler
• Access your data without ETL pipelines
• Teams using Amazon EMR, Athena & Redshift can collaborate using the same data lake
Amazon Redshift Spectrum improves availability and concurrency
• Run multiple Amazon Redshift clusters against common data
• Isolate jobs with tight SLAs from ad hoc analysis

Alexa, Lex, Recognition, Polly

Object and Scene Detection
Maple
Villa
Plant
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard

Rekognition: Object & Scene Detection

Rekognition: Facial Comparison

Digital Asset Management
Media and Entertainment
Travel and Hospitality
Influencer Marketing
Systems Integration
Digital Advertising
Consumer Storage
Law Enforcement
Public Safety
eCommerce
Education
Rekognition: Use Cases

QuickSight + Amazon S3, Athena
Amazon Athena

What does the customer say?
https://aws.amazon.com/solutions/case-studies/analytics/
https://aws.amazon.com/solutions/case-studies/big-data/

Just Giving Creates a Big Data Platform on AWS
“Before AWS, [we were]
basing decisions on a
single high-level data
source. Now we can extract
much more granular data
based on millions of
donations…and use that
information to provide a
better platform for our
visitors.”
-Richard Atkinson, CIO

UMUC Improves Student Outcomes with Big Data
“Nobody can match
AWS’ product set, scale
and innovation. From an
analytics perspective,
Amazon Redshift is very
disruptive.”
---Darren Catalano, VP of
Analytics

FINRA Analyzes Billions of Transactions Daily
To respond to
rapidly changing
market dynamics,
FINRA, moved 75% of
its operations to
Amazon Web
Services, using AWS
to analyze 75B
records a day.

Fraud Detection
FINRA uses Amazon EMR and Amazon S3 to process up to 75 billion
trading events per day and securely store over 5 petabytes of data,
attaining savings of $10-20mm per year.

AWS is Positioned as a Leader in the Gartner Magic
Quadrant for Data Management Solutions for
Analytics
Gartner, Magic Quadrant for Data Management Solutions for Analytics, February 2017
This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from AWS :
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research
publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of

AWS Named as a Leader in The Forrester
WaveTM: Big Data Warehouse Q2 2017
http://bit.ly/2w1TAEy
On June 15, Forrester published the Big Data
Warehouse, Q2 2017, in which AWS is
positioned as a Leader. According to Forrester,
“With more than 5,000 deployments, Amazon
Redshift has the largest data warehouse
deployments in the cloud.” AWS received the
highest score possible, 5/5, for customer base,
market awareness, ability to execute, road map,
support, and partners. “AWS’s key strengths lie
in its dynamic scale, automated administration,
flexibility of database offerings, good security,
and high availability (HA) capabilities, which
make it a preferred choice for customers.

Vibrant partner community
Data integration Systems integratorsBusiness intelligence

• AWS enables you to build sophisticated data strategies and related
analytics applications
• Retrospective, Real-time, Predictive
• You can build incrementally, adding use cases and increasing scale
as you go
• AWS provides a broad range of security and auditing features to
enable you to meet your security requirements
https://aws.amazon.com/big-data/

• Prescriptive guidance and rapidly deployable solutions.
• Derive Insights from IoT in Minutes using AWS IoT, Amazon
Kinesis Firehose, Amazon Athena, and Amazon QuickSight
• Deploying a Data Lake on AWS
• Harmonize, Search, and Analyze Loosely Coupled Datasets on
AWS with Glue, Athena and QuickSight
• From Data Lake to Data Warehouse: Enhancing Customer 360
with Amazon Redshift Spectrum
• Implement Continuous Integration and Delivery of Apache
Spark Applications using AWS
http://amzn.to/2vHIwBq
http://amzn.to/2i9gqZn
http://bit.ly/2qipA8h
http://amzn.to/2qpiFaK
http://amzn.to/2lpbc8p
Takeaways
https://aws.amazon.com/blogs/big-data/
https://aws.amazon.com/answers/big-data/
http://amzn.to/2gIJcj8

The Power of Big Data - AWS Summit Bahrain 2017

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to The Power of Big Data - AWS Summit Bahrain 2017

Similar to The Power of Big Data - AWS Summit Bahrain 2017 (20)

More from Amazon Web Services

More from Amazon Web Services (20)

The Power of Big Data - AWS Summit Bahrain 2017