SlideShare ist ein Scribd-Unternehmen logo
1 von 113
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Craig Stires
Head of Analytics, Big Data, AI
Asia-Pacific
Modern Data Architectures for
Business Insights at Scale
Today's workshop
2:00pm - 2:15pm Overview on using modern data architectures on AWS
2:15pm - 3:40pm Modern data architectures for business insights at scale
(Includes Live Demos)
3:40pm – 4:00pm Break
4:00pm - 5:15pm Modern data architectures for real-time analytics and
engagement
(Includes Live Demos)
Overview on using modern data architectures on AWS
What is driving the requests for information?
- What information is needed?
- Where does the source data live?
- Freshness - how real-time?
What kind of persona are you serving?
- Measurable business outcome?
- Speed to access / urgency
- UI - interactive vs file vs embedded
- On-demand vs published
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Available for analysis
Generated data
Data volume - Gap
1990 2000 2010 2020
Should we collect "all the data" and see what's in it?
Starting by amassing "all your data" and dumping
into a large repository for the data gurus to start
finding "insights" is like trying to win the lottery
Three big indicators of individual behavior
Purchases Movement Influence
A platform to build business outcomes from data
Purchases
Movement
Influence
Ingest/
Collect
Consume/
visualize
Store
Process/
analyze
1 4
0 9
5
Revenue Lift
Market
acquisition
Customer delight
Brand advocacy
Inventory
optimization
Supply chain
efficiency
...
The AWS Cloud helps remove constraints
Big Data:
• Potentially massive datasets
• Iterative, experimental style of
data manipulation and analysis
• Frequently not a steady-state
workload; peaks and valleys
• Data is a combination of
structured and unstructured
data in many formats
AWS Cloud:
• Virtually unlimited capacity
• Iterative, experimental usage cost
through on-demand
infrastructure
• Fully scalable infrastructure for
highly variable workloads
• Tools & Services for managing
structured, unstructured and
stream data
Starting small is powerful, when you can scale
up fast
Scaling up your analytics systems With AWS Traditional IT *
get a new BI server 20 minutes 3 months
upgrade your analytics server to the
newest Intel processors and add 16GB
memory
15 minutes 2 months
add 500TB of storage instant 2 months
grow a DWH cluster from 8GB to 1PB 1 hour 8 months
build a 1024-node Hadoop cluster 30 minutes unlikely
roll out multi-region production
environment
hours months
* actual provisioning times in a well-organized IT division
Let’s talk business outcomes of data analytics!
Outcome 1 : Modernize and consolidate
• Insights to enhance business applications and
create new digital services
Outcome 2 : Innovate for new revenues
• Personalization, demand forecasting, risk analysis
Outcome 3 : Real-time engagement
• Interactive customer experience, event-driven
automation, fraud detection
Outcome 4 : Automate for expansive reach
• Automation of business processes and physical
infrastructure
Driving Business Outcomes via Data Analytics
Modern data architectures for business insights at scale
Insights to enhance business applications, new digital services
Technology: Backend system integration, on-prem data center extension, business application
integration, BI provisioning, data lakes, external APIs, access control and logging
Common initiatives
Insights: 360 view of the business
• Legacy data systems migration to enable self-service for business analysts
• Integration of all customer data, from orders, payments, interactions
• Supplier performance for inventory and vendor management
Digitization: Web-service that gives on-demand insights
• Delivery of digital content, with behavior tracking, and upsell (or ads)
• Ordering system for enterprise customers or consumers
Data monetization: Enrich, aggregate, and sell business data
• External data enrichment API, including digital marketing platforms
• Purchasable data sets of anonymized, domain-enriched insights
Outcome 1 : Modernize and Consolidate
Suncorp is moving "all-in" on cloud.
Project Ignite will extract benefits of $170 million
- Group CEO Patrick Snowball
Insurance Policy Insurance Claim Core Banking Life Admin
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Enhancing business applications and creating new digital services takes a few
steps. Business goals often consist of being an agile, well-run organization,
and to stop missing opportunities because people are making decisions
without accurate insights. These initiatives are focused on giving important
personas fast and secure access to business-relevant insights.
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
1. Define personas and use case requirements (including UI)
Data analysts
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
2. Locate the data sources that have the information to extract
Data analysts
Fluentd: Open Source Log Collection
https://github.com/fluent/fluentd/
• Fluentd is an open source
data collector to unify data
collection and consumption
• Integration into many data
sources (App Logs, Syslogs,
Twitter etc.)
• Direct integration into AWS
<source>
type tail
format apache2
path /var/log/apache2/access_log
tag s3.apache.access
</source>
<match s3.*.*>
type s3
s3_bucket myweblogs
path logs/
</match>
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
3. Ingest data through incremental or full loads, across secure connections
Data analysts
A single, large system may perform a single task
well, but is often too difficult to adapt and scale
A system that is decoupled can adapt to a fast
moving business, and can scale up and down with
significantly lower barriers
Decouple Storage and Compute
Traditionally analytical workloads
required large databases or data
warehouses, with storage and
compute close to each other
Big Data often benefits from
decoupling storage and compute
Amazon S3 offers virtually unlimited
storage at a per GB/month rate
Amazon
S3
Highly available object storage
99.999999999% data durability
Replicated across 3 facilities
Virtually unlimited scale
Pay only for usage, no pre-provisioning
Event notifications to trigger actions
Amazon
EMR
Fully managed Hadoop
Optimized with S3
Autoscaling for elasticity
Transient and long running clusters
Integration with AWS Spot Market
1 instance x 100 hours = 100 instances x 1 hour
(and with Spot Pricing not only faster but also cheaper)
Amazon EMR
• Amazon EMR supports all common
Hadoop Frameworks such as:
• Spark, Pig, Hive, Hue, Oozie …
• Hbase, Presto, Impala …
• Decouples storage from compute
• Allows independent scaling
• Direct Integration with DynamoDB
and S3
Amazon S3Amazon
DynamoDB
Amazon EMR
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
4. Use Hadoop for large scale ETL, data quality, and preparation [*EMRFS]
AWS Glue
Amazon S3
Raw Data
Amazon EMR
ETL
Data analysts
Amazon S3
Clean Data
AWS
Glue
Managed Transform Engine
Job Scheduler
Data Catalog
Built on Apache Spark
Integrated with S3, RDS, Redshift & any
JDBC-compliant data store
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
5. Stage all data into centralized, highly available, durable storage for further access
AWS Glue
Amazon S3
Raw Data
Data analysts
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Fully managed
MPP SQL database - fully relational
Optimised for analytics
Gigabytes to Petabytes
Less than 1/10th the cost of traditional
Amazon
Redshift
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
6. Load semi-structured into Hadoop, structured into the DWH, and application data
into managed legacy application databases
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
7. Data is protected through identity and access management and logging
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Fast, cloud-powered, BI service
Visualizations and ad-hoc analysis
Connectors for AWS and 3rd party sources
In-memory calculation engine (SPICE)
$9 per user per month
Amazon
QuickSight
AWS Marketplace
• Pre-Configured machine images
ready to be launched into virtual
server instances
• Launch applications with 1-Click
• Pay software licenses by the
hour or bring your own license
(BYOL)
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
8. Data analysts use BI tools of choice to access all serving services
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
9. Business users have enterprise applications enhanced by analytics
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
10. External parties can buy services or data in a governed, secure way
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon
API Gateway
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon
API Gateway
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Personalization, demand forecasting, risk analysis
Technology: Advanced analytics, customer segmentations, high volume transactional data, un/semi-
structured data, design of experiment, A/B & hypothesis testing, machine learning
Common initiatives
Personalization: Refine market approaches based on optimal segments
• Offer products to new customers based on clusters of similar individuals
• Launch share of wallet initiatives, understanding likely total spend
• Targeted marketing to capture interests and increase conversion rates
Predict demand: Guide business owners to select the best scenarios
• Launch items or promotions at the optimal time to maximize response
• Modeling for store assortment, product selection, and merchandizing
• New product design, based on known market propensities
Risk measurement: Create freedom to act by quantifying exposures
• Scenario simulation to encourage investments and new offerings
• Supply chain analytics allows for faster confirmation of goods to customers
Outcome 2 : Innovate for new revenues
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Driving net new revenues is realized by business teams that have access to
skilled analysts, using platforms that can scale up and out, without IT
bottlenecks. Organizations start operating based on what they know about
their customers, and can approach new ventures in terms of confidence
levels. Product launches, campaigns, supply chain management, packaged
services, and customized offerings are designed and executed based on
predictive models.
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
AWS
Cloud TrailAWS IAM
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Engagement platforms
AWS KMS
1. Personas involved in generating new revenues are data scientists, data
analysts (often embedded), business users, and customers/suppliers
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Direct
Connect
Data analysts
Data scientists
Business users
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
2. Advanced analytics are built from a base of traditional data processing
Amazon EMR
Amazon RedShift
Amazon RDS
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Direct
Connect
Data analysts
Data scientists
Business users
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
3. On-premise storage and databases are connected and converted
Amazon EMR
Amazon RedShift
Amazon RDS
AWS Database
Migration Service
AWS Storage
Gateway
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Direct
Connect
Internet
Interfaces
Data analysts
Data scientists
Business users
Web logs /
cookies
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
4. Internet-native data sources, like web and mobile, are captured
Amazon EMR
Amazon RedShift
Amazon RDS
AWS Database
Migration Service
AWS Storage
Gateway
Stream in Real Time: Amazon Kinesis
• Real-Time Data Processing over
large distributed streams
• Elastic capacity that scales to
millions of events per second
• React In real-time upon incoming
stream events
• Reliable stream storage
replicated across 3 facilities
Amazon Kinesis
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
5. Streaming un/semi-structured data feeds, like social and devices are
captured
Amazon EMR
Amazon RedShift
Amazon RDS
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
6. Log files and other schemaless data converted to Parquet and staged
Amazon EMR
Amazon RedShift
Amazon RDS
Interactive query service to analyze data
in Amazon S3 directly using standard SQL
No need to move data
No infrastructure to setup & manage
Fast -- results within seconds
Pay for only the queries you run
Amazon
Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
7. Data analysts explore and visualize un/semi-structured data
Amazon RedShift
Amazon RDS
Amazon Athena
Amazon Machine Learning
• Easy to use, managed machine
learning service built for developers
• Machine learning technology based
on Amazon’s internal systems
• Create models using data stored in
Amazon S3, Amazon RDS or Amazon
Redshift
• Request predictions on batch or real-
time
Amazon Machine
Learning
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine Learning
Amazon S3
Schemaless
AWS Glue
8. Simple analytical models are built against Amazon Machine Learning
Amazon EMR
Amazon RedShift
Amazon RDS
Amazon ElasticSearch
Amazon Athena
Apache Spark
• In-memory analytics cluster using RDD
(Resilient Distributed Dataset) for fast
processing
• Spark MLlib offers machine learning out of the box
• Apache Spark can read directly from Amazon S3
data = sc.textFile("s3://...")
parsedData = data.map(lambda line: array([float(x) for x in line.split(' ')]))
model = KMeans.train(parsedData, 2, maxIterations=10, initializationMode="random")
model.save(sc, "MyModel")
sameModel = KMeansModel.load(sc, "MyModel")
Intel® Processor Technologies
Intel® AVX – Dramatically increases performance for highly parallel HPC workloads
such as life science engineering, data mining, financial analysis, media processing
Intel® AES-NI – Enhances security with new encryption instructions that reduce the
performance penalty associated with encrypting/decrypting data
Intel® Turbo Boost Technology – Increases computing power with performance that
adapts to spikes in workloads
Intel Transactional Synchronization (TSX) Extensions – Enables execution of
transactions that are independent to accelerate throughput
P state & C state control – provides granular performance tuning for cores and sleep
states to improve overall application performance
New X1 Instance - Tons of Memory
• Designed for large-scale, in-memory
applications in the cloud
• Ideal for in-memory databases like SAP
HANA and big data processing apps like
Spark and Presto
• Powered by Intel® Xeon® E7 8880 v3
Haswell processors
• Features up to 2TB of memory and up to
128 vCPUs per instance
• 8X the memory offered by any other Amazon EC2
instance
Machine Learning Algorithms
• Classification
• Sentiment analysis – Do people like my new product?
• Linear Regression
• Trend prediction – How much revenue next month?
• Clustering
• Recommendation - Other people bought this!
• Association
• Market basket analysis – Bundled products
• Neural Networks
• Pattern recognition - Speech recognition
Amazon Machine
Learning
Amazon EMR +
Spark Mlib
GPU Optimized
EC2 Instance
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
9. Complex analytical models are built against EMR (Spark) clusters
Amazon EMR
Amazon RedShift
Amazon RDS
Amazon ElasticSearchAmazonML
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
10. Deep learning models are built against mxnet clusters
Amazon EMR
Amazon RedShift
Amazon RDS
Amazon ElasticSearch
Deep Learning
AmazonML
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
11. Predictive models and scored datasets are published to data staging
Amazon EMR
Amazon RedShift
Amazon RDS
Amazon ElasticSearchAmazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
12. Analysts use DWH, EMR, ES to find patterns & measure performance
Amazon RedShift
Amazon RDS
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
13. Risk models evaluated to create new products and assess customers
Amazon RDS
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
14. Demand forecasts loaded into supply chain management systems
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
Amazon SNS & Amazon Pinpoint
• Amazon SNS is a fully
managed, cross-platform
mobile push intermediary
service
• Fully scalable to millions
of devices
• Amazon Pinpoint allows
to created targeted
campaigns and measure
engagement and results
Amazon SNS
Apple APNS
Google GCM
Amazon ADM
Windows WNS and
MPNS
Baidu CP
Android Phones and Tablets
Apple iPhones and iPads
Kindle Fire Devices
Android Phones and Tablets in China
iOS
Windows Phone Devices
Amazon
SNS
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
15. Personalized offers are broadcast out over notification channels
Amazon SNS
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
Amazon Pinpoint
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
Amazon SNS
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Amazon Pinpoint
See it live in action!
BREAK
Next up: Real-Time Analytics and Engagement
Modern data architectures for real-time analytics and engagement
Interactive customer experience, event-driven automation, fraud detection
Technology: Clickstream/mobile apps/sensor/video (computer vision)/audio (intent comprehension), event
detection and pipelining, in-line scoring, serverless compute, computer vision, deep learning
Common initiatives
Interactive CX: Natural customer journeys with adaptive interfaces
• Behavior-based recommendations, improving personalization along the journey
• Seamless session transfer across UI, from browser to mobile to physical location
• Voice-driven commands, and use of gestures and other natural interfaces
Event-driven automation: Full execution of business process driven by an action
• Order fulfillment, with real-time update notifications to customer
• Fast response to customer complaints/comments over direct or social channels
Fraud detection: Protect customer and business w/ real-time anomaly detection
• Purchase and payment verification, using behavioral models and location assessment
• Application and account opening validation
Outcome 3 : Real-time Engagement
Personalized content
- Account access
- Track spending
- Check balances
- Pay bills
- Prevent fraud
The Power of Speech: Alexa
Alexa, the voice service that powers
Echo, provides capabilities, or skills,
that enable customers to interact with
devices using voice
Alexa Skills Kit (ASK) allows everyone
to build and publish their own skills
Skills can be powered by AWS
Lambda
Automated Speech Recognition (ASR)
Natural Language Processing (NLP)
Alexa Skills Kit (ASK)
Over 80 services, including Core, Security, Database,
Artificial Intelligence, Analytics, Mobile Development
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Provide superior customer service by responding to opportunities in real
time. Fulfill requests for products or services in an automated fashion to
create a strong competitive advantage over those that are unable to.
Assurance becomes a different challenge, when speeds increase, and fraud
prevention must be adaptive and fast. Adding another layer of opportunity and
complexity is the use of vast streams of data from devices that are
measuring location, video, behaviors, environmental conditions, and more.
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
1. Real-time engagement requires personas that develop the analytics,
and platforms for engaging and automating processes
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
2. Real-time systems are built from a base of advanced data processing
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
Amazon
Kinesis
3. Events are pipelined through Kinesis, into multiple streams, at scale
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
Also possible with Spark Streaming!
Amazon
Kinesis
EMR with
Spark Streaming
KinesisUtils.createStream(‘twitter-stream’)
.filter(_.getText.contains(‘Big Data’))
.countByWindow(Seconds(5))
Counting tweets on a sliding window
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
4. Event data is given context and structure in EMR and pushed for batch
Amazon EMR
AWS Glue
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
Amazon Kinesis Firehose
• Fully managed data streaming service to ingest and
capture data into your storage or data warehouse
• Ability to batch load, compress or encrypt streaming
data
• Elastic to scale to any throughput (no more sharding)
• Charged only per GB processed ($0.035 per GB)
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
5. Kinesis Firehose pumps events into a DWH for near real-time analysis
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
AWS Lambda
• Use AWS Lambda to clean and
massage incoming data
• Write code to load data sources
(S3, DynamoDB) automatically in your
data warehouse (e.g. Amazon Redshift)
• React in real-time to incoming events in
Amazon Kinesis
Amazon Lambda
Amazon Redshift
Amazon
Kinesis
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
6. The event is streamed to a scoring server for processing
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
Artificial Intelligence
Unlimited
Replays
Returns an MP3
or audio stream
Lightning Fast
Response
Fully Managed and
Low Cost
Amazon Polly
Turn text into lifelike speech using deep
learning technologies to synthesize
speech that sounds like a human voice
Amazon Polly
“The temperature
in WA is 75°F”
“The temperature
in Washington is 75 degrees
Fahrenheit”
Amazon Polly: Text In, Life-like Speech Out
Amazon Lex
Conversational interfaces for your
applications, powered by the same
Natural Language Understanding
(NLU) & Automatic Speech Recognition
(ASR) models as Alexa
Integrated
development in
AWS console
Trigger AWS
Lambda
functions
Multi-step
conversations
Continually improving
ASR & NLU models
Enterprise
connectors
Fully Managed
Intents
A particular goal that the
user wants to achieve
Utterances
Spoken or typed phrases
that invoke your intent
Slots
Data the user must provide to fulfill the
intent
Prompts
Questions that ask the user to input
data
Fulfillment
The business logic required to fulfill the
user’s intent
BookHotel
Amazon Rekognition
Image Recognitions and Analysis
powered by Deep Learning which
allows to search, verify and organize
millions of images
Easy to use Batch Analysis Real-time
Analysis
Continually Improving Low Cost
Maple
Villa
Plant
Garden
Water
Swimming Pool
Tree
Potted Plant
Backyard
Demographic Data
Facial Landmarks
Sentiment Expressed
Image Quality
Brightness: 25.84
Sharpness: 160
General Attributes
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
7. Language, intent, and image processing are run and sent for scoring
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
8. Simple analytical models are checked on-demand against Amazon ML
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
9. Complex analytical models are scored against coded models (PMML)
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
10. Deep learning models are scored against imported models (eg JSON)
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
11. Scored response to the event is processed to be pushed for action
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
Amazon DynamoDB
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
12. Recommendations are pushed to DynamoDB for low latency serving
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
Amazon SQS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
12. Actions are pushed to RDS and SQS for business process automation
Amazon DynamoDB
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
Amazon DynamoDB
Amazon SQS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
See it live in action!
Automation of self-service, deployment, policy, and quality assurance
Technology: Self-service, on-demand provisioning, DevOps, spot pricing, Cloud Formations, security
automation, performance monitoring (CW&XR), global rollouts
Common initiatives
Self-service:
• Application catalog or portal for all employees, availability determined by role
• Service provisioning backed by automation of policy and governance
Agile development: Use of DevOps to allow very few resources to deploy globally
• CI/CD for software release, build/test, and deployment automation
• Templated infrastructure provisioning, and configuration management
• Business rules and policies are "gold coded" to be used for all deployments
• Use of Security by Design (SbD) to codify network, O/S, and encryption
Comprehensive monitoring: Assurance of SLA and issue remediation
• Logging and monitoring of all API calls and executions to ensure SLAs are met
• Analysis of performance variance for faster root cause analysis
Outcome 4 : Automate for expansive reach
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Automate for expansive reach
Automation of self-service, deployment, policy, and quality assurance
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
Amazon DynamoDB
Amazon SQS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine LearningAmazon EMR
MLlib
Amazon S3
Schemaless
Amazon EMR
AWS Glue
Amazon
Kinesis
AWS DevOps
Amazon Athena
Next steps for you to be the Big Data hero
Sharpen your skills (Singapore)
Attend the official AWS Training course organized by AWS Authorized local
training partner – Bespoke Training Services (www.bespoketraining.com).
Join the AWS Jumpstart (2 hr) session and hear from our customers and
partners on how they enabled their teams and successfully deployed on
AWS. Also stand a chance to win free seat to the above courses.
Point of contact – Gilbert Cheo - gilbert@bespoketraining.com
Courses Date
Architecting on AWS 28 Feb-2 Mar / 14-16 March
System Operations on
AWS
22-24 Feb
Developing on AWS 4-6 April
Big Data on AWS 4-6 April
Date Venue
AWS Singapore, Church Street, Capital Square,
#10-01, Singapore 049481
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
http://bit.ly/summitsg
April 11 – Marina Bay Sands - Singapore
Register Now!
You, the Big Data hero!
Start with the persona
De-couple to scale
Experiment and iterate
Deploy with automation
Thank You!
Next up: Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

February 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSFebruary 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWS
Amazon Web Services
 

Was ist angesagt? (20)

AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
AWS re:Invent 2016: Delighting Customers Through Device Data with Salesforce ...
 
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWSAWS Summit Singapore - Architecting a Serverless Data Lake on AWS
AWS Summit Singapore - Architecting a Serverless Data Lake on AWS
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
BDA309 Building Your Data Lake on AWS
BDA309 Building Your Data Lake on AWSBDA309 Building Your Data Lake on AWS
BDA309 Building Your Data Lake on AWS
 
Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
 
February 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSFebruary 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWS
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWS
 
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksReal-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
 
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPTHow EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Welcome Keynote - AWS Summit Stockholm
Welcome Keynote - AWS Summit Stockholm Welcome Keynote - AWS Summit Stockholm
Welcome Keynote - AWS Summit Stockholm
 
使用 Amazon Lex 在應用程式中建立對話式機器人
使用 Amazon Lex 在應用程式中建立對話式機器人 使用 Amazon Lex 在應用程式中建立對話式機器人
使用 Amazon Lex 在應用程式中建立對話式機器人
 
Getting Started With Amazon Quick Sight
Getting Started With Amazon Quick SightGetting Started With Amazon Quick Sight
Getting Started With Amazon Quick Sight
 
Lessons & Use-Cases at Scale - Dr. Pete Stanski
Lessons & Use-Cases at Scale - Dr. Pete StanskiLessons & Use-Cases at Scale - Dr. Pete Stanski
Lessons & Use-Cases at Scale - Dr. Pete Stanski
 
IDC and AWS Joint Webinar: Getting the most bang for your buck with EC2 Spot -
IDC and AWS Joint Webinar: Getting the most bang for your buck with EC2 Spot - IDC and AWS Joint Webinar: Getting the most bang for your buck with EC2 Spot -
IDC and AWS Joint Webinar: Getting the most bang for your buck with EC2 Spot -
 
Security Requires Visibility-Turn Data Into Security Insight
Security Requires Visibility-Turn Data Into Security InsightSecurity Requires Visibility-Turn Data Into Security Insight
Security Requires Visibility-Turn Data Into Security Insight
 
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
AWS re:Invent 2016: Innovation After Installation: Establishing a Digital Rel...
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Serverless Real Time Analytics
Serverless Real Time AnalyticsServerless Real Time Analytics
Serverless Real Time Analytics
 

Andere mochten auch

Andere mochten auch (20)

MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB  NoSQL database a deep dive -MyWhitePaperMongoDB  NoSQL database a deep dive -MyWhitePaper
MongoDB NoSQL database a deep dive -MyWhitePaper
 
Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017
 
Tracxn Research - Construction Tech Landscape, February 2017
Tracxn Research - Construction Tech Landscape, February 2017Tracxn Research - Construction Tech Landscape, February 2017
Tracxn Research - Construction Tech Landscape, February 2017
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
2015 Internet Trends Report
2015 Internet Trends Report2015 Internet Trends Report
2015 Internet Trends Report
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
2017 iosco research report on financial technologies (fintech)
2017 iosco research report on  financial technologies (fintech)2017 iosco research report on  financial technologies (fintech)
2017 iosco research report on financial technologies (fintech)
 
Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn Research - Insurance Tech Landscape, February 2017
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
Email Marketing Metrics Benchmark Study 2016
Email Marketing Metrics Benchmark Study 2016Email Marketing Metrics Benchmark Study 2016
Email Marketing Metrics Benchmark Study 2016
 
Build an App on AWS for Your First 10 Million Users
Build an App on AWS for Your First 10 Million UsersBuild an App on AWS for Your First 10 Million Users
Build an App on AWS for Your First 10 Million Users
 
Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...
Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...
Salesforce Marketing Cloud Training | Salesforce Training For Beginners - Mar...
 
Comparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statementsComparing 30 MongoDB operations with Oracle SQL statements
Comparing 30 MongoDB operations with Oracle SQL statements
 
Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017
 
Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017
 
Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552
 
Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn Research - Healthcare Analytics Landscape, February 2017Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn Research - Healthcare Analytics Landscape, February 2017
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
Industrial internet of things (IIOT) - special report-2017
Industrial internet of things (IIOT) - special report-2017Industrial internet of things (IIOT) - special report-2017
Industrial internet of things (IIOT) - special report-2017
 
K8S in prod
K8S in prodK8S in prod
K8S in prod
 

Ähnlich wie Modern Data Architectures for Business Outcomes

SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
Amazon Web Services
 

Ähnlich wie Modern Data Architectures for Business Outcomes (20)

Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100Driving Business Outcomes with a Modern Data Architecture - Level 100
Driving Business Outcomes with a Modern Data Architecture - Level 100
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your DataMongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
MongoDB World 2019: re:Innovate from Siloed to Deep Insights on Your Data
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
 
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
 
Amazon Web Services
Amazon Web ServicesAmazon Web Services
Amazon Web Services
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料
 
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Modern Data Architectures for Business Outcomes

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Craig Stires Head of Analytics, Big Data, AI Asia-Pacific Modern Data Architectures for Business Insights at Scale
  • 2. Today's workshop 2:00pm - 2:15pm Overview on using modern data architectures on AWS 2:15pm - 3:40pm Modern data architectures for business insights at scale (Includes Live Demos) 3:40pm – 4:00pm Break 4:00pm - 5:15pm Modern data architectures for real-time analytics and engagement (Includes Live Demos)
  • 3. Overview on using modern data architectures on AWS
  • 4. What is driving the requests for information? - What information is needed? - Where does the source data live? - Freshness - how real-time? What kind of persona are you serving? - Measurable business outcome? - Speed to access / urgency - UI - interactive vs file vs embedded - On-demand vs published
  • 5. Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares Available for analysis Generated data Data volume - Gap 1990 2000 2010 2020 Should we collect "all the data" and see what's in it?
  • 6. Starting by amassing "all your data" and dumping into a large repository for the data gurus to start finding "insights" is like trying to win the lottery
  • 7. Three big indicators of individual behavior Purchases Movement Influence
  • 8. A platform to build business outcomes from data Purchases Movement Influence Ingest/ Collect Consume/ visualize Store Process/ analyze 1 4 0 9 5 Revenue Lift Market acquisition Customer delight Brand advocacy Inventory optimization Supply chain efficiency ...
  • 9. The AWS Cloud helps remove constraints
  • 10. Big Data: • Potentially massive datasets • Iterative, experimental style of data manipulation and analysis • Frequently not a steady-state workload; peaks and valleys • Data is a combination of structured and unstructured data in many formats AWS Cloud: • Virtually unlimited capacity • Iterative, experimental usage cost through on-demand infrastructure • Fully scalable infrastructure for highly variable workloads • Tools & Services for managing structured, unstructured and stream data
  • 11. Starting small is powerful, when you can scale up fast Scaling up your analytics systems With AWS Traditional IT * get a new BI server 20 minutes 3 months upgrade your analytics server to the newest Intel processors and add 16GB memory 15 minutes 2 months add 500TB of storage instant 2 months grow a DWH cluster from 8GB to 1PB 1 hour 8 months build a 1024-node Hadoop cluster 30 minutes unlikely roll out multi-region production environment hours months * actual provisioning times in a well-organized IT division
  • 12. Let’s talk business outcomes of data analytics!
  • 13. Outcome 1 : Modernize and consolidate • Insights to enhance business applications and create new digital services Outcome 2 : Innovate for new revenues • Personalization, demand forecasting, risk analysis Outcome 3 : Real-time engagement • Interactive customer experience, event-driven automation, fraud detection Outcome 4 : Automate for expansive reach • Automation of business processes and physical infrastructure Driving Business Outcomes via Data Analytics
  • 14. Modern data architectures for business insights at scale
  • 15. Insights to enhance business applications, new digital services Technology: Backend system integration, on-prem data center extension, business application integration, BI provisioning, data lakes, external APIs, access control and logging Common initiatives Insights: 360 view of the business • Legacy data systems migration to enable self-service for business analysts • Integration of all customer data, from orders, payments, interactions • Supplier performance for inventory and vendor management Digitization: Web-service that gives on-demand insights • Delivery of digital content, with behavior tracking, and upsell (or ads) • Ordering system for enterprise customers or consumers Data monetization: Enrich, aggregate, and sell business data • External data enrichment API, including digital marketing platforms • Purchasable data sets of anonymized, domain-enriched insights Outcome 1 : Modernize and Consolidate
  • 16. Suncorp is moving "all-in" on cloud. Project Ignite will extract benefits of $170 million - Group CEO Patrick Snowball Insurance Policy Insurance Claim Core Banking Life Admin
  • 17. Ingest ServingData sources Speed (Real-time) Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Enhancing business applications and creating new digital services takes a few steps. Business goals often consist of being an agile, well-run organization, and to stop missing opportunities because people are making decisions without accurate insights. These initiatives are focused on giving important personas fast and secure access to business-relevant insights.
  • 18. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers 1. Define personas and use case requirements (including UI) Data analysts
  • 19. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP 2. Locate the data sources that have the information to extract Data analysts
  • 20. Fluentd: Open Source Log Collection https://github.com/fluent/fluentd/ • Fluentd is an open source data collector to unify data collection and consumption • Integration into many data sources (App Logs, Syslogs, Twitter etc.) • Direct integration into AWS <source> type tail format apache2 path /var/log/apache2/access_log tag s3.apache.access </source> <match s3.*.*> type s3 s3_bucket myweblogs path logs/ </match>
  • 21. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 3. Ingest data through incremental or full loads, across secure connections Data analysts
  • 22. A single, large system may perform a single task well, but is often too difficult to adapt and scale
  • 23. A system that is decoupled can adapt to a fast moving business, and can scale up and down with significantly lower barriers
  • 24. Decouple Storage and Compute Traditionally analytical workloads required large databases or data warehouses, with storage and compute close to each other Big Data often benefits from decoupling storage and compute Amazon S3 offers virtually unlimited storage at a per GB/month rate
  • 25. Amazon S3 Highly available object storage 99.999999999% data durability Replicated across 3 facilities Virtually unlimited scale Pay only for usage, no pre-provisioning Event notifications to trigger actions
  • 26. Amazon EMR Fully managed Hadoop Optimized with S3 Autoscaling for elasticity Transient and long running clusters Integration with AWS Spot Market
  • 27. 1 instance x 100 hours = 100 instances x 1 hour (and with Spot Pricing not only faster but also cheaper)
  • 28. Amazon EMR • Amazon EMR supports all common Hadoop Frameworks such as: • Spark, Pig, Hive, Hue, Oozie … • Hbase, Presto, Impala … • Decouples storage from compute • Allows independent scaling • Direct Integration with DynamoDB and S3 Amazon S3Amazon DynamoDB Amazon EMR
  • 29. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 4. Use Hadoop for large scale ETL, data quality, and preparation [*EMRFS] AWS Glue Amazon S3 Raw Data Amazon EMR ETL Data analysts Amazon S3 Clean Data
  • 30. AWS Glue Managed Transform Engine Job Scheduler Data Catalog Built on Apache Spark Integrated with S3, RDS, Redshift & any JDBC-compliant data store
  • 31.
  • 32. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 5. Stage all data into centralized, highly available, durable storage for further access AWS Glue Amazon S3 Raw Data Data analysts Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 33. Fully managed MPP SQL database - fully relational Optimised for analytics Gigabytes to Petabytes Less than 1/10th the cost of traditional Amazon Redshift
  • 34. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 6. Load semi-structured into Hadoop, structured into the DWH, and application data into managed legacy application databases AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 35. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 7. Data is protected through identity and access management and logging AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 36. Fast, cloud-powered, BI service Visualizations and ad-hoc analysis Connectors for AWS and 3rd party sources In-memory calculation engine (SPICE) $9 per user per month Amazon QuickSight
  • 37.
  • 38. AWS Marketplace • Pre-Configured machine images ready to be launched into virtual server instances • Launch applications with 1-Click • Pay software licenses by the hour or bring your own license (BYOL)
  • 39. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 8. Data analysts use BI tools of choice to access all serving services AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts Amazon QuickSight Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 40. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 9. Business users have enterprise applications enhanced by analytics AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts Amazon QuickSight Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 41. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data 10. External parties can buy services or data in a governed, secure way AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts Amazon QuickSight Amazon API Gateway Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data
  • 42. Speed (Real-time) Ingest ServingData sources Scale (Batch) Modernize and consolidate Insights to enhance business applications, new digital services Business users External buyers Transactions Web logs / cookies ERP Ingest AWS Database Migration Service AWS Direct Connect AWS Storage Gateway Internet Interfaces Changed Data AWS Glue Amazon S3 Raw Data Amazon EMR Semi-structured Amazon RedShift Data Warehouse Amazon RDS Legacy Apps Data analysts Amazon QuickSight Amazon API Gateway Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS
  • 43. Personalization, demand forecasting, risk analysis Technology: Advanced analytics, customer segmentations, high volume transactional data, un/semi- structured data, design of experiment, A/B & hypothesis testing, machine learning Common initiatives Personalization: Refine market approaches based on optimal segments • Offer products to new customers based on clusters of similar individuals • Launch share of wallet initiatives, understanding likely total spend • Targeted marketing to capture interests and increase conversion rates Predict demand: Guide business owners to select the best scenarios • Launch items or promotions at the optimal time to maximize response • Modeling for store assortment, product selection, and merchandizing • New product design, based on known market propensities Risk measurement: Create freedom to act by quantifying exposures • Scenario simulation to encourage investments and new offerings • Supply chain analytics allows for faster confirmation of goods to customers Outcome 2 : Innovate for new revenues
  • 44.
  • 45.
  • 46. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Driving net new revenues is realized by business teams that have access to skilled analysts, using platforms that can scale up and out, without IT bottlenecks. Organizations start operating based on what they know about their customers, and can approach new ventures in terms of confidence levels. Product launches, campaigns, supply chain management, packaged services, and customized offerings are designed and executed based on predictive models.
  • 47. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis AWS Cloud TrailAWS IAM Amazon CloudWatch Data analysts Data scientists Business users Engagement platforms AWS KMS 1. Personas involved in generating new revenues are data scientists, data analysts (often embedded), business users, and customers/suppliers
  • 48. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Direct Connect Data analysts Data scientists Business users Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data AWS Glue 2. Advanced analytics are built from a base of traditional data processing Amazon EMR Amazon RedShift Amazon RDS
  • 49. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Direct Connect Data analysts Data scientists Business users Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data AWS Glue 3. On-premise storage and databases are connected and converted Amazon EMR Amazon RedShift Amazon RDS AWS Database Migration Service AWS Storage Gateway
  • 50. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Direct Connect Internet Interfaces Data analysts Data scientists Business users Web logs / cookies Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data AWS Glue 4. Internet-native data sources, like web and mobile, are captured Amazon EMR Amazon RedShift Amazon RDS AWS Database Migration Service AWS Storage Gateway
  • 51. Stream in Real Time: Amazon Kinesis • Real-Time Data Processing over large distributed streams • Elastic capacity that scales to millions of events per second • React In real-time upon incoming stream events • Reliable stream storage replicated across 3 facilities Amazon Kinesis
  • 52. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data AWS Glue 5. Streaming un/semi-structured data feeds, like social and devices are captured Amazon EMR Amazon RedShift Amazon RDS
  • 53. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue 6. Log files and other schemaless data converted to Parquet and staged Amazon EMR Amazon RedShift Amazon RDS
  • 54. Interactive query service to analyze data in Amazon S3 directly using standard SQL No need to move data No infrastructure to setup & manage Fast -- results within seconds Pay for only the queries you run Amazon Athena
  • 55. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis Amazon EMR Amazon ElasticSearch AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue 7. Data analysts explore and visualize un/semi-structured data Amazon RedShift Amazon RDS Amazon Athena
  • 56. Amazon Machine Learning • Easy to use, managed machine learning service built for developers • Machine learning technology based on Amazon’s internal systems • Create models using data stored in Amazon S3, Amazon RDS or Amazon Redshift • Request predictions on batch or real- time Amazon Machine Learning
  • 57. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine Learning Amazon S3 Schemaless AWS Glue 8. Simple analytical models are built against Amazon Machine Learning Amazon EMR Amazon RedShift Amazon RDS Amazon ElasticSearch Amazon Athena
  • 58. Apache Spark • In-memory analytics cluster using RDD (Resilient Distributed Dataset) for fast processing • Spark MLlib offers machine learning out of the box • Apache Spark can read directly from Amazon S3 data = sc.textFile("s3://...") parsedData = data.map(lambda line: array([float(x) for x in line.split(' ')])) model = KMeans.train(parsedData, 2, maxIterations=10, initializationMode="random") model.save(sc, "MyModel") sameModel = KMeansModel.load(sc, "MyModel")
  • 59. Intel® Processor Technologies Intel® AVX – Dramatically increases performance for highly parallel HPC workloads such as life science engineering, data mining, financial analysis, media processing Intel® AES-NI – Enhances security with new encryption instructions that reduce the performance penalty associated with encrypting/decrypting data Intel® Turbo Boost Technology – Increases computing power with performance that adapts to spikes in workloads Intel Transactional Synchronization (TSX) Extensions – Enables execution of transactions that are independent to accelerate throughput P state & C state control – provides granular performance tuning for cores and sleep states to improve overall application performance
  • 60. New X1 Instance - Tons of Memory • Designed for large-scale, in-memory applications in the cloud • Ideal for in-memory databases like SAP HANA and big data processing apps like Spark and Presto • Powered by Intel® Xeon® E7 8880 v3 Haswell processors • Features up to 2TB of memory and up to 128 vCPUs per instance • 8X the memory offered by any other Amazon EC2 instance
  • 61. Machine Learning Algorithms • Classification • Sentiment analysis – Do people like my new product? • Linear Regression • Trend prediction – How much revenue next month? • Clustering • Recommendation - Other people bought this! • Association • Market basket analysis – Bundled products • Neural Networks • Pattern recognition - Speech recognition Amazon Machine Learning Amazon EMR + Spark Mlib GPU Optimized EC2 Instance
  • 62. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon EMR MLlib Amazon S3 Schemaless AWS Glue 9. Complex analytical models are built against EMR (Spark) clusters Amazon EMR Amazon RedShift Amazon RDS Amazon ElasticSearchAmazonML Amazon Athena
  • 63. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon EMR MLlib Amazon S3 Schemaless AWS Glue 10. Deep learning models are built against mxnet clusters Amazon EMR Amazon RedShift Amazon RDS Amazon ElasticSearch Deep Learning AmazonML Amazon Athena
  • 64. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue 11. Predictive models and scored datasets are published to data staging Amazon EMR Amazon RedShift Amazon RDS Amazon ElasticSearchAmazon EMR MLlib Deep Learning AmazonML Amazon Athena
  • 65. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis Amazon EMR Amazon ElasticSearch AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue 12. Analysts use DWH, EMR, ES to find patterns & measure performance Amazon RedShift Amazon RDS Amazon EMR MLlib Deep Learning AmazonML Amazon Athena
  • 66. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue 13. Risk models evaluated to create new products and assess customers Amazon RDS Amazon EMR MLlib Deep Learning AmazonML Amazon Athena
  • 67. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue 14. Demand forecasts loaded into supply chain management systems Amazon EMR MLlib Deep Learning AmazonML Amazon Athena
  • 68. Amazon SNS & Amazon Pinpoint • Amazon SNS is a fully managed, cross-platform mobile push intermediary service • Fully scalable to millions of devices • Amazon Pinpoint allows to created targeted campaigns and measure engagement and results Amazon SNS Apple APNS Google GCM Amazon ADM Windows WNS and MPNS Baidu CP Android Phones and Tablets Apple iPhones and iPads Kindle Fire Devices Android Phones and Tablets in China iOS Windows Phone Devices Amazon SNS
  • 69. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue 15. Personalized offers are broadcast out over notification channels Amazon SNS Amazon EMR MLlib Deep Learning AmazonML Amazon Athena Amazon Pinpoint
  • 70. Ingest ServingData sources Speed (Real-time) Scale (Batch) Innovate for new revenues Personalization, demand forecasting, risk analysis Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue Amazon SNS Amazon EMR MLlib Deep Learning AmazonML Amazon Athena AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Amazon Pinpoint
  • 71. See it live in action!
  • 72. BREAK Next up: Real-Time Analytics and Engagement
  • 73. Modern data architectures for real-time analytics and engagement
  • 74. Interactive customer experience, event-driven automation, fraud detection Technology: Clickstream/mobile apps/sensor/video (computer vision)/audio (intent comprehension), event detection and pipelining, in-line scoring, serverless compute, computer vision, deep learning Common initiatives Interactive CX: Natural customer journeys with adaptive interfaces • Behavior-based recommendations, improving personalization along the journey • Seamless session transfer across UI, from browser to mobile to physical location • Voice-driven commands, and use of gestures and other natural interfaces Event-driven automation: Full execution of business process driven by an action • Order fulfillment, with real-time update notifications to customer • Fast response to customer complaints/comments over direct or social channels Fraud detection: Protect customer and business w/ real-time anomaly detection • Purchase and payment verification, using behavioral models and location assessment • Application and account opening validation Outcome 3 : Real-time Engagement
  • 75.
  • 76.
  • 77. Personalized content - Account access - Track spending - Check balances - Pay bills - Prevent fraud
  • 78. The Power of Speech: Alexa Alexa, the voice service that powers Echo, provides capabilities, or skills, that enable customers to interact with devices using voice Alexa Skills Kit (ASK) allows everyone to build and publish their own skills Skills can be powered by AWS Lambda
  • 79. Automated Speech Recognition (ASR) Natural Language Processing (NLP) Alexa Skills Kit (ASK) Over 80 services, including Core, Security, Database, Artificial Intelligence, Analytics, Mobile Development
  • 80. Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Provide superior customer service by responding to opportunities in real time. Fulfill requests for products or services in an automated fashion to create a strong competitive advantage over those that are unable to. Assurance becomes a different challenge, when speeds increase, and fraud prevention must be adaptive and fast. Adding another layer of opportunity and complexity is the use of vast streams of data from devices that are measuring location, video, behaviors, environmental conditions, and more.
  • 81. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Data analysts Data scientists Business users Engagement platforms Automation / events 1. Real-time engagement requires personas that develop the analytics, and platforms for engaging and automating processes
  • 82. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue 2. Real-time systems are built from a base of advanced data processing Amazon EMR MLlib Deep Learning AmazonML Amazon Athena
  • 83. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless AWS Glue Amazon Kinesis 3. Events are pipelined through Kinesis, into multiple streams, at scale Amazon EMR MLlib Deep Learning AmazonML Amazon Athena
  • 84. Also possible with Spark Streaming! Amazon Kinesis EMR with Spark Streaming KinesisUtils.createStream(‘twitter-stream’) .filter(_.getText.contains(‘Big Data’)) .countByWindow(Seconds(5)) Counting tweets on a sliding window
  • 85. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless Amazon Kinesis 4. Event data is given context and structure in EMR and pushed for batch Amazon EMR AWS Glue Amazon EMR MLlib Deep Learning AmazonML Amazon Athena
  • 86. Amazon Kinesis Firehose • Fully managed data streaming service to ingest and capture data into your storage or data warehouse • Ability to batch load, compress or encrypt streaming data • Elastic to scale to any throughput (no more sharding) • Charged only per GB processed ($0.035 per GB)
  • 87. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon Kinesis Firehose Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless Amazon Kinesis 5. Kinesis Firehose pumps events into a DWH for near real-time analysis Amazon EMR Amazon EMR MLlib Deep Learning AmazonML AWS Glue Amazon Athena
  • 88. AWS Lambda • Use AWS Lambda to clean and massage incoming data • Write code to load data sources (S3, DynamoDB) automatically in your data warehouse (e.g. Amazon Redshift) • React in real-time to incoming events in Amazon Kinesis Amazon Lambda Amazon Redshift Amazon Kinesis
  • 89. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon Kinesis Firehose Event Scoring AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless Amazon Kinesis 6. The event is streamed to a scoring server for processing Amazon EMR Amazon EMR MLlib Deep Learning AmazonML AWS Glue Amazon Athena
  • 91. Unlimited Replays Returns an MP3 or audio stream Lightning Fast Response Fully Managed and Low Cost Amazon Polly Turn text into lifelike speech using deep learning technologies to synthesize speech that sounds like a human voice
  • 92. Amazon Polly “The temperature in WA is 75°F” “The temperature in Washington is 75 degrees Fahrenheit” Amazon Polly: Text In, Life-like Speech Out
  • 93. Amazon Lex Conversational interfaces for your applications, powered by the same Natural Language Understanding (NLU) & Automatic Speech Recognition (ASR) models as Alexa Integrated development in AWS console Trigger AWS Lambda functions Multi-step conversations Continually improving ASR & NLU models Enterprise connectors Fully Managed
  • 94. Intents A particular goal that the user wants to achieve Utterances Spoken or typed phrases that invoke your intent Slots Data the user must provide to fulfill the intent Prompts Questions that ask the user to input data Fulfillment The business logic required to fulfill the user’s intent BookHotel
  • 95. Amazon Rekognition Image Recognitions and Analysis powered by Deep Learning which allows to search, verify and organize millions of images Easy to use Batch Analysis Real-time Analysis Continually Improving Low Cost
  • 97. Demographic Data Facial Landmarks Sentiment Expressed Image Quality Brightness: 25.84 Sharpness: 160 General Attributes
  • 98. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless Amazon Kinesis 7. Language, intent, and image processing are run and sent for scoring Amazon EMR Amazon EMR MLlib Deep Learning AmazonML AWS Glue Amazon Athena
  • 99. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless Amazon Kinesis 8. Simple analytical models are checked on-demand against Amazon ML Amazon EMR Amazon EMR MLlib Deep Learning AmazonML AWS Glue Amazon Athena
  • 100. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless Amazon Kinesis 9. Complex analytical models are scored against coded models (PMML) Amazon EMR Amazon EMR MLlib Deep Learning AmazonML AWS Glue Amazon Athena
  • 101. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless Amazon Kinesis 10. Deep learning models are scored against imported models (eg JSON) Amazon EMR Amazon EMR MLlib Deep Learning AmazonML AWS Glue Amazon Athena
  • 102. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS AWS Storage Gateway Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless Amazon Kinesis 11. Scored response to the event is processed to be pushed for action Amazon EMR Amazon EMR MLlib Deep Learning AmazonML AWS Glue Amazon Athena
  • 103. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS Amazon DynamoDB AWS Storage Gateway Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless Amazon Kinesis 12. Recommendations are pushed to DynamoDB for low latency serving Amazon EMR Amazon EMR MLlib Deep Learning AmazonML AWS Glue Amazon Athena
  • 104. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS Amazon SQS AWS Storage Gateway Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless Amazon Kinesis 12. Actions are pushed to RDS and SQS for business process automation Amazon DynamoDB Amazon EMR Amazon EMR MLlib Deep Learning AmazonML AWS Glue Amazon Athena
  • 105. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Real-time engagement Interactive customer experience, event-driven automation, fraud detection Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS Amazon DynamoDB Amazon SQS AWS Storage Gateway Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon S3 Schemaless Amazon Kinesis Amazon EMR Amazon EMR MLlib Deep Learning AmazonML AWS Glue Amazon Athena
  • 106. See it live in action!
  • 107. Automation of self-service, deployment, policy, and quality assurance Technology: Self-service, on-demand provisioning, DevOps, spot pricing, Cloud Formations, security automation, performance monitoring (CW&XR), global rollouts Common initiatives Self-service: • Application catalog or portal for all employees, availability determined by role • Service provisioning backed by automation of policy and governance Agile development: Use of DevOps to allow very few resources to deploy globally • CI/CD for software release, build/test, and deployment automation • Templated infrastructure provisioning, and configuration management • Business rules and policies are "gold coded" to be used for all deployments • Use of Security by Design (SbD) to codify network, O/S, and encryption Comprehensive monitoring: Assurance of SLA and issue remediation • Logging and monitoring of all API calls and executions to ensure SLAs are met • Analysis of performance variance for faster root cause analysis Outcome 4 : Automate for expansive reach
  • 108. AWS Cloud TrailAWS IAM Amazon CloudWatchAWS KMS Ingest ServingData sources Speed (Real-time) Scale (Batch) Automate for expansive reach Automation of self-service, deployment, policy, and quality assurance Transactions AWS Database Migration Service AWS Direct Connect Internet Interfaces Amazon S3 Stream Data Amazon Kinesis Amazon EMR Amazon ElasticSearch Amazon RedShift Amazon RDS Amazon DynamoDB Amazon SQS AWS Storage Gateway Amazon Kinesis Firehose Event Scoring Amazon AI AWS Lambda AWS Lambda Data analysts Data scientists Business users Connected devices Web logs / cookies Social media Engagement platforms Automation / events ERP Amazon S3 Raw Data Amazon S3 Staged Data (Data Lake) Amazon EMR ETL Amazon S3 Clean Data Amazon Machine LearningAmazon EMR MLlib Amazon S3 Schemaless Amazon EMR AWS Glue Amazon Kinesis AWS DevOps Amazon Athena
  • 109. Next steps for you to be the Big Data hero
  • 110. Sharpen your skills (Singapore) Attend the official AWS Training course organized by AWS Authorized local training partner – Bespoke Training Services (www.bespoketraining.com). Join the AWS Jumpstart (2 hr) session and hear from our customers and partners on how they enabled their teams and successfully deployed on AWS. Also stand a chance to win free seat to the above courses. Point of contact – Gilbert Cheo - gilbert@bespoketraining.com Courses Date Architecting on AWS 28 Feb-2 Mar / 14-16 March System Operations on AWS 22-24 Feb Developing on AWS 4-6 April Big Data on AWS 4-6 April Date Venue AWS Singapore, Church Street, Capital Square, #10-01, Singapore 049481
  • 111. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. http://bit.ly/summitsg April 11 – Marina Bay Sands - Singapore Register Now!
  • 112. You, the Big Data hero! Start with the persona De-couple to scale Experiment and iterate Deploy with automation

Hinweis der Redaktion

  1. 50 mins
  2. "Over the next two years as we move to our optimised platform, we'll be able to extract ... benefits of $170 million," in addition to benefits already realised from the transformation process begun in 2010, Snowball said. Suncorp's vision for its "optimised platform" is digitally enabled customer-facing systems sitting atop simplified core administration systems that feed into a data lake that can drive predictive analytics and business intelligence across the group. "Increasingly our customers want to connect digitally and we're living in a world of both mobility and technological disruption," Snowball said. "To ensure that we stay ahead of the competition, we've been investing in systems that are digitally enabled to allow our customers and business partners to access us, how and where they want. "Standing behind our digital frontend we are completing the development of four core administration systems: One policy and one claim system for all our general insurance businesses both here and in New Zealand, a world-class banking system and a new life administration system." "These core systems will feed our customer, policy and claims data along with HR, finance and management data, into our single, centralised data lake," the group CEO said. "This will allow us to establish a best in class business intelligence function providing forward-looking, predictive analytics to deliver better solutions and outcomes for our customers. "All of this will sit in a secure and flexible cloud environment where our lean and agile capabilities will enable us to deliver new services at high speed and lower cost," Snowball said.
  3. 10:04 Rodos comment on available
  4. Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances. The EMR File System allows EMR clusters to efficiently and securely use Amazon S3 as an object store for Hadoop. You can store your data in Amazon S3 and use multiple Amazon EMR clusters to process the same data set. Each cluster can be optimized for a particular workload, which can be more efficient than a single cluster serving multiple workloads with different requirements. For example, you might have one cluster that is optimized for I/O and another that is optimized for CPU, each processing the same data set in Amazon S3. Additionally, by storing your input and output data in Amazon S3, you can shut down clusters when they are no longer needed.  Amazon EMR makes it easy to use spot instances so you can save both time and money. Amazon EMR clusters include 'core nodes' that run HDFS and ‘task nodes’ that do not; task nodes are ideal for Spot because if the Spot price increases and you lose those instances you will not lose data stored in HDFS.  Amazon EMR supports powerful and proven Hadoop tools such as Hive, Pig, HBase, and Impala. Additionally, it can run distributed computing frameworks besides Hadoop MapReduce such as Spark or Presto using bootstrap actions. You can also use Hue and Zeppelin as GUIs for interacting with applications on your cluster.
  5. Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances. The EMR File System allows EMR clusters to efficiently and securely use Amazon S3 as an object store for Hadoop. You can store your data in Amazon S3 and use multiple Amazon EMR clusters to process the same data set. Each cluster can be optimized for a particular workload, which can be more efficient than a single cluster serving multiple workloads with different requirements. For example, you might have one cluster that is optimized for I/O and another that is optimized for CPU, each processing the same data set in Amazon S3. Additionally, by storing your input and output data in Amazon S3, you can shut down clusters when they are no longer needed.  Amazon EMR makes it easy to use spot instances so you can save both time and money. Amazon EMR clusters include 'core nodes' that run HDFS and ‘task nodes’ that do not; task nodes are ideal for Spot because if the Spot price increases and you lose those instances you will not lose data stored in HDFS.  Amazon EMR supports powerful and proven Hadoop tools such as Hive, Pig, HBase, and Impala. Additionally, it can run distributed computing frameworks besides Hadoop MapReduce such as Spark or Presto using bootstrap actions. You can also use Hue and Zeppelin as GUIs for interacting with applications on your cluster.
  6. …..And by the way if you thought that these innovative digital use cases were only happening globally outside, you could NOT be more wrong. A lot of large Indian companies are increasing their digital presence and seeing massive success in those areas in doing those….CLICK
  7. More : https://aws.amazon.com/blogs/aws/ec2-instance-update-x1-sap-hana-t2-nano-websites/