SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bob Griffiths, Solutions Architect, AWS
Brett Shriver, Senior Director, FINRA
Ricardo Portilla, Lead Architect, FINRA
December 2016
CMP316
Learn How FINRA Aligns
Billions of Time-Ordered Events
with Apache Spark on Amazon EC2
What to Expect from the Session
• Overview of Apache Spark on
AWS
• FINRA will describe their use
of Spark for aligning time
ordered events (business
problem, solution, benefits,
lessons learned)
AWS – Spark Overview
3
What is Spark?
• A fast and general
engine for large-scale
data processing
• Write applications quickly
in Java, Scala, Python, R
• Run programs up to
100x faster than Hadoop
MapReduce in memory,
or 10x faster on disk To learn more, check out http://spark.apache.org
4
Spark on AWS Compute
• Amazon EC2
• Standalone cluster mode
• Apache Mesos
• Amazon EMR
• YARN
5
AWS Storage options
Amazon EC2 ephemeral storage (HDFS)
Amazon EBS
Amazon S3
Amazon DynamoDB
6
Additional Apache Spark Use Cases
Yelp’s advertising
targeting team makes
prediction models to
determine the likelihood
of a user interacting with
an advertisement. By
using Apache Spark on
Amazon EMR to process
large amounts of data to
train machine learning
models, Yelp increased
revenue and advertising
click-through rate.
As part of its Data
Management Platform for
customer insights, Krux
runs many machine
learning and general
processing workloads
using Apache Spark. Krux
utilizes ephemeral
Amazon EMR clusters
with Amazon EC2 Spot
Capacity to save costs,
and uses Amazon S3 with
EMRFS as a data layer
for Apache Spark.
CrowdStrike provides
endpoint protection to
stop breaches. They use
Apache Spark on Amazon
EMR to process hundreds
of terabytes of event data
and roll it up into higher-
level behavioral
descriptions on the hosts.
From that data,
CrowdStrike can pull
event data together and
identify the presence of
malicious activity.
7
Spark on AWS Best Practices
8
Separate Storage & Compute
• Optimize cluster size based on
compute requirements
• Allows selection of optimal EC2
instance types
• Shut down your cluster when not in
use
• Share data among multiple clusters
• Fault tolerance and disaster recovery
9
Spot instances
• Bid on spare Amazon EC2 computing capacity
• Reduce operating costs by up to 50-90%
• Ideal for ephemeral compute clusters
10
FINRA: Spark Case Study
11
12
What Do We Do?
By the Numbers
• We oversee more than 3,900 securities firms with approximately
640,795 brokers.
• Every day, we watch over nearly 6 billion shares traded in U.S. listed
equities markets—using technology powerful enough to detect potential
abuses.
• In fact, FINRA processes approximately 6 terabytes of data and up to
75 billion events every day to build a complete, holistic picture of market
trading in the U.S.
• In 2015, FINRA:
• Referred over 800 fraud cases for prosecution
• Levied more than $191 million in fines & restitution
13
Problem Statement
What data is FINRA tracking, and what do we do with it?
- Up to 75 billion events/day
- 12 markets/exchanges
- Surveillance – Compare the state of activity across different
'venues', such as quotation systems, exchanges, firms, or
trade reporting facilities
• Involves grouping events by stock symbol and venue then
ordering by event time
- Alerts for ‘venues of interest’ where suspicious activity occurs
14
Problem Statement (cont’d)
What qualifies as suspicious activity? Actions taken by traders/brokers which disadvantage
customer orders or undermine fairness/efficiency of markets.
Individual investors Broker/Firm
Buy customer order placed
– 100,000 shares
Broker buys
(100 shares)
Buy customer order
executes, share price
increases
Broker sells
high
Time
Front-running
Layering
Broker places multiple sell orders, slowing
incrementing price
Customer drives price
lower
Broker buys at
lower price
15
Example 1 – Intermarket Price Protection
 Intermarket Price
Protection - Restrict broker
trading outside the best
publicly available bid/offer
to encourage best
execution of customer
orders
Best price = highest bid/lowest ask, broker is
trading outside this band!
85K quotes/sec
< 1K trades/sec
16
Example 2 – Exchange side of the picture
Exchange #2
Time Ticker Price
13:00:00 STOCK XYZ $57.42
13:00:00.1 STOCK XYZ $57.41
13:00:00.156 STOCK XYZ $57
13:00:00.243 STOCK XYZ $57.10
13:00:00.244 STOCK XYZ $57.11
13:00:00.260 STOCK XYZ $57.15
13:00:00.287 STOCK XYZ $57.92
13:00:00.29 STOCK XYZ $57.90
13:00:00.293 STOCK XYZ $57.40
13:00:00.297 STOCK XYZ $57.41
Intra-day Exchange Feeds
Best price per exchange – ‘top of order
book’ – BUY side
90K order events/second
4 billion order events/day across
11 exchanges
Time Ticker Excha
nge
Price Size
13:00:0
0
STOCK
XYZ
EX 1 $58 1000
13:00:0
0.1
STOCK
XYZ
EX 2 $57.41 1200
13:00:0
0.156
STOCK
XYZ
EX 1 $57.50 800
13:00:0
0.243
STOCK
XYZ
EX 3 $57.13 700
13:00:0
0.244
STOCK
XYZ
EX 4 $57.18 1200
13:00:0
0.260
STOCK
XYZ
EX 2 $57.15 900
Universal best
size/price
Limit Order Display - Obligation of
firm to publish the full
price/volume of its received
customer limit orders in an
exchange where the firm is
registered as a market-maker
Accumulate quantity at the best price
Best price at Exchange 2
at this point in time
Best universal price
(Exchange 1)
17
• Exchange 1
• Exchange 2
• Exchange 3
• Exchange 4
• etc
Example 2 – Customer order side of the picture
Time Firm Ticker Excha
nge
Price Size
13:00:
00
FIRM
ABC
STOCK
XYZ
EX 1 $58 1000
13:00:
00.1
FIRM
ABC
STOCK
XYZ
EX 2 $57.41 1200
13:00:
00.15
6
FIRM
ABC
STOCK
XYZ
EX 1 $57.50 800
13:00:
00.24
3
FIRM
ABC
STOCK
XYZ
EX 3 $57.13 700
13:00:
00.24
4
FIRM
ABC
STOCK
XYZ
EX 4 $57.18 1200
13:00:
00.26
0
FIRM
ABC
STOCK
XYZ
EX 2 $57.15 900
Published best
size/price
New
order
4-5 billion customer
orders/day
50 million
displayable
orders/day
Time Firm Size Pr
13:00 FIRM
ABC
1000 $60
13:05 FIRM
ABC
0 $60
13:06 FIRM
ABC
1000 $60
BUY Limit Order Size/Price
At this point, the exchange price is worse than
the customer’s order, so in violation!
Route New
order
Cancel
18
Legacy Solution
Storage
Compute
Proprietary processors
380 cores, 380 TB disk
> 300 SQL-based
surveillances
Data off-loading
to make way for
current
data/processing
Data loads run
day/night
Production DB Architecture
Storage
Compute
DR – Disaster Recovery
19
Pain Points
Pain points with the legacy solution – no longer present in
AWS/Spark architecture
- 7 figures yearly to maintain and operate
- Difficult to scale during periods of market volatility
- Storage and compute tightly coupled
- Won’t support industry movement toward real time
- Reprocessing is difficult (takes months)
- Unavailable during maintenance windows
- No control of managing internal compute model
21
Alternatives
Key requirements of new solution:
- Scalability / elasticity
- Cost effectiveness
- Ease of coding and testing
- Platform for future real-time processing
- Support for time-based iteration
What options were considered?
- Apache Spark on Amazon EMR
- Java MapReduce
- Apache Giraph
- Apache Crunch
Spark Processing Approach
Time
T
T
T
T
Venue = Quoting Facility
Q = Externally quoted/displayed
data
Q
Q
Q
Q
Q
Q
Q
Q
• What makes this comparison difficult?
• Have to join on ticker and firm -> large partitions
• Instead of joining -> union, sort, and iterate
Turns an M x N problem
into an M + N problem.
Venue = Trade Reporting
Facility
T = Surveillance events (e.g.
trades)
Q
20
AWS Architecture
Amazon S3 Storage
Input Bucket
Output Bucket
Spark on EMR Compute
Spot m1.xlarge (I/O intensive)
Spot m2.4xlarge (shuffle intensive)
Spot m3.2xlarge (compute intensive)
23
Choosing Spot Instances
This architecture gets us:
 Flexibility / Elasticity
 I/O intensive instances for portions of code (m1.xlarge)
 Shuffle-heavy (m2.4x)
 Time-based computations (compute optimized -> m3.2x)
 Cost Savings
 Targeting AWS Spot Instance types to match these profiles saved us
3X the cost. We also pinpointed instances which were not too
volatile.
24
Benefits
Realized Benefits
• Order of magnitude cost savings vs. on-premises database
architecture
• Increased speed on reprocessing requests
• Scalability, etc.
Expected Future Benefits
• Supports real-time processing (as data providers migrate to real time)
• Easier experimentation with new instance types
• Convert more surveillance to this model
25
FINRA’s Future Plans
• Capture additional benefits of Spark, such as use of data
frames API for easier manipulations and optimization
• Migrate additional workloads onto Spark – currently 240
surveillance applications are implemented in Hive but would
benefit from Spark
• Some experiments at FINRA show 2X speed with Spark vs Hive
for same design
• Use new AWS APIs
• Move toward real time (coordinating with data providers)
26
Thank you!
Remember to complete
your evaluations!
28
Related Sessions
FINRA Sessions:
- BDM203 – Building a Secure Data Science Platform
- DAT302 – Best Practices for Migrating to RDS / Aurora
- ENT313 – FINRA in the Cloud, Big Data Enterprise
- STG308 – Data Lake for Big Data on Amazon S3
- SVR202 – What’s new with Lambda
29

Weitere ähnliche Inhalte

Was ist angesagt?

AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)Amazon Web Services
 
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...Amazon Web Services
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksAmazon Web Services
 
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...Amazon Web Services
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon AuroraAmazon Web Services
 
getting started with amazon aurora
getting started with amazon auroragetting started with amazon aurora
getting started with amazon auroraAmazon Web Services
 
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017 Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017 Amazon Web Services
 
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)Amazon Web Services
 
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...Amazon Web Services
 
Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017Amazon Web Services
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)Amazon Web Services
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingAmazon Web Services
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)Amazon Web Services
 
Amazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best PracticesAmazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best PracticesAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Getting Started with AWS Security
Getting Started with AWS SecurityGetting Started with AWS Security
Getting Started with AWS SecurityAmazon Web Services
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)Amazon Web Services
 
Deep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech Talks
Deep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech TalksDeep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech Talks
Deep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech TalksAmazon Web Services
 
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...Amazon Web Services
 

Was ist angesagt? (20)

AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
 
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
AWS re:Invent 2016: Getting the most Bang for your buck with #EC2 #Winning (C...
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
 
Cost Optimization at Scale
Cost Optimization at ScaleCost Optimization at Scale
Cost Optimization at Scale
 
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
AWS re:Invent 2016: Dollars and Sense: Technical Tips for Continual Cost Opti...
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
getting started with amazon aurora
getting started with amazon auroragetting started with amazon aurora
getting started with amazon aurora
 
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017 Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
Accelerate your Business with SAP on AWS - AWS Summit Cape Town 2017
 
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
SRV401 Deep Dive on Amazon Elastic File System (Amazon EFS)
 
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
AWS re:Invent 2016: Case Study: How Atlassian Uses Amazon EFS with JIRA to Cu...
 
Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017Introduction to Storage on AWS - AWS Summit Cape Town 2017
Introduction to Storage on AWS - AWS Summit Cape Town 2017
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with Caching
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
 
Amazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best PracticesAmazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best Practices
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with AWS Security
Getting Started with AWS SecurityGetting Started with AWS Security
Getting Started with AWS Security
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
Deep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech Talks
Deep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech TalksDeep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech Talks
Deep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech Talks
 
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
AWS re:Invent 2016: Event Handling at Scale: Designing an Auditable Ingestion...
 

Andere mochten auch

AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...Amazon Web Services
 
AWS re:Invent 2016: The State of Serverless Computing (SVR311)
AWS re:Invent 2016: The State of Serverless Computing (SVR311)AWS re:Invent 2016: The State of Serverless Computing (SVR311)
AWS re:Invent 2016: The State of Serverless Computing (SVR311)Amazon Web Services
 
Role-based Access Control on AWS
Role-based Access Control on AWSRole-based Access Control on AWS
Role-based Access Control on AWSFreeman Zhang
 
AWS re:Invent 2016: Taking DevOps to the AWS Edge (CTD302)
AWS re:Invent 2016: Taking DevOps to the AWS Edge (CTD302)AWS re:Invent 2016: Taking DevOps to the AWS Edge (CTD302)
AWS re:Invent 2016: Taking DevOps to the AWS Edge (CTD302)Amazon Web Services
 
AWS re:Invent 2016: 6 Million New Registrations in 30 Days: How the Chick-fil...
AWS re:Invent 2016: 6 Million New Registrations in 30 Days: How the Chick-fil...AWS re:Invent 2016: 6 Million New Registrations in 30 Days: How the Chick-fil...
AWS re:Invent 2016: 6 Million New Registrations in 30 Days: How the Chick-fil...Amazon Web Services
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...Amazon Web Services
 
AWS re:Invent 2016: Amazon CloudWatch Logs and AWS Lambda: A Match Made in He...
AWS re:Invent 2016: Amazon CloudWatch Logs and AWS Lambda: A Match Made in He...AWS re:Invent 2016: Amazon CloudWatch Logs and AWS Lambda: A Match Made in He...
AWS re:Invent 2016: Amazon CloudWatch Logs and AWS Lambda: A Match Made in He...Amazon Web Services
 
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)Amazon Web Services
 
AWS Partner Webcast - Use Your AWS CloudTrail Data and Splunk Software To Imp...
AWS Partner Webcast - Use Your AWS CloudTrail Data and Splunk Software To Imp...AWS Partner Webcast - Use Your AWS CloudTrail Data and Splunk Software To Imp...
AWS Partner Webcast - Use Your AWS CloudTrail Data and Splunk Software To Imp...Amazon Web Services
 
A Data Journey With AWS
A Data Journey With AWSA Data Journey With AWS
A Data Journey With AWSJulien SIMON
 
What's New in Spark 2?
What's New in Spark 2?What's New in Spark 2?
What's New in Spark 2?Eyal Ben Ivri
 
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...Amazon Web Services
 
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...Amazon Web Services
 
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)Amazon Web Services
 
(ARC301) Scaling Up to Your First 10 Million Users
(ARC301) Scaling Up to Your First 10 Million Users(ARC301) Scaling Up to Your First 10 Million Users
(ARC301) Scaling Up to Your First 10 Million UsersAmazon Web Services
 
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...Amazon Web Services
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...Amazon Web Services
 
DevOps en Amazon: Un vistazo a nuestras herramientas y procesos
DevOps en Amazon: Un vistazo a nuestras herramientas y procesosDevOps en Amazon: Un vistazo a nuestras herramientas y procesos
DevOps en Amazon: Un vistazo a nuestras herramientas y procesosAmazon Web Services
 
Next Generation Open Data Platforms | AWS Public Sector Summit 2016
Next Generation Open Data Platforms | AWS Public Sector Summit 2016Next Generation Open Data Platforms | AWS Public Sector Summit 2016
Next Generation Open Data Platforms | AWS Public Sector Summit 2016Amazon Web Services
 

Andere mochten auch (20)

AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
AWS re:Invent 2016: FINRA: Building a Secure Data Science Platform on AWS (BD...
 
AWS re:Invent 2016: The State of Serverless Computing (SVR311)
AWS re:Invent 2016: The State of Serverless Computing (SVR311)AWS re:Invent 2016: The State of Serverless Computing (SVR311)
AWS re:Invent 2016: The State of Serverless Computing (SVR311)
 
Role-based Access Control on AWS
Role-based Access Control on AWSRole-based Access Control on AWS
Role-based Access Control on AWS
 
AWS re:Invent 2016: Taking DevOps to the AWS Edge (CTD302)
AWS re:Invent 2016: Taking DevOps to the AWS Edge (CTD302)AWS re:Invent 2016: Taking DevOps to the AWS Edge (CTD302)
AWS re:Invent 2016: Taking DevOps to the AWS Edge (CTD302)
 
AWS re:Invent 2016: 6 Million New Registrations in 30 Days: How the Chick-fil...
AWS re:Invent 2016: 6 Million New Registrations in 30 Days: How the Chick-fil...AWS re:Invent 2016: 6 Million New Registrations in 30 Days: How the Chick-fil...
AWS re:Invent 2016: 6 Million New Registrations in 30 Days: How the Chick-fil...
 
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
AWS re:Invent 2016: Leveraging Amazon Machine Learning, Amazon Redshift, and ...
 
AWS re:Invent 2016: Amazon CloudWatch Logs and AWS Lambda: A Match Made in He...
AWS re:Invent 2016: Amazon CloudWatch Logs and AWS Lambda: A Match Made in He...AWS re:Invent 2016: Amazon CloudWatch Logs and AWS Lambda: A Match Made in He...
AWS re:Invent 2016: Amazon CloudWatch Logs and AWS Lambda: A Match Made in He...
 
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
AWS re:Invent 2016: Getting Started with Serverless Architectures (CMP211)
 
AWS Partner Webcast - Use Your AWS CloudTrail Data and Splunk Software To Imp...
AWS Partner Webcast - Use Your AWS CloudTrail Data and Splunk Software To Imp...AWS Partner Webcast - Use Your AWS CloudTrail Data and Splunk Software To Imp...
AWS Partner Webcast - Use Your AWS CloudTrail Data and Splunk Software To Imp...
 
A Data Journey With AWS
A Data Journey With AWSA Data Journey With AWS
A Data Journey With AWS
 
What's New in Spark 2?
What's New in Spark 2?What's New in Spark 2?
What's New in Spark 2?
 
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...
AWS re:Invent 2016: [JK REPEAT] The Enterprise Fast Lane - What Your Competit...
 
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
AWS re:Invent 2016: Leverage the Power of the Crowd To Work with Amazon Mecha...
 
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
AWS re:Invent 2016: Chalice: A Serverless Microframework for Python (DEV308)
 
(ARC301) Scaling Up to Your First 10 Million Users
(ARC301) Scaling Up to Your First 10 Million Users(ARC301) Scaling Up to Your First 10 Million Users
(ARC301) Scaling Up to Your First 10 Million Users
 
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...
AWS re:Invent 2016: Turbocharge Your Microsoft .NET Developments with AWS (DE...
 
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
AWS re:Invent 2016: Building HPC Clusters as Code in the (Almost) Infinite Cl...
 
DevOps en Amazon: Un vistazo a nuestras herramientas y procesos
DevOps en Amazon: Un vistazo a nuestras herramientas y procesosDevOps en Amazon: Un vistazo a nuestras herramientas y procesos
DevOps en Amazon: Un vistazo a nuestras herramientas y procesos
 
Next Generation Open Data Platforms | AWS Public Sector Summit 2016
Next Generation Open Data Platforms | AWS Public Sector Summit 2016Next Generation Open Data Platforms | AWS Public Sector Summit 2016
Next Generation Open Data Platforms | AWS Public Sector Summit 2016
 
AWS Mobile Hub
AWS Mobile HubAWS Mobile Hub
AWS Mobile Hub
 

Ähnlich wie AWS re:Invent 2016: Learn How FINRA Aligns Billions of Time Ordered Events with Spark on EC2 (CMP316)

AWS Cloud Kata | Bangkok - Getting to Profitability
AWS Cloud Kata | Bangkok - Getting to ProfitabilityAWS Cloud Kata | Bangkok - Getting to Profitability
AWS Cloud Kata | Bangkok - Getting to ProfitabilityAmazon Web Services
 
AWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS EconomicsAWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS EconomicsAaron Klein
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in SparkSnappyData
 
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAmazon Web Services
 
How to Reduce your Spend on AWS
How to Reduce your Spend on AWSHow to Reduce your Spend on AWS
How to Reduce your Spend on AWSJoseph K. Ziegler
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom
 
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Amazon Web Services
 
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP ExpoOptimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP ExpoAmazon Web Services
 
AWS Cloud Kata | Manila - Getting to Profitability on AWS
AWS Cloud Kata | Manila - Getting to Profitability on AWSAWS Cloud Kata | Manila - Getting to Profitability on AWS
AWS Cloud Kata | Manila - Getting to Profitability on AWSAmazon Web Services
 
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...Amazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceAmazon Web Services
 
Getting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #WinningGetting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #WinningAmazon Web Services
 
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Amazon Web Services
 
Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency Amazon Web Services
 
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...Amazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
AWS Summit Tel Aviv - Enterprise Track - Cost Optimization & TCO
AWS Summit Tel Aviv - Enterprise Track - Cost Optimization & TCOAWS Summit Tel Aviv - Enterprise Track - Cost Optimization & TCO
AWS Summit Tel Aviv - Enterprise Track - Cost Optimization & TCOAmazon Web Services
 
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWSAWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWSAmazon Web Services
 

Ähnlich wie AWS re:Invent 2016: Learn How FINRA Aligns Billions of Time Ordered Events with Spark on EC2 (CMP316) (20)

AWS Cloud Kata | Bangkok - Getting to Profitability
AWS Cloud Kata | Bangkok - Getting to ProfitabilityAWS Cloud Kata | Bangkok - Getting to Profitability
AWS Cloud Kata | Bangkok - Getting to Profitability
 
AWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS EconomicsAWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS Economics
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
 
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
 
How to Reduce your Spend on AWS
How to Reduce your Spend on AWSHow to Reduce your Spend on AWS
How to Reduce your Spend on AWS
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
 
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP ExpoOptimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
 
AWS Cloud Kata | Manila - Getting to Profitability on AWS
AWS Cloud Kata | Manila - Getting to Profitability on AWSAWS Cloud Kata | Manila - Getting to Profitability on AWS
AWS Cloud Kata | Manila - Getting to Profitability on AWS
 
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...
 
Amazon EC2
Amazon EC2Amazon EC2
Amazon EC2
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Getting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #WinningGetting the most Bang for your Buck with #EC2 #Winning
Getting the most Bang for your Buck with #EC2 #Winning
 
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
Get the Most Out of Amazon EC2: A Deep Dive on Reserved, On-Demand, and Spot ...
 
Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency Running Lean Architectures: How to Optimize for Cost Efficiency
Running Lean Architectures: How to Optimize for Cost Efficiency
 
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
AWS Summit Tel Aviv - Enterprise Track - Cost Optimization & TCO
AWS Summit Tel Aviv - Enterprise Track - Cost Optimization & TCOAWS Summit Tel Aviv - Enterprise Track - Cost Optimization & TCO
AWS Summit Tel Aviv - Enterprise Track - Cost Optimization & TCO
 
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWSAWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

AWS re:Invent 2016: Learn How FINRA Aligns Billions of Time Ordered Events with Spark on EC2 (CMP316)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bob Griffiths, Solutions Architect, AWS Brett Shriver, Senior Director, FINRA Ricardo Portilla, Lead Architect, FINRA December 2016 CMP316 Learn How FINRA Aligns Billions of Time-Ordered Events with Apache Spark on Amazon EC2
  • 2. What to Expect from the Session • Overview of Apache Spark on AWS • FINRA will describe their use of Spark for aligning time ordered events (business problem, solution, benefits, lessons learned)
  • 3. AWS – Spark Overview 3
  • 4. What is Spark? • A fast and general engine for large-scale data processing • Write applications quickly in Java, Scala, Python, R • Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk To learn more, check out http://spark.apache.org 4
  • 5. Spark on AWS Compute • Amazon EC2 • Standalone cluster mode • Apache Mesos • Amazon EMR • YARN 5
  • 6. AWS Storage options Amazon EC2 ephemeral storage (HDFS) Amazon EBS Amazon S3 Amazon DynamoDB 6
  • 7. Additional Apache Spark Use Cases Yelp’s advertising targeting team makes prediction models to determine the likelihood of a user interacting with an advertisement. By using Apache Spark on Amazon EMR to process large amounts of data to train machine learning models, Yelp increased revenue and advertising click-through rate. As part of its Data Management Platform for customer insights, Krux runs many machine learning and general processing workloads using Apache Spark. Krux utilizes ephemeral Amazon EMR clusters with Amazon EC2 Spot Capacity to save costs, and uses Amazon S3 with EMRFS as a data layer for Apache Spark. CrowdStrike provides endpoint protection to stop breaches. They use Apache Spark on Amazon EMR to process hundreds of terabytes of event data and roll it up into higher- level behavioral descriptions on the hosts. From that data, CrowdStrike can pull event data together and identify the presence of malicious activity. 7
  • 8. Spark on AWS Best Practices 8
  • 9. Separate Storage & Compute • Optimize cluster size based on compute requirements • Allows selection of optimal EC2 instance types • Shut down your cluster when not in use • Share data among multiple clusters • Fault tolerance and disaster recovery 9
  • 10. Spot instances • Bid on spare Amazon EC2 computing capacity • Reduce operating costs by up to 50-90% • Ideal for ephemeral compute clusters 10
  • 11. FINRA: Spark Case Study 11
  • 12. 12
  • 13. What Do We Do? By the Numbers • We oversee more than 3,900 securities firms with approximately 640,795 brokers. • Every day, we watch over nearly 6 billion shares traded in U.S. listed equities markets—using technology powerful enough to detect potential abuses. • In fact, FINRA processes approximately 6 terabytes of data and up to 75 billion events every day to build a complete, holistic picture of market trading in the U.S. • In 2015, FINRA: • Referred over 800 fraud cases for prosecution • Levied more than $191 million in fines & restitution 13
  • 14. Problem Statement What data is FINRA tracking, and what do we do with it? - Up to 75 billion events/day - 12 markets/exchanges - Surveillance – Compare the state of activity across different 'venues', such as quotation systems, exchanges, firms, or trade reporting facilities • Involves grouping events by stock symbol and venue then ordering by event time - Alerts for ‘venues of interest’ where suspicious activity occurs 14
  • 15. Problem Statement (cont’d) What qualifies as suspicious activity? Actions taken by traders/brokers which disadvantage customer orders or undermine fairness/efficiency of markets. Individual investors Broker/Firm Buy customer order placed – 100,000 shares Broker buys (100 shares) Buy customer order executes, share price increases Broker sells high Time Front-running Layering Broker places multiple sell orders, slowing incrementing price Customer drives price lower Broker buys at lower price 15
  • 16. Example 1 – Intermarket Price Protection  Intermarket Price Protection - Restrict broker trading outside the best publicly available bid/offer to encourage best execution of customer orders Best price = highest bid/lowest ask, broker is trading outside this band! 85K quotes/sec < 1K trades/sec 16
  • 17. Example 2 – Exchange side of the picture Exchange #2 Time Ticker Price 13:00:00 STOCK XYZ $57.42 13:00:00.1 STOCK XYZ $57.41 13:00:00.156 STOCK XYZ $57 13:00:00.243 STOCK XYZ $57.10 13:00:00.244 STOCK XYZ $57.11 13:00:00.260 STOCK XYZ $57.15 13:00:00.287 STOCK XYZ $57.92 13:00:00.29 STOCK XYZ $57.90 13:00:00.293 STOCK XYZ $57.40 13:00:00.297 STOCK XYZ $57.41 Intra-day Exchange Feeds Best price per exchange – ‘top of order book’ – BUY side 90K order events/second 4 billion order events/day across 11 exchanges Time Ticker Excha nge Price Size 13:00:0 0 STOCK XYZ EX 1 $58 1000 13:00:0 0.1 STOCK XYZ EX 2 $57.41 1200 13:00:0 0.156 STOCK XYZ EX 1 $57.50 800 13:00:0 0.243 STOCK XYZ EX 3 $57.13 700 13:00:0 0.244 STOCK XYZ EX 4 $57.18 1200 13:00:0 0.260 STOCK XYZ EX 2 $57.15 900 Universal best size/price Limit Order Display - Obligation of firm to publish the full price/volume of its received customer limit orders in an exchange where the firm is registered as a market-maker Accumulate quantity at the best price Best price at Exchange 2 at this point in time Best universal price (Exchange 1) 17 • Exchange 1 • Exchange 2 • Exchange 3 • Exchange 4 • etc
  • 18. Example 2 – Customer order side of the picture Time Firm Ticker Excha nge Price Size 13:00: 00 FIRM ABC STOCK XYZ EX 1 $58 1000 13:00: 00.1 FIRM ABC STOCK XYZ EX 2 $57.41 1200 13:00: 00.15 6 FIRM ABC STOCK XYZ EX 1 $57.50 800 13:00: 00.24 3 FIRM ABC STOCK XYZ EX 3 $57.13 700 13:00: 00.24 4 FIRM ABC STOCK XYZ EX 4 $57.18 1200 13:00: 00.26 0 FIRM ABC STOCK XYZ EX 2 $57.15 900 Published best size/price New order 4-5 billion customer orders/day 50 million displayable orders/day Time Firm Size Pr 13:00 FIRM ABC 1000 $60 13:05 FIRM ABC 0 $60 13:06 FIRM ABC 1000 $60 BUY Limit Order Size/Price At this point, the exchange price is worse than the customer’s order, so in violation! Route New order Cancel 18
  • 19. Legacy Solution Storage Compute Proprietary processors 380 cores, 380 TB disk > 300 SQL-based surveillances Data off-loading to make way for current data/processing Data loads run day/night Production DB Architecture Storage Compute DR – Disaster Recovery 19
  • 20. Pain Points Pain points with the legacy solution – no longer present in AWS/Spark architecture - 7 figures yearly to maintain and operate - Difficult to scale during periods of market volatility - Storage and compute tightly coupled - Won’t support industry movement toward real time - Reprocessing is difficult (takes months) - Unavailable during maintenance windows - No control of managing internal compute model 21
  • 21. Alternatives Key requirements of new solution: - Scalability / elasticity - Cost effectiveness - Ease of coding and testing - Platform for future real-time processing - Support for time-based iteration What options were considered? - Apache Spark on Amazon EMR - Java MapReduce - Apache Giraph - Apache Crunch
  • 22. Spark Processing Approach Time T T T T Venue = Quoting Facility Q = Externally quoted/displayed data Q Q Q Q Q Q Q Q • What makes this comparison difficult? • Have to join on ticker and firm -> large partitions • Instead of joining -> union, sort, and iterate Turns an M x N problem into an M + N problem. Venue = Trade Reporting Facility T = Surveillance events (e.g. trades) Q 20
  • 23. AWS Architecture Amazon S3 Storage Input Bucket Output Bucket Spark on EMR Compute Spot m1.xlarge (I/O intensive) Spot m2.4xlarge (shuffle intensive) Spot m3.2xlarge (compute intensive) 23
  • 24. Choosing Spot Instances This architecture gets us:  Flexibility / Elasticity  I/O intensive instances for portions of code (m1.xlarge)  Shuffle-heavy (m2.4x)  Time-based computations (compute optimized -> m3.2x)  Cost Savings  Targeting AWS Spot Instance types to match these profiles saved us 3X the cost. We also pinpointed instances which were not too volatile. 24
  • 25. Benefits Realized Benefits • Order of magnitude cost savings vs. on-premises database architecture • Increased speed on reprocessing requests • Scalability, etc. Expected Future Benefits • Supports real-time processing (as data providers migrate to real time) • Easier experimentation with new instance types • Convert more surveillance to this model 25
  • 26. FINRA’s Future Plans • Capture additional benefits of Spark, such as use of data frames API for easier manipulations and optimization • Migrate additional workloads onto Spark – currently 240 surveillance applications are implemented in Hive but would benefit from Spark • Some experiments at FINRA show 2X speed with Spark vs Hive for same design • Use new AWS APIs • Move toward real time (coordinating with data providers) 26
  • 28. Remember to complete your evaluations! 28
  • 29. Related Sessions FINRA Sessions: - BDM203 – Building a Secure Data Science Platform - DAT302 – Best Practices for Migrating to RDS / Aurora - ENT313 – FINRA in the Cloud, Big Data Enterprise - STG308 – Data Lake for Big Data on Amazon S3 - SVR202 – What’s new with Lambda 29