SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
Getting Started with Real-Time
Analytics
Shawn Gandhi, Solutions Architect
{
"payerId": "Joe",
"productCode": "AmazonS3",
"clientProductCode": "AmazonS3",
"usageType": "Bandwidth",
"operation": "PUT",
"value": "22490",
"timestamp": "1216674828"
}
Metering Record
127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700]
"GET /apache_pb.gif HTTP/1.0" 200 2326
Common Log Entry
<165>1 2003-10-11T22:14:15.003Z
mymachine.example.com evntslog - ID47
[exampleSDID@32473 iut="3"
eventSource="Application"
eventID="1011"][examplePriority@32473
class="high"]
Syslog Entry
“SeattlePublicWater/Kinesis/123/Realtime”
– 412309129140
MQTT Record <R,AMZN ,T,G,R1>
NASDAQ OMX Record
Generated data
Available for analysis
Data volume
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Abraham Wald (1902-1950)
Big data: best served fresh
Big data
• Hourly server logs:
were your systems misbehaving one
hour ago
• Weekly / Monthly Bill:
what you spent this billing cycle
• Daily customer-preferences report from your
website’s click stream:
what deal or ad to try next time
• Daily fraud reports:
was there fraud yesterday
Real-time big data
• Amazon CloudWatch metrics:
what went wrong now
• Real-time spending alerts/caps:
prevent overspending now
• Real-time analysis:
what to offer the current customer now
• Real-time detection:
block fraudulent use now
Talk outline
• A tour of Amazon Kinesis concepts in the context of
a Twitter Trends service
• Implementing the Twitter Trends service
• Amazon Kinesis in the broader context of a big data
ecosystem
Sending and reading data from Amazon Kinesis
streams
HTTP Post
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Kinesis Client
Library
+
Connector Library
Apache
Storm
Amazon Elastic
MapReduce
Sending Reading
Demo
http://us-iot.aws-proserv.de
A Tour of Amazon Kinesis Concepts in the
Context of a Twitter Trends Service
twitter-trends.com
Elastic Beanstalk
twitter-trends.com
twitter-trends.com website
twitter-trends.com
Too big to handle on one box
twitter-trends.com
The solution: streaming map/reduce
My top-10
My top-10
My top-10
Global top-10
twitter-trends.com
Core concepts
My top-10
My top-10
My top-10
Global top-10
Data record
Stream
Partition key
Shard
Worker
Shard: 14 17 18 21 23
Data record
Sequence number
twitter-trends.com
How this relates to Amazon Kinesis
Kinesis
Kinesis application
Core concepts recapped
• Data record ~ tweet
• Stream ~ all tweets (the Twitter Firehose)
• Partition key ~ Twitter topic (every tweet record belongs to exactly one of
these)
• Shard ~ all the data records belonging to a set of Twitter topics that will get
grouped together
• Sequence number ~ each data record gets one assigned when first
ingested
• Worker ~ processes the records of a shard in sequence number order
twitter-trends.com
Using the Amazon Kinesis API directly
K
I
N
E
S
I
S
iterator = getShardIterator(shardId, LATEST);
while (true) {
[records, iterator] =
getNextRecords(iterator, maxRecsToReturn);
process(records);
}
process(records): {
for (record in records) {
updateLocalTop10(record);
}
if (timeToDoOutput()) {
writeLocalTop10ToDDB();
}
}
while (true) {
localTop10Lists =
scanDDBTable();
updateGlobalTop10List(
localTop10Lists);
sleep(10);
}
K
I
N
E
S
I
S
twitter-trends.com
Challenges with using the Amazon Kinesis API
directly
Kinesis
application
Manual creation of workers and
assignment to shards
How many workers
per EC2 instance?
How many EC2 instances?
K
I
N
E
S
I
S
twitter-trends.com
Using the Amazon Kinesis Client Library
Kinesis
application
Shard mgmt
table
K
I
N
E
S
I
S
twitter-trends.com
Elastic scale and load balancing
Shard mgmt
table
Auto scaling
Group
Amazon
CloudWatch
K
I
N
E
S
I
S
twitter-trends.com
Shard management
Shard mgmt
table
Auto scaling
Group
Amazon
CloudWatch
Amazon
SNS
AWS
Console
K
I
N
E
S
I
S
twitter-trends.com
The challenges of fault tolerance
X
X
X Shard mgmt
table
K
I
N
E
S
I
S
twitter-trends.com
Fault tolerance support in KCL
Shard mgmt
table
X
Availability Zone
1
Availability Zone
3
Checkpoint, replay design pattern
Kinesis
1417182123
Shard-i
235810
Shard
ID
Lock Seq
num
Shard-i
Host A
Host B
Shard
ID
Local top-10
Shard-i
0
10
18
X2
3
5
8
10
14
17
18
21
23
0
310
Host AHost B
{#Obama: 10235, #Seahawks: 9835, …}{#Obama: 10235, #Congress: 9910, …}
1023
14
17
18
21
23
Catching up from delayed processing
Amazon Kinesis
1417182123
Shard-i
235810
Host A
Host B
X2
3
5
2
3
5
8
10
14
17
18
21
23
How long until
we’ve caught up?
Shard throughput SLA:
- 1 MBps in
- 2 MBps out
=>
Catch up time ~ down time
Unless your application
can’t process 2 MBps on a
single host
=>
Provision more shards
Concurrent processing applications
Amazon Kinesis
1417182123
Shard-i
235810
twitter-trends.com
spot-the-revolution.org
018
2
3
5
8
10
14
17
18
21
23
310
023
14
17
18
21
23
23
2
3
5
8
10
10
Why not combine applications?
Amazon Kinesis
twitter-trends.com
twitter-stats.com
Output state and
checkpoint every 10
seconds
Output state and
checkpoint every hour
Implementing twitter-trends.com
Creating an Amazon Kinesis stream
How many shards?
Additional considerations:
- How much processing is needed
per data record/data byte?
- How quickly do you need to
catch up if one or more of your
workers stalls or fails?
You can always dynamically reshard
to adjust to changing workloads
Twitter Trends shard processing code
Class TwitterTrendsShardProcessor implements IRecordProcessor {
public TwitterTrendsShardProcessor() { … }
@Override
public void initialize(String shardId) { … }
@Override
public void processRecords(List<Record> records,
IRecordProcessorCheckpointer checkpointer) { … }
@Override
public void shutdown(IRecordProcessorCheckpointer checkpointer,
ShutdownReason reason) { … }
}
Twitter Trends shard processing code
Class TwitterTrendsShardProcessor implements IRecordProcessor {
private Map<String, AtomicInteger> hashCount = new HashMap<>();
private long tweetsProcessed = 0;
@Override
public void processRecords(List<Record> records,
IRecordProcessorCheckpointer checkpointer) {
computeLocalTop10(records);
if ((tweetsProcessed++) >= 2000) {
emitToDynamoDB(checkpointer);
}
}
}
Twitter Trends shard processing code
private void computeLocalTop10(List<Record> records) {
for (Record r : records) {
String tweet = new String(r.getData().array());
String[] words = tweet.split(" t");
for (String word : words) {
if (word.startsWith("#")) {
if (!hashCount.containsKey(word)) hashCount.put(word, new AtomicInteger(0));
hashCount.get(word).incrementAndGet();
}
}
}
private void emitToDynamoDB(IRecordProcessorCheckpointer checkpointer) {
persistMapToDynamoDB(hashCount);
try {
checkpointer.checkpoint();
} catch (IOException | KinesisClientLibDependencyException | InvalidStateException
| ThrottlingException | ShutdownException e) {
// Error handling
}
hashCount = new HashMap<>();
tweetsProcessed = 0;
}
Amazon Kinesis as a gateway into the big
data ecosystem
Big data has a lifecycle
twitter-trends.com
Twitter trends
processing application
Twitter trends
statistics
Twitter trends
anonymized archive
Connector framework
public interface IKinesisConnectorPipeline<T> {
IEmitter<T> getEmitter(KinesisConnectorConfiguration configuration);
IBuffer<T> getBuffer(KinesisConnectorConfiguration configuration);
ITransformer<T> getTransformer(
KinesisConnectorConfiguration configuration);
IFilter<T> getFilter(KinesisConnectorConfiguration configuration);
}
Transformer and Filter interfaces
public interface ITransformer<T> {
public T toClass(Record record) throws IOException;
public byte[] fromClass(T record) throws IOException;
}
public interface IFilter<T> {
public boolean keepRecord(T record);
}
Tweet transformer for S3 archival
public class TweetTransformer implements ITransformer<TwitterRecord> {
private final JsonFactory fact =
JsonActivityFeedProcessor.getObjectMapper().getJsonFactory();
@Override
public TwitterRecord toClass(Record record) {
try {
return fact.createJsonParser(
record.getData().array()).readValueAs(TwitterRecord.class);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
Example Amazon Redshift connector
K
I
N
E
S
I
S
Amazon
Redshift
Amazon
S3
Example Amazon Redshift connector v2
K
I
N
E
S
I
S
Amazon
Redshift
Amazon
S3
K
I
N
E
S
I
S
Your feedback is important to AWS
Please complete the session evaluation. Tell us what you think!
CHICAGO

Weitere ähnliche Inhalte

Was ist angesagt?

Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"
Fwdays
 
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas EricssonOSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
NETWAYS
 

Was ist angesagt? (20)

Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and StreamingUsing Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
 
AWS Java SDK @ scale
AWS Java SDK @ scaleAWS Java SDK @ scale
AWS Java SDK @ scale
 
Triggers in MongoDB
Triggers in MongoDBTriggers in MongoDB
Triggers in MongoDB
 
Forgive me for i have allocated
Forgive me for i have allocatedForgive me for i have allocated
Forgive me for i have allocated
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
 
Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with FlinkSanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
 
Reactive programming on Android
Reactive programming on AndroidReactive programming on Android
Reactive programming on Android
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Apache Spark in your likeness - low and high level customization
Apache Spark in your likeness - low and high level customizationApache Spark in your likeness - low and high level customization
Apache Spark in your likeness - low and high level customization
 
OpenStack Log Mining
OpenStack Log MiningOpenStack Log Mining
OpenStack Log Mining
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
 
Sharded cluster tutorial
Sharded cluster tutorialSharded cluster tutorial
Sharded cluster tutorial
 
Introduction to rx java for android
Introduction to rx java for androidIntroduction to rx java for android
Introduction to rx java for android
 
AJUG April 2011 Cascading example
AJUG April 2011 Cascading exampleAJUG April 2011 Cascading example
AJUG April 2011 Cascading example
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
 
Using MongoDB with Kafka - Use Cases and Best Practices
Using MongoDB with Kafka -  Use Cases and Best PracticesUsing MongoDB with Kafka -  Use Cases and Best Practices
Using MongoDB with Kafka - Use Cases and Best Practices
 
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas EricssonOSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
 
WOTC_Import
WOTC_ImportWOTC_Import
WOTC_Import
 
ELK stack at weibo.com
ELK stack at weibo.comELK stack at weibo.com
ELK stack at weibo.com
 

Andere mochten auch

I Truth #2 -ILove
I Truth #2 -ILoveI Truth #2 -ILove
I Truth #2 -ILove
bdpayne
 
İ N S A N B E Y Nİ
İ N S A N  B E Y Nİİ N S A N  B E Y Nİ
İ N S A N B E Y Nİ
kirbiyik
 
Hortinternet
HortinternetHortinternet
Hortinternet
marblocs
 
Marsh uditore
Marsh uditoreMarsh uditore
Marsh uditore
jexxon
 
xreferplus-dereksturdy
xreferplus-dereksturdyxreferplus-dereksturdy
xreferplus-dereksturdy
guestfbf1e1
 
Pink Ribbon Girls Newsletter
Pink Ribbon Girls NewsletterPink Ribbon Girls Newsletter
Pink Ribbon Girls Newsletter
cmcmahon
 
Callidus Software Product Installation And Performance Tuning
Callidus Software Product Installation And Performance TuningCallidus Software Product Installation And Performance Tuning
Callidus Software Product Installation And Performance Tuning
Callidus Software
 
Appendoectomy
AppendoectomyAppendoectomy
Appendoectomy
medgadget
 
Milieu
MilieuMilieu
Milieu
tekke
 
Miller's Moments
Miller's MomentsMiller's Moments
Miller's Moments
kmiller210
 

Andere mochten auch (20)

Hollywood vs Silicon Valley: Open Video als Vermittler
Hollywood vs Silicon Valley: Open Video als VermittlerHollywood vs Silicon Valley: Open Video als Vermittler
Hollywood vs Silicon Valley: Open Video als Vermittler
 
I Truth #2 -ILove
I Truth #2 -ILoveI Truth #2 -ILove
I Truth #2 -ILove
 
İ N S A N B E Y Nİ
İ N S A N  B E Y Nİİ N S A N  B E Y Nİ
İ N S A N B E Y Nİ
 
Hortinternet
HortinternetHortinternet
Hortinternet
 
Insectes3
Insectes3Insectes3
Insectes3
 
Marsh uditore
Marsh uditoreMarsh uditore
Marsh uditore
 
xreferplus-dereksturdy
xreferplus-dereksturdyxreferplus-dereksturdy
xreferplus-dereksturdy
 
Pink Ribbon Girls Newsletter
Pink Ribbon Girls NewsletterPink Ribbon Girls Newsletter
Pink Ribbon Girls Newsletter
 
Callidus Software Product Installation And Performance Tuning
Callidus Software Product Installation And Performance TuningCallidus Software Product Installation And Performance Tuning
Callidus Software Product Installation And Performance Tuning
 
Video Spielfelder
Video SpielfelderVideo Spielfelder
Video Spielfelder
 
Appendoectomy
AppendoectomyAppendoectomy
Appendoectomy
 
Video und Web 2.0
Video und Web 2.0Video und Web 2.0
Video und Web 2.0
 
Milieu
MilieuMilieu
Milieu
 
Internet
InternetInternet
Internet
 
Maggiolini etica it
Maggiolini etica itMaggiolini etica it
Maggiolini etica it
 
Tsukiji Fish Market
Tsukiji Fish MarketTsukiji Fish Market
Tsukiji Fish Market
 
Google Earth Business Uses
Google Earth Business UsesGoogle Earth Business Uses
Google Earth Business Uses
 
Social Innovation For Resource Management
Social Innovation For Resource ManagementSocial Innovation For Resource Management
Social Innovation For Resource Management
 
Miller's Moments
Miller's MomentsMiller's Moments
Miller's Moments
 
Finding Simple - Seat Map Design for Everyone UX Australia 2014
Finding Simple - Seat Map Design for Everyone  UX Australia 2014Finding Simple - Seat Map Design for Everyone  UX Australia 2014
Finding Simple - Seat Map Design for Everyone UX Australia 2014
 

Ähnlich wie Getting Started with Real-Time Analytics

Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Tom Diederich
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
Sages
 
Deep dumpster diving 2010
Deep dumpster diving 2010Deep dumpster diving 2010
Deep dumpster diving 2010
RonnBlack
 

Ähnlich wie Getting Started with Real-Time Analytics (20)

Getting Started with Real-Time Analytics
Getting Started with Real-Time AnalyticsGetting Started with Real-Time Analytics
Getting Started with Real-Time Analytics
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Time for Functions
Time for FunctionsTime for Functions
Time for Functions
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice Way
 
TSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkTSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech Talk
 
Tsar tech talk
Tsar tech talkTsar tech talk
Tsar tech talk
 
Web program-peformance-optimization
Web program-peformance-optimizationWeb program-peformance-optimization
Web program-peformance-optimization
 
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
Durable functions 2.0 (2019-10-10)
Durable functions 2.0 (2019-10-10)Durable functions 2.0 (2019-10-10)
Durable functions 2.0 (2019-10-10)
 
Spicy javascript: Create your first Chrome extension for web analytics QA
Spicy javascript: Create your first Chrome extension for web analytics QASpicy javascript: Create your first Chrome extension for web analytics QA
Spicy javascript: Create your first Chrome extension for web analytics QA
 
About Node.js
About Node.jsAbout Node.js
About Node.js
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
Where the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-OptimisationsWhere the wild things are - Benchmarking and Micro-Optimisations
Where the wild things are - Benchmarking and Micro-Optimisations
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Deep dumpster diving 2010
Deep dumpster diving 2010Deep dumpster diving 2010
Deep dumpster diving 2010
 
Tamir Dresher - What’s new in ASP.NET Core 6
Tamir Dresher - What’s new in ASP.NET Core 6Tamir Dresher - What’s new in ASP.NET Core 6
Tamir Dresher - What’s new in ASP.NET Core 6
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Getting Started with Real-Time Analytics

  • 1. ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Getting Started with Real-Time Analytics Shawn Gandhi, Solutions Architect
  • 2. { "payerId": "Joe", "productCode": "AmazonS3", "clientProductCode": "AmazonS3", "usageType": "Bandwidth", "operation": "PUT", "value": "22490", "timestamp": "1216674828" } Metering Record 127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 Common Log Entry <165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"][examplePriority@32473 class="high"] Syslog Entry “SeattlePublicWater/Kinesis/123/Realtime” – 412309129140 MQTT Record <R,AMZN ,T,G,R1> NASDAQ OMX Record
  • 3. Generated data Available for analysis Data volume Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
  • 5.
  • 6.
  • 7. Big data: best served fresh Big data • Hourly server logs: were your systems misbehaving one hour ago • Weekly / Monthly Bill: what you spent this billing cycle • Daily customer-preferences report from your website’s click stream: what deal or ad to try next time • Daily fraud reports: was there fraud yesterday Real-time big data • Amazon CloudWatch metrics: what went wrong now • Real-time spending alerts/caps: prevent overspending now • Real-time analysis: what to offer the current customer now • Real-time detection: block fraudulent use now
  • 8. Talk outline • A tour of Amazon Kinesis concepts in the context of a Twitter Trends service • Implementing the Twitter Trends service • Amazon Kinesis in the broader context of a big data ecosystem
  • 9. Sending and reading data from Amazon Kinesis streams HTTP Post AWS SDK LOG4J Flume Fluentd Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce Sending Reading
  • 11. A Tour of Amazon Kinesis Concepts in the Context of a Twitter Trends Service
  • 13. twitter-trends.com Too big to handle on one box
  • 14. twitter-trends.com The solution: streaming map/reduce My top-10 My top-10 My top-10 Global top-10
  • 15. twitter-trends.com Core concepts My top-10 My top-10 My top-10 Global top-10 Data record Stream Partition key Shard Worker Shard: 14 17 18 21 23 Data record Sequence number
  • 16. twitter-trends.com How this relates to Amazon Kinesis Kinesis Kinesis application
  • 17. Core concepts recapped • Data record ~ tweet • Stream ~ all tweets (the Twitter Firehose) • Partition key ~ Twitter topic (every tweet record belongs to exactly one of these) • Shard ~ all the data records belonging to a set of Twitter topics that will get grouped together • Sequence number ~ each data record gets one assigned when first ingested • Worker ~ processes the records of a shard in sequence number order
  • 18. twitter-trends.com Using the Amazon Kinesis API directly K I N E S I S iterator = getShardIterator(shardId, LATEST); while (true) { [records, iterator] = getNextRecords(iterator, maxRecsToReturn); process(records); } process(records): { for (record in records) { updateLocalTop10(record); } if (timeToDoOutput()) { writeLocalTop10ToDDB(); } } while (true) { localTop10Lists = scanDDBTable(); updateGlobalTop10List( localTop10Lists); sleep(10); }
  • 19. K I N E S I S twitter-trends.com Challenges with using the Amazon Kinesis API directly Kinesis application Manual creation of workers and assignment to shards How many workers per EC2 instance? How many EC2 instances?
  • 20. K I N E S I S twitter-trends.com Using the Amazon Kinesis Client Library Kinesis application Shard mgmt table
  • 21. K I N E S I S twitter-trends.com Elastic scale and load balancing Shard mgmt table Auto scaling Group Amazon CloudWatch
  • 22. K I N E S I S twitter-trends.com Shard management Shard mgmt table Auto scaling Group Amazon CloudWatch Amazon SNS AWS Console
  • 23. K I N E S I S twitter-trends.com The challenges of fault tolerance X X X Shard mgmt table
  • 24. K I N E S I S twitter-trends.com Fault tolerance support in KCL Shard mgmt table X Availability Zone 1 Availability Zone 3
  • 25. Checkpoint, replay design pattern Kinesis 1417182123 Shard-i 235810 Shard ID Lock Seq num Shard-i Host A Host B Shard ID Local top-10 Shard-i 0 10 18 X2 3 5 8 10 14 17 18 21 23 0 310 Host AHost B {#Obama: 10235, #Seahawks: 9835, …}{#Obama: 10235, #Congress: 9910, …} 1023 14 17 18 21 23
  • 26. Catching up from delayed processing Amazon Kinesis 1417182123 Shard-i 235810 Host A Host B X2 3 5 2 3 5 8 10 14 17 18 21 23 How long until we’ve caught up? Shard throughput SLA: - 1 MBps in - 2 MBps out => Catch up time ~ down time Unless your application can’t process 2 MBps on a single host => Provision more shards
  • 27. Concurrent processing applications Amazon Kinesis 1417182123 Shard-i 235810 twitter-trends.com spot-the-revolution.org 018 2 3 5 8 10 14 17 18 21 23 310 023 14 17 18 21 23 23 2 3 5 8 10 10
  • 28. Why not combine applications? Amazon Kinesis twitter-trends.com twitter-stats.com Output state and checkpoint every 10 seconds Output state and checkpoint every hour
  • 30. Creating an Amazon Kinesis stream
  • 31. How many shards? Additional considerations: - How much processing is needed per data record/data byte? - How quickly do you need to catch up if one or more of your workers stalls or fails? You can always dynamically reshard to adjust to changing workloads
  • 32. Twitter Trends shard processing code Class TwitterTrendsShardProcessor implements IRecordProcessor { public TwitterTrendsShardProcessor() { … } @Override public void initialize(String shardId) { … } @Override public void processRecords(List<Record> records, IRecordProcessorCheckpointer checkpointer) { … } @Override public void shutdown(IRecordProcessorCheckpointer checkpointer, ShutdownReason reason) { … } }
  • 33. Twitter Trends shard processing code Class TwitterTrendsShardProcessor implements IRecordProcessor { private Map<String, AtomicInteger> hashCount = new HashMap<>(); private long tweetsProcessed = 0; @Override public void processRecords(List<Record> records, IRecordProcessorCheckpointer checkpointer) { computeLocalTop10(records); if ((tweetsProcessed++) >= 2000) { emitToDynamoDB(checkpointer); } } }
  • 34. Twitter Trends shard processing code private void computeLocalTop10(List<Record> records) { for (Record r : records) { String tweet = new String(r.getData().array()); String[] words = tweet.split(" t"); for (String word : words) { if (word.startsWith("#")) { if (!hashCount.containsKey(word)) hashCount.put(word, new AtomicInteger(0)); hashCount.get(word).incrementAndGet(); } } } private void emitToDynamoDB(IRecordProcessorCheckpointer checkpointer) { persistMapToDynamoDB(hashCount); try { checkpointer.checkpoint(); } catch (IOException | KinesisClientLibDependencyException | InvalidStateException | ThrottlingException | ShutdownException e) { // Error handling } hashCount = new HashMap<>(); tweetsProcessed = 0; }
  • 35. Amazon Kinesis as a gateway into the big data ecosystem
  • 36. Big data has a lifecycle twitter-trends.com Twitter trends processing application Twitter trends statistics Twitter trends anonymized archive
  • 37. Connector framework public interface IKinesisConnectorPipeline<T> { IEmitter<T> getEmitter(KinesisConnectorConfiguration configuration); IBuffer<T> getBuffer(KinesisConnectorConfiguration configuration); ITransformer<T> getTransformer( KinesisConnectorConfiguration configuration); IFilter<T> getFilter(KinesisConnectorConfiguration configuration); }
  • 38. Transformer and Filter interfaces public interface ITransformer<T> { public T toClass(Record record) throws IOException; public byte[] fromClass(T record) throws IOException; } public interface IFilter<T> { public boolean keepRecord(T record); }
  • 39. Tweet transformer for S3 archival public class TweetTransformer implements ITransformer<TwitterRecord> { private final JsonFactory fact = JsonActivityFeedProcessor.getObjectMapper().getJsonFactory(); @Override public TwitterRecord toClass(Record record) { try { return fact.createJsonParser( record.getData().array()).readValueAs(TwitterRecord.class); } catch (IOException e) { throw new RuntimeException(e); } }
  • 40. Example Amazon Redshift connector K I N E S I S Amazon Redshift Amazon S3
  • 41. Example Amazon Redshift connector v2 K I N E S I S Amazon Redshift Amazon S3 K I N E S I S
  • 42. Your feedback is important to AWS Please complete the session evaluation. Tell us what you think!