SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Logging at Scale
Alex Smith - @alexjs
Solutions Architect
April 2016
Logging is difficult
I thought I knew this
No Users
5.2m users
(~80k rps)
But.
It is really difficult
Problems
• Storage (Temporary)
• Capture
• Storage (Permanent)
• Visualisation
Stealing Content…
‘Your First 10m Users’
ARC301 – re:Invent 2015
http://bitly.com/2015arc301
- Joel Williams
AWS Solutions Architect
>1 User
• Amazon Route 53 for DNS
• A single Elastic IP
• A single Amazon EC2 instance
• With full stack on this host
• Web app
• Database
• Management
• And so on…
Amazon
EC2
instance
Elastic IP
User
Amazon
Route 53
ARC301
>1 User
• A single place to read logs
Amazon
EC2
instance
Elastic IP
User
Amazon
Route 53
ARC301
>1 User
• A single place to read logs from
Amazon
EC2
instance
Elastic IP
User
Amazon
Route 53
ARC301
@alexjs hacks – top URLs
# awk -F" '{print $2'} access_log 
| awk '{print $2}' 
| sort | uniq -c | sort –rn
11208 /
3287 /2016/04/23/welcome
@alexjs hacks – HTTP response codes
# awk '{print $9}' access_log 
| sort | uniq -c | sort –rn
19307 200
1239 404
120 503
1 416
@alexjs hacks - top User-Agents
# awk -F" '{print $6'} access_log | sort | uniq -c | sort -rn
3774 Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0;
Trident/6.0; IEMobile/10.0; ARM; Touch; Microsoft; Lumia 640 XL)
2949 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/46.0.2490.86 Safari/537.36
2928 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36
2900 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/39.5.2171.95 Safari/537.36
@alexjs hacks – requests per second (realtime)
# tail -F access_log 
perl -e 'while (<>) {$l++;if (time > $e) {$e=time;print
"$ln";$l=0}}’
1
1
68
99
912
424
http://bitly.com/bashlps
Users >1000
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
Web
Instance
RDS DB Instance
Standby (Multi-AZ)
ELB
Balancer
User
Amazon
Route 53
Real Life
Users >1 million+
RDS DB Instance
Active (Multi-AZ)
Availability Zone
ELB
Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Amazon
Route 53
User
Amazon S3
Amazon
CloudFront
DynamoDB
Amazon SQS
ElastiCache
Worker
Instance
Worker
Instance
Amazon
CloudWatch
Internal App
Instance
Internal App
Instance Amazon SES
Lambda
ARC301
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Amazon
EC2
instance
Elastic IP
User
Amazon
Route 53
ARC301
Users >1 million+
RDS DB Instance
Active (Multi-AZ)
Availability Zone
ELB
Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Amazon
Route 53
User
Amazon S3
Amazon
CloudFront
DynamoDB
Amazon SQS
ElastiCache
Worker
Instance
Worker
Instance
Amazon
CloudWatch
Internal App
Instance
Internal App
Instance Amazon SES
Lambda
ARC301
Web
Instance
Web
Instance
Web
Instance
Users >1 million+
RDS DB Instance
Active (Multi-AZ)
Availability Zone
ELB
Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Amazon
Route 53
Amazon S3
Amazon
CloudFront
DynamoDB
Amazon SQS
ElastiCache
Worker
Instance
Worker
Instance
Amazon
CloudWatch
Internal App
Instance
Internal App
Instance Amazon SES
Lambda
ARC301
Web
Instance
Web
Instance
Web
Instance
Users >1 million+
RDS DB Instance
Active (Multi-AZ)
Availability Zone
ELB
Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Amazon
Route 53
User
Amazon S3
Amazon
CloudFront
DynamoDB
Amazon SQS
ElastiCache
Worker
Instance
Worker
Instance
Amazon
CloudWatch
Internal App
Instance
Internal App
Instance Amazon SES
Lambda
ARC301
Web
Instance
Web
Instance
Web
Instance
Problems
• Storage (Temporary)
• Capture
• Storage (Permanent)
• Visualisation
Problems
• Storage (Temporary)
• Capture
• Storage (Permanent)
• Visualisation
Problems
• Storage (Temporary)
• Capture
• Storage (Permanent)
• Visualisation
When the logs are written (AWS)
• Local memory
• Ephemeral Volumes
• EBS Volumes
• gp2
• st1/sc1
Problems
• Storage (Temporary)
• Capture
• Storage (Permanent)
• Visualisation
• Insight
Problems
• Storage (Temporary)
• Capture
• Storage (Permanent)
• Visualisation
• Insight
Three Problems of Persistence
• Somewhere to stage
• Somewhere to live
• Somewhere to search
To NoSQL, or not to NoSQL?
- Joel
Some folks won’t like this,
but…
Start with SQL databases
(even MPP SQL)
Why start with SQL?
• Established and well-worn technology.
• Lots of existing code, communities, books, and tools.
• You aren’t going to break SQL DBs in your first 10 million
users. No, really, you won’t.*
• Clear patterns to scalability (especially in analytics)
*Unless you are doing something SUPER peculiar with the data or you have MASSIVE amounts of it.
…but even then SQL will have a place in your stack.
Ah ha! You said
massive!
- Joel (again)
Why might you need NoSQL?
• Super low-latency applications
• Metadata-driven datasets
• Highly nonrelational data
• Need schema-less data constructs*
• Massive amounts of data (again, in the TB range)
• Rapid ingest of data (thousands of records/sec)
*Need!= “It’s easier to do dev without schemas”
Why might you need NoSQL?
• Super low-latency applications
• Metadata-driven datasets
• Highly nonrelational data
• Need schema-less data constructs*
• Massive amounts of data (again, in the TB range)
• Rapid ingest of data (thousands of records/sec)
*Need!= “It’s easier to do dev without schemas”
Why might you need NoSQL?
• Super low-latency applications
• Metadata-driven datasets
• Highly nonrelational data
• Need schema-less data constructs*
• Massive amounts of data (again, in the TB range)
• Rapid ingest of data (thousands of records/sec)
*Need!= “It’s easier to do dev without schemas”
Three Problems of Persistence
• Somewhere to stage
• Somewhere to live
• Somewhere to search
Log Dispatcher Architecture Revisited
App Server App Server App Server App Server
Kinesis
Firehose
Log Index
ElasticSearch
Log Index
ElasticSearch
Visualisation
Amazon
S3
JSON
Amazon S3
• Simple Storage Service
• Canonical logging target for ELB, CloudFront, etc.
• Virtually unlimited amounts of storage
• Support for Lambda operations
• Very fast – ideal for feeding other services (Redshift,
EMR/Hadoop)
• Data can be automatically pushed here from Amazon
Firehose
Amazon
S3
Three Problems of Persistence
• Somewhere to stage
• Somewhere to live
• Long tail
• Somewhere to search
Redshift
• PostgreSQL based MPP
database
• Petabyte scale data
warehousing
• Choice of nodes
• Dense compute
• Dense storage
• Already compatible with
your existing BI tools
dense
compute node
dense
storage node
Amazon
Redshift
Up to 128 nodes at 2PB
~256PB/cluster
Three Problems of Persistence
• Somewhere to stage
• Somewhere to live
• Somewhere to search
(streaming data)
Amazon ElasticSearch Service
• ElasticSearch
• Popular/Open Source
• Commonly used for log
and clickstream
• Managed Solution
• We prepackage Kibana
• Integrated with IAM,
Firehose, etc
Amazon
Elasticsearch Service
Amazon
Kinesis
Firehose
Three Problems of Persistence
• Somewhere to stage
• Somewhere to live
• Somewhere to search
(streaming data)
Demo: Storage!
ElasticSearch Index Mapping
curl -XPUT 'https://search-loggingatscale-demo-[...].us-east-
1.es.amazonaws.com/blog-apache-combined' -d '
{
"mappings": {
"blog-apache-combined": {
"properties": {
"datetime": {
"type": "date",
"format": "dd/MMM/yyyy:HH:mm:ss Z”
},
"agent": {
"type": "string",
"index": "not_analyzed”
}, [...]
ElasticSearch Index Mapping
curl -XPUT 'https://search-loggingatscale-demo-[...].us-east-
1.es.amazonaws.com/blog-apache-combined' -d '
{
"mappings": {
"blog-apache-combined": {
"properties": {
"datetime": {
"type": "date",
"format": "dd/MMM/yyyy:HH:mm:ss Z”
},
"agent": {
"type": "string",
"index": "not_analyzed”
}, [...]
Problems
• Storage (Temporary)
• Capture
• Storage (Permanent)
• Visualisation
How do I get my data in anyway?
Logging Architecture
App Server App Server App Server App Server
Log
Aggregator
(Kafka/Kinesis/MQ)
Log
Aggregator
(Kafka/Kinesis/MQ)
Log
Index/Persist
(ElasticSearch, etc)
Log
Index/Persist
(ElasticSearch, etc)
Visualisation
Logging Architecture
App Server App Server App Server App Server
Log
Aggregator
(Kafka/Kinesis/MQ)
Log
Aggregator
(Kafka/Kinesis/MQ)
ElasticSearch ElasticSearch
Visualisation
Amazon Kinesis
• Firstly, a massively
scalable, low cost way to
send JSON objects to a
’stream’ hosted by AWS
• Users can write applications
(using KCL) to take data
from it and parse/evaluate
• Apps can be written in Java,
Lambda (Node, Python, Java),
etc
Kinesis Streams
• What was previously Kinesis
• Still very customisable, for
innovative stream workloads
• Users still write app to parse
data from the stream
Amazon Kinesis: New Features (re:Invent 2015)
Kinesis Firehose
• Fully managed data ingest
service
• Provision end point
• Send data to end point
• ???
• Data!
• Outputs to S3, Redshift,
ElasticSearch Service
• (And can do two at once)
Amazon Kinesis: New Features (Apr 2016)
Amazon Kinesis Agent
• Standalone Java application from AWS
• Collect and send logs to Kinesis Firehose
• Built-in:
• File rotation
• Failure retries
• Checkpoints
• Integrated with CloudWatch for alerting
Amazon Kinesis Agent
• Multiple input options
• SINGLELINE
• CSVTOJSON
• LOGTOJSON
• LOGTOJSON
• Hoorah!
Demo: Local Capture + Dispatch
S3
ElasticSearch
Problems
• Storage (Temp)
• Capture
• Storage (Perm)
• Visualisation
Kibana
• Pre-packaged with Amazon ElasticSearch Service
• Easy to manage with freeform data
• Dashboards!
Your existing BI tools
• As before – your data exists on S3 (JSON)
• S3 -> Redshift
• Commission a Redshift cluster with IAM roles
• Write a manifest of the files to load (JSON)
• Issue a load
• Redshift is PgSQL compatible
• Drivers exist for many tools
Demo: visualisation!
(Kibana)
Problems
• Storage (Temporary)
• Capture
• Storage (Permanent)
• Visualisation
• Insight
Recap / Lessons / Next
• Logging is really hard.
• Use tools like AWS Firehose, Kinesis Agent and
ElasticSearch Service to make it easier
• Reuse data, tools and people where possible
Lessons
Don’t be big data dog
Use the right tools at the right
time
Q&A
Twitter
@alexjs
LinkedIn
https://sg.linkedin.com/in/alexjs
Email
alexjs@amazon.com
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

February 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSFebruary 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWS
Amazon Web Services
 
16h00 globant - aws globant-big-data_summit2012
16h00   globant - aws globant-big-data_summit201216h00   globant - aws globant-big-data_summit2012
16h00 globant - aws globant-big-data_summit2012
infolive
 

Was ist angesagt? (20)

Modern Data Architectures for Real Time Analytics & Engagement
Modern Data Architectures for Real Time Analytics & EngagementModern Data Architectures for Real Time Analytics & Engagement
Modern Data Architectures for Real Time Analytics & Engagement
 
A Serverless Approach to Operational Log Visualisation and Analytics
A Serverless Approach to Operational Log Visualisation and AnalyticsA Serverless Approach to Operational Log Visualisation and Analytics
A Serverless Approach to Operational Log Visualisation and Analytics
 
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You ScaleENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
ENT316 Keeping Pace With The Cloud: Managing and Optimizing as You Scale
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
Build a Website on AWS for Your First 10 Million Users
Build a Website on AWS for Your First 10 Million UsersBuild a Website on AWS for Your First 10 Million Users
Build a Website on AWS for Your First 10 Million Users
 
Gaming in the Cloud at Playhubs Oct 2015
Gaming in the Cloud at Playhubs Oct 2015Gaming in the Cloud at Playhubs Oct 2015
Gaming in the Cloud at Playhubs Oct 2015
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...
 
February 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWSFebruary 2016 Webinar Series - 451 Research and AWS
February 2016 Webinar Series - 451 Research and AWS
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWS
 
Modern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagementModern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagement
 
Building a Streaming Data Platform on AWS - Workshop
Building a Streaming Data Platform on AWS - WorkshopBuilding a Streaming Data Platform on AWS - Workshop
Building a Streaming Data Platform on AWS - Workshop
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksReal-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
16h00 globant - aws globant-big-data_summit2012
16h00   globant - aws globant-big-data_summit201216h00   globant - aws globant-big-data_summit2012
16h00 globant - aws globant-big-data_summit2012
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWS
 
BDA309 Building Your Data Lake on AWS
BDA309 Building Your Data Lake on AWSBDA309 Building Your Data Lake on AWS
BDA309 Building Your Data Lake on AWS
 

Ähnlich wie Log Analysis At Scale

Ähnlich wie Log Analysis At Scale (20)

Journey Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million UsersJourney Towards Scaling Your Application to Million Users
Journey Towards Scaling Your Application to Million Users
 
Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
 
Big data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The NetherlandsBig data and serverless - AWS UG The Netherlands
Big data and serverless - AWS UG The Netherlands
 
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
 
Aplicaciones a gran escala: Cómo servir a millones de usuarios
Aplicaciones a gran escala: Cómo servir a millones de usuariosAplicaciones a gran escala: Cómo servir a millones de usuarios
Aplicaciones a gran escala: Cómo servir a millones de usuarios
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million Users
 
Your First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web ServicesYour First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web Services
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
Scaling up to your first 10 million users - Pop-up Loft Tel Aviv
Scaling up to your first 10 million users - Pop-up Loft Tel AvivScaling up to your first 10 million users - Pop-up Loft Tel Aviv
Scaling up to your first 10 million users - Pop-up Loft Tel Aviv
 
Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users
 
Escalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuariosEscalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuarios
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
ENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million usersENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million users
 
Escalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuariosEscalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuarios
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Log Analysis At Scale

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Logging at Scale Alex Smith - @alexjs Solutions Architect April 2016
  • 3. I thought I knew this
  • 5. But. It is really difficult
  • 6. Problems • Storage (Temporary) • Capture • Storage (Permanent) • Visualisation
  • 7. Stealing Content… ‘Your First 10m Users’ ARC301 – re:Invent 2015 http://bitly.com/2015arc301 - Joel Williams AWS Solutions Architect
  • 8. >1 User • Amazon Route 53 for DNS • A single Elastic IP • A single Amazon EC2 instance • With full stack on this host • Web app • Database • Management • And so on… Amazon EC2 instance Elastic IP User Amazon Route 53 ARC301
  • 9. >1 User • A single place to read logs Amazon EC2 instance Elastic IP User Amazon Route 53 ARC301
  • 10. >1 User • A single place to read logs from Amazon EC2 instance Elastic IP User Amazon Route 53 ARC301
  • 11. @alexjs hacks – top URLs # awk -F" '{print $2'} access_log | awk '{print $2}' | sort | uniq -c | sort –rn 11208 / 3287 /2016/04/23/welcome
  • 12. @alexjs hacks – HTTP response codes # awk '{print $9}' access_log | sort | uniq -c | sort –rn 19307 200 1239 404 120 503 1 416
  • 13. @alexjs hacks - top User-Agents # awk -F" '{print $6'} access_log | sort | uniq -c | sort -rn 3774 Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; Microsoft; Lumia 640 XL) 2949 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36 2928 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36 2900 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.5.2171.95 Safari/537.36
  • 14. @alexjs hacks – requests per second (realtime) # tail -F access_log perl -e 'while (<>) {$l++;if (time > $e) {$e=time;print "$ln";$l=0}}’ 1 1 68 99 912 424 http://bitly.com/bashlps
  • 15. Users >1000 Web Instance RDS DB Instance Active (Multi-AZ) Availability Zone Availability Zone Web Instance RDS DB Instance Standby (Multi-AZ) ELB Balancer User Amazon Route 53
  • 17. Users >1 million+ RDS DB Instance Active (Multi-AZ) Availability Zone ELB Balancer RDS DB Instance Read Replica RDS DB Instance Read Replica Amazon Route 53 User Amazon S3 Amazon CloudFront DynamoDB Amazon SQS ElastiCache Worker Instance Worker Instance Amazon CloudWatch Internal App Instance Internal App Instance Amazon SES Lambda ARC301 Web Instance Web Instance Web Instance Web Instance
  • 19. Users >1 million+ RDS DB Instance Active (Multi-AZ) Availability Zone ELB Balancer RDS DB Instance Read Replica RDS DB Instance Read Replica Web Instance Amazon Route 53 User Amazon S3 Amazon CloudFront DynamoDB Amazon SQS ElastiCache Worker Instance Worker Instance Amazon CloudWatch Internal App Instance Internal App Instance Amazon SES Lambda ARC301 Web Instance Web Instance Web Instance
  • 20. Users >1 million+ RDS DB Instance Active (Multi-AZ) Availability Zone ELB Balancer RDS DB Instance Read Replica RDS DB Instance Read Replica Web Instance Amazon Route 53 Amazon S3 Amazon CloudFront DynamoDB Amazon SQS ElastiCache Worker Instance Worker Instance Amazon CloudWatch Internal App Instance Internal App Instance Amazon SES Lambda ARC301 Web Instance Web Instance Web Instance
  • 21. Users >1 million+ RDS DB Instance Active (Multi-AZ) Availability Zone ELB Balancer RDS DB Instance Read Replica RDS DB Instance Read Replica Web Instance Amazon Route 53 User Amazon S3 Amazon CloudFront DynamoDB Amazon SQS ElastiCache Worker Instance Worker Instance Amazon CloudWatch Internal App Instance Internal App Instance Amazon SES Lambda ARC301 Web Instance Web Instance Web Instance
  • 22. Problems • Storage (Temporary) • Capture • Storage (Permanent) • Visualisation
  • 23. Problems • Storage (Temporary) • Capture • Storage (Permanent) • Visualisation
  • 24. Problems • Storage (Temporary) • Capture • Storage (Permanent) • Visualisation
  • 25. When the logs are written (AWS) • Local memory • Ephemeral Volumes • EBS Volumes • gp2 • st1/sc1
  • 26. Problems • Storage (Temporary) • Capture • Storage (Permanent) • Visualisation • Insight
  • 27. Problems • Storage (Temporary) • Capture • Storage (Permanent) • Visualisation • Insight
  • 28. Three Problems of Persistence • Somewhere to stage • Somewhere to live • Somewhere to search
  • 29. To NoSQL, or not to NoSQL? - Joel
  • 30. Some folks won’t like this, but…
  • 31. Start with SQL databases (even MPP SQL)
  • 32. Why start with SQL? • Established and well-worn technology. • Lots of existing code, communities, books, and tools. • You aren’t going to break SQL DBs in your first 10 million users. No, really, you won’t.* • Clear patterns to scalability (especially in analytics) *Unless you are doing something SUPER peculiar with the data or you have MASSIVE amounts of it. …but even then SQL will have a place in your stack.
  • 33. Ah ha! You said massive! - Joel (again)
  • 34. Why might you need NoSQL? • Super low-latency applications • Metadata-driven datasets • Highly nonrelational data • Need schema-less data constructs* • Massive amounts of data (again, in the TB range) • Rapid ingest of data (thousands of records/sec) *Need!= “It’s easier to do dev without schemas”
  • 35. Why might you need NoSQL? • Super low-latency applications • Metadata-driven datasets • Highly nonrelational data • Need schema-less data constructs* • Massive amounts of data (again, in the TB range) • Rapid ingest of data (thousands of records/sec) *Need!= “It’s easier to do dev without schemas”
  • 36. Why might you need NoSQL? • Super low-latency applications • Metadata-driven datasets • Highly nonrelational data • Need schema-less data constructs* • Massive amounts of data (again, in the TB range) • Rapid ingest of data (thousands of records/sec) *Need!= “It’s easier to do dev without schemas”
  • 37. Three Problems of Persistence • Somewhere to stage • Somewhere to live • Somewhere to search
  • 38. Log Dispatcher Architecture Revisited App Server App Server App Server App Server Kinesis Firehose Log Index ElasticSearch Log Index ElasticSearch Visualisation Amazon S3 JSON
  • 39. Amazon S3 • Simple Storage Service • Canonical logging target for ELB, CloudFront, etc. • Virtually unlimited amounts of storage • Support for Lambda operations • Very fast – ideal for feeding other services (Redshift, EMR/Hadoop) • Data can be automatically pushed here from Amazon Firehose Amazon S3
  • 40. Three Problems of Persistence • Somewhere to stage • Somewhere to live • Long tail • Somewhere to search
  • 41. Redshift • PostgreSQL based MPP database • Petabyte scale data warehousing • Choice of nodes • Dense compute • Dense storage • Already compatible with your existing BI tools dense compute node dense storage node Amazon Redshift Up to 128 nodes at 2PB ~256PB/cluster
  • 42. Three Problems of Persistence • Somewhere to stage • Somewhere to live • Somewhere to search (streaming data)
  • 43. Amazon ElasticSearch Service • ElasticSearch • Popular/Open Source • Commonly used for log and clickstream • Managed Solution • We prepackage Kibana • Integrated with IAM, Firehose, etc Amazon Elasticsearch Service Amazon Kinesis Firehose
  • 44. Three Problems of Persistence • Somewhere to stage • Somewhere to live • Somewhere to search (streaming data)
  • 46. ElasticSearch Index Mapping curl -XPUT 'https://search-loggingatscale-demo-[...].us-east- 1.es.amazonaws.com/blog-apache-combined' -d ' { "mappings": { "blog-apache-combined": { "properties": { "datetime": { "type": "date", "format": "dd/MMM/yyyy:HH:mm:ss Z” }, "agent": { "type": "string", "index": "not_analyzed” }, [...]
  • 47. ElasticSearch Index Mapping curl -XPUT 'https://search-loggingatscale-demo-[...].us-east- 1.es.amazonaws.com/blog-apache-combined' -d ' { "mappings": { "blog-apache-combined": { "properties": { "datetime": { "type": "date", "format": "dd/MMM/yyyy:HH:mm:ss Z” }, "agent": { "type": "string", "index": "not_analyzed” }, [...]
  • 48. Problems • Storage (Temporary) • Capture • Storage (Permanent) • Visualisation
  • 49. How do I get my data in anyway?
  • 50. Logging Architecture App Server App Server App Server App Server Log Aggregator (Kafka/Kinesis/MQ) Log Aggregator (Kafka/Kinesis/MQ) Log Index/Persist (ElasticSearch, etc) Log Index/Persist (ElasticSearch, etc) Visualisation
  • 51. Logging Architecture App Server App Server App Server App Server Log Aggregator (Kafka/Kinesis/MQ) Log Aggregator (Kafka/Kinesis/MQ) ElasticSearch ElasticSearch Visualisation
  • 52. Amazon Kinesis • Firstly, a massively scalable, low cost way to send JSON objects to a ’stream’ hosted by AWS • Users can write applications (using KCL) to take data from it and parse/evaluate • Apps can be written in Java, Lambda (Node, Python, Java), etc
  • 53. Kinesis Streams • What was previously Kinesis • Still very customisable, for innovative stream workloads • Users still write app to parse data from the stream Amazon Kinesis: New Features (re:Invent 2015) Kinesis Firehose • Fully managed data ingest service • Provision end point • Send data to end point • ??? • Data! • Outputs to S3, Redshift, ElasticSearch Service • (And can do two at once)
  • 54. Amazon Kinesis: New Features (Apr 2016) Amazon Kinesis Agent • Standalone Java application from AWS • Collect and send logs to Kinesis Firehose • Built-in: • File rotation • Failure retries • Checkpoints • Integrated with CloudWatch for alerting
  • 55. Amazon Kinesis Agent • Multiple input options • SINGLELINE • CSVTOJSON • LOGTOJSON • LOGTOJSON • Hoorah!
  • 56. Demo: Local Capture + Dispatch
  • 57. S3
  • 59. Problems • Storage (Temp) • Capture • Storage (Perm) • Visualisation
  • 60. Kibana • Pre-packaged with Amazon ElasticSearch Service • Easy to manage with freeform data • Dashboards!
  • 61. Your existing BI tools • As before – your data exists on S3 (JSON) • S3 -> Redshift • Commission a Redshift cluster with IAM roles • Write a manifest of the files to load (JSON) • Issue a load • Redshift is PgSQL compatible • Drivers exist for many tools
  • 63. Problems • Storage (Temporary) • Capture • Storage (Permanent) • Visualisation • Insight
  • 64. Recap / Lessons / Next • Logging is really hard. • Use tools like AWS Firehose, Kinesis Agent and ElasticSearch Service to make it easier • Reuse data, tools and people where possible
  • 65. Lessons Don’t be big data dog Use the right tools at the right time