Log Analytics with Amazon Elasticsearch Service - September Webinar Series

Log Analytics with
Amazon Elasticsearch
Service
Jon Handler
(handler@amazon.com)

What we'll cover
• Understanding Elasticsearch capabilities
• Elasticsearch, the technology
• Aggregations; ad-hoc analysis
• Amazon Elasticsearch Service is a drop-in
replacement for self-managed Elasticsearch
• Q&A

Understanding Elasticsearch capabilities

CloudTrail delivers API calls to you
• AWS API call monitoring
• You need to understand the changing
landscape of your AWS resources
• You need to do security analysis and
compliance auditing
• You want the ability to dig into your logs
in an intuitive, fine-grained way

How Elasticsearch can help
• Combined with Kibana, Elasticsearch provides a
tool for search, real-time analytics, and data
visualization

Demo Architecture
Amazon
CloudWatch
Logs
Amazon
Elasticsearch Service
CloudTrail
Logs
AWS
Resources

Scenario: Log data analytics
• Application monitoring and
event diagnosis
• You need to monitor the performance of
your application, web servers, and
hardware
• You need easy to use, yet powerful
data visualization tools to detect issues
in near real-time
• You want the ability to dig into your logs
in an intuitive, fine-grained way
• Kibana provides fast, easy visualization

Scenario: Batch data analytics
• Reporting and Analysis
• You are a mobile app developer
• You have to monitor/manage users
across multiple app versions
• You want to analyze and report on
usage and migration between app
versions
• Use Kibana for dashboarding. Use the
query API for deeper analysis

Scenario: Full-text search
• Traditional search
• Your application or website provides
search capabilities over diverse
documents
• You are tasked with making this
knowledge base searchable and
accessible
• You need key search features including
text matching, faceting, filtering, fuzzy
search, auto complete, and highlighting
• Use the query API to support
application search

Elasticsearch is like a database
Search
Value
Field
Document
Index
Cluster
Queries
Database
Value
Column
Row
Table
Database
SQL

Documents are the core entity
ID
F1 Value
F2 Value
{
"eventVersion": "1.03",
"eventTime": "2016-06-01T00:16:19Z",
"eventSource": "dynamodb.amazonaws.com",
"eventName": "DescribeStream",
"awsRegion": "eu-west-1",
"sourceIPAddress": "52.51.24.XX",
"userAgent": "leb-kcl-580935a6-5f94-4ce0-ac69-cdeb609ba16a,amazon-
kinesis-client-library-java-lambda_1.2.1, aws-internal/3",
"requestParameters": {
"streamArn": "arn:aws:dynamodb:eu-west-
1:17816119XXXX:table/restaurant/stream/2016-04-08T18:07:53.837"
},
"responseElements": null,
"requestID": "KC608PH8POAF2I184E2SL1PS2FVV4KQNSO5AEMVJF66Q9ASUAAJG",
"eventID": "49b56379-903b-4f04-8ce5-d21bbfcf8ab3",
"eventType": "AwsApiCall",
"apiVersion": "2012-08-10",
"recipientAccountId": "17816119XXXX",
"userIdentity": {
"type": "AssumedRole",
"principalId":
"AROAJBQVRM7LN25CAHX7Y:awslambda_338_20160531233813522",
"arn": "arn:aws:sts::178161197791:assumed-role/geospatial-rec-
engine-ApplicationExecutionRole-
9LPKB77QMR97/awslambda_338_20160531233813522", ...

Lucene provides text analysis and indexing
0 quick 1,3,5
1 brown 2,3,4,6
2 fox 1,7,9
3 lazy 2,8
4 dog 24
Term ID Term Postings
Index
Writer
Index
Searcher
Segment

Elsaticsearch query processing
Query
quick
brown
fox
lazy
lorem
ipsum
dolor
sit
Index Lookup
id: 216
id: 305
id: 486
id: 713
Matches
Query
logic and
post-
filtering Scoring,
aggs
id: 713
id: 305
id: 486
id: 216
Sorted matches
(results)

Faceting: basic aggregation
• Query: shirt
Facets
Carhartt (1092)
Russell Athletic (1087)
Dickies (954)
RALPH LAUREN (823)
Wrangler (701)
Doublju (259)
Levi's (12)
ID
F1 Value
F2 Value

Elasticsearch Aggregations
• Buckets – a collection of documents meeting
some criterion
• Metrics – calculations on the content of buckets.
Bucket: time
Metric:count

A more complicated aggregation
Bucket: ARN
Bucket: Region
Bucket: eventName
Metric: Count

More kinds of aggregations
Buckets
• Date histogram
• Histogram
• Range
• Terms
• Filters
• Significant terms
Metrics
• Count
• Average
• Sum
• Min
• Max
• Std. Dev
• Unique Count
• Percentiles

Shard 1 Shard 2 Shard 3
{
{
{
{
Shard 4
Shards: independent collections of documents
Id Id Id . . .
Documents
Index/Type

Deployment of indices to a cluster
• Index 1
– Shard 1
– Shard 2
– Shard 3
• Index 2
– Shard 1
– Shard 2
– Shard 3
Amazon ES cluster
1
2
3
1
2
3
1
2
3
1
2
3
Primary Replica
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3

Determining storage
• Data:Index ratio is typically close to 1:1
• Add a replica, double the storage
• Figure out data node count based on storage
– Current limits; 10T EBS, 32T instance store

Determining instance type
• Instance type is workload-dependent
• T2; dev, test, QA
• M3; solid performance
• R3; heavier queries, aggs
• I2; largest storage option

Best practices
• Take the minimum number of shards for 50G
max data per shard
• Number of replicas = 1
• For all prod workloads: use 3 dedicated masters
• Use the _bulk API. Some ingest mechanisms do
this automatically
• Increase index.refresh_interval for higher
throughput

Logstash
REST
CWL Agent
EC2 Instances
Amazon
Kinesis
Amazon
RDS
Amazon
DynamoDB
Amazon
SQS
Queue
Logstash
Cluster
Amazon
Elasticsearch
Service
Amazon
CloudWatch
AWS
Lambda
AWS
CloudTrail
Access Logs
Amazon
VPC Flow
Logs
Amazon S3
bucket
AWS IoT
Amazon Kinesis
Firehose
Integration with the AWS
ecosystem
Amazon ECS

Indexing strategy for streaming data
• Use an index per time period, typically index-
per-day, high volume can go to index-per-hour
• Shard the index according to data size; use
50GB as a soft limit per shard
• Master nodes increase cluster stability

Index settings control sharding and more
curl -XPUT <endpoint>/<index>/_settings -d '{
"number_of_shards" : 5,
"number_of_replicas" : 1,
"refresh_interval": "5s"
}'

Mappings control how data is indexed
curl -XPUT <endpoint>/<index> -d '{
"mappings" : {
<type> : {
"properties" : {
"eventName" : {
"type" : "string",
"index" : "not_analyzed" } } } }
}'

Index templates simplify mapping creation
curl -XPUT <endpoint>/_template/<name> -d '{
"template" : "<wildcard e.g. cwl-*>",
"settings" : { "number_of_shards" : 2 },
"mappings" : {
<type, e.g. _default_> : {
"dynamic_templates" : [ {
<template name> : {
"mapping" : {
"index" : "not_analyzed"
},
"match" : "*" } } ],
"properties" : {
"@timestamp" : { "type" : "date" } } }
}'

Direct access to the Elasticsearch API
• $ curl -XPUT https://<endpoint>/blog -d '{
• "settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } }'
• $ curl -XPOST http://<endpoint>/blog/post/1 -d '{
• "author":"jon handler",
• "title":"Amazon ES Launch" }'
• $ curl -XPOST https://<endpoint>/blog/post/_bulk -d '
• { "index" : { "_index" : "blog", "_type" : "post", "_id" : "2"}}
• {"title":"Amazon ES for search", "author": "carl meadows"},
• { "index" : { "_index":"blog", "_type":"post", "_id":"3" } }
• { "title":"Analytics too", "author": "vivek sriram"}'
• $ curl -XGET http://<endpoint>/_search?q=ES
• {"took":16,"timed_out":false,"_shards":{"total":3,"successful":3,"failed":0
},"hits":{"total":2,"max_score":0.13424811,"hits":[{"_index":"blog","_type":
"post","_id":"1","_score":0.13424811,"_source":{"author":"jon handler",
"title":"Amazon ES Launch"
}},{"_index":"blog","_type":"post","_id":"2","_score":0.11506981,"_source":{
"title":"Amazon ES for search", "author": "carl meadows"},}]}}

Elasticsearch is a full-featured search engine
• Built on Lucene, the popular, open-source library
• Search structured and unstructured data with
complex, boolean queries
• Supports common search features: geo search,
aggregations, highlighting, search suggestions,
and more

Challenges with self-managed Elasticsearch
• Easy to get started, challenging to scale
• Scaling ingest pipelines is difficult
• Undifferentiated heavy lifting

Amazon ES overview
Amazon Route
53
Elastic Load
Balancing
IAM
CloudWatch
Elasticsearch API
CloudTrail

Easy cluster configuration and reconfiguration
AWS
• Elasticsearch Version
• Data nodes, count and type
• Master nodes, count and type
• Storage option – EBS/instance
• HA option
• Advanced options

High availability with Zone Awareness
Amazon ES cluster
1
3
Instance 1
2
1 2
Instance 2
3
2
1
Instance 3
Availability Zone 1 Availability Zone 2
2
1
Instance 4
3
3

Monitor with CloudWatch metrics
• FreeStorageSpace – monitor and alarm before the
cluster runs out of space
• CPUUtilization – alarm at 80% CPU to signal the need to
scale up
• ClusterStatus.yellow – check whether replication
requires additional nodes
• JVMMemoryPressure – check instance type and count
for sufficient resources
• MasterCPUUtilization – monitoring for master nodes is
separated from data nodes

Security with IAM
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam:123456789012:user/susan"
},
"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",
"es:CreateElasticsearchDomain",
"es:ListDomainNames" ],
"Resource":
"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"
} ] }

Pay for compute and storage you use
• With Amazon Elasticsearch Service, you pay
only for the compute and storage resources you
use. AWS Free Tier for qualifying customers.

Wrap up
• Combined with Kibana, Elasticsearch provides search and
visualization for streaming data and full-text use cases.
• Elasticsearch is based on Lucene, which reads and writes
search indices
• Aggregations allow you to analyze your data, splitting into
Buckets and computing Metrics
• Amazon Elasticsearch Service makes it easy to set up and
manage your Elasticsearch cluster on AWS
• Amazon ES is a great way to get started with Elasticsearch!

Q&A
• Jon Handler: handler@amazon.com
• Vivek Sriram: Business Development Manager:
vsriram@amazon.com
• https://run.qwiklab.com/searches/elasticsearch

Log Analytics with Amazon Elasticsearch Service - September Webinar Series

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Log Analytics with Amazon Elasticsearch Service - September Webinar Series

Ähnlich wie Log Analytics with Amazon Elasticsearch Service - September Webinar Series (20)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Log Analytics with Amazon Elasticsearch Service - September Webinar Series