Elasticsearch is a popular open-source search and analytics engine used for log analytics. With Amazon Elasticsearch Service, you can easily run Elasticsearch on AWS. In this webinar, we will provide an overview of Amazon Elasticsearch Service and demo how to set up and configure an Amazon Elasticsearch domain for the log analytics use case.
Learning Objectives:
'- Understand Amazon Elasticsearch Service use cases and key features
- Learn how to secure your Amazon Elasticsearch cluster for access from Kibana and other plug-ins
- Learn best practices for scaling, monitoring, and troubleshooting Amazon Elasticsearch domains
2. What we'll cover
• Understanding Elasticsearch capabilities
• Elasticsearch, the technology
• Aggregations; ad-hoc analysis
• Amazon Elasticsearch Service is a drop-in
replacement for self-managed Elasticsearch
• Q&A
4. CloudTrail delivers API calls to you
• AWS API call monitoring
• You need to understand the changing
landscape of your AWS resources
• You need to do security analysis and
compliance auditing
• You want the ability to dig into your logs
in an intuitive, fine-grained way
5.
6. How Elasticsearch can help
• Combined with Kibana, Elasticsearch provides a
tool for search, real-time analytics, and data
visualization
10. Scenario: Log data analytics
• Application monitoring and
event diagnosis
• You need to monitor the performance of
your application, web servers, and
hardware
• You need easy to use, yet powerful
data visualization tools to detect issues
in near real-time
• You want the ability to dig into your logs
in an intuitive, fine-grained way
• Kibana provides fast, easy visualization
11. Scenario: Batch data analytics
• Reporting and Analysis
• You are a mobile app developer
• You have to monitor/manage users
across multiple app versions
• You want to analyze and report on
usage and migration between app
versions
• Use Kibana for dashboarding. Use the
query API for deeper analysis
12. Scenario: Full-text search
• Traditional search
• Your application or website provides
search capabilities over diverse
documents
• You are tasked with making this
knowledge base searchable and
accessible
• You need key search features including
text matching, faceting, filtering, fuzzy
search, auto complete, and highlighting
• Use the query API to support
application search
14. Elasticsearch is like a database
Search
Value
Field
Document
Index
Cluster
Queries
Database
Value
Column
Row
Table
Database
SQL
15. Documents are the core entity
ID
F1 Value
F2 Value
{
"eventVersion": "1.03",
"eventTime": "2016-06-01T00:16:19Z",
"eventSource": "dynamodb.amazonaws.com",
"eventName": "DescribeStream",
"awsRegion": "eu-west-1",
"sourceIPAddress": "52.51.24.XX",
"userAgent": "leb-kcl-580935a6-5f94-4ce0-ac69-cdeb609ba16a,amazon-
kinesis-client-library-java-lambda_1.2.1, aws-internal/3",
"requestParameters": {
"streamArn": "arn:aws:dynamodb:eu-west-
1:17816119XXXX:table/restaurant/stream/2016-04-08T18:07:53.837"
},
"responseElements": null,
"requestID": "KC608PH8POAF2I184E2SL1PS2FVV4KQNSO5AEMVJF66Q9ASUAAJG",
"eventID": "49b56379-903b-4f04-8ce5-d21bbfcf8ab3",
"eventType": "AwsApiCall",
"apiVersion": "2012-08-10",
"recipientAccountId": "17816119XXXX",
"userIdentity": {
"type": "AssumedRole",
"principalId":
"AROAJBQVRM7LN25CAHX7Y:awslambda_338_20160531233813522",
"arn": "arn:aws:sts::178161197791:assumed-role/geospatial-rec-
engine-ApplicationExecutionRole-
9LPKB77QMR97/awslambda_338_20160531233813522", ...
16. Lucene provides text analysis and indexing
0 quick 1,3,5
1 brown 2,3,4,6
2 fox 1,7,9
3 lazy 2,8
4 dog 24
Term ID Term Postings
Index
Writer
Index
Searcher
Segment
19. Faceting: basic aggregation
• Query: shirt
Facets
Carhartt (1092)
Russell Athletic (1087)
Dickies (954)
RALPH LAUREN (823)
Wrangler (701)
Doublju (259)
Levi's (12)
ID
F1 Value
F2 Value
20. Elasticsearch Aggregations
• Buckets – a collection of documents meeting
some criterion
• Metrics – calculations on the content of buckets.
Bucket: time
Metric:count
21. A more complicated aggregation
Bucket: ARN
Bucket: Region
Bucket: eventName
Metric: Count
22. More kinds of aggregations
Buckets
• Date histogram
• Histogram
• Range
• Terms
• Filters
• Significant terms
Metrics
• Count
• Average
• Sum
• Min
• Max
• Std. Dev
• Unique Count
• Percentiles
24. Shard 1 Shard 2 Shard 3
{
{
{
{
Shard 4
Shards: independent collections of documents
Id Id Id . . .
Documents
Index/Type
25. Deployment of indices to a cluster
• Index 1
– Shard 1
– Shard 2
– Shard 3
• Index 2
– Shard 1
– Shard 2
– Shard 3
Amazon ES cluster
1
2
3
1
2
3
1
2
3
1
2
3
Primary Replica
1
3
3
1
Instance 1,
Master
2
1
1
2
Instance 2
3
2
2
3
Instance 3
26. Determining storage
• Data:Index ratio is typically close to 1:1
• Add a replica, double the storage
• Figure out data node count based on storage
– Current limits; 10T EBS, 32T instance store
27. Determining instance type
• Instance type is workload-dependent
• T2; dev, test, QA
• M3; solid performance
• R3; heavier queries, aggs
• I2; largest storage option
28. Best practices
• Take the minimum number of shards for 50G
max data per shard
• Number of replicas = 1
• For all prod workloads: use 3 dedicated masters
• Use the _bulk API. Some ingest mechanisms do
this automatically
• Increase index.refresh_interval for higher
throughput
31. Indexing strategy for streaming data
• Use an index per time period, typically index-
per-day, high volume can go to index-per-hour
• Shard the index according to data size; use
50GB as a soft limit per shard
• Master nodes increase cluster stability
32. Index settings control sharding and more
curl -XPUT <endpoint>/<index>/_settings -d '{
"number_of_shards" : 5,
"number_of_replicas" : 1,
"refresh_interval": "5s"
}'
33. Mappings control how data is indexed
curl -XPUT <endpoint>/<index> -d '{
"mappings" : {
<type> : {
"properties" : {
"eventName" : {
"type" : "string",
"index" : "not_analyzed" } } } }
}'
37. Elasticsearch is a full-featured search engine
• Built on Lucene, the popular, open-source library
• Search structured and unstructured data with
complex, boolean queries
• Supports common search features: geo search,
aggregations, highlighting, search suggestions,
and more
38. Challenges with self-managed Elasticsearch
• Easy to get started, challenging to scale
• Scaling ingest pipelines is difficult
• Undifferentiated heavy lifting
40. Amazon ES overview
Amazon Route
53
Elastic Load
Balancing
IAM
CloudWatch
Elasticsearch API
CloudTrail
41. Easy cluster configuration and reconfiguration
AWS
• Elasticsearch Version
• Data nodes, count and type
• Master nodes, count and type
• Storage option – EBS/instance
• HA option
• Advanced options
42. High availability with Zone Awareness
Amazon ES cluster
1
3
Instance 1
2
1 2
Instance 2
3
2
1
Instance 3
Availability Zone 1 Availability Zone 2
2
1
Instance 4
3
3
43. Monitor with CloudWatch metrics
• FreeStorageSpace – monitor and alarm before the
cluster runs out of space
• CPUUtilization – alarm at 80% CPU to signal the need to
scale up
• ClusterStatus.yellow – check whether replication
requires additional nodes
• JVMMemoryPressure – check instance type and count
for sufficient resources
• MasterCPUUtilization – monitoring for master nodes is
separated from data nodes
45. Pay for compute and storage you use
• With Amazon Elasticsearch Service, you pay
only for the compute and storage resources you
use. AWS Free Tier for qualifying customers.
46. Wrap up
• Combined with Kibana, Elasticsearch provides search and
visualization for streaming data and full-text use cases.
• Elasticsearch is based on Lucene, which reads and writes
search indices
• Aggregations allow you to analyze your data, splitting into
Buckets and computing Metrics
• Amazon Elasticsearch Service makes it easy to set up and
manage your Elasticsearch cluster on AWS
• Amazon ES is a great way to get started with Elasticsearch!
47. Q&A
• Jon Handler: handler@amazon.com
• Vivek Sriram: Business Development Manager:
vsriram@amazon.com
• https://run.qwiklab.com/searches/elasticsearch