Elasticsearch 5.1 in Amazon Elasticsearch Service

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Elasticsearch 5.1 in
Amazon Elasticsearch Service
16 Mar 2017
Jon Handler
AWS Principal Solutions Architect
handler@amazon.com or @_searchgeek
Darin Briskman
Amazon Web Services Technical Evangelist
briskman@amazon.com or @briskmad

Get started at https://aws.amazon.com/elasticsearch-service/
Amazon Search Services
Amazon
CloudSearch
Amazon
Elasticsearch
Service

Open Source Distributed Index
Managed Service using Elasticsearch and Kibana
Fully managed; Zero admin
Highly Available and Reliable
RESTful API for easy integration
Amazon
Elasticsearch
Service

Amazon Elasticsearch Service Leading Use Cases
Log Analytics &
Operational Monitoring
• Monitor the performance of
applications, web servers, and
hardware
• Easy to use, powerful data
visualization tools to detect
issues quickly
• Dig into logs in an intuitive,
fine-grained way
• Kibana provides fast, easy
visualization
Search
• Application or website provides
search capabilities over diverse
documents
• Tasked with making this knowledge
base searchable and accessible
• Text matching, faceting, filtering,
fuzzy search, auto complete,
highlighting, and other search
features
• Query API to support application
search

Leading enterprises trust Amazon Elasticsearch
Service for their search and analytics applications
Media &
Entertainment
Online
Services
Technology Other

Adobe Developer Platform (Adobe I/O)
P R O B L E M
• Cost effective monitor
for XL amount of log
data
• Over 200,000 API calls
per second at peak -
destinations, response
times, bandwidth
• Integrate seamlessly
with other components
of AWS eco-system.
SOLU TION
• Log data is routed with
Amazon Kinesis to
Amazon Elasticsearch
Service, then
displayed using AES
Kibana
• Adobe team can easily
see traffic patterns and
error rates, quickly
identifying anomalies
and potential
challenges
B E N E F I T S
• Management and
operational simplicity
• Flexibility to try out
different cluster config
during dev and test
Amazon
Kinesis
Streams
Spark Streaming
Amazon
Elasticsearch
Service
Data
Sources
1

McGraw Hill Education
P R O B L E M
• Supporting a wide catalog
across multiple services in
multiple jurisdictions
• Over 100 million learning
events each month
• Tests, quizzes, learning
modules begun / completed
/ abandoned
S O L U T I O N
• Search and analyze test
results, student/teacher
interaction, teacher
effectiveness, student
progress
• Analytics of applications
and infrastructure are now
integrated to understand
operations in real time
B E N E F I T S
• Confidence to scale
throughout the school year.
From 0 to 32TB in 9 months
• Focus on their business, not
their infrastructure

Easy to Use
Deploy a production-ready Elasticsearch
cluster in minutes
Simplifies time-consuming management
tasks such as software patching, failure
recovery, backups, and monitoring
Open
Get direct access to the Elasticsearch
open-source API
Fully compatible with the open source
Elasticsearch API, for all code and
applications
Secure
Secure Elasticsearch clusters with AWS
Identity and Access Management (IAM)
policies with fine-grained access control
access for users and endpoints
Automatically applies security patches
without disruption, keeping Elasticsearch
environments secure
Available
Provides high availability using Zone
Awareness, which replicates data between
two Availability Zones
Monitors the health of clusters and
automatically replaces failed nodes,
without service disruption
AWS Integrated
Integrates with Amazon Kinesis Firehose,
AWS IOT, and Amazon CloudWatch Logs for
seamless data ingestion
AWS CloudTrail for auditing, AWS Identity
and Access Management (IAM) for
security, and AWS CloudFormation for
cloud orchestration
Scalable
Scale clusters from a single node up to 20
nodes
Configure clusters to meet performance
requirements by selecting from a range of
instance types and storage options
including SSD-powered EBS volumes
Amazon Elasticsearch Service Benefits

Easy to use and scalable
AWS SDK
AWS CLI
AWS
CloudFormation
Elastic Load
Balancing
AWS IAM
Amazon
CloudWatch
AWS CloudTrail

Open
• Drop-in replacement
• Zero-change, no-risk
migration to or from open
source Elasticsearch

Secure
• Control access based on
originating IP or Principal
• Mix policies to provide
application access and
Kibana access
• Use IAM roles to provide
access for other services

Available
Amazon Elasticsearch Service cluster
1
3
Instance 1
2
1 2
Instance 2
3
2
1
Instance 3
Availability Zone 1 Availability Zone 2
2
1
Instance 4
3
3

Logstash
REST
CWL Agent
EC2 Instances
Amazon
Kinesis
Amazon
RDS
Amazon
DynamoDB
Amazon
SQS
Queue
Logstash
Cluster
Amazon
Elasticsearch
Service
Amazon
CloudWatch
AWS
Lambda
AWS
CloudTrail
Access Logs
Amazon
VPC Flow
Logs
Amazon S3
bucket
AWS IoT
Amazon Kinesis
Firehose
AWS integrated
Amazon ECS

Dedicated master nodes improve stability
Amazon ES cluster
1
3
3
1
Instance 1
2
1
1
2
Instance 2
3
2
2
3
Instance 3Dedicated master nodes
Data nodes: queries and updates

Firehose delivery architecture with
transformations
intermediate
Amazon S3
bucket
backup S3 bucket
source records
data source
source records
Service
Firehose
delivery stream
transformed
records transformed
records
transformation failure
delivery failure

Repository Search
• File metadata and
possibly file contents
for traditional search
• Lambda to keep the
repository current
• Good for up to ~60TB
of metadata/source
data (current limits)
See also: Indexing S3 Metadata blog post by Amit Sharma

Amazon Elasticsearch Service
support for Elasticsearch 5.1

What to do with a terabyte of logs?

Visualize it with Kibana 5!

Scripting with Amazon Elasticsearch Service
Scripting is fully supported using the Painless language.
With scripts you can
• Change the precedence of search results
• Delete index fields by query
• Modify search results to return specific fields
• Alter elements in a field
Painless is explicitly designed for Elasticsearch and is both
performant and secure.

Ingest Pipelines and Processors
When you index documents, you can specify a pipeline.
The pipeline can have a series of processors that
pre-process the data before indexing.
Twenty processors are available, some are simple:
{ "append":
{ "field": "field1"
"value": ["item2", "item3", "item4"] } }
Others are more complex, like the Grok processor for
regex with aliased expressions.

Lots of New Elasticsearch APIs
/_alias
/_aliases
/_all
/_analyze
/_bulk
/_cache/clear (Index only)
/_cat
/_cluster/allocation/explain
/_cluster/health
/_cluster/pending_tasks
/_cluster_settings (PUT only):
indices.breaker.fielddata.limit
indices.breaker.request.limit
indices.breaker.total.limit
/_cluster/state
/_cluster/stats
/_count
/_delete_by_query*
/_explain
/_field_stats
/_flush
/_forcemerge (Index only)
/_mapping
/_mget
/_msearch
/_mtermvectors
/_nodes
/_plugin/kibana
/_recovery (Index only)
/_refresh
/_reindex*
/_rollover
/_search
/_search profile
/_segments (Index only)
/_shard_stores
/_shrink
/_snapshot
/_stats
/_status
/_tasks
/_template
/_termvectors
/_update_by_query*
/_validate

Shrink and Rollover
Shrink an index to a single shard:
POST source_index/_shrink/target_index
Very useful for time-series indexes once ingestion is done!
Rollover an index based on number of documents:
POST logs_index/_rollover
{ "conditions": {"max_docs": 100000 } }

Supported Elasticsearch 5 Plugins
• Smart Chinese Analysis plugin
• Stempel Polish Analysis plugin
• Ingest Processor Attachment plugin
• Ingest Geoip Processor Plugin
• Ingest User Agent Processor plugin
• Mapper Murmur3 Plugin
中文
Polskie

Testing Ingest Performance
• Load generator
• m4.large, single process, single thread
• Amazon Elasticsearch Service
• 1 instance, 1 primary, no replicas, EBS gp2 storage
• Data
• 1.8m apache web log lines, comprising 196 MB
• _bulk API calls with 10K lines per call
• Monitoring data gathered from load generator process
and from the Amazon Elasticsearch Service domain

Service with v2.3 Engine
Instance Avg Index Docs/sec
m3.medium 3.93 ms 2811
m3.2xlarge 11.83 ms 3966
r3.large 8.87 ms 3932
r3.8xlarge 10.58 ms 4404
I2.2xlarge 11.2 ms 5305
Ingest Performance Test Results
Instance Avg Index Docs/sec
m3.medium 3.12 ms 3629
m3.2xlarge 11.1 ms 5816
r3.large 8.76 ms 7221
r3.8xlarge 9.59 ms 7726
I2.2xlarge 10.3 ms 9676
Service with v5.1 Engine
Up to 82% more documents per second!

Migrating from v2.3 to v5.1
The easy way:
1. Create a new Amazon Elasticsearch Service v5.1 cluster
2. Snapshot your v2.3 indexes
3. Restore the indexes to the v5.1 cluster
… but this won’t get most of the benefits of v5.1
There are many breaking changes in v5, documented at
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/breaking-changes.html

Three Things to Remember
• Amazon Elasticsearch Service is a drop-in replacement
for new and existing Elasticsearch workloads
• Deploy, manage, and scale Elasticsearch more easily in
the AWS cloud
• Support for Elasticsearch 5.1 brings scripting, additional
plugins and additional performance to Amazon
Elasticsearch Service

Find out more:
https://aws.amazon.com/elasticsearch-service/
AWS Centralized Logging:
https://aws.amazon.com/answers/logging/centralized-logging/
Elasticsearch at the AWS Database Blog:
https://aws.amazon.com/blogs/database/category/elasticsearch/
Or ask your Solutions Architect!
Amazon
Elasticsearch
Service

Elasticsearch 5.1 in Amazon Elasticsearch Service

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (19)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Elasticsearch 5.1 in Amazon Elasticsearch Service

Hinweis der Redaktion