SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Introduction to
Elasticsearch
Cliff James /omnisis@github.com

1
What is Elasticsearch?
Elasticsearch:
An open-source, distributed, real-time,
document indexer with support for online
analytics

2
Features at a Glance
Extremely elegant and powerful REST API
• Almost all search engine features are accessible over plain HTTP
• JSON formatted queries and results
• Can test/experiment/debug with simple tools like curl
Schema-Less Data Model
• Allows great flexibility for application designer
• Can index arbitrary documents right away with no schema metadata
• Can also tweak type/field mappings for indexes as needed
Fully Distributed and Highly-Available
• Tunable index-level write-path (index) and read-path (query) distribution policies
• P2P node operations with recoverable master node, multicast auto-discovery (configurable)
• Plays well in VM/Cloud provisioned environments
• Indexes scale horizontally as new nodes are added
• Search Cluster performs automatic failover and recovery
Advanced Search Features
• Full-Text search, autocomplete, facets, real-time search analytics
• Powerful Query DSL
• Multi-Language Support
• Built-in Tokenizers,Filters and Analyzers for most common search needs
3
Concepts
Clusters/Nodes
ES is a deployed as a cluster of individual nodes with a single master node. Each node can have
many indexes hosted on it.
Documents
In ES you index documents. Document indexing is a distributed atomic operation with versioning
support and transaction logs. Every document is associated with an index and has at least a type
and an id.
Indexes
Similar to a database in traditional relational stores. Indexes are a logical namespace and have
a primary shard and zero or more replica shards in the cluster. A single index has mappings
which may define several types stored in the index. Indexes store a mapping between terms and
documents.
Mappings
Mappings are like schemas in relational database. Mappings define a type within an index
along with some index-wide settings. Unlike a traditional database, in ES types do not have to be
explicitly defined ahead of time. Indexes can be created without explicit mappings at all in which
case ES infer a mapping from the source documents being indexed.
4
Concepts
Types
Types are like tables in a database. A type defines fields along with optional information about how
that field should be indexed. If a request is made to index a document with fields that don’t have
explicit type information ES will attempt to guess an appropriate type based on the indexed data.
Queries
A query is a request to retrieve matching documents (“hits”) from one or more indexes. ES can
query for exact term matches or more sophisticated full text searches across several fields or indexes
at once. The query options are also quite powerful and support things like sorting, filtering,
aggregate statistics, facet counts and much more.
Analysis
Analysis is the process of converting unstructured text into terms. It includes things like ignoring
punctuation, common stop words (‘the’,’a’,‘on’,‘and’), performing case normalizing, breaking a work
into ngrams (smaller pieces based on substrings), etc. to support full-text search. Is ES analysis
happens at index-time and query-time.

5
Index Layout
Type 1

Type 2

Type 3

Index 1

Type 3

Index 2

Node

Documents
6
Shards and Replicas
curl -XPUT localhost:9200/test -d ‘{
“settings”: {
“number_of_shards”: 1,
“number_of_replicas”: 0 }
}’

test(1)

Node
7
Shards and Replicas
curl -XPUT localhost:9200/test -d ‘{
“settings”: {
“number_of_shards”: 3,
“number_of_replicas”: 2}
}’

Shards

test(1)

Replicas

other(1)

test(1)

Node

Node
Shards

test(2)

test(3)

other(2)

other(1

other(2) other(3)
test(2)

Replicas

Node

Node
8
Shard Placement
REST

Node

Node
test(4)

Node

Document

test(2)

test(3)

st
eque
R
Index

test(1)

Node

By default shards in ES are placed onto
nodes by taking the the hash of the
document id modulo #shards for the
destination index
9
Shard Placement
REST

Node

Node
test(4)

Node

User

test(2)

test(3)

y
Quer

test(1)

Node

Querying is more complex. Generally potential search
hits are spread across all the shards for that index so
the query is distributed to all shards and the results are
combined somehow before being returned (scatter/
gather architecture).
10
Routing
curl -XGET 'localhost:9200/test/product/_query?routing=electronics'

REST

Node

Node
test(4)

Node

User

test(2)

test(3)

uer y
Q

test(1)

Node

Routing can be used to control which shards (and therefore which
nodes) receive requests to search for a document. When routing is
enabled the user can specify a value at either index time or query
time to determine which shards are used for indexing/querying. The
same routing value is always routed to the same shard for a given
index.
11
ES Document Model
•Documents first broken down into terms to create inverted index back to original
source (more on this later)
•Document content is up to you and can be:
✴ unstructured (articles/tweets)
✴ semi-structured (log entries/emails)
✴ structured (patient records/emplyee records) or
✴

any combination thereof

•Queries can look for exact term matches (e.g. productCategory == entertainment)
or “best match” based on scoring each document against search criteria
•All documents in ES have an associated index, type and id.
12
Analyzers
• In ES Analysis is the process of breaking down raw document text into terms
that will be indexed in a single lucene index.
• The role of analysis is performed by Analyzers. Analyzers themselves are
broken into logical parts:
✴ CharFilter: An optional component that directly modifies the underlying
char stream for example to remove HTML tags or convert characters
✴ Tokenizer: Component that extracts multiple terms from a single text
string
✴ TokenFilters: Component that modifies, adds or removes tokens for
example to convert all characters to uppercase or remove common
stopwords
• Can be index-specific or shared globally.
• ES ships with several common analyzers. You can also create a custom
analyzers with a single logical name by specifying the CharFilter, Tokenizer
and TokenFilters that comprise it.
13
Analyzer Example
“<p>The quick brown Fox jumps
over the Lazy dog</p>”

Input

“The quick brown Fox jumps
over the Lazy dog”

CharFilter
HTMLStripper

[“The”, “quick”, “brown”, “Fox”,
“jumps”, “over”, “the”, “Lazy”, “dog”]

Tokenizer
Standard

TokenFilter
Stopwords
[ “quick”, “brown”, “Fox”, “jumps”,
“over”, “Lazy”, “dog”]

TokenFilter
Lowercase
[ “quick”, “brown”, “fox”, “jumps”,
“over”, “lazy”, “dog”]
14

Index Terms
Testing Analyzers
curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty&format=text' -d 'this Is a tESt'

{
"tokens" : [ {
"token" : "test",
"start_offset" : 12,
"end_offset" : 16,
"type" : "<ALPHANUM>",
"position" : 4
} ]
}

•ES has several built-in analyzers and analyzer
components (which are highly congurable)
•You can mix-and-match analyzer components to
build custom analyzers and use the Analysis
REST API to test your analyzers.
•Here is an example of the standard analyzer
(default if you don’t explicitly define a mapping)
being applied to a sample text string. Notice
that several common english words (the,is,this,a)
were removed and the case was normalized to
lowercase
15
Testing Analyzers
curl -XGET 'localhost:9200/_analyze?tokenizer=standard&pretty' -d 'this Is A tESt'

{
"tokens" : [ {
"token" : "this",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "Is",
"start_offset" : 5,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "A",
"start_offset" : 8,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "tESt",
"start_offset" : 10,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 4
} ]

•We can also test tokenizers
and tokenFilters by
themselves.
•You can mix-and-match
analyzer components to build
custom analyzers and use the
Analysis REST API to test your
analyzers.

}

16
E-Commerce Example
•Suppose we run an E-commerce site similar to Amazon and
have several “products” that we would like to be able to search
for easily and quickly.
•Customers need to be able to search with a variety of complex
criteria. Although all products have some common criteria, we
don’t know all possible product attributes for all possible
products ahead of time (dynamic schema).
•Our Simple JSON Data Model:

{
}

“category”: “electronics”,
“price”: 129.99,
“name”: “ipod”

17
Indexing a Document
Index

Type

Id

curl -XPUT localhost:9200/test/product/1 -d ‘{"category": "electronics", "price": 129.99, "id": 1, "name":
"ipod"}’
--{"ok":true,"_index":"test","_type":"product","_id":"1","_version":1}

Document

•This will index all the fields of our document in the index named test with a
type mapping of product an an id of 1
•Notice that we did not create any indexes ahead of time or define any
information about the schema of the document we just indexed!
•ES returns a response JSON object acknowledging our operation

18
Indexing a Document
curl -XPOST localhost:9200/test/product -d ‘{"category": "electronics", "price": 129.99, "name":"ipod"}’
--{"ok":true,"_index":"test","_type":"product","_id":"9wrADN4eS8uXm3gNpDvEJw","_version":1}

•Using POST method this time instead of PUT
•No explicit id provided to ES so it auto-generates one for us. Id is returned in
the id eld of the JSON response.
•Notice the _version field in the response. ES keeps a version number for every
indexed document. The same document can be updated or re-indexed with
different attributes and the version will be automatically incremented by ES.

19
Introspecting Indexes
• The mapping API lets us see how ES mapped our
document elds

• ES determined that the price field was of type
double based on the rst document indexed

• Using the ‘format=yaml’ parameter in API Get

requests formats the response as YAML which is
sometimes easier to read than JSON (the default)

curl -XGET 'localhost:9200/test/_status?
format=yaml'
--ok: true
_shards:
total: 1
successful: 1
failed: 0
indices:
test:
index:
primary_size: "2.2kb"
primary_size_in_bytes: 2282
size: "2.2kb"
size_in_bytes: 2282
translog:
operations: 1
docs:
num_docs: 1
max_doc: 1
deleted_docs: 0
...

curl -XGET 'localhost:9200/test/_mapping?
format=yaml'
--test:
product:
properties:
category:
type: "string"
name:
type: "string"
price:
type: "double"

• The _status path lets us examine lots of interesting
facts about an index.

• Here we see that a new index ‘test’ was created

after our document PUT call and that it is 2.2KB in
size and contains a single document

20
Index Design

Date Bounded Indexes

• A very common pattern for user-generated data (e.g. tweets/emails) and machine generated
data (log events,system metrics) is to segregate data by date or timestamp.
• ES makes it easy to create a separate index at whatever interval makes sense for your
application (daily/weekly/monthly). For example if we are indexing log data by day our
indexes might look like:
logs-2013-10-01
logs-2013-10-02
logs-2013-10-03
• Now we can query for all the logs in October and November 2013 with the following URI form:
http://localhost:9200/logs-2013-10*,logs-2013-11*/

21
Index Aliases
curl -XPOST 'http://localhost:9200/_aliases' -d '
{
    "actions" : [
        { "add" : { "index" : "logs-2013-10", "alias"
        { "add" : { "index" : "logs-2013-09", "alias"
        { "add" : { "index" : "logs-2013-08", "alias"
        { "add" : { "index" : "logs-2013-07", "alias"
        { "add" : { "index" : "logs-2013-06", "alias"
        { "add" : { "index" : "logs-2013-05", "alias"
    ]
}'

:
:
:
:
:
:

"logs_last_6months"
"logs_last_6months"
"logs_last_6months"
"logs_last_6months"
"logs_last_6months"
"logs_last_6months"

}
}
}
}
}
}

},
},
},
},
},
},

•Index aliases allow us to manage one or more individual
indexes under a single logical name.
•This is perfect for things like creating an index alias to hold a
sliding window of indexes or providing a filtered “view” on a
subset of an indexes actual data.
•Like other aspects of ES, a REST API is exposed that allows
complete programmatic management of aliases
22
Retrieving Documents
curl -XGET ‘localhost:9200/test/product/1?pretty’
--{
"_index" : "test",
"_type" : "product",
"_id" : "1",
"_version" : 2,
"exists" : true, "_source" : {"category": "electronics", "price":
129.99, "name": "ipod"}
}

The primary purpose for setting up an ES
cluster is to support full-text or complex
querying across documents however you
can also retrieve a specic document if you
happen to know its id (Similar to KV stores)
23
Manual Index Creation
•For Indexes that
are created “lazily”
in ES, a mapping is
created “on-the-fly”
from introspecting
the documents
being indexed.
•You can specify
mappings at index
creation time or in
a cong le stored
at each node.

Name of logical index that we are creating.

Mappings for types
within the index.

curl -XPOST ‘http://localhost:9200/test' -d 
‘{
"mappings": {
"products": {
"properties": {
"name": {"type": "string", },
"price": {"type": "float"},
"category": {"type": "string"}
}
}
},
"settings" :{
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
}’

Index shard settings (overrides global defaults)

24
Mappings
mappings:
product:
properties:
category:
type: "string"
name:
fields:
bare:
index: "not_analyzed"
type: "string"
name:
index: "analyzed"
index_analyzer: "partial_word"
search_analyzer: "full_word"
type: "string"
type: "multi_field"
price:
type: "float"

settings:
analysis:
analyzer:
full_word:
filter:
- "standard"
- "lowercase"
- "asciifolding"
tokenizer:
"standard"
type: "custom"
partial_word:
filter:
- "standard"
- "lowercase"
- "contains"
- "asciifolding"
tokenizer:
"standard"
type: "custom"
filter:
contains:
max_gram: 20
min_gram: 2
type: "nGram"

•Mappings can also define the
underlying analyzer that is
used on indexed eld values.
Field mappings can specify
both an index analyzer and a
query analyzer or opt out of
analyzation completely.

•A single document field can actually have multiple settings (index settings, type,
etc) applied simultaneously using the multi_eld type, see reference guide for a
full description.
25
Dynamic Field Mappings
{
"mappings" : {
"logs" : {
"dynamic_templates" : [
{
"logs": {
"match" : "*",
"mapping" : {
"type" : "multi_field",
"fields" : {
"{name}": {
"type" : "{dynamic_type}",
“index_analyzer”: “keyword”
},
"str": {"type" : "string"}
}
}
}
}
]
}
}
}

•Sometimes we want to control how
certain elds get mapped dynamically
indexes but we don’t know every
possible eld ahead of time, dynamic
mapping templates help with this.
•A dynamic mapping template allows us
to use pattern matching to control how
new elds get mapped dynamically
•Within the template spec {dynamic_type}
is a placeholder for the type that ES
automatically infers for a given eld and
{name} is the original name of the eld
in the source document
26
Index Templates
curl -XPUT localhost:9200/_template/logtemplate -d '
{
    "template" : "logs*",
    "settings" : {
        "number_of_shards" : 5,
“number_of_replicas” : 1
    },
    "mappings" : {
        "logevent" : {
            "_source" : { "enabled" : false }
        }
    }
}

• Index templates allow you to create templates
that will automatically be applied to new
indexes
• Very handy when using a temporal index
design strategy like ‘index per day’ or similar
• Templates use a index name matching
strategy to decide if they apply to a newly
created index. If there is a match the
contents of the template are copied into the
new index settings.
• Multiple templates can match a single index.
Unless the order parameter is given templates
are applied in the order they are dened.

27
Performing Queries
curl -XGET 'localhost:9200/test/product/_search?
q="ipod"&format=yaml'
--took: 104
timed_out: false
_shards:
total: 1
successful: 1
failed: 0
hits:
total: 1
max_score: 0.15342641
hits:
- _index: "test"
_type: "product"
_id: "1"
_score: 0.15342641
_source:
category: "electronics"
price: 129.99
name: "ipod"

•The _search path is the standard way to
query an ES index
•Using the q=<query> form performs a fulltext search by parsing the query string
value. While this is convenient for a some
queries, ES offers a much richer query API
via it’s JSON Query object and query DSL
•Normally a search query also returns the
_source eld for every search hit which
contains the document as it was originally
indexed

28
Multi-Index / MultiType API Conventions
URI Format

Meaning

curl -XGET ‘localhost:9200/test/_search’

Searches all documents of any
type under the test index

curl -XGET ‘localhost:9200/test/product,sale/_search’

Searches inside documents of
type product or sale in the test
index

curl -XGET ‘localhost:9200/test/product,sale/_search’

Searches inside documents of
type product or sale in the test
index

curl -XGET ‘localhost:9200/test,production/product/_search’

Searches for documents of type
product in the test or production
indexes

curl -XGET ‘localhost:9200/_all/product/_search’

Searches for documents of type
product across all indexes

curl -XGET ‘localhost:9200/_search’

Searches across all indexes and
all types within them

29
The ES Query Object
•By Providing a “Query Object” (JSON Blob) to ES during a search operation, you
can form very complex search queries
• The size, from, and sort attributes effect how many results are returned and in what
order
{
size: number of results to return (defaults to 10) ...
from: offset into results (defaults to 0) ...
fields: ... specific fields that should be returned ...

•

sort: ... sort options ...

The query, lter, facets attributes
are the used to control the content
of search results

•

The query attribute is very
customizable and has it’s own
flexible DSL

query: {
... "query" object following the Query DSL ...
},
filter: {
...a filter spec that will be used to eliminate documents from results
note that this filter only filters on the returned results, not from the index
itself...
},
facets: {
...facets specifications...
}
}

30
Queries vs. Filters
•Since both queries and filters can return similar search results, it
can be confusing to know which one to use for a given search
scenario
•The ES Guide offers some general advice for when to use
queries vs. lters:
Use queries when:
•Full text search is needed
•The results of the search depend on a relevance score
Use lters when:
•Results of search are binary (yes/no)
•Querying against exact values
31
Simple Term Filter
{
curl -XGET 'localhost:9200/test/product/_search?format=yaml' 
-d @000-term-filter.json
--...
hits:
total: 7
max_score: 1.0
hits:
- _index: "test"
_type: "product"
_id: "1"
_score: 1.0
_source:
category: "electronics"
price: 129.99
name: "ipod"
- _index: "test"
_type: "product"
_id: "2"
_score: 1.0
_source:
category: "electronics"
price: 299.99
name: "iPhone"
...

"query": {
"constant_score": {
"filter": {
"term": {
"category":
"electronics"
}
}
}
}
}

000-term-lter.json

•Matches all documents
that have a eld containing
the search term.
•Search terms are not
analyzed
•Scores all matched
documents the same (1.0
by default)

32
Simple Term Query
{

curl -XGET 'localhost:9200/test/product/_search?
format=yaml' 
-d @001-basic-term-query.json
--...
hits:
total: 7
max_score: 1.8109303
hits:
- _index: "test"
_type: "product"
_id: "1"
_score: 1.8109303
_source:
category: "electronics"
price: 129.99
name: "ipod"
- _index: "test"
_type: "product"
_id: "2"
_score: 1.8109303
_source:
category: "electronics"
price: 299.99
name: "iPhone"
- _index: “test”
_type: "product"
_id: "3"
_score: 1.8109303
_source:
category: "electronics"
price: 499.0
name: "ipad"
...

"query": {
"term": {
"category": "electronics"
}
}
}

001-basic-term-query.json

•This is the same as the
previous query but uses a
query instead of a lter
•Matches all documents that
have a eld containing the
search term.
•Search terms are not analyzed
•Performs document relevancy
scoring on hits
33
Prex Queries
{
"query": {
"prex": {
"name": "ip"
}
}

curl -XGET 'localhost:9200/test/product/_search?format=yaml' 
-d @003-prex-query.json
--hits:
total: 3
max_score: 1.0
hits:
- _index: "test"
_type: "product"
_id: "1"
_score: 1.0
_source:
category: "electronics"
price: 129.99
name: "ipod"
- _index: "test"
_type: "product"
_id: "2"
_score: 1.0
_source:
category: "electronics"
price: 299.99
name: "iPhone"
...

}

003-prex-query.json

•Matches all documents that have
elds that start with the specied
prex
•Search terms are not analyzed

34
Complex Queries
{

curl -XGET 'localhost:9200/test/product/_search?format=yaml' 
-d @006-complex-bool-query.json
--hits:
total: 5
max_score: 1.8109303
hits:
- _index: "test"
_type: "product"
_id: "1"
_score: 1.8109303
_source:
category: "electronics"
price: 129.99
name: "ipod"
- _index: "test"
_type: "product"
_id: "2"
_score: 1.8109303
_source:
category: "electronics"
price: 299.99
name: "iPhone"
- _index: "test"
_type: "product"
_id: "5"
_score: 1.8109303
_source:
category: "electronics"
price: 139.99
name: "beats audio headphones"
...

}

"query": {
"bool": {
"must": {
"term": {
"category": "electronics"
}
},
"must_not": {
"range": {
"price": {
"from": 300
}
}
}
}
}

006-complex-bool-query.json

• This query finds all electronics
products costing less than 300
dollars
• a bool query allows us to composite
individual query pieces with must,
must_not and should clauses
35
Facet Support in ES
• Facet queries allow for faceted navigation whereby users of a search enabled application can see
aggregate stats relevant to their search results
• Example: Querying a product catalog for all “electronics” products and then getting back a list of
the Top 10 sub-categories under that section with the total count of “number of items” per subcategory
• By default facet counts returned are scoped to the query being performed. This can be altered by
using the scope: global attribute on your search request
• In addition to TopN on arbitrary fields, ES Also supports facets for:
✴ Documents within a user-defined ranges (e.g. price)
✴ Histogram counts with user-defined bin sizes
✴ Date Histograms with user-defined interval sizes
✴ Statistical Field Faceting (min,max,variance,std deviation, ss)
✴ Geographical distance from an arbitrary lat-lon
• The true power of facets lies in the fact that they allow you to combine aggregate calculations with
arbitrary search-driven drill-down and get real-time results. This creates a powerful platform for
complex online analytics.
36
Facet Queries
{

curl -XGET 'localhost:9200/test/product/_search?format=yaml' 
-d @007-complex-with-facets.json
--...
facets:
category_breakdown:
_type: "terms"
missing: 0
total: 18
other: 0
terms:
- term: "electronics"
count: 7
- term: "sports"
count: 3
- term: "entertainment"
count: 3
- term: "clothing"
count: 3
- term: "housewares"
count: 2
price_stats:
_type: "statistical"
count: 5
total: 1169.95
min: 129.99
max: 299.99
mean: 233.99
sum_of_squares: 306476.6005
variance: 6543.999999999988
std_deviation: 80.89499366462667
...

"query": {
... same as previous complex query ...
},
"facets": {
"category_breakdown": {
"terms" : {
"eld" : "category",
"size" : 10
},
"global": true
},
"price_stats" : {
"statistical": {
"eld": "price"
}
}
}
}

007-complex-with-facets.json

•Due to the search scope settings
the categories are global, but the
price statistics are local to the
search results
37
Performance Tips
•Use filters instead of queries when possible. Doing so leverages
underlying efciencies and cache opportunities from Lucene.
From the ES documentation:
Filters are very handy since they perform an order of magnitude better than plain queries since no
scoring is performed and they are automatically cached.
Filters can be a great candidate for caching. Caching the result of a filter does not require a lot of
memory, and will cause other queries executing against the same filter (same parameters) to be
blazingly fast.

•

Don’t store implicit fields unless they are needed.
_source This eld stores the entire source document by default, if
you don’t need this not storing saves significant storage space
_all This eld stores all stored elds in a single eld by default, if
you don’t need to search for values in all fields for a given index
and type you can leave it off.
38
Security Considerations
• Default access to ES, including its management APIs, is over unauthorized/unauthenticated
REST-based APIs over plain HTTP. Can be used for various tasks, such as dropping the index or
modifying the index denition to store more data.
• In a production setting you should ensure:
✴ ES in only accessible from behind a firewall, don’t expose HTTP endpoints outside of a
rewall!
✴ Set http.enabled = false to disable Netty and HTTP access on nodes that do not need to
expose it. Alternatively, can use the ES Jetty plugin (https://github.com/sonian/
elasticsearch-jetty) to implement authentication and encryption.
• If you have more stringent security requirements consider the following:
✴ By default ES uses multicast auto-discovery with an auto-join capability for new nodes. Use
unicast whitelisting instead to ensure that new “rogue” nodes can’t be started nefariously.
✴ The lucene index data is stored on the node-local filesystem by default in unencrypted files.
At a minimum, set proper le system access controls to prevent unauthorized access. You
may also want to consider using an encrypted lesystem for your data directories to protect
the data while it is stored.
39
Cluster Load Balancing
•ES nodes can have up to three roles:
✴Master - Master nodes are eligible for being declared the master
for the whole cluster. Master nodes act as coordinators for the
entire cluster. Only a single master is active at one time and if it
fails a new one is automatically selected from the master election
pool
✴Data Nodes - Data nodes hold the lucene index shards that make
up ES distributed indexes
✴Client Nodes - Client nodes handle incoming client REST requests
and coordinate data to satisfy them from the cluster’s data nodes
•The default mode of operation for ES is to have each node take on
all 3 roles within the cluster but you can tweak this in
elasticsearch.yml and opt out of being a master or data node.
40
Cluster Load Balancing
Example Architecture

Data nodes are
the workhorses of
the
cluster so they are
not congured to be
master eligible.

Data

Data

Data

Node 1

Node 2

Node 3

Client nodes handle incoming
REST client requests and are
also both eligible master
nodes in this cluster topology.
If we had more nodes we
could have congured
dedicated master nodes as
well.

master

master

Node 4

Node 5

41
Plugins
•Plugins extend the core ES capability and provide
extended or optional functionality.
•Plugins can also have a “site” associated with them.
This allows plugin writers to create third-party webbased UIs for interacting with their plugins
•Some common plugins provide additional transport
capabilities or integrations with other data
technologies like NoSQL databases, relational
databases, JMS, etc.

42
Recommended Plugins
Two management plugins are
especially useful:
Elasticsearch Head
A plugin that provides a very nice
UI for visually the state of an entire
ES cluster. Also includes a query
UI with a tabular results grid

BigDesk
A plugin that shows the last hour’s
heap,thread,disk and cpu
utilization, index request stats and
much more across the cluster.

43
References
✴ Official Elasticsearch Reference Guide
http://bit.ly/1kx8g4R
✴ Elasticsearch Query Tutorial
http://bit.ly/1cfaTVj
✴ ES Index Design Patterns and Analytics (By ES creator)
http://bit.ly/1kx8s3X
✴ More complicated mapping in Elasticsearch
http://bit.ly/1bZNoPd
✴ Using Elasticsearch to speed up filtering
http://bit.ly/JSWtj7
✴ On Elasticsearch Performance
http://bit.ly/J84j8o

44

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
Data Modeling in Looker
Data Modeling in LookerData Modeling in Looker
Data Modeling in LookerLooker
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...VMware Tanzu
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...DataWorks Summit
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Designing Data-Intensive Applications
Designing Data-Intensive ApplicationsDesigning Data-Intensive Applications
Designing Data-Intensive ApplicationsOleg MĂźrk
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeDatabricks
 

Was ist angesagt? (20)

NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Data Modeling in Looker
Data Modeling in LookerData Modeling in Looker
Data Modeling in Looker
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
MongoDB
MongoDBMongoDB
MongoDB
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Designing Data-Intensive Applications
Designing Data-Intensive ApplicationsDesigning Data-Intensive Applications
Designing Data-Intensive Applications
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Containerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta LakeContainerized Stream Engine to Build Modern Delta Lake
Containerized Stream Engine to Build Modern Delta Lake
 

Andere mochten auch

Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
 
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...DataWorks Summit
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for ElasticsearchFlorian Hopf
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineDaniel N
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Elasticsearch in Zalando
Elasticsearch in ZalandoElasticsearch in Zalando
Elasticsearch in ZalandoAlaa Elhadba
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionDavid Pilato
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearchsirensolutions
 
Elasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsElasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsAlaa Elhadba
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문SeungHyun Eom
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaAmazee Labs
 

Andere mochten auch (11)

Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Elasticsearch in Zalando
Elasticsearch in ZalandoElasticsearch in Zalando
Elasticsearch in Zalando
 
Elasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English versionElasticsearch - Devoxx France 2012 - English version
Elasticsearch - Devoxx France 2012 - English version
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
Elasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsElasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & Aggregations
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 

Ähnlich wie Introduction to Elasticsearch

Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and SparkAudible, Inc.
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsTiziano Fagni
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Vinay Kumar
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchSperasoft
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with railsTom Z Zeng
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Elasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdfElasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdfInexture Solutions
 
Amazon Elasticsearch and Databases
Amazon Elasticsearch and DatabasesAmazon Elasticsearch and Databases
Amazon Elasticsearch and DatabasesAmazon Web Services
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersBen van Mol
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
 
Search engine. Elasticsearch
Search engine. ElasticsearchSearch engine. Elasticsearch
Search engine. ElasticsearchSelecto
 

Ähnlich wie Introduction to Elasticsearch (20)

Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analytics
 
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
 
Solr5
Solr5Solr5
Solr5
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
 
Apache solr
Apache solrApache solr
Apache solr
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Elasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdfElasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdf
 
Amazon Elasticsearch and Databases
Amazon Elasticsearch and DatabasesAmazon Elasticsearch and Databases
Amazon Elasticsearch and Databases
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
Elastic pivorak
Elastic pivorakElastic pivorak
Elastic pivorak
 
Search engine. Elasticsearch
Search engine. ElasticsearchSearch engine. Elasticsearch
Search engine. Elasticsearch
 

KĂźrzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂşjo
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 

KĂźrzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 

Introduction to Elasticsearch

  • 2. What is Elasticsearch? Elasticsearch: An open-source, distributed, real-time, document indexer with support for online analytics 2
  • 3. Features at a Glance Extremely elegant and powerful REST API • Almost all search engine features are accessible over plain HTTP • JSON formatted queries and results • Can test/experiment/debug with simple tools like curl Schema-Less Data Model • Allows great flexibility for application designer • Can index arbitrary documents right away with no schema metadata • Can also tweak type/field mappings for indexes as needed Fully Distributed and Highly-Available • Tunable index-level write-path (index) and read-path (query) distribution policies • P2P node operations with recoverable master node, multicast auto-discovery (configurable) • Plays well in VM/Cloud provisioned environments • Indexes scale horizontally as new nodes are added • Search Cluster performs automatic failover and recovery Advanced Search Features • Full-Text search, autocomplete, facets, real-time search analytics • Powerful Query DSL • Multi-Language Support • Built-in Tokenizers,Filters and Analyzers for most common search needs 3
  • 4. Concepts Clusters/Nodes ES is a deployed as a cluster of individual nodes with a single master node. Each node can have many indexes hosted on it. Documents In ES you index documents. Document indexing is a distributed atomic operation with versioning support and transaction logs. Every document is associated with an index and has at least a type and an id. Indexes Similar to a database in traditional relational stores. Indexes are a logical namespace and have a primary shard and zero or more replica shards in the cluster. A single index has mappings which may define several types stored in the index. Indexes store a mapping between terms and documents. Mappings Mappings are like schemas in relational database. Mappings define a type within an index along with some index-wide settings. Unlike a traditional database, in ES types do not have to be explicitly defined ahead of time. Indexes can be created without explicit mappings at all in which case ES infer a mapping from the source documents being indexed. 4
  • 5. Concepts Types Types are like tables in a database. A type defines fields along with optional information about how that field should be indexed. If a request is made to index a document with fields that don’t have explicit type information ES will attempt to guess an appropriate type based on the indexed data. Queries A query is a request to retrieve matching documents (“hits”) from one or more indexes. ES can query for exact term matches or more sophisticated full text searches across several fields or indexes at once. The query options are also quite powerful and support things like sorting, filtering, aggregate statistics, facet counts and much more. Analysis Analysis is the process of converting unstructured text into terms. It includes things like ignoring punctuation, common stop words (‘the’,’a’,‘on’,‘and’), performing case normalizing, breaking a work into ngrams (smaller pieces based on substrings), etc. to support full-text search. Is ES analysis happens at index-time and query-time. 5
  • 6. Index Layout Type 1 Type 2 Type 3 Index 1 Type 3 Index 2 Node Documents 6
  • 7. Shards and Replicas curl -XPUT localhost:9200/test -d ‘{ “settings”: { “number_of_shards”: 1, “number_of_replicas”: 0 } }’ test(1) Node 7
  • 8. Shards and Replicas curl -XPUT localhost:9200/test -d ‘{ “settings”: { “number_of_shards”: 3, “number_of_replicas”: 2} }’ Shards test(1) Replicas other(1) test(1) Node Node Shards test(2) test(3) other(2) other(1 other(2) other(3) test(2) Replicas Node Node 8
  • 9. Shard Placement REST Node Node test(4) Node Document test(2) test(3) st eque R Index test(1) Node By default shards in ES are placed onto nodes by taking the the hash of the document id modulo #shards for the destination index 9
  • 10. Shard Placement REST Node Node test(4) Node User test(2) test(3) y Quer test(1) Node Querying is more complex. Generally potential search hits are spread across all the shards for that index so the query is distributed to all shards and the results are combined somehow before being returned (scatter/ gather architecture). 10
  • 11. Routing curl -XGET 'localhost:9200/test/product/_query?routing=electronics' REST Node Node test(4) Node User test(2) test(3) uer y Q test(1) Node Routing can be used to control which shards (and therefore which nodes) receive requests to search for a document. When routing is enabled the user can specify a value at either index time or query time to determine which shards are used for indexing/querying. The same routing value is always routed to the same shard for a given index. 11
  • 12. ES Document Model •Documents first broken down into terms to create inverted index back to original source (more on this later) •Document content is up to you and can be: ✴ unstructured (articles/tweets) ✴ semi-structured (log entries/emails) ✴ structured (patient records/emplyee records) or ✴ any combination thereof •Queries can look for exact term matches (e.g. productCategory == entertainment) or “best match” based on scoring each document against search criteria •All documents in ES have an associated index, type and id. 12
  • 13. Analyzers • In ES Analysis is the process of breaking down raw document text into terms that will be indexed in a single lucene index. • The role of analysis is performed by Analyzers. Analyzers themselves are broken into logical parts: ✴ CharFilter: An optional component that directly modies the underlying char stream for example to remove HTML tags or convert characters ✴ Tokenizer: Component that extracts multiple terms from a single text string ✴ TokenFilters: Component that modies, adds or removes tokens for example to convert all characters to uppercase or remove common stopwords • Can be index-specic or shared globally. • ES ships with several common analyzers. You can also create a custom analyzers with a single logical name by specifying the CharFilter, Tokenizer and TokenFilters that comprise it. 13
  • 14. Analyzer Example “<p>The quick brown Fox jumps over the Lazy dog</p>” Input “The quick brown Fox jumps over the Lazy dog” CharFilter HTMLStripper [“The”, “quick”, “brown”, “Fox”, “jumps”, “over”, “the”, “Lazy”, “dog”] Tokenizer Standard TokenFilter Stopwords [ “quick”, “brown”, “Fox”, “jumps”, “over”, “Lazy”, “dog”] TokenFilter Lowercase [ “quick”, “brown”, “fox”, “jumps”, “over”, “lazy”, “dog”] 14 Index Terms
  • 15. Testing Analyzers curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty&format=text' -d 'this Is a tESt' { "tokens" : [ { "token" : "test", "start_offset" : 12, "end_offset" : 16, "type" : "<ALPHANUM>", "position" : 4 } ] } •ES has several built-in analyzers and analyzer components (which are highly congurable) •You can mix-and-match analyzer components to build custom analyzers and use the Analysis REST API to test your analyzers. •Here is an example of the standard analyzer (default if you don’t explicitly dene a mapping) being applied to a sample text string. Notice that several common english words (the,is,this,a) were removed and the case was normalized to lowercase 15
  • 16. Testing Analyzers curl -XGET 'localhost:9200/_analyze?tokenizer=standard&pretty' -d 'this Is A tESt' { "tokens" : [ { "token" : "this", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "Is", "start_offset" : 5, "end_offset" : 7, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "A", "start_offset" : 8, "end_offset" : 9, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "tESt", "start_offset" : 10, "end_offset" : 14, "type" : "<ALPHANUM>", "position" : 4 } ] •We can also test tokenizers and tokenFilters by themselves. •You can mix-and-match analyzer components to build custom analyzers and use the Analysis REST API to test your analyzers. } 16
  • 17. E-Commerce Example •Suppose we run an E-commerce site similar to Amazon and have several “products” that we would like to be able to search for easily and quickly. •Customers need to be able to search with a variety of complex criteria. Although all products have some common criteria, we don’t know all possible product attributes for all possible products ahead of time (dynamic schema). •Our Simple JSON Data Model: { } “category”: “electronics”, “price”: 129.99, “name”: “ipod” 17
  • 18. Indexing a Document Index Type Id curl -XPUT localhost:9200/test/product/1 -d ‘{"category": "electronics", "price": 129.99, "id": 1, "name": "ipod"}’ --{"ok":true,"_index":"test","_type":"product","_id":"1","_version":1} Document •This will index all the elds of our document in the index named test with a type mapping of product an an id of 1 •Notice that we did not create any indexes ahead of time or dene any information about the schema of the document we just indexed! •ES returns a response JSON object acknowledging our operation 18
  • 19. Indexing a Document curl -XPOST localhost:9200/test/product -d ‘{"category": "electronics", "price": 129.99, "name":"ipod"}’ --{"ok":true,"_index":"test","_type":"product","_id":"9wrADN4eS8uXm3gNpDvEJw","_version":1} •Using POST method this time instead of PUT •No explicit id provided to ES so it auto-generates one for us. Id is returned in the id eld of the JSON response. •Notice the _version eld in the response. ES keeps a version number for every indexed document. The same document can be updated or re-indexed with different attributes and the version will be automatically incremented by ES. 19
  • 20. Introspecting Indexes • The mapping API lets us see how ES mapped our document elds • ES determined that the price eld was of type double based on the rst document indexed • Using the ‘format=yaml’ parameter in API Get requests formats the response as YAML which is sometimes easier to read than JSON (the default) curl -XGET 'localhost:9200/test/_status? format=yaml' --ok: true _shards: total: 1 successful: 1 failed: 0 indices: test: index: primary_size: "2.2kb" primary_size_in_bytes: 2282 size: "2.2kb" size_in_bytes: 2282 translog: operations: 1 docs: num_docs: 1 max_doc: 1 deleted_docs: 0 ... curl -XGET 'localhost:9200/test/_mapping? format=yaml' --test: product: properties: category: type: "string" name: type: "string" price: type: "double" • The _status path lets us examine lots of interesting facts about an index. • Here we see that a new index ‘test’ was created after our document PUT call and that it is 2.2KB in size and contains a single document 20
  • 21. Index Design Date Bounded Indexes • A very common pattern for user-generated data (e.g. tweets/emails) and machine generated data (log events,system metrics) is to segregate data by date or timestamp. • ES makes it easy to create a separate index at whatever interval makes sense for your application (daily/weekly/monthly). For example if we are indexing log data by day our indexes might look like: logs-2013-10-01 logs-2013-10-02 logs-2013-10-03 • Now we can query for all the logs in October and November 2013 with the following URI form: http://localhost:9200/logs-2013-10*,logs-2013-11*/ 21
  • 22. Index Aliases curl -XPOST 'http://localhost:9200/_aliases' -d ' {     "actions" : [         { "add" : { "index" : "logs-2013-10", "alias"         { "add" : { "index" : "logs-2013-09", "alias"         { "add" : { "index" : "logs-2013-08", "alias"         { "add" : { "index" : "logs-2013-07", "alias"         { "add" : { "index" : "logs-2013-06", "alias"         { "add" : { "index" : "logs-2013-05", "alias"     ] }' : : : : : : "logs_last_6months" "logs_last_6months" "logs_last_6months" "logs_last_6months" "logs_last_6months" "logs_last_6months" } } } } } } }, }, }, }, }, }, •Index aliases allow us to manage one or more individual indexes under a single logical name. •This is perfect for things like creating an index alias to hold a sliding window of indexes or providing a ltered “view” on a subset of an indexes actual data. •Like other aspects of ES, a REST API is exposed that allows complete programmatic management of aliases 22
  • 23. Retrieving Documents curl -XGET ‘localhost:9200/test/product/1?pretty’ --{ "_index" : "test", "_type" : "product", "_id" : "1", "_version" : 2, "exists" : true, "_source" : {"category": "electronics", "price": 129.99, "name": "ipod"} } The primary purpose for setting up an ES cluster is to support full-text or complex querying across documents however you can also retrieve a specic document if you happen to know its id (Similar to KV stores) 23
  • 24. Manual Index Creation •For Indexes that are created “lazily” in ES, a mapping is created “on-the-fly” from introspecting the documents being indexed. •You can specify mappings at index creation time or in a cong le stored at each node. Name of logical index that we are creating. Mappings for types within the index. curl -XPOST ‘http://localhost:9200/test' -d ‘{ "mappings": { "products": { "properties": { "name": {"type": "string", }, "price": {"type": "float"}, "category": {"type": "string"} } } }, "settings" :{ "index": { "number_of_shards": 1, "number_of_replicas": 0 } } }’ Index shard settings (overrides global defaults) 24
  • 25. Mappings mappings: product: properties: category: type: "string" name: fields: bare: index: "not_analyzed" type: "string" name: index: "analyzed" index_analyzer: "partial_word" search_analyzer: "full_word" type: "string" type: "multi_field" price: type: "float" settings: analysis: analyzer: full_word: filter: - "standard" - "lowercase" - "asciifolding" tokenizer: "standard" type: "custom" partial_word: filter: - "standard" - "lowercase" - "contains" - "asciifolding" tokenizer: "standard" type: "custom" filter: contains: max_gram: 20 min_gram: 2 type: "nGram" •Mappings can also dene the underlying analyzer that is used on indexed eld values. Field mappings can specify both an index analyzer and a query analyzer or opt out of analyzation completely. •A single document eld can actually have multiple settings (index settings, type, etc) applied simultaneously using the multi_eld type, see reference guide for a full description. 25
  • 26. Dynamic Field Mappings { "mappings" : { "logs" : { "dynamic_templates" : [ { "logs": { "match" : "*", "mapping" : { "type" : "multi_field", "fields" : { "{name}": { "type" : "{dynamic_type}", “index_analyzer”: “keyword” }, "str": {"type" : "string"} } } } } ] } } } •Sometimes we want to control how certain elds get mapped dynamically indexes but we don’t know every possible eld ahead of time, dynamic mapping templates help with this. •A dynamic mapping template allows us to use pattern matching to control how new elds get mapped dynamically •Within the template spec {dynamic_type} is a placeholder for the type that ES automatically infers for a given eld and {name} is the original name of the eld in the source document 26
  • 27. Index Templates curl -XPUT localhost:9200/_template/logtemplate -d ' {     "template" : "logs*",     "settings" : {         "number_of_shards" : 5, “number_of_replicas” : 1     },     "mappings" : {         "logevent" : {             "_source" : { "enabled" : false }         }     } } • Index templates allow you to create templates that will automatically be applied to new indexes • Very handy when using a temporal index design strategy like ‘index per day’ or similar • Templates use a index name matching strategy to decide if they apply to a newly created index. If there is a match the contents of the template are copied into the new index settings. • Multiple templates can match a single index. Unless the order parameter is given templates are applied in the order they are dened. 27
  • 28. Performing Queries curl -XGET 'localhost:9200/test/product/_search? q="ipod"&format=yaml' --took: 104 timed_out: false _shards: total: 1 successful: 1 failed: 0 hits: total: 1 max_score: 0.15342641 hits: - _index: "test" _type: "product" _id: "1" _score: 0.15342641 _source: category: "electronics" price: 129.99 name: "ipod" •The _search path is the standard way to query an ES index •Using the q=<query> form performs a fulltext search by parsing the query string value. While this is convenient for a some queries, ES offers a much richer query API via it’s JSON Query object and query DSL •Normally a search query also returns the _source eld for every search hit which contains the document as it was originally indexed 28
  • 29. Multi-Index / MultiType API Conventions URI Format Meaning curl -XGET ‘localhost:9200/test/_search’ Searches all documents of any type under the test index curl -XGET ‘localhost:9200/test/product,sale/_search’ Searches inside documents of type product or sale in the test index curl -XGET ‘localhost:9200/test/product,sale/_search’ Searches inside documents of type product or sale in the test index curl -XGET ‘localhost:9200/test,production/product/_search’ Searches for documents of type product in the test or production indexes curl -XGET ‘localhost:9200/_all/product/_search’ Searches for documents of type product across all indexes curl -XGET ‘localhost:9200/_search’ Searches across all indexes and all types within them 29
  • 30. The ES Query Object •By Providing a “Query Object” (JSON Blob) to ES during a search operation, you can form very complex search queries • The size, from, and sort attributes effect how many results are returned and in what order { size: number of results to return (defaults to 10) ... from: offset into results (defaults to 0) ... fields: ... specific fields that should be returned ... • sort: ... sort options ... The query, lter, facets attributes are the used to control the content of search results • The query attribute is very customizable and has it’s own flexible DSL query: { ... "query" object following the Query DSL ... }, filter: { ...a filter spec that will be used to eliminate documents from results note that this filter only filters on the returned results, not from the index itself... }, facets: { ...facets specifications... } } 30
  • 31. Queries vs. Filters •Since both queries and lters can return similar search results, it can be confusing to know which one to use for a given search scenario •The ES Guide offers some general advice for when to use queries vs. lters: Use queries when: •Full text search is needed •The results of the search depend on a relevance score Use lters when: •Results of search are binary (yes/no) •Querying against exact values 31
  • 32. Simple Term Filter { curl -XGET 'localhost:9200/test/product/_search?format=yaml' -d @000-term-filter.json --... hits: total: 7 max_score: 1.0 hits: - _index: "test" _type: "product" _id: "1" _score: 1.0 _source: category: "electronics" price: 129.99 name: "ipod" - _index: "test" _type: "product" _id: "2" _score: 1.0 _source: category: "electronics" price: 299.99 name: "iPhone" ... "query": { "constant_score": { "filter": { "term": { "category": "electronics" } } } } } 000-term-lter.json •Matches all documents that have a eld containing the search term. •Search terms are not analyzed •Scores all matched documents the same (1.0 by default) 32
  • 33. Simple Term Query { curl -XGET 'localhost:9200/test/product/_search? format=yaml' -d @001-basic-term-query.json --... hits: total: 7 max_score: 1.8109303 hits: - _index: "test" _type: "product" _id: "1" _score: 1.8109303 _source: category: "electronics" price: 129.99 name: "ipod" - _index: "test" _type: "product" _id: "2" _score: 1.8109303 _source: category: "electronics" price: 299.99 name: "iPhone" - _index: “test” _type: "product" _id: "3" _score: 1.8109303 _source: category: "electronics" price: 499.0 name: "ipad" ... "query": { "term": { "category": "electronics" } } } 001-basic-term-query.json •This is the same as the previous query but uses a query instead of a lter •Matches all documents that have a eld containing the search term. •Search terms are not analyzed •Performs document relevancy scoring on hits 33
  • 34. Prex Queries { "query": { "prex": { "name": "ip" } } curl -XGET 'localhost:9200/test/product/_search?format=yaml' -d @003-prex-query.json --hits: total: 3 max_score: 1.0 hits: - _index: "test" _type: "product" _id: "1" _score: 1.0 _source: category: "electronics" price: 129.99 name: "ipod" - _index: "test" _type: "product" _id: "2" _score: 1.0 _source: category: "electronics" price: 299.99 name: "iPhone" ... } 003-prex-query.json •Matches all documents that have elds that start with the specied prex •Search terms are not analyzed 34
  • 35. Complex Queries { curl -XGET 'localhost:9200/test/product/_search?format=yaml' -d @006-complex-bool-query.json --hits: total: 5 max_score: 1.8109303 hits: - _index: "test" _type: "product" _id: "1" _score: 1.8109303 _source: category: "electronics" price: 129.99 name: "ipod" - _index: "test" _type: "product" _id: "2" _score: 1.8109303 _source: category: "electronics" price: 299.99 name: "iPhone" - _index: "test" _type: "product" _id: "5" _score: 1.8109303 _source: category: "electronics" price: 139.99 name: "beats audio headphones" ... } "query": { "bool": { "must": { "term": { "category": "electronics" } }, "must_not": { "range": { "price": { "from": 300 } } } } } 006-complex-bool-query.json • This query nds all electronics products costing less than 300 dollars • a bool query allows us to composite individual query pieces with must, must_not and should clauses 35
  • 36. Facet Support in ES • Facet queries allow for faceted navigation whereby users of a search enabled application can see aggregate stats relevant to their search results • Example: Querying a product catalog for all “electronics” products and then getting back a list of the Top 10 sub-categories under that section with the total count of “number of items” per subcategory • By default facet counts returned are scoped to the query being performed. This can be altered by using the scope: global attribute on your search request • In addition to TopN on arbitrary elds, ES Also supports facets for: ✴ Documents within a user-dened ranges (e.g. price) ✴ Histogram counts with user-dened bin sizes ✴ Date Histograms with user-dened interval sizes ✴ Statistical Field Faceting (min,max,variance,std deviation, ss) ✴ Geographical distance from an arbitrary lat-lon • The true power of facets lies in the fact that they allow you to combine aggregate calculations with arbitrary search-driven drill-down and get real-time results. This creates a powerful platform for complex online analytics. 36
  • 37. Facet Queries { curl -XGET 'localhost:9200/test/product/_search?format=yaml' -d @007-complex-with-facets.json --... facets: category_breakdown: _type: "terms" missing: 0 total: 18 other: 0 terms: - term: "electronics" count: 7 - term: "sports" count: 3 - term: "entertainment" count: 3 - term: "clothing" count: 3 - term: "housewares" count: 2 price_stats: _type: "statistical" count: 5 total: 1169.95 min: 129.99 max: 299.99 mean: 233.99 sum_of_squares: 306476.6005 variance: 6543.999999999988 std_deviation: 80.89499366462667 ... "query": { ... same as previous complex query ... }, "facets": { "category_breakdown": { "terms" : { "eld" : "category", "size" : 10 }, "global": true }, "price_stats" : { "statistical": { "eld": "price" } } } } 007-complex-with-facets.json •Due to the search scope settings the categories are global, but the price statistics are local to the search results 37
  • 38. Performance Tips •Use lters instead of queries when possible. Doing so leverages underlying efciencies and cache opportunities from Lucene. From the ES documentation: Filters are very handy since they perform an order of magnitude better than plain queries since no scoring is performed and they are automatically cached. Filters can be a great candidate for caching. Caching the result of a filter does not require a lot of memory, and will cause other queries executing against the same filter (same parameters) to be blazingly fast. • Don’t store implicit elds unless they are needed. _source This eld stores the entire source document by default, if you don’t need this not storing saves signicant storage space _all This eld stores all stored elds in a single eld by default, if you don’t need to search for values in all elds for a given index and type you can leave it off. 38
  • 39. Security Considerations • Default access to ES, including its management APIs, is over unauthorized/unauthenticated REST-based APIs over plain HTTP. Can be used for various tasks, such as dropping the index or modifying the index denition to store more data. • In a production setting you should ensure: ✴ ES in only accessible from behind a rewall, don’t expose HTTP endpoints outside of a rewall! ✴ Set http.enabled = false to disable Netty and HTTP access on nodes that do not need to expose it. Alternatively, can use the ES Jetty plugin (https://github.com/sonian/ elasticsearch-jetty) to implement authentication and encryption. • If you have more stringent security requirements consider the following: ✴ By default ES uses multicast auto-discovery with an auto-join capability for new nodes. Use unicast whitelisting instead to ensure that new “rogue” nodes can’t be started nefariously. ✴ The lucene index data is stored on the node-local lesystem by default in unencrypted les. At a minimum, set proper le system access controls to prevent unauthorized access. You may also want to consider using an encrypted lesystem for your data directories to protect the data while it is stored. 39
  • 40. Cluster Load Balancing •ES nodes can have up to three roles: ✴Master - Master nodes are eligible for being declared the master for the whole cluster. Master nodes act as coordinators for the entire cluster. Only a single master is active at one time and if it fails a new one is automatically selected from the master election pool ✴Data Nodes - Data nodes hold the lucene index shards that make up ES distributed indexes ✴Client Nodes - Client nodes handle incoming client REST requests and coordinate data to satisfy them from the cluster’s data nodes •The default mode of operation for ES is to have each node take on all 3 roles within the cluster but you can tweak this in elasticsearch.yml and opt out of being a master or data node. 40
  • 41. Cluster Load Balancing Example Architecture Data nodes are the workhorses of the cluster so they are not congured to be master eligible. Data Data Data Node 1 Node 2 Node 3 Client nodes handle incoming REST client requests and are also both eligible master nodes in this cluster topology. If we had more nodes we could have congured dedicated master nodes as well. master master Node 4 Node 5 41
  • 42. Plugins •Plugins extend the core ES capability and provide extended or optional functionality. •Plugins can also have a “site” associated with them. This allows plugin writers to create third-party webbased UIs for interacting with their plugins •Some common plugins provide additional transport capabilities or integrations with other data technologies like NoSQL databases, relational databases, JMS, etc. 42
  • 43. Recommended Plugins Two management plugins are especially useful: Elasticsearch Head A plugin that provides a very nice UI for visually the state of an entire ES cluster. Also includes a query UI with a tabular results grid BigDesk A plugin that shows the last hour’s heap,thread,disk and cpu utilization, index request stats and much more across the cluster. 43
  • 44. References ✴ Ofcial Elasticsearch Reference Guide http://bit.ly/1kx8g4R ✴ Elasticsearch Query Tutorial http://bit.ly/1cfaTVj ✴ ES Index Design Patterns and Analytics (By ES creator) http://bit.ly/1kx8s3X ✴ More complicated mapping in Elasticsearch http://bit.ly/1bZNoPd ✴ Using Elasticsearch to speed up ltering http://bit.ly/JSWtj7 ✴ On Elasticsearch Performance http://bit.ly/J84j8o 44