ESKibana

© 2014 MapR Technologies 1© 2014 MapR Technologies
Elasticsearch & Kibana

© 2014 MapR Technologies 2
Agenda
• Brief overview of search
• Brief overview of Elasticsearch
• Brief overview of Kibana 4
• MapR-DB integration with Elasticsearch
• Demo

Search

Information Retrieval (IR)
“Information retrieval is the activity of obtaining information
resources (in the form of documents) relevant to an information
need from a collection of information resources. Searches can be
based on metadata or on full-text (or other content-based)
indexing”
~ Wikipedia

Basics
• Term t : a noun or compound word used in a specific context
• tf (t in d) : term frequency in a document
• The number of times term t appears in the currently scored document d
• idf (t) : inverse document frequency
• measure of whether the term is common or rare across a corpus of documents,
i.e. how often the term appears across the index
• boost (index) : boost of the field at index-time
• boost (query) : boost of the field at query-time

What is TFIDF?
TF - IDF = Term Frequency X Inverse Document Frequency

Lucene
• Fast, high performance, scalable search/IR library
• Open source
• Initially developed by Doug Cutting (Also author of Hadoop)
• Indexing and Searching
• Inverted Index of documents
• Provides advanced Search options like synonyms, stopwords,
based on similarity, proximity.
• http://lucene.apache.org/

What is an inverted index?

Indexing pipeline
Analyzer : create tokens using a Tokenizer and/or applying Filters (Token Filters)
Each field can define an Analyzer at index time/query time or both at same time.

What is a search engine?

Elasticsearch

Elasticsearch
• Enterprise Search platform built on top of Apache Lucene
• Open source
• Highly reliable, scalable, fault tolerant
• Support distributed Indexing, Replication and load balanced querying
• Distributed RESTful search server
• Document oriented
• Schema less
• Easy to scale horizontally
• http://www.elasticsearch.org/

Features
• Highlighting
• Spelling Suggestions
• Aggregations – Bucketing and Metric
• Query DSL
– based on JSON to define queries
• Automatic shard replication, routing, splitting and rebalancing
• Zen discovery
– Unicast
– Multicast
• Master Election
- Re-election if Master Node fails
• Schemaless or schema on the fly
• Percolation queries

Cluster Architecture

Index Request

Search Request

ES Client

High-level Client Architecture

Language clients
• Java
• Ruby
• Python
• Scala
• Go
• PHP
• Pearl
• Groovy
• Javascript
• C#
• .Net
• Haskell
• Clojure
• Erlang
• Ocaml
• Smalltalk
• Cold Fusion
• NodeJS
• R
• Eventmachine

ES and Friends
• Hadoop and Yarn
• Spark
• Storm
• Samza
• Kafka
• Hive
• Cascading
• Pig
Reference: https://speakerdeck.com/elastic/elasticsearch-hadoop-and-friends-spark-storm-and-more

Elasticsearch VS Solr
Download
Expanded
First run
0
50
100
150
200
250
300
Ref: http://www.slideshare.net/arafalov/solr-vs-elasticsearch-case-by-case

Kibana

Features
• Seamless Integration with Elasticsearch
• Give shape to data easily
• Sophisticated Analysis
• Flexible interface
• Empower more team members – easy to share
• Easy setup
• Visualize data from many sources – logstash, hadoop, Flume,
Fluentd etc
• Simple data import
• Support for aggregations and sub-aggregations

Analytics
27

Demo

MapRDB - ES Integration
Mansi Shah

Agenda
• Architecture
• Setup and Monitoring
• Default Conversion
• Custom Conversion
• Performance Considerations
• Gateway Configuration
• Q & A

MapR-DB Replication
• 4.1
• DR
• Async and Sync Replication on Geographically distributed
cluster
• Connector to ES 4.2/5.0
• Connector to Spark
• …

Replication
Gateway
MapR
Server
Volume 1
Volume 1
Table Replication Architecture
Client Operations Client operations
SRC Cluster DST Cluster
Volume 1
Volume 1
Volume 1
Table
1
Table
2
Table
n
Volume a
Table
2
Table
n
MapR
Server
Gateway
Nodes
Replication
Gateway MapR
Server
Table
1
Replication Stream Write

Replication
Gateway
MapR
Server
Volume 1
Volume 1
Table Replication Architecture
Client operations Client operations
Cluster1 Cluster2
Volume 1
Volume 1
Volume 1
Table
1
Table
2
Table
n
Volume a
Table
2
Table
n
MapR
Server
Gateway
Nodes
Replication
Gateway MapR
Server
Table
1
Replication
GatewayGateway
Nodes
Replication
Gateway
Replication Stream Write

MapR
Server
Volume 1
Volume 1
Client operations Client operations
MapR-DB Cluster
Volume 1
Table 1
Table
2
Table n
MapR
Server
Gateway
Nodes
Replication
Gateway &
ES Client
Repl Stream Write ES
Cluster
Elasticsearch
Cluster
ES Replication Architecture

Setup and Monitoring

Register Elasticsearch Cluster
MapR Cluster
MFS Nodes
+
Gateway
Nodes
Elasticsearch
Cluster
Register the
elasticsearch
cluster with
mapr

Create a replication to ES
MapR-DB
Table
ES Index
+ Type
● Register the ES cluster with MapR
● Create a source table
● Start replication on the source table.
maprcli table replica elasticsearch autosetup -path /test1 -target elasticsearch -index demoidx -type demotype
* Will work with mcs in later versions – 5.0 (possibly)

DEMO

Data Conversion

Default Data Conversion
● Converts byte-stream stored in MapR-DB to basic ES data types using mappings stored in ES.
● Data Types supported - String, Int, Float, Double, Long, Short. Date (as epoch), Geo-point / Geo-
Hash, Boolean, Binary, IP, etc.
● Gateway reads data type mapping from ES and then converts data based on this mapping
● Example mapping added to Elasticsearch during index creation
PUT /costarica/_mapping/activities
{
"activities" : {
"properties" : {
“CF1” : {“type” : “nested”}
“properties”: {
“name” : {“type” : “string”},
"price" : {"type" : "integer"},
"rating": {"type" : "float"},
"location":{"type" : "geo_point"}
} } } } }

Mapping Continued ...
PUT /costarica/_mapping/activities
{
"activities" : {
"properties" : {
“CF1” : {“type” : “nested”}
“properties”: {
“name” : {“type” : “string”},
"price" : {"type" : "integer"},
"rating": {"type" : "float"},
"location":{"type" : "geo_point"}
} } } } }
GET /costarica/activities/row1
{
“CF1” : {
“name” : “kayaking”,
"price" : 50,
"rating": 8.6,
"location": “78.45, 14.33”
}
“CF2” : {
description: “river safari, animals, bird-watching”
}
}

Custom Conversion
{
“CF1” : {
"price" : 50,
"rating": 8.6,
"location": “78.45, 14.33”
}
“CF2” : {
description: “river safari, animals”
}
}
{
"price" : 90,
"rating": 8,
“location": “78.45, 14.33”,
“tags”: [“river safari”, “animals”, “birds”
}
● Custom mapping / conversion / data manipulation.
● Non-supported data types - arrays, nested etc
● Special settings like - replication, routing, scripts, scripting language.
● Multiple JSON documents per source table update
● Delete something on a row update.

Gateway configuration

Gateways overview
• mapr-gateway is a new package.
• Has no dependency on mapr-fileserver or mapr-cldb, can run
independently.
• Warden monitors the status, restarts if the service terminates.
• MCS shows the health of gateway service.
• Is not counted as a licensed node (only nodes having mfs are
counted).

Gateways discovery
• MFS nodes on the source cluster need to discover the gateways
to a destination cluster.
• Gateways can be specified via DNS
– Add an entry in the dns with key gateway.dstClusterName, and value
being a list of hostnames or IPs
• Gateways can be specified via maprcli
maprcli cluster gateway set -dstcluster hyd -gateway
“gw1.hyd.maprtech.com gw2.hyd.maprtech.com”

Q & A

Q&A
@mapr maprtech
mgunturu@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

ESKibana

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to ESKibana

Similar to ESKibana (20)

ESKibana