SlideShare a Scribd company logo
1 of 46
© 2014 MapR Technologies 1© 2014 MapR Technologies
Elasticsearch & Kibana
© 2014 MapR Technologies 2
Agenda
• Brief overview of search
• Brief overview of Elasticsearch
• Brief overview of Kibana 4
• MapR-DB integration with Elasticsearch
• Demo
© 2014 MapR Technologies 3
Search
© 2014 MapR Technologies 4
Information Retrieval (IR)
“Information retrieval is the activity of obtaining information
resources (in the form of documents) relevant to an information
need from a collection of information resources. Searches can be
based on metadata or on full-text (or other content-based)
indexing”
~ Wikipedia
© 2014 MapR Technologies 5
Basics
• Term t : a noun or compound word used in a specific context
• tf (t in d) : term frequency in a document
• The number of times term t appears in the currently scored document d
• idf (t) : inverse document frequency
• measure of whether the term is common or rare across a corpus of documents,
i.e. how often the term appears across the index
• boost (index) : boost of the field at index-time
• boost (query) : boost of the field at query-time
© 2014 MapR Technologies 6
What is TFIDF?
TF - IDF = Term Frequency X Inverse Document Frequency
© 2014 MapR Technologies 7
Lucene
• Fast, high performance, scalable search/IR library
• Open source
• Initially developed by Doug Cutting (Also author of Hadoop)
• Indexing and Searching
• Inverted Index of documents
• Provides advanced Search options like synonyms, stopwords,
based on similarity, proximity.
• http://lucene.apache.org/
© 2014 MapR Technologies 8
What is an inverted index?
© 2014 MapR Technologies 10
Indexing pipeline
Analyzer : create tokens using a Tokenizer and/or applying Filters (Token Filters)
Each field can define an Analyzer at index time/query time or both at same time.
© 2014 MapR Technologies 11
What is a search engine?
© 2014 MapR Technologies 12
Elasticsearch
© 2014 MapR Technologies 13
Elasticsearch
• Enterprise Search platform built on top of Apache Lucene
• Open source
• Highly reliable, scalable, fault tolerant
• Support distributed Indexing, Replication and load balanced querying
• Distributed RESTful search server
• Document oriented
• Schema less
• Easy to scale horizontally
• http://www.elasticsearch.org/
© 2014 MapR Technologies 14
Features
• Highlighting
• Spelling Suggestions
• Aggregations – Bucketing and Metric
• Query DSL
– based on JSON to define queries
• Automatic shard replication, routing, splitting and rebalancing
• Zen discovery
– Unicast
– Multicast
• Master Election
- Re-election if Master Node fails
• Schemaless or schema on the fly
• Percolation queries
© 2014 MapR Technologies 15
Cluster Architecture
© 2014 MapR Technologies 16
Index Request
© 2014 MapR Technologies 17
Search Request
© 2014 MapR Technologies 18
ES Client
© 2014 MapR Technologies 19
High-level Client Architecture
© 2014 MapR Technologies 20
Language clients
• Java
• Ruby
• Python
• Scala
• Go
• PHP
• Pearl
• Groovy
• Javascript
• C#
• .Net
• Haskell
• Clojure
• Erlang
• Ocaml
• Smalltalk
• Cold Fusion
• NodeJS
• R
• Eventmachine
© 2014 MapR Technologies 21
ES and Friends
• Hadoop and Yarn
• Spark
• Storm
• Samza
• Kafka
• Hive
• Cascading
• Pig
Reference: https://speakerdeck.com/elastic/elasticsearch-hadoop-and-friends-spark-storm-and-more
© 2014 MapR Technologies 22
Elasticsearch VS Solr
Download
Expanded
First run
0
50
100
150
200
250
300
Ref: http://www.slideshare.net/arafalov/solr-vs-elasticsearch-case-by-case
© 2014 MapR Technologies 23
Kibana
© 2014 MapR Technologies 24
Features
• Seamless Integration with Elasticsearch
• Give shape to data easily
• Sophisticated Analysis
• Flexible interface
• Empower more team members – easy to share
• Easy setup
• Visualize data from many sources – logstash, hadoop, Flume,
Fluentd etc
• Simple data import
• Support for aggregations and sub-aggregations
© 2014 MapR Technologies 25
© 2014 MapR Technologies 26
© 2014 MapR Technologies 27
Analytics
27
© 2014 MapR Technologies 28
Demo
© 2014 MapR Technologies 29© 2014 MapR Technologies
MapRDB - ES Integration
Mansi Shah
© 2014 MapR Technologies 30
Agenda
• Architecture
• Setup and Monitoring
• Default Conversion
• Custom Conversion
• Performance Considerations
• Gateway Configuration
• Q & A
© 2014 MapR Technologies 31
MapR-DB Replication
• 4.1
• DR
• Async and Sync Replication on Geographically distributed
cluster
• Connector to ES 4.2/5.0
• Connector to Spark
• …
© 2014 MapR Technologies 32
Replication
Gateway
MapR
Server
Volume 1
Volume 1
Table Replication Architecture
Client Operations Client operations
SRC Cluster DST Cluster
Volume 1
Volume 1
Volume 1
Table
1
Table
2
Table
n
Volume a
Table
2
Table
n
MapR
Server
Gateway
Nodes
Replication
Gateway MapR
Server
Table
1
Replication Stream Write
© 2014 MapR Technologies 33
Replication
Gateway
MapR
Server
Volume 1
Volume 1
Table Replication Architecture
Client operations Client operations
Cluster1 Cluster2
Volume 1
Volume 1
Volume 1
Table
1
Table
2
Table
n
Volume a
Table
2
Table
n
MapR
Server
Gateway
Nodes
Replication
Gateway MapR
Server
Table
1
Replication
GatewayGateway
Nodes
Replication
Gateway
Replication Stream Write
© 2014 MapR Technologies 34
MapR
Server
Volume 1
Volume 1
Client operations Client operations
MapR-DB Cluster
Volume 1
Table 1
Table
2
Table n
MapR
Server
Gateway
Nodes
Replication
Gateway &
ES Client
Repl Stream Write ES
Cluster
Elasticsearch
Cluster
ES Replication Architecture
© 2014 MapR Technologies 35© 2014 MapR Technologies
Setup and Monitoring
© 2014 MapR Technologies 36
Register Elasticsearch Cluster
MapR Cluster
MFS Nodes
+
Gateway
Nodes
Elasticsearch
Cluster
Register the
elasticsearch
cluster with
mapr
© 2014 MapR Technologies 37
Create a replication to ES
MapR-DB
Table
ES Index
+ Type
● Register the ES cluster with MapR
● Create a source table
● Start replication on the source table.
maprcli table replica elasticsearch autosetup -path /test1 -target elasticsearch -index demoidx -type demotype
* Will work with mcs in later versions – 5.0 (possibly)
© 2014 MapR Technologies 38© 2014 MapR Technologies
DEMO
© 2014 MapR Technologies 39© 2014 MapR Technologies
Data Conversion
© 2014 MapR Technologies 40
Default Data Conversion
● Converts byte-stream stored in MapR-DB to basic ES data types using mappings stored in ES.
● Data Types supported - String, Int, Float, Double, Long, Short. Date (as epoch), Geo-point / Geo-
Hash, Boolean, Binary, IP, etc.
● Gateway reads data type mapping from ES and then converts data based on this mapping
● Example mapping added to Elasticsearch during index creation
PUT /costarica/_mapping/activities
{
"activities" : {
"properties" : {
“CF1” : {“type” : “nested”}
“properties”: {
“name” : {“type” : “string”},
"price" : {"type" : "integer"},
"rating": {"type" : "float"},
"location":{"type" : "geo_point"}
} } } } }
© 2014 MapR Technologies 41
Mapping Continued ...
PUT /costarica/_mapping/activities
{
"activities" : {
"properties" : {
“CF1” : {“type” : “nested”}
“properties”: {
“name” : {“type” : “string”},
"price" : {"type" : "integer"},
"rating": {"type" : "float"},
"location":{"type" : "geo_point"}
} } } } }
GET /costarica/activities/row1
{
“CF1” : {
“name” : “kayaking”,
"price" : 50,
"rating": 8.6,
"location": “78.45, 14.33”
}
“CF2” : {
description: “river safari, animals, bird-watching”
}
}
© 2014 MapR Technologies 42
Custom Conversion
GET /costarica/activities/row1
{
“CF1” : {
“name” : “kayaking”,
"price" : 50,
"rating": 8.6,
"location": “78.45, 14.33”
}
“CF2” : {
description: “river safari, animals”
}
}
GET /costarica/activities/row1
{
“name” : “kayaking”,
"price" : 90,
"rating": 8,
“location": “78.45, 14.33”,
“tags”: [“river safari”, “animals”, “birds”
}
● Custom mapping / conversion / data manipulation.
● Non-supported data types - arrays, nested etc
● Special settings like - replication, routing, scripts, scripting language.
● Multiple JSON documents per source table update
● Delete something on a row update.
© 2014 MapR Technologies 45© 2014 MapR Technologies
Gateway configuration
© 2014 MapR Technologies 46
Gateways overview
• mapr-gateway is a new package.
• Has no dependency on mapr-fileserver or mapr-cldb, can run
independently.
• Warden monitors the status, restarts if the service terminates.
• MCS shows the health of gateway service.
• Is not counted as a licensed node (only nodes having mfs are
counted).
© 2014 MapR Technologies 47
Gateways discovery
• MFS nodes on the source cluster need to discover the gateways
to a destination cluster.
• Gateways can be specified via DNS
– Add an entry in the dns with key gateway.dstClusterName, and value
being a list of hostnames or IPs
• Gateways can be specified via maprcli
maprcli cluster gateway set -dstcluster hyd -gateway
“gw1.hyd.maprtech.com gw2.hyd.maprtech.com”
© 2014 MapR Technologies 48© 2014 MapR Technologies
Q & A
© 2014 MapR Technologies 49
Q&A
@mapr maprtech
mgunturu@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

What's hot

Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down InternetMapR Technologies
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBaseCarol McDonald
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopSpagoWorld
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Adam Doyle
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...DataWorks Summit/Hadoop Summit
 
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions ArchitectHUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions ArchitectSpagoWorld
 
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistHUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistSpagoWorld
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Carol McDonald
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseMapR Technologies
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batchboorad
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkMapR Technologies
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APICarol McDonald
 

What's hot (20)

Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
 
Meet Spark
Meet SparkMeet Spark
Meet Spark
 
Getting started with HBase
Getting started with HBaseGetting started with HBase
Getting started with HBase
 
Architectural Evolution Starting from Hadoop
Architectural Evolution Starting from HadoopArchitectural Evolution Starting from Hadoop
Architectural Evolution Starting from Hadoop
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions ArchitectHUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions Architect
 
Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistHUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Free Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBaseFree Code Friday - Spark Streaming with HBase
Free Code Friday - Spark Streaming with HBase
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
 
Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Streaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka APIStreaming Patterns Revolutionary Architectures with the Kafka API
Streaming Patterns Revolutionary Architectures with the Kafka API
 

Viewers also liked

ElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
ElasticES-Hadoop: Bridging the world of Hadoop and ElasticsearchElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
ElasticES-Hadoop: Bridging the world of Hadoop and ElasticsearchMapR Technologies
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesCarol McDonald
 
Ted Dunning - Keynote: How Can We Take Flink Forward?
Ted Dunning -  Keynote: How Can We Take Flink Forward?Ted Dunning -  Keynote: How Can We Take Flink Forward?
Ted Dunning - Keynote: How Can We Take Flink Forward?Flink Forward
 
SimplifyStreamingArchitecture
SimplifyStreamingArchitectureSimplifyStreamingArchitecture
SimplifyStreamingArchitectureMaheedhar Gunturu
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
The Search for Meaning in B2B Marketing
The Search for Meaning in B2B MarketingThe Search for Meaning in B2B Marketing
The Search for Meaning in B2B MarketingVelocity Partners
 

Viewers also liked (9)

Kafka talk
Kafka talkKafka talk
Kafka talk
 
ElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
ElasticES-Hadoop: Bridging the world of Hadoop and ElasticsearchElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
ElasticES-Hadoop: Bridging the world of Hadoop and Elasticsearch
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Ted Dunning - Keynote: How Can We Take Flink Forward?
Ted Dunning -  Keynote: How Can We Take Flink Forward?Ted Dunning -  Keynote: How Can We Take Flink Forward?
Ted Dunning - Keynote: How Can We Take Flink Forward?
 
SimplifyStreamingArchitecture
SimplifyStreamingArchitectureSimplifyStreamingArchitecture
SimplifyStreamingArchitecture
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
The Search for Meaning in B2B Marketing
The Search for Meaning in B2B MarketingThe Search for Meaning in B2B Marketing
The Search for Meaning in B2B Marketing
 

Similar to ESKibana

Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Steve Min
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Chris Fregly
 
Incremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF GraphsIncremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF GraphsNikolaos Konstantinou
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Anthony Baker
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationPeter Haase
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over YarnInMobi Technology
 
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Data Con LA
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkVince Gonzalez
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark Summit
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Chris Fregly
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillMapR Technologies
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...aiuy
 
CitySprint Fleetmapper use case -Big Data Bootcamp
CitySprint  Fleetmapper use case -Big Data BootcampCitySprint  Fleetmapper use case -Big Data Bootcamp
CitySprint Fleetmapper use case -Big Data BootcampEduard Lazar
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...DB Tsai
 

Similar to ESKibana (20)

Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)Apache Spark Overview part1 (20161107)
Apache Spark Overview part1 (20161107)
 
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
Kinesis and Spark Streaming - Advanced AWS Meetup - August 2014
 
Incremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF GraphsIncremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF Graphs
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CIT
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache Drill
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Apache Spark Streaming
Apache Spark StreamingApache Spark Streaming
Apache Spark Streaming
 
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
Coral-and-Transport_Portable-SQL-and-UDFs-for-the-Interoperability-of-Spark-a...
 
Keys for Success from Streams to Queries
Keys for Success from Streams to QueriesKeys for Success from Streams to Queries
Keys for Success from Streams to Queries
 
CitySprint Fleetmapper use case -Big Data Bootcamp
CitySprint  Fleetmapper use case -Big Data BootcampCitySprint  Fleetmapper use case -Big Data Bootcamp
CitySprint Fleetmapper use case -Big Data Bootcamp
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
 

ESKibana

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies Elasticsearch & Kibana
  • 2. © 2014 MapR Technologies 2 Agenda • Brief overview of search • Brief overview of Elasticsearch • Brief overview of Kibana 4 • MapR-DB integration with Elasticsearch • Demo
  • 3. © 2014 MapR Technologies 3 Search
  • 4. © 2014 MapR Technologies 4 Information Retrieval (IR) “Information retrieval is the activity of obtaining information resources (in the form of documents) relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing” ~ Wikipedia
  • 5. © 2014 MapR Technologies 5 Basics • Term t : a noun or compound word used in a specific context • tf (t in d) : term frequency in a document • The number of times term t appears in the currently scored document d • idf (t) : inverse document frequency • measure of whether the term is common or rare across a corpus of documents, i.e. how often the term appears across the index • boost (index) : boost of the field at index-time • boost (query) : boost of the field at query-time
  • 6. © 2014 MapR Technologies 6 What is TFIDF? TF - IDF = Term Frequency X Inverse Document Frequency
  • 7. © 2014 MapR Technologies 7 Lucene • Fast, high performance, scalable search/IR library • Open source • Initially developed by Doug Cutting (Also author of Hadoop) • Indexing and Searching • Inverted Index of documents • Provides advanced Search options like synonyms, stopwords, based on similarity, proximity. • http://lucene.apache.org/
  • 8. © 2014 MapR Technologies 8 What is an inverted index?
  • 9. © 2014 MapR Technologies 10 Indexing pipeline Analyzer : create tokens using a Tokenizer and/or applying Filters (Token Filters) Each field can define an Analyzer at index time/query time or both at same time.
  • 10. © 2014 MapR Technologies 11 What is a search engine?
  • 11. © 2014 MapR Technologies 12 Elasticsearch
  • 12. © 2014 MapR Technologies 13 Elasticsearch • Enterprise Search platform built on top of Apache Lucene • Open source • Highly reliable, scalable, fault tolerant • Support distributed Indexing, Replication and load balanced querying • Distributed RESTful search server • Document oriented • Schema less • Easy to scale horizontally • http://www.elasticsearch.org/
  • 13. © 2014 MapR Technologies 14 Features • Highlighting • Spelling Suggestions • Aggregations – Bucketing and Metric • Query DSL – based on JSON to define queries • Automatic shard replication, routing, splitting and rebalancing • Zen discovery – Unicast – Multicast • Master Election - Re-election if Master Node fails • Schemaless or schema on the fly • Percolation queries
  • 14. © 2014 MapR Technologies 15 Cluster Architecture
  • 15. © 2014 MapR Technologies 16 Index Request
  • 16. © 2014 MapR Technologies 17 Search Request
  • 17. © 2014 MapR Technologies 18 ES Client
  • 18. © 2014 MapR Technologies 19 High-level Client Architecture
  • 19. © 2014 MapR Technologies 20 Language clients • Java • Ruby • Python • Scala • Go • PHP • Pearl • Groovy • Javascript • C# • .Net • Haskell • Clojure • Erlang • Ocaml • Smalltalk • Cold Fusion • NodeJS • R • Eventmachine
  • 20. © 2014 MapR Technologies 21 ES and Friends • Hadoop and Yarn • Spark • Storm • Samza • Kafka • Hive • Cascading • Pig Reference: https://speakerdeck.com/elastic/elasticsearch-hadoop-and-friends-spark-storm-and-more
  • 21. © 2014 MapR Technologies 22 Elasticsearch VS Solr Download Expanded First run 0 50 100 150 200 250 300 Ref: http://www.slideshare.net/arafalov/solr-vs-elasticsearch-case-by-case
  • 22. © 2014 MapR Technologies 23 Kibana
  • 23. © 2014 MapR Technologies 24 Features • Seamless Integration with Elasticsearch • Give shape to data easily • Sophisticated Analysis • Flexible interface • Empower more team members – easy to share • Easy setup • Visualize data from many sources – logstash, hadoop, Flume, Fluentd etc • Simple data import • Support for aggregations and sub-aggregations
  • 24. © 2014 MapR Technologies 25
  • 25. © 2014 MapR Technologies 26
  • 26. © 2014 MapR Technologies 27 Analytics 27
  • 27. © 2014 MapR Technologies 28 Demo
  • 28. © 2014 MapR Technologies 29© 2014 MapR Technologies MapRDB - ES Integration Mansi Shah
  • 29. © 2014 MapR Technologies 30 Agenda • Architecture • Setup and Monitoring • Default Conversion • Custom Conversion • Performance Considerations • Gateway Configuration • Q & A
  • 30. © 2014 MapR Technologies 31 MapR-DB Replication • 4.1 • DR • Async and Sync Replication on Geographically distributed cluster • Connector to ES 4.2/5.0 • Connector to Spark • …
  • 31. © 2014 MapR Technologies 32 Replication Gateway MapR Server Volume 1 Volume 1 Table Replication Architecture Client Operations Client operations SRC Cluster DST Cluster Volume 1 Volume 1 Volume 1 Table 1 Table 2 Table n Volume a Table 2 Table n MapR Server Gateway Nodes Replication Gateway MapR Server Table 1 Replication Stream Write
  • 32. © 2014 MapR Technologies 33 Replication Gateway MapR Server Volume 1 Volume 1 Table Replication Architecture Client operations Client operations Cluster1 Cluster2 Volume 1 Volume 1 Volume 1 Table 1 Table 2 Table n Volume a Table 2 Table n MapR Server Gateway Nodes Replication Gateway MapR Server Table 1 Replication GatewayGateway Nodes Replication Gateway Replication Stream Write
  • 33. © 2014 MapR Technologies 34 MapR Server Volume 1 Volume 1 Client operations Client operations MapR-DB Cluster Volume 1 Table 1 Table 2 Table n MapR Server Gateway Nodes Replication Gateway & ES Client Repl Stream Write ES Cluster Elasticsearch Cluster ES Replication Architecture
  • 34. © 2014 MapR Technologies 35© 2014 MapR Technologies Setup and Monitoring
  • 35. © 2014 MapR Technologies 36 Register Elasticsearch Cluster MapR Cluster MFS Nodes + Gateway Nodes Elasticsearch Cluster Register the elasticsearch cluster with mapr
  • 36. © 2014 MapR Technologies 37 Create a replication to ES MapR-DB Table ES Index + Type ● Register the ES cluster with MapR ● Create a source table ● Start replication on the source table. maprcli table replica elasticsearch autosetup -path /test1 -target elasticsearch -index demoidx -type demotype * Will work with mcs in later versions – 5.0 (possibly)
  • 37. © 2014 MapR Technologies 38© 2014 MapR Technologies DEMO
  • 38. © 2014 MapR Technologies 39© 2014 MapR Technologies Data Conversion
  • 39. © 2014 MapR Technologies 40 Default Data Conversion ● Converts byte-stream stored in MapR-DB to basic ES data types using mappings stored in ES. ● Data Types supported - String, Int, Float, Double, Long, Short. Date (as epoch), Geo-point / Geo- Hash, Boolean, Binary, IP, etc. ● Gateway reads data type mapping from ES and then converts data based on this mapping ● Example mapping added to Elasticsearch during index creation PUT /costarica/_mapping/activities { "activities" : { "properties" : { “CF1” : {“type” : “nested”} “properties”: { “name” : {“type” : “string”}, "price" : {"type" : "integer"}, "rating": {"type" : "float"}, "location":{"type" : "geo_point"} } } } } }
  • 40. © 2014 MapR Technologies 41 Mapping Continued ... PUT /costarica/_mapping/activities { "activities" : { "properties" : { “CF1” : {“type” : “nested”} “properties”: { “name” : {“type” : “string”}, "price" : {"type" : "integer"}, "rating": {"type" : "float"}, "location":{"type" : "geo_point"} } } } } } GET /costarica/activities/row1 { “CF1” : { “name” : “kayaking”, "price" : 50, "rating": 8.6, "location": “78.45, 14.33” } “CF2” : { description: “river safari, animals, bird-watching” } }
  • 41. © 2014 MapR Technologies 42 Custom Conversion GET /costarica/activities/row1 { “CF1” : { “name” : “kayaking”, "price" : 50, "rating": 8.6, "location": “78.45, 14.33” } “CF2” : { description: “river safari, animals” } } GET /costarica/activities/row1 { “name” : “kayaking”, "price" : 90, "rating": 8, “location": “78.45, 14.33”, “tags”: [“river safari”, “animals”, “birds” } ● Custom mapping / conversion / data manipulation. ● Non-supported data types - arrays, nested etc ● Special settings like - replication, routing, scripts, scripting language. ● Multiple JSON documents per source table update ● Delete something on a row update.
  • 42. © 2014 MapR Technologies 45© 2014 MapR Technologies Gateway configuration
  • 43. © 2014 MapR Technologies 46 Gateways overview • mapr-gateway is a new package. • Has no dependency on mapr-fileserver or mapr-cldb, can run independently. • Warden monitors the status, restarts if the service terminates. • MCS shows the health of gateway service. • Is not counted as a licensed node (only nodes having mfs are counted).
  • 44. © 2014 MapR Technologies 47 Gateways discovery • MFS nodes on the source cluster need to discover the gateways to a destination cluster. • Gateways can be specified via DNS – Add an entry in the dns with key gateway.dstClusterName, and value being a list of hostnames or IPs • Gateways can be specified via maprcli maprcli cluster gateway set -dstcluster hyd -gateway “gw1.hyd.maprtech.com gw2.hyd.maprtech.com”
  • 45. © 2014 MapR Technologies 48© 2014 MapR Technologies Q & A
  • 46. © 2014 MapR Technologies 49 Q&A @mapr maprtech mgunturu@mapr.com Engage with us! MapR maprtech mapr-technologies