SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
ElasticSearch –
Suche im Zeitalter der Clouds
Christian Meder
Bernhard Pflugfelder
inovex Gmbh
Background
‣  open source (free software)
‣  Linux
‣  Web
‣  Java
‣  Android
‣  CTO@inovex
‣  Christian Meder
Christian MederSpeaker
2
Background
‣  Lucene
‣  Solr
‣  Text Mining Technologies,
Information Retrieval
‣  Hadoop
‣  Java
‣  Big Data Engineer@inovex
‣  bpflugfelder@inovex.de
Bernhard PflugfelderSpeaker
3
‣  Search is everywhere
‣  Elasticsearch
‣  Examples
‣  Overview
‣  Features
Agenda
4
Search, what?
5
Enterprise SearchSearch applications
6
Online shopsSearch applications
7
Semantic searchSearch applications
8
Navigation &
Information access
Search applications
9
Data analysisSearch applications
10
http://datarpm.com/product
Log-file AnalysisSearch applications
11
http://kibana.org/
Document storeSearch applications
12
‣  Can you think of other scenarios where search applications
will also do a good job?
‣  Remind the key capabilities of search technologies:
‣  Persistency
‣  Flexible data model
‣  Unstructured data, but not only
‣  Extremely quick access to data
‣  Horizontal scalability
There are plenty of applications scenarios out there where
search technologies shall be considered!
Document storeSearch applications
13
Open sourceSearch technologies
14
http://lucene.apache.org
http://lucene.apache.org/solr/
http://www.elasticsearch.org
Lucene is an open source, pure Java API
for enabling information retrieval
‣  Originally developed by Doug Cutting 1999 and became Apache TLP in 2001
‣  Licensed by Apache License 2.0
‣  Pure Java Library with implementations for :
‣  Lucene.NET (http://lucenenet.apache.org)
‣  PyLucene (http://lucene.apache.org/pylucene/)
‣  and more:
http://wiki.apache.org/lucene-java/LuceneImplementations
‣  Large and very active developer community, well documented and supported (38
active committer!)
‣  Current stable release: 4.2.1
‣  Widely used and adopted for commercial / non-commercial projects:
http://wiki.apache.org/lucene-java/PoweredBy
Overview
15
http://lucene.apache.org/
Solr is a standalone enterprise search server & document
store with based on Lucene
‣  Created by Yonik Seeley at CNET Networks in 2004
‣  Introduced as Apache Incubator in 2006, became TLP in 2007
‣  Licensed by Apache License 2.0
‣  Seeley and others founded Lucid Imagination -> LucidWorks
‣  Large and very active developer community, well documented and supported
(strong relationship to Lucene community also)
‣  Current stable release: 4.2.1
‣  Widely used and adopted for commercial / non-commercial projects:
http://wiki.apache.org/solr/PublicServers
Overview
16
http://lucene.apache.org/solr/
“You know, for search” (Shay Banon)
Search technologies
17
Elasticsearch is a “distributed-from-scratch” search server
based on Lucene
Created by Shay Banon with a first version made public in 02/2010:
Elasticsearch itself was born out of my frustration with the fact that there isn’t really a
good, open source, solution for distributed search engine out there, which also
combines what I expect of search engines after building Compass (and on that, I will
blog later…).
I have been working on this for the past several months, pouring my search and
distributed knowledge into this (and portions of my heart and time ;) )
[http://www.elasticsearch.org/blog/2010/02/08/youknowforsearch.html]
Motivation
18
http://www.elasticsearch.org/
‣  Current stable version 0.20.6 working with Lucene 3.6
‣  Available version 0.90 RC2 includes Lucene 4.2.1 integration
‣  Licensed by Apache License 2.0
‣  Small, but growing group of core developer
‣  Strong support of valuable Lucene committer
‣  Company elasticsearch.com founded in 2012
‣  By the people behind elasticsearch.org
‣  www.elasticsearch.com
Overview
19
http://www.elasticsearch.org/
Customers
20
http://www.elasticsearch.org/
‣  Code search is organized on a cluster
‣  26 storage nodes holding the searchable data
‣  8 client nodes coordinating query requests
‣  Storage cluster has 2TB of SSD based storage
‣  17 TB of indexed data is stored in cluster
‣  shared in the cluster with replication factor of 1
‣  makes overall 34 TB of indexed data
Github
21
http://www.elasticsearch.org/
‣  Question-and-answer website
‣  aggregates questions and answer in terms of topics
‣  Sources are the web in general, social media
‣  Goals for search:
‣  low latency for queries
‣  increased relevancy of results.
‣  evaluates elasticsearch against Solr and Sphinx
‣  “After much benchmarking with our data set, we discovered that ElasticSearch
was clearly the fastest of the possible search platforms we were considering.”
Quora
22
http://www.elasticsearch.org/
Quora
23
http://www.elasticsearch.org/
http://www.quora.com/Full-Text-Search-on-Quora/What-technology-does-Quora-use-for-its-full-text-search-infrastructure/answer/Adrien-Lucas-Ecoffet?
srid=pilt&share=1
Soundcloud
24
http://bed-con.org/2013/wp-content/uploads/2013/04/Wie_SoundCloud_skaliert.pdf
http://www.elasticsearch.org/
Moloch
25
https://github.com/aol/moloch
http://www.elasticsearch.org/
Huffington Post
26
http://blogs.vmware.com/vfabric/2013/03/scaling-real-time-comments-huffpost-live-with-rabbitmq.html
http://www.elasticsearch.org/
Search pipeline
27
‣  Scalable, High-Performance Indexing
‣  over 95GB/hour on modern hardware
‣  small RAM requirements
‣  incremental indexing as fast as batch indexing
‣  index size roughly 20-30% the size of text indexed
‣  Powerful, Accurate and Efficient Search Algorithms
‣  ranked searching -- best results returned first
‣  many powerful query types
‣  fielded searching (e.g., title, author, contents)
‣  date-range searching
‣  sorting by any field
‣  multiple-index searching with merged results
‣  allows simultaneous update and searching
[From http://lucene.apache.org/core/features.html]
Highlights
28
http://lucene.apache.org/
‣  Pure Java application
‣  Powered by Lucene
‣  Document-oriented
‣  Schema-less
‣  HTTP API with JSON In & Out
‣  Indexing / Updating
‣  Searching
‣  Administration / Monitoring
‣  Extendable by plugins
‣  Distribution is a fundamental paradigm of Elasticsearch
Overview
29
http://www.elasticsearch.org/
Architecture
30
21 12
321
3 3
Primary Shard Replica Shard
Master node
Node
Node
http://www.elasticsearch.org/
‣  Index distribution by auto sharding
‣  Automatic replication and balancing
‣  Fault tolerant + high availability
‣  Cluster building & managment
‣  node detection through zen discovery
‣  nodes communicate via unicast / multicast
‣  automatic master election
‣  influence into master / data node assignment possible
‣  Master responsible to
‣  route the search request
‣  include new nodes into cluster
‣  Index / query routing (automatic / individual)
Architecture
31
http://www.elasticsearch.org/
Elasticsearch-head
32
http://www.elasticsearch.org/
https://github.com/mobz/elasticsearch-head
Elasticsearch-head
33
http://www.elasticsearch.org/
https://github.com/mobz/elasticsearch-head
Schema-less, but
34
http://www.elasticsearch.org/
‣  Define a mapping for type book
‣  Retrieve the current mapping for type book
Schema-less, but
35
# echo " {
"mappings" : {
"books" : {
"properties" : {
”id" : { "type" : "string" },
"title" : { "type" : "string" },
"author" : { "type" : "string" },
”subject" : { "type" : ”string" },
”view_count" : { "type" : ”integer" },
"created" : { "type" : "date",
"format" : “dateOptionalTime" }
}}}} " > book.json
curl –XPUT 'localhost:9200/gutenberg/books/_mapping’ –d @book.json
# curl 'localhost:9200/gutenberg/books/_mapping?pretty=1
http://www.elasticsearch.org/
‣  Search on terms, numeric values, dates, numeric ranges, date/time ranges
‣  Lots of query types
‣  terms, phrases, fuzzy, wildcard, ranges
‣  faceting, filtering
‣  Geospatial search called GeoShape Query
‣  Configurable caching for
‣  Filter queries
‣  Field values
‣  NRT search with separate API
‣  Sorting, Highlighting
‣  MoreLikeThis
‣  Multi Tenancy
Search highlights
36
http://www.elasticsearch.org/
Faceted search
37
http://www.elasticsearch.org/
Suggestion
38
http://www.elasticsearch.org/
Highlighting
39
http://www.elasticsearch.org/
Local search
40
http://www.elasticsearch.org/
Multi Tenancy
41
http://www.elasticsearch.org/
‣  Gateway module stores cluster metadata to:
‣  Local FS, Shared FS, Hadoop, Amazon S3
‣  River:
‣  Pluggable service to constantly pull data
‣  Manage over specific REST endpoint
‣  Implementations for CouchDB, MongoDB, JDBC, Solr, …
‣  Bulk indexing
‣  Default: single document indexing
‣  Bulk indexing over specific REST endpoints
‣  Lucene Analyzer specification over elasticsearch.yml or API
Some more features
42
http://www.elasticsearch.org/
‣  Query types such as term, terms, match, wildcard, fuzzy, range, …
‣  Multi Search
‣  Get
‣  Multi Get
‣  Filter
‣  Facets
‣  Highlighting
‣  Suggest
‣  MoreLikeThis
‣  Index boosting
‣  Explain
‣  Percolate
Search API
43
http://www.elasticsearch.org/
‣  Create, Delete, Exists, Open, Close, Optimize, Refresh, Flush, Settings
‣  Index templates (mappings + settings)
‣  Get, Put, Delete Mapping
‣  Get, update settings
‣  Snapshot
‣  Aliases
‣  Warmers
‣  Statistics, Status
Indices API
44
http://www.elasticsearch.org/
‣  Live configuration of cluster settings
‣  minimum master nodes
‣  cache sizes
‣  routing
‣  allocation
‣  moving shards
‣  Moving replicas
‣  Cluster health & status
‣  Nodes info & stats, Shutdown all / specific nodes
Cluster API
45
http://www.elasticsearch.org/
+  Elasticssearch feels light-weighted
+  Simple but effective architecture
+  Easiness of use, even when using distributed search
+  High matureness, even though ES is young
+  High-performance search (at least based on current benchmarks seen)
+  Modern technologies used (HTTP, JSON, NoXML, Java, Guava)
-  Still small community and small group of core developer
-  Missing data connectors (e.g. dataimporthandler),
-  Missing search features grouping & search result clustering
-  Less number of query types
-  Less possibilities for boosting (e.g function queries)
-  Less number of analyzers
Pros & Cons
46
http://www.elasticsearch.org/
‣  The world becomes data-driven and user-driven
‣  large data volumes
‣  multiple sources
‣  many users shall be able to access
‣  Therefore search technologies Elasticsearch becomes important:
‣  Easy aggregation of data from multiple sources
‣  Provide unified access layer through search
‣  Scalable regarding data volume and users
‣  Highly configurable
‣  ElasticSearch is easy to use, distributed, scalable and search is fast
Wrap up
47
http://www.elasticsearch.org/
Thank you!
End
48

Weitere ähnliche Inhalte

Was ist angesagt?

Using server logs to your advantage
Using server logs to your advantageUsing server logs to your advantage
Using server logs to your advantageAlexandra Johnson
 
Scrapinghub Deck for Startups
Scrapinghub Deck for StartupsScrapinghub Deck for Startups
Scrapinghub Deck for StartupsScrapinghub
 
Bea con anatomy-of-web-attack
Bea con anatomy-of-web-attackBea con anatomy-of-web-attack
Bea con anatomy-of-web-attackPatrick Laverty
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!Michele Leroux Bustamante
 
Datascript: Serverless Architetecture
Datascript: Serverless ArchitetectureDatascript: Serverless Architetecture
Datascript: Serverless ArchitetectureLadislav Prskavec
 
J-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your applicationJ-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your applicationMaciej Bilas
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaAvinash Ramineni
 
Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015ontopic
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Lucidworks
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevAltinity Ltd
 
Use cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentUse cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentOpenSource Connections
 
Cloud architectures for data science
Cloud architectures for data scienceCloud architectures for data science
Cloud architectures for data scienceMargriet Groenendijk
 
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaLucidworks
 
Small intro to Big Data - Old version
Small intro to Big Data - Old versionSmall intro to Big Data - Old version
Small intro to Big Data - Old versionSoftwareMill
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...Lucidworks
 
Designing RESTful APIs
Designing RESTful APIsDesigning RESTful APIs
Designing RESTful APIsanandology
 
«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghubit-people
 

Was ist angesagt? (20)

Using server logs to your advantage
Using server logs to your advantageUsing server logs to your advantage
Using server logs to your advantage
 
Scrapinghub Deck for Startups
Scrapinghub Deck for StartupsScrapinghub Deck for Startups
Scrapinghub Deck for Startups
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless Architecture
 
Bea con anatomy-of-web-attack
Bea con anatomy-of-web-attackBea con anatomy-of-web-attack
Bea con anatomy-of-web-attack
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!
 
Datascript: Serverless Architetecture
Datascript: Serverless ArchitetectureDatascript: Serverless Architetecture
Datascript: Serverless Architetecture
 
J-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your applicationJ-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your application
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
 
Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015Storm crawler apachecon_na_2015
Storm crawler apachecon_na_2015
 
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, Wal...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
 
Use cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentUse cases for cassandra in federal and state government
Use cases for cassandra in federal and state government
 
Cloud architectures for data science
Cloud architectures for data scienceCloud architectures for data science
Cloud architectures for data science
 
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, ClouderaWhy Is My Solr Slow?: Presented by Mike Drob, Cloudera
Why Is My Solr Slow?: Presented by Mike Drob, Cloudera
 
Small intro to Big Data - Old version
Small intro to Big Data - Old versionSmall intro to Big Data - Old version
Small intro to Big Data - Old version
 
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
 
Designing RESTful APIs
Designing RESTful APIsDesigning RESTful APIs
Designing RESTful APIs
 
«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub
 
Shawn-Averkamp-feb25
Shawn-Averkamp-feb25Shawn-Averkamp-feb25
Shawn-Averkamp-feb25
 

Ähnlich wie ElasticSearch - Suche im Zeitalter der Clouds

Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.inovex GmbH
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingCascading
 
The original vision of Nutch, 14 years later: Building an open source search ...
The original vision of Nutch, 14 years later: Building an open source search ...The original vision of Nutch, 14 years later: Building an open source search ...
The original vision of Nutch, 14 years later: Building an open source search ...Sylvain Zimmer
 
Fluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP ArchiveFluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP ArchivePaul Calvano
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksSlim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...confluent
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)Abdelkrim Boujraf
 
Datasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDatasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDamian T. Gordon
 
Intro to Exhibit Workshop
Intro to Exhibit WorkshopIntro to Exhibit Workshop
Intro to Exhibit WorkshopShawn Day
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the moveCodemotion
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 PresentationsAna Rebelo
 

Ähnlich wie ElasticSearch - Suche im Zeitalter der Clouds (20)

Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log Processing
 
The original vision of Nutch, 14 years later: Building an open source search ...
The original vision of Nutch, 14 years later: Building an open source search ...The original vision of Nutch, 14 years later: Building an open source search ...
The original vision of Nutch, 14 years later: Building an open source search ...
 
Fluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP ArchiveFluent 2018: Tracking Performance of the Web with HTTP Archive
Fluent 2018: Tracking Performance of the Web with HTTP Archive
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
 
ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)ALT-F1.BE : The Accelerator (Google Cloud Platform)
ALT-F1.BE : The Accelerator (Google Cloud Platform)
 
Publishing Linked Data from RDB
Publishing Linked Data from RDBPublishing Linked Data from RDB
Publishing Linked Data from RDB
 
Datasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDatasets, APIs, and Web Scraping
Datasets, APIs, and Web Scraping
 
Intro to Exhibit Workshop
Intro to Exhibit WorkshopIntro to Exhibit Workshop
Intro to Exhibit Workshop
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the move
 
Meetup070416 Presentations
Meetup070416 PresentationsMeetup070416 Presentations
Meetup070416 Presentations
 
SemaGrow demonstrator: “Web Crawler + AgroTagger”
SemaGrow demonstrator: “Web Crawler + AgroTagger”SemaGrow demonstrator: “Web Crawler + AgroTagger”
SemaGrow demonstrator: “Web Crawler + AgroTagger”
 

Mehr von inovex GmbH

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegeninovex GmbH
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIinovex GmbH
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolutioninovex GmbH
 
Network Policies
Network PoliciesNetwork Policies
Network Policiesinovex GmbH
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learninginovex GmbH
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungeninovex GmbH
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeteninovex GmbH
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetesinovex GmbH
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreiheninovex GmbH
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenteninovex GmbH
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?inovex GmbH
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Projectinovex GmbH
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretabilityinovex GmbH
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseinovex GmbH
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessinovex GmbH
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumiinovex GmbH
 

Mehr von inovex GmbH (20)

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegen
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AI
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolution
 
WWDC 2019 Recap
WWDC 2019 RecapWWDC 2019 Recap
WWDC 2019 Recap
 
Network Policies
Network PoliciesNetwork Policies
Network Policies
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungen
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeten
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetes
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Azure IoT Edge
Azure IoT EdgeAzure IoT Edge
Azure IoT Edge
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenten
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?
 
Dev + Ops = Go
Dev + Ops = GoDev + Ops = Go
Dev + Ops = Go
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Project
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madness
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
 

Kürzlich hochgeladen

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

ElasticSearch - Suche im Zeitalter der Clouds

  • 1. ElasticSearch – Suche im Zeitalter der Clouds Christian Meder Bernhard Pflugfelder inovex Gmbh
  • 2. Background ‣  open source (free software) ‣  Linux ‣  Web ‣  Java ‣  Android ‣  CTO@inovex ‣  Christian Meder Christian MederSpeaker 2
  • 3. Background ‣  Lucene ‣  Solr ‣  Text Mining Technologies, Information Retrieval ‣  Hadoop ‣  Java ‣  Big Data Engineer@inovex ‣  bpflugfelder@inovex.de Bernhard PflugfelderSpeaker 3
  • 4. ‣  Search is everywhere ‣  Elasticsearch ‣  Examples ‣  Overview ‣  Features Agenda 4
  • 13. ‣  Can you think of other scenarios where search applications will also do a good job? ‣  Remind the key capabilities of search technologies: ‣  Persistency ‣  Flexible data model ‣  Unstructured data, but not only ‣  Extremely quick access to data ‣  Horizontal scalability There are plenty of applications scenarios out there where search technologies shall be considered! Document storeSearch applications 13
  • 15. Lucene is an open source, pure Java API for enabling information retrieval ‣  Originally developed by Doug Cutting 1999 and became Apache TLP in 2001 ‣  Licensed by Apache License 2.0 ‣  Pure Java Library with implementations for : ‣  Lucene.NET (http://lucenenet.apache.org) ‣  PyLucene (http://lucene.apache.org/pylucene/) ‣  and more: http://wiki.apache.org/lucene-java/LuceneImplementations ‣  Large and very active developer community, well documented and supported (38 active committer!) ‣  Current stable release: 4.2.1 ‣  Widely used and adopted for commercial / non-commercial projects: http://wiki.apache.org/lucene-java/PoweredBy Overview 15 http://lucene.apache.org/
  • 16. Solr is a standalone enterprise search server & document store with based on Lucene ‣  Created by Yonik Seeley at CNET Networks in 2004 ‣  Introduced as Apache Incubator in 2006, became TLP in 2007 ‣  Licensed by Apache License 2.0 ‣  Seeley and others founded Lucid Imagination -> LucidWorks ‣  Large and very active developer community, well documented and supported (strong relationship to Lucene community also) ‣  Current stable release: 4.2.1 ‣  Widely used and adopted for commercial / non-commercial projects: http://wiki.apache.org/solr/PublicServers Overview 16 http://lucene.apache.org/solr/
  • 17. “You know, for search” (Shay Banon) Search technologies 17
  • 18. Elasticsearch is a “distributed-from-scratch” search server based on Lucene Created by Shay Banon with a first version made public in 02/2010: Elasticsearch itself was born out of my frustration with the fact that there isn’t really a good, open source, solution for distributed search engine out there, which also combines what I expect of search engines after building Compass (and on that, I will blog later…). I have been working on this for the past several months, pouring my search and distributed knowledge into this (and portions of my heart and time ;) ) [http://www.elasticsearch.org/blog/2010/02/08/youknowforsearch.html] Motivation 18 http://www.elasticsearch.org/
  • 19. ‣  Current stable version 0.20.6 working with Lucene 3.6 ‣  Available version 0.90 RC2 includes Lucene 4.2.1 integration ‣  Licensed by Apache License 2.0 ‣  Small, but growing group of core developer ‣  Strong support of valuable Lucene committer ‣  Company elasticsearch.com founded in 2012 ‣  By the people behind elasticsearch.org ‣  www.elasticsearch.com Overview 19 http://www.elasticsearch.org/
  • 21. ‣  Code search is organized on a cluster ‣  26 storage nodes holding the searchable data ‣  8 client nodes coordinating query requests ‣  Storage cluster has 2TB of SSD based storage ‣  17 TB of indexed data is stored in cluster ‣  shared in the cluster with replication factor of 1 ‣  makes overall 34 TB of indexed data Github 21 http://www.elasticsearch.org/
  • 22. ‣  Question-and-answer website ‣  aggregates questions and answer in terms of topics ‣  Sources are the web in general, social media ‣  Goals for search: ‣  low latency for queries ‣  increased relevancy of results. ‣  evaluates elasticsearch against Solr and Sphinx ‣  “After much benchmarking with our data set, we discovered that ElasticSearch was clearly the fastest of the possible search platforms we were considering.” Quora 22 http://www.elasticsearch.org/
  • 28. ‣  Scalable, High-Performance Indexing ‣  over 95GB/hour on modern hardware ‣  small RAM requirements ‣  incremental indexing as fast as batch indexing ‣  index size roughly 20-30% the size of text indexed ‣  Powerful, Accurate and Efficient Search Algorithms ‣  ranked searching -- best results returned first ‣  many powerful query types ‣  fielded searching (e.g., title, author, contents) ‣  date-range searching ‣  sorting by any field ‣  multiple-index searching with merged results ‣  allows simultaneous update and searching [From http://lucene.apache.org/core/features.html] Highlights 28 http://lucene.apache.org/
  • 29. ‣  Pure Java application ‣  Powered by Lucene ‣  Document-oriented ‣  Schema-less ‣  HTTP API with JSON In & Out ‣  Indexing / Updating ‣  Searching ‣  Administration / Monitoring ‣  Extendable by plugins ‣  Distribution is a fundamental paradigm of Elasticsearch Overview 29 http://www.elasticsearch.org/
  • 30. Architecture 30 21 12 321 3 3 Primary Shard Replica Shard Master node Node Node http://www.elasticsearch.org/
  • 31. ‣  Index distribution by auto sharding ‣  Automatic replication and balancing ‣  Fault tolerant + high availability ‣  Cluster building & managment ‣  node detection through zen discovery ‣  nodes communicate via unicast / multicast ‣  automatic master election ‣  influence into master / data node assignment possible ‣  Master responsible to ‣  route the search request ‣  include new nodes into cluster ‣  Index / query routing (automatic / individual) Architecture 31 http://www.elasticsearch.org/
  • 35. ‣  Define a mapping for type book ‣  Retrieve the current mapping for type book Schema-less, but 35 # echo " { "mappings" : { "books" : { "properties" : { ”id" : { "type" : "string" }, "title" : { "type" : "string" }, "author" : { "type" : "string" }, ”subject" : { "type" : ”string" }, ”view_count" : { "type" : ”integer" }, "created" : { "type" : "date", "format" : “dateOptionalTime" } }}}} " > book.json curl –XPUT 'localhost:9200/gutenberg/books/_mapping’ –d @book.json # curl 'localhost:9200/gutenberg/books/_mapping?pretty=1 http://www.elasticsearch.org/
  • 36. ‣  Search on terms, numeric values, dates, numeric ranges, date/time ranges ‣  Lots of query types ‣  terms, phrases, fuzzy, wildcard, ranges ‣  faceting, filtering ‣  Geospatial search called GeoShape Query ‣  Configurable caching for ‣  Filter queries ‣  Field values ‣  NRT search with separate API ‣  Sorting, Highlighting ‣  MoreLikeThis ‣  Multi Tenancy Search highlights 36 http://www.elasticsearch.org/
  • 42. ‣  Gateway module stores cluster metadata to: ‣  Local FS, Shared FS, Hadoop, Amazon S3 ‣  River: ‣  Pluggable service to constantly pull data ‣  Manage over specific REST endpoint ‣  Implementations for CouchDB, MongoDB, JDBC, Solr, … ‣  Bulk indexing ‣  Default: single document indexing ‣  Bulk indexing over specific REST endpoints ‣  Lucene Analyzer specification over elasticsearch.yml or API Some more features 42 http://www.elasticsearch.org/
  • 43. ‣  Query types such as term, terms, match, wildcard, fuzzy, range, … ‣  Multi Search ‣  Get ‣  Multi Get ‣  Filter ‣  Facets ‣  Highlighting ‣  Suggest ‣  MoreLikeThis ‣  Index boosting ‣  Explain ‣  Percolate Search API 43 http://www.elasticsearch.org/
  • 44. ‣  Create, Delete, Exists, Open, Close, Optimize, Refresh, Flush, Settings ‣  Index templates (mappings + settings) ‣  Get, Put, Delete Mapping ‣  Get, update settings ‣  Snapshot ‣  Aliases ‣  Warmers ‣  Statistics, Status Indices API 44 http://www.elasticsearch.org/
  • 45. ‣  Live configuration of cluster settings ‣  minimum master nodes ‣  cache sizes ‣  routing ‣  allocation ‣  moving shards ‣  Moving replicas ‣  Cluster health & status ‣  Nodes info & stats, Shutdown all / specific nodes Cluster API 45 http://www.elasticsearch.org/
  • 46. +  Elasticssearch feels light-weighted +  Simple but effective architecture +  Easiness of use, even when using distributed search +  High matureness, even though ES is young +  High-performance search (at least based on current benchmarks seen) +  Modern technologies used (HTTP, JSON, NoXML, Java, Guava) -  Still small community and small group of core developer -  Missing data connectors (e.g. dataimporthandler), -  Missing search features grouping & search result clustering -  Less number of query types -  Less possibilities for boosting (e.g function queries) -  Less number of analyzers Pros & Cons 46 http://www.elasticsearch.org/
  • 47. ‣  The world becomes data-driven and user-driven ‣  large data volumes ‣  multiple sources ‣  many users shall be able to access ‣  Therefore search technologies Elasticsearch becomes important: ‣  Easy aggregation of data from multiple sources ‣  Provide unified access layer through search ‣  Scalable regarding data volume and users ‣  Highly configurable ‣  ElasticSearch is easy to use, distributed, scalable and search is fast Wrap up 47 http://www.elasticsearch.org/