SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Ben van Mol
ElasticSearch for .NET
SEARCH ENGINE
Why would I need one?
Search is more than text comparison
Search must advice
Search must be intelligent
Search must aggregate
What is ElasticSearch?
 “flexible and powerful open-source, distributed (NoSQL), RESTful search engine build on
top of Lucene”
(http://www/elastic.co)
 Features: real-time data, real-time analytics, distributed, high availability, multi-tenancy,
full text search, document oriented, conflict management, schema free, restful API, per-
operation persistence, apache 2 open source license, build on top of apache lucene.
Installation
Procedure
 Java based, requires v7+
 Same JVM version on all nodes is required
 Set a bunch of environment variables
 Fill in the ElasticSearch config files
 Streamlined Installation available for Windows (local service)
 https://github.com/rgl/elasticsearch-setup/releases
Scalability & performance
Scalability
NoSQL databases are more scalable and provide
superior performance, and their data model addresses
several issues that the relational model is not designed
to address
- Structured & fixed data model vs. dynamic model
- Efficient, scale-out architecture instead of expensive,
monolithic architecture (scale-up)
- Object-oriented programming that is easy to use and
flexible
Data representation in JSON
Scalability - Architecture
 Cluster
 logical grouping of multiple nodes
 Node
 an elasticsearch server instance
 Master – in charge of managing cluster-wide operations
 Only one, responsible for cluster-wide operations
 No bottleneck for queries
 Shard
 low-level worker instance that holds a slice of all data
 Each document belongs to a single primary shard
 Created during index creation
 Determines the number of data stored in each shard
 Replica
 A copy of a master shard on a different node
 Can be created any time
 Spreading over nodes => done automatically
POST /<index name>
{
"settings" :
{
"number_of_shards" : 3,
"number_of_replicas" : 1
}
}
Create an index
1 node
2 nodes
3 nodes
3 nodes
2 replica’s
Having more replica’s shards on the same
number of nodes doesn’t increase our
performance at all because each shard has
access to a smaller fraction of its node’s
resources but it adds redundancy.
Default Routing
 Hashes the ID of a document and uses that to find a shard (retrieve document).
 Gives an even distribution of documents across the entire set of shards
 But what about search?
Incomming request
Broadcast & query all shards
Aggregate all results & send back
Custom Routing
 Configure routing for a certain type:
XPUT /<index name>/<type>/_mapping -d
{
"order":
{
"_routing":
{
"required":true,
"path":"customerID"
}
}
}
 Search for a specific document of user user123:
XGET /<index name>/<type>/_search?routing=user123 -d
{
"query":
{
"match_all":{}
}
}
 Tell ElasticSearch which property
to use to determine routing
 E.g. zipcode, age,
Default routing ensures that distribution is fairly
uniform across all shards.
Once you start implementing your own custom
schemes, it is entirely possible that this uniformity is
lost.
Advanced Search Capabilities
Dealing with human language
 Indexation
Example : <div>Here is some example text including an extract of 9 poems</div>
 Analyzers
 Character filters
 convert 9 to nine
 strip HTML and extract the actual text
 lower-case all words
 Tokenizer
 create individual terms or tokens from text, minding comma’s, whitespaces, periods, hyphens, …
 Token filter:
 remove stopwords like ‘an’, ‘the’, …
 stemming: reduce verbes and words to their stem
 {Here} {is} {some} {example} {text} {including} {extract} {nine} {poems}
Text Analysis - Experiments
 Whitespace
 Whitespace tokenizer - A tokenizer of type whitespace that divides text at whitespace.
Sentence: Convert the title-case text using the ToLower(string) command.
Result: {Convert} {the} {title-case} {text} {using} {the} {ToLower(string)} {command.}
Text Analysis - Experiments
 Simple
 Standard tokenizer - A tokenizer of type standard providing grammar based tokenizer that is a good
tokenizer for most European language documents.
 Lower-case token filter
Sentence: Convert the title-case text using the ToLower(string) command.
Result: {convert} {the} {title} {case} {text} {using} {the} {tolower} {string} {command}
Text Analysis - Experiments
 Stop analyzer:
 Standard tokenizer
 Lower-case token filter
 Stop token filter
 A token filter of type stop that removes stop words (meaningless words for search) from token streams.
 Support for multiple languages
Sentence: Convert the title-case text using the ToLower(string) command.
Result: {convert} {the} {title} {case} {text} {using} {the} {tolower} {string} {command}
Text Analysis - Experiments
 Snowball
 Standard tokenizer
 Lower-case token filter
 Stop token filter
 Stemming (snowball generated stemmer)
 A filter that stems (reduce a word to the core) words using a Snowball-generated stemmer
 Support for multiple languages
Sentence: Convert the title-case text using the ToLower(string) command.
Result: {convert} {title} {case} {text} {usinge} {tolower} {string} {command}
Text Analysis- Adding Custom Analyzers
PUT /my-index/_settings
{
"index":
{
"analysis":
{
"analyzer":
{
“YourCustomAnalyzer":
{
"type": "custom",
"char_filter": [ "html_strip" ],
"tokenizer": "standard",
“filter": [ "lowercase", "stop", "snowball" ]
}
}
}
}
}
A list of available analysis tools:
 CharacterFilters: http://bit.ly/1H3hgJF
 Tokenizers: http://bit.ly/1zIU2IO
 Token filters: http://bit.ly/1AJXCO2
Possible to create your own combination!
Text Analysis – Define analyzer
 Create a Mapping Type (cfr. Table)
 Assign fields
 Define field types (string, int, date, …)
 Define the analyzer to be used
 Define the boost value on a field
 Define the routing
 …
PUT /my_index/_mapping/my_type
{
"my_type": {
"properties": {
"english_title": {
"type": "string",
"analyzer": "english"
}
}
}
}
ELASTIC AND .NET
Let’s get dirty!
What is NEST?
NEST
• All request & response objects represented
• Strongly typed Query DSL implementation
• Supports fluent syntax
• Uses ElasticSearch.net
ElasticSearch.NET
• Low-level, dependency-free client
• All ES endpoints are available as methods
ElasticSearch RESTFul API
http://nest.azurewebsites.net/
NEST – Connection Initialization
 Initialize an ElasticClient:
All actions on the ElasticSearch cluster are performed using the ElasticClient
For example:
 Search
 Index
 DeleteIndex/CreateIndex
 …
Uri node = new Uri("http://192.168.137.73:9200");
ConnectionSettings settings = new ConnectionSettings(node, defaultIndex: "products");
ElasticClient client = new ElasticClient(settings);
Index your content
JSON .NET
PUT /products/product/1  Index the RAW JSON string
 Index a Type
 Automatically infers
 Index
 Type
 ID
 Use ElasticType to define type behavior
 Use ElasticProperty to define field behavior
 Define explicit values for inferred ones
More information:
http://nest.azurewebsites.net/nest/index-type-
inference.html
http://localhost:9200/products/product/1
{
"id":"1",
"name" : "MacBook Air",
"price" : 1099,
"descr" : "Some lengthy never-read description",
"attributes" :
{
"color" : "silver",
"display" : 13.3,
"ram" : 4
}
}
Index your Content - .NET
 Raw JSON string
 Type based indexation
 Modify out-of-the-box behavior using decorators
client.Raw.Index("products", "product", new JavaScriptSerializer().Serialize(prod));
client.Index(product);
[ElasticType(Name = "Product", IdProperty="id")]
public class Product
{
public int id { get; set; }
[ElasticProperty(Name = "name", Index = FieldIndexOption.Analyzed, Type = FieldType.String, Analyzer =
"standard")]
public string name { get; set; }
Query your content – JSON Query
JSON examples
http://localhost:9200/products/product/_search
Some queries will return nothing if lowercased by analyzer & split on whitespace!
{ "query" : { "term" : { "name": "MacBook Air" }}}
{ "query" : { "prefix" : { "name": "Mac" }}}
{ "query" : { "range" : { "price" : { "from" : 1000, "to": 2000 } } } }
{ "from": 0, "size": 10, "query" : { "term" : { "name": "MacBook Air" }}}
{ "sort" : { "name" : { "order": "asc" } }, "query" : { "term" : { "name": "MacBook Air" }}}
Query your content – JSON Result
{
"took": 1,
"timed_out": false,
"_shards": { "total": 5, "successful": 5, "failed": 0 },
"hits": {
"total": 2,
"max_score": 0.076713204,
"hits": [
{
"_index": "products",
"_type": "Product",
"_id": "1",
"_score": 0.076713204,
"_source": {
"id": 1,
"name": "MacBook Air",
"price": 1099.0,
"descr": "Some lengthy never-read description",
"attributes": {
"color": "silver",
"display": 13.300000190734863,
"ram": 4
}
}
},
Query your content – Query DSL .NET
 Retrieve all products from an index using a MatchAll search
 Retrieve all products by using a term query
 Search on all fields using the _all built-in property
 Search on a combination of fields using boolean operators (see fiddler result)
result = client.Search<Product>(s => s.MatchAll());
result = client.Search<Product>(s => s.Query(q => q.Term(t => t.name, "macbook")));
result = client.Search<Product>(s => s.Query(q => q.Term("name", "macbook")));
result = client.Search<Product>(s => s.Query(q => q.Term("_all", "macbook")));
result = client.Search<Product>(s => s.Query(q => q.Term("name", "macbook") ||
q.Term("descr","macbook")));
Query your content – Query DSL
 Search on a combination of fields using boolean operators and a date range filter
 Some more advanced query examples:
 Wildcard Query - use wildcards to search for relevant documents
 Span Near - search for word combinations within a certain span in the document
 More like this query - finds documents which are ‘like’ a given set of documents using representative
terms
 More information: http://bit.ly/1A6wpKs
result = client.Search<Product>(s => s
.Query(q => (q.Term("name", "macbook") || q.Term("descr", "macbook"))
&& q.Range(r => r
.OnField("price")
.Greater(1000)
.LowerOrEquals(2000)
)));
Query your content – Fuzzy searches
 Perform a fuzzy search to overcome query string errors
result = client.Search<Product>(s => s
.Query(q => q
.Match(m => m
.Query("makboek")
.OnField("name")
.Fuzziness(10)
.PrefixLength(1)
)));
Query your content - Paging
 Select pages from the full result set using the From & Size filters
result = client.Search<Product>(s => s
.Query(q => q.Term("name", "macbook") || q.Term("descr", "macbook"))
.From(0)
.Size(1));
Query your content – Hit Highlighting
.NET Code JSON Result
 Hit Highlighting
 Possible to add other Pre- and Post-
tags on specific fields
result = client.Search<Product>(s => s
.Query(q => q.Term("name", "macbook"))
.Highlight(h => h
.PreTags("<b>")
.PostTags("</b>")
.OnFields(f => f
.OnField(e => e.name))));
Query your content – Aggregations
.NET Code JSON Result
 Aggregations group documents
based on term values
 Useful to create a facetted search
interface
result = client.Search<Product>(s => s
.Aggregations(a => a
.Terms("color", st => st
.Field(o => o.attributes.color))));
Query your content – Suggesters
 Did you mean
 Term suggester
 Suggests terms based on edit distance (=number of operations needed to switch term)
 More info: http://bit.ly/1FDFPwr
 Phrase suggester
 adds additional logic on top of the term suggester to select entire corrected phrases instead
of individual tokens weighted based on ngram-language models.
 Provides better suggestions because of co-occurrence & frequency
 More info: http://bit.ly/1FbfAKg
Query your content – Suggesters
 Search as you type
 Completion suggester
 a so-called prefix suggester
 does not do spell correction like the term or phrase suggesters but allows basic auto-complete functionality
 Uses FST models and makes them part of the index for faster querying
 More info: http://bit.ly/1HwFKbO
hotel, marriot, mercure, munchen and munich
QUESTIONS?

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchSperasoft
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystifiedjavier ramirez
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with ElasticsearchAleksander Stensby
 
Getting the most out of Java [Nordic Coding-2010]
Getting the most out of Java [Nordic Coding-2010]Getting the most out of Java [Nordic Coding-2010]
Getting the most out of Java [Nordic Coding-2010]Sven Efftinge
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014Roy Russo
 
Search Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearchSearch Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearchFlorian Hopf
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHP
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHPPHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHP
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHPiMasters
 
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자Donghyeok Kang
 
Learn Ajax here
Learn Ajax hereLearn Ajax here
Learn Ajax herejarnail
 
Omnisearch in AEM 6.2 - Search All the Things
Omnisearch in AEM 6.2 - Search All the ThingsOmnisearch in AEM 6.2 - Search All the Things
Omnisearch in AEM 6.2 - Search All the ThingsJustin Edelson
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchPedro Franceschi
 

Was ist angesagt? (20)

Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystified
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Oak Lucene Indexes
Oak Lucene IndexesOak Lucene Indexes
Oak Lucene Indexes
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with Elasticsearch
 
Getting the most out of Java [Nordic Coding-2010]
Getting the most out of Java [Nordic Coding-2010]Getting the most out of Java [Nordic Coding-2010]
Getting the most out of Java [Nordic Coding-2010]
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 
Search Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearchSearch Evolution - Von Lucene zu Solr und ElasticSearch
Search Evolution - Von Lucene zu Solr und ElasticSearch
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHP
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHPPHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHP
PHP Experience 2016 - [Workshop] Elastic Search: Turbinando sua aplicação PHP
 
it's just search
it's just searchit's just search
it's just search
 
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
 
Learn Ajax here
Learn Ajax hereLearn Ajax here
Learn Ajax here
 
Omnisearch in AEM 6.2 - Search All the Things
Omnisearch in AEM 6.2 - Search All the ThingsOmnisearch in AEM 6.2 - Search All the Things
Omnisearch in AEM 6.2 - Search All the Things
 
Fazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearchFazendo mágica com ElasticSearch
Fazendo mágica com ElasticSearch
 

Ähnlich wie ElasticSearch for .NET Developers

Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and SparkAudible, Inc.
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearchMinsoo Jun
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overviewAmit Juneja
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Karel Minarik
 
Elastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachElastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachSymfonyMu
 
Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B) Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B) Michael Reinsch
 
Getting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NETGetting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NETAhmed Abd Ellatif
 
Getting started with Elasticsearch in .net
Getting started with Elasticsearch in .netGetting started with Elasticsearch in .net
Getting started with Elasticsearch in .netIsmaeel Enjreny
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"George Stathis
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibJen Aman
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
 
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)Pat Patterson
 
540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdfhamzadamani7
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedDainius Jocas
 

Ähnlich wie ElasticSearch for .NET Developers (20)

Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]Elasticsearch And Ruby [RuPy2012]
Elasticsearch And Ruby [RuPy2012]
 
All about elasticsearch language clients
All about elasticsearch language clientsAll about elasticsearch language clients
All about elasticsearch language clients
 
Elastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approachElastic search and Symfony3 - A practical approach
Elastic search and Symfony3 - A practical approach
 
Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B) Finding the right stuff, an intro to Elasticsearch (at Rug::B)
Finding the right stuff, an intro to Elasticsearch (at Rug::B)
 
Getting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NETGetting Started With Elasticsearch In .NET
Getting Started With Elasticsearch In .NET
 
Getting started with Elasticsearch in .net
Getting started with Elasticsearch in .netGetting started with Elasticsearch in .net
Getting started with Elasticsearch in .net
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
 
540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Lessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at VintedLessons Learned While Scaling Elasticsearch at Vinted
Lessons Learned While Scaling Elasticsearch at Vinted
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 

Kürzlich hochgeladen

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 

Kürzlich hochgeladen (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 

ElasticSearch for .NET Developers

  • 3. Search is more than text comparison
  • 5. Search must be intelligent
  • 7. What is ElasticSearch?  “flexible and powerful open-source, distributed (NoSQL), RESTful search engine build on top of Lucene” (http://www/elastic.co)  Features: real-time data, real-time analytics, distributed, high availability, multi-tenancy, full text search, document oriented, conflict management, schema free, restful API, per- operation persistence, apache 2 open source license, build on top of apache lucene.
  • 8. Installation Procedure  Java based, requires v7+  Same JVM version on all nodes is required  Set a bunch of environment variables  Fill in the ElasticSearch config files  Streamlined Installation available for Windows (local service)  https://github.com/rgl/elasticsearch-setup/releases
  • 10. Scalability NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address - Structured & fixed data model vs. dynamic model - Efficient, scale-out architecture instead of expensive, monolithic architecture (scale-up) - Object-oriented programming that is easy to use and flexible Data representation in JSON
  • 11. Scalability - Architecture  Cluster  logical grouping of multiple nodes  Node  an elasticsearch server instance  Master – in charge of managing cluster-wide operations  Only one, responsible for cluster-wide operations  No bottleneck for queries  Shard  low-level worker instance that holds a slice of all data  Each document belongs to a single primary shard  Created during index creation  Determines the number of data stored in each shard  Replica  A copy of a master shard on a different node  Can be created any time  Spreading over nodes => done automatically
  • 12. POST /<index name> { "settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } } Create an index 1 node 2 nodes 3 nodes 3 nodes 2 replica’s Having more replica’s shards on the same number of nodes doesn’t increase our performance at all because each shard has access to a smaller fraction of its node’s resources but it adds redundancy.
  • 13. Default Routing  Hashes the ID of a document and uses that to find a shard (retrieve document).  Gives an even distribution of documents across the entire set of shards  But what about search? Incomming request Broadcast & query all shards Aggregate all results & send back
  • 14. Custom Routing  Configure routing for a certain type: XPUT /<index name>/<type>/_mapping -d { "order": { "_routing": { "required":true, "path":"customerID" } } }  Search for a specific document of user user123: XGET /<index name>/<type>/_search?routing=user123 -d { "query": { "match_all":{} } }  Tell ElasticSearch which property to use to determine routing  E.g. zipcode, age, Default routing ensures that distribution is fairly uniform across all shards. Once you start implementing your own custom schemes, it is entirely possible that this uniformity is lost.
  • 16. Dealing with human language  Indexation Example : <div>Here is some example text including an extract of 9 poems</div>  Analyzers  Character filters  convert 9 to nine  strip HTML and extract the actual text  lower-case all words  Tokenizer  create individual terms or tokens from text, minding comma’s, whitespaces, periods, hyphens, …  Token filter:  remove stopwords like ‘an’, ‘the’, …  stemming: reduce verbes and words to their stem  {Here} {is} {some} {example} {text} {including} {extract} {nine} {poems}
  • 17. Text Analysis - Experiments  Whitespace  Whitespace tokenizer - A tokenizer of type whitespace that divides text at whitespace. Sentence: Convert the title-case text using the ToLower(string) command. Result: {Convert} {the} {title-case} {text} {using} {the} {ToLower(string)} {command.}
  • 18. Text Analysis - Experiments  Simple  Standard tokenizer - A tokenizer of type standard providing grammar based tokenizer that is a good tokenizer for most European language documents.  Lower-case token filter Sentence: Convert the title-case text using the ToLower(string) command. Result: {convert} {the} {title} {case} {text} {using} {the} {tolower} {string} {command}
  • 19. Text Analysis - Experiments  Stop analyzer:  Standard tokenizer  Lower-case token filter  Stop token filter  A token filter of type stop that removes stop words (meaningless words for search) from token streams.  Support for multiple languages Sentence: Convert the title-case text using the ToLower(string) command. Result: {convert} {the} {title} {case} {text} {using} {the} {tolower} {string} {command}
  • 20. Text Analysis - Experiments  Snowball  Standard tokenizer  Lower-case token filter  Stop token filter  Stemming (snowball generated stemmer)  A filter that stems (reduce a word to the core) words using a Snowball-generated stemmer  Support for multiple languages Sentence: Convert the title-case text using the ToLower(string) command. Result: {convert} {title} {case} {text} {usinge} {tolower} {string} {command}
  • 21. Text Analysis- Adding Custom Analyzers PUT /my-index/_settings { "index": { "analysis": { "analyzer": { “YourCustomAnalyzer": { "type": "custom", "char_filter": [ "html_strip" ], "tokenizer": "standard", “filter": [ "lowercase", "stop", "snowball" ] } } } } } A list of available analysis tools:  CharacterFilters: http://bit.ly/1H3hgJF  Tokenizers: http://bit.ly/1zIU2IO  Token filters: http://bit.ly/1AJXCO2 Possible to create your own combination!
  • 22. Text Analysis – Define analyzer  Create a Mapping Type (cfr. Table)  Assign fields  Define field types (string, int, date, …)  Define the analyzer to be used  Define the boost value on a field  Define the routing  … PUT /my_index/_mapping/my_type { "my_type": { "properties": { "english_title": { "type": "string", "analyzer": "english" } } } }
  • 24. What is NEST? NEST • All request & response objects represented • Strongly typed Query DSL implementation • Supports fluent syntax • Uses ElasticSearch.net ElasticSearch.NET • Low-level, dependency-free client • All ES endpoints are available as methods ElasticSearch RESTFul API http://nest.azurewebsites.net/
  • 25. NEST – Connection Initialization  Initialize an ElasticClient: All actions on the ElasticSearch cluster are performed using the ElasticClient For example:  Search  Index  DeleteIndex/CreateIndex  … Uri node = new Uri("http://192.168.137.73:9200"); ConnectionSettings settings = new ConnectionSettings(node, defaultIndex: "products"); ElasticClient client = new ElasticClient(settings);
  • 26. Index your content JSON .NET PUT /products/product/1  Index the RAW JSON string  Index a Type  Automatically infers  Index  Type  ID  Use ElasticType to define type behavior  Use ElasticProperty to define field behavior  Define explicit values for inferred ones More information: http://nest.azurewebsites.net/nest/index-type- inference.html http://localhost:9200/products/product/1 { "id":"1", "name" : "MacBook Air", "price" : 1099, "descr" : "Some lengthy never-read description", "attributes" : { "color" : "silver", "display" : 13.3, "ram" : 4 } }
  • 27. Index your Content - .NET  Raw JSON string  Type based indexation  Modify out-of-the-box behavior using decorators client.Raw.Index("products", "product", new JavaScriptSerializer().Serialize(prod)); client.Index(product); [ElasticType(Name = "Product", IdProperty="id")] public class Product { public int id { get; set; } [ElasticProperty(Name = "name", Index = FieldIndexOption.Analyzed, Type = FieldType.String, Analyzer = "standard")] public string name { get; set; }
  • 28. Query your content – JSON Query JSON examples http://localhost:9200/products/product/_search Some queries will return nothing if lowercased by analyzer & split on whitespace! { "query" : { "term" : { "name": "MacBook Air" }}} { "query" : { "prefix" : { "name": "Mac" }}} { "query" : { "range" : { "price" : { "from" : 1000, "to": 2000 } } } } { "from": 0, "size": 10, "query" : { "term" : { "name": "MacBook Air" }}} { "sort" : { "name" : { "order": "asc" } }, "query" : { "term" : { "name": "MacBook Air" }}}
  • 29. Query your content – JSON Result { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.076713204, "hits": [ { "_index": "products", "_type": "Product", "_id": "1", "_score": 0.076713204, "_source": { "id": 1, "name": "MacBook Air", "price": 1099.0, "descr": "Some lengthy never-read description", "attributes": { "color": "silver", "display": 13.300000190734863, "ram": 4 } } },
  • 30. Query your content – Query DSL .NET  Retrieve all products from an index using a MatchAll search  Retrieve all products by using a term query  Search on all fields using the _all built-in property  Search on a combination of fields using boolean operators (see fiddler result) result = client.Search<Product>(s => s.MatchAll()); result = client.Search<Product>(s => s.Query(q => q.Term(t => t.name, "macbook"))); result = client.Search<Product>(s => s.Query(q => q.Term("name", "macbook"))); result = client.Search<Product>(s => s.Query(q => q.Term("_all", "macbook"))); result = client.Search<Product>(s => s.Query(q => q.Term("name", "macbook") || q.Term("descr","macbook")));
  • 31. Query your content – Query DSL  Search on a combination of fields using boolean operators and a date range filter  Some more advanced query examples:  Wildcard Query - use wildcards to search for relevant documents  Span Near - search for word combinations within a certain span in the document  More like this query - finds documents which are ‘like’ a given set of documents using representative terms  More information: http://bit.ly/1A6wpKs result = client.Search<Product>(s => s .Query(q => (q.Term("name", "macbook") || q.Term("descr", "macbook")) && q.Range(r => r .OnField("price") .Greater(1000) .LowerOrEquals(2000) )));
  • 32. Query your content – Fuzzy searches  Perform a fuzzy search to overcome query string errors result = client.Search<Product>(s => s .Query(q => q .Match(m => m .Query("makboek") .OnField("name") .Fuzziness(10) .PrefixLength(1) )));
  • 33. Query your content - Paging  Select pages from the full result set using the From & Size filters result = client.Search<Product>(s => s .Query(q => q.Term("name", "macbook") || q.Term("descr", "macbook")) .From(0) .Size(1));
  • 34. Query your content – Hit Highlighting .NET Code JSON Result  Hit Highlighting  Possible to add other Pre- and Post- tags on specific fields result = client.Search<Product>(s => s .Query(q => q.Term("name", "macbook")) .Highlight(h => h .PreTags("<b>") .PostTags("</b>") .OnFields(f => f .OnField(e => e.name))));
  • 35. Query your content – Aggregations .NET Code JSON Result  Aggregations group documents based on term values  Useful to create a facetted search interface result = client.Search<Product>(s => s .Aggregations(a => a .Terms("color", st => st .Field(o => o.attributes.color))));
  • 36. Query your content – Suggesters  Did you mean  Term suggester  Suggests terms based on edit distance (=number of operations needed to switch term)  More info: http://bit.ly/1FDFPwr  Phrase suggester  adds additional logic on top of the term suggester to select entire corrected phrases instead of individual tokens weighted based on ngram-language models.  Provides better suggestions because of co-occurrence & frequency  More info: http://bit.ly/1FbfAKg
  • 37. Query your content – Suggesters  Search as you type  Completion suggester  a so-called prefix suggester  does not do spell correction like the term or phrase suggesters but allows basic auto-complete functionality  Uses FST models and makes them part of the index for faster querying  More info: http://bit.ly/1HwFKbO hotel, marriot, mercure, munchen and munich

Hinweis der Redaktion

  1. Text can appear on multiple places Descriptions need to be created Hit highlighting Relevance needs to be calculated
  2. Search as you type Did you mean Facets
  3. Fuzzy searches Synoniemen
  4. Aggregates
  5. an Open Source (Apache 2) distributed (NoSQL) RESTful search engine built on top of Lucene