Webinar: Modern Techniques for Better Search Relevance with Fusion

Modern
Techniques
for
Better
Search

Relevance
in
Fusion
Grant
Ingersoll

CTO,
Lucidworks

November
15,
2017

😊
iPad case
!
"ipad
accessory"~3
OR "ipad
case"~5

if
(doc.name.contains(“Vikings”)){

doc.boost
=
100

}
OR
q:(MAIN
QUERY)
OR
(name:Vikings)^10
Index
Time:
Query
Time:

• Term
Frequency:
“How
well
a
term
describes
a
document”

• Measure:
how
often
a
term
occurs
per
document

• Inverse
Document
Frequency:
“How
important
is
a
term

overall”

• Measure:
how
rare
the
term
is
across
all
documents
TF*IDF

Score(q,
d)
=

∑

idf(t)
·∙
(
tf(t
in
d)
·∙
(k
+
1)
)
/
(
tf(t
in
d)
+
k
·∙
(1
–
b
+
b
·∙
|d|
/
avgdl
)

t
in
q

Where:

t
=
term;
d
=
document;
q
=
query;
i
=
index 

tf(t
in
d)

=

numTermOccurrencesInDocument
½

idf(t)
=

1
+
log
(numDocs
/
(docFreq
+
1))

|d|
=

∑
1

t
in
d

avgdl
=
=
(
∑
|d|

)
/
(
∑
1
)
)

d
in
i

d
in
i

k
=
Free
parameter.
Usually
~1.2
to
2.0.
Increases
term
frequency
saturation
point.

b
=
Free
parameter.
Usually
~0.75.
Increases
impact
of
document
normalization.
BM25
(aka
Okapi)

• Capture
and
log
pretty
much
everything

• Searches,
clicks,
time
on
page,
seen/not,
etc.

• Precision
—
Of
those
shown,
what’s
relevant?

• Recall
—
Of
all
that’s
relevant,
what
was
found?

• NDCG
—
Account
for
position
Measure,
Measure,
Measure

👮 ♀👜 👲 💼 !♀😎 🎓 🎅 "👸 🚴 💁 ♂🚜 !🙎 ♂👮 ♀👜 👲 💼 !♀😎 🎓 🎅 "👸 🚴 💁 ♂🚜 !🙎 ♂🚵 ♀!💆 0
💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0 👮 1 👮 ♀👜 👲 💼 !♀😎 🎓 🎅 "👸 🚴 💁 ♂🚜 !🙎 ♂🚵 ♀!💆 0
"🏇 "♀🏈 "💆 ♂🏄 ♀💇 👷 !👷 ♀🏂 💇 ♂👴 👼 💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0 👮 1 🚵 ♀!💆 0
!🎪 🚶 ♀👮 ♀👜 👲 💼 !♀😎 🎓 🎅 "👸 🚴 💁 ♂🚜 !🙎 ♂💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0 👮 🚵 ♀!
💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0 👮 1 💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0 👮 1 🚵 ♀!💆 0
🏇 "♀🏈 "💆 ♂🏄 ♀💇 👷 !👷 ♀🏂 💇 ♂👴 👼 💂 ♀B 🎒 !🎪 🚶 ♀💃 🔬 !♀👵 🏃 ♀💂 👔 8 🚶 🚵 ♀!💆 0

Magic
Guessing
Core
Information
Theory
(aka
Lucene/Solr)
Search
Aids
(Facets,
Did
You
Mean,
Highlighting)
Machine
Learning/NLP

(Clicks,
Crowd
Sourcing,
Recs,
Personalization,
User
feedback)
Rules,
Domain
Specific
Knowledge
fuhgeddaboudit

Content Collabora*on Context
Core
Solr
capabili*es:
text
matching,
face*ng,

spell
checking,
highligh*ng

Business
Rules
for
content:
landing
pages,
boost/
block,
promopons,
etc.
Leverage
collec*ve
intelligence
to
predict
what

users
will
do
based
on
historical,
aggregated

data

Recommenders,
Popularity,
Search
Paths
Who
are
you?

Where
are
you?

What
have
you

done
previously?

User/Market
Segmentapon,
Roles,
Security,

Personalizapon
Next Generation Relevance

But
What
About
the
Real
World?
Indexing
Edipon
Machine
Learning/
NLP

NER,
Topic
Detection,

Clustering
Word2Vec,
etc.
Domain
Rules:

Synonyms,
Regexes,

Lexical
Resources
Extraction
Load
Into
Spark
Build
W2V,

PageRank,
Topic,

Clustering
Models
Offline
Content
Models

Real
World?
Query
Edipon
Query
Intent

Strategic,
Tactical,

Semantic😊
iPad case
Head/Tail/
Clickstream/
Recommenders
User
Factors:

Segmentation,
Location,

History,
Profile,
Security
Parse
Domain
Specific

Rules
Transform
Results
…
Cascading
Rerankers

Learn
To
Rank
(multi-‐
model),
Bias
corrections

Real
World?
Users
Edipon
Load
Into
SparkSignals Query
Analysis
Recommenders/
Personalization
😊
iPad case
Query
Edition
Raw
Models
Clickstream
Models

(Exact/Original
Match)^X

(Sloppy
Phrase)~M^Y

(AND
Q)^Z

(OR
Q)^XX

(Expansions/Click/Head/Tail
Boosts)^YY

(Personalization
Biases)^ZZ

({!ltr
model=…})

Filters+Options:
security,
rules,
hard
preferences,
categories
The
Perfect(?!?)
Query*

YMMV!
}
Precision
Recall
Caveat
Emptor!
*
Note:
there
are
a
lot
of
variations
on
this.

edismax
handles
most
Learn
to
Rank
X
>
Y
>
Z
>
XX

All
weights
can
be
learned

• Don’t
take
my
word
for
it,
experiment!

• A/B
Tests,
Multi-‐arm
Bandits

• Good
primer:

• http://www.slideshare.net/InfoQ/online-‐controlled-‐experiments-‐introduction-‐
insights-‐scaling-‐and-‐humbling-‐statistics

• Rules
are
fine,
as
long
as
the
are
contained,
have
a
lifespan

and
are
measured
for
effectiveness
Experimentapon,
Not
Editorializapon

Show
Us
Already,
Will
You!

29
Lucidworks Fusion Product Suite
The Lucidworks platform provides all of the components needed to create and  
run smart enterprise and consumer applications
Create rich UI with modular components for
web and mobile
Surface the insights that matter most with the power
of machine learning and artificial intelligence
Highly scalable search engine and NoSQL datastore
that gives you instant access to all your data
Combine the power of
the Fusion stack with
the simplicity you’d
expect in a SaaS-based
application

Lucidworks Fusion Architecture
Web App Mobile
BI/Analytics
Logs File
Web Database
Box/
Dropbox
Elasticsearch
SDK
Sharepoint
Slack
Jive
Connectors
Admin UI
Search
Analytics
Visualization
REST/
SQL
Hadoop
Google Drive
Security Built In
Proven Speed CDCR
Extensible Scalable Responsive
NLP: NER, Phrases, POS
Query Intent & Doc Classiﬁcation
Recommenders
Anomaly Detection
Signals and Query Analytics
Clustering
A/B Testing
ETL and Query Pipelines
Alerting and Messaging
SQL and Catalog
Scheduling
Connectors/Federation
Import/Export
Custom Jobs
RulesTopic Detection
Custom
Search
Devs
Data
Scientists
Business
Users
Cross Cutting Features
HDFS (Optional)

Fusion: Meeting the Search Challenge
Relevance and Discovery
Business Support
Intelligence
Open & Scalable
Signal Proc. Machine Learning NLP Math/Stats
Proven Search Extensible Simplified DevOps Real Time
Query/Doc Simulations Rules Analytics User Interface
Personalization Recommendations Query Intent Experimentation (A/B)

Demo Details
• Ecommerce
Data
Set

-‐ Product
Catalog:
~1.3M

-‐ Signals:
1
month
of
query,
document
logs

• Fusion
3.1
+
Rules
(open
source
add-‐on
module)
+
Solr
LTR
contrib

• Spark
2.x,
Solr
6.5

• App
Studio
UI
(hyp://twigkit.com)
Demo
Details

• http://lucidworks.com

• grant@

• http://lucene.apache.org/solr

• http://spark.apache.org/

• https://github.com/lucidworks/spark-‐solr

• https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank

• Bloomberg
talk
on
LTR
https://www.youtube.com/watch?
v=M7BKwJoh96s
Resources

Webinar: Modern Techniques for Better Search Relevance with Fusion

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Ähnlich wie Webinar: Modern Techniques for Better Search Relevance with Fusion

Ähnlich wie Webinar: Modern Techniques for Better Search Relevance with Fusion (20)

Mehr von Lucidworks

Mehr von Lucidworks (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Webinar: Modern Techniques for Better Search Relevance with Fusion