SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Query-time Nonparametric
Regression with Temporally
Bounded Models
Gus Heck & David Smiley
#Activate18 #ActivateSearch
Agenda
• Presenters
• Query time Nonparametric Regression
• Demo – Suggesting tagged images
• Time Routed Aliases
• Demo – Creating a Time Routed Alias
Presenters
Patrick (Gus) Heck
• Solr Contributor since 2013
• Consulting since 2012
• Enterprise Search since
2010
• Apache Ant contributor
2003-2004
• Web Applications since 2003
David Smiley
• Lucene/Solr Committer
(PMC)
• Consulting
• Author of first book on Solr
• Presentations & Training
In the beginning..
Dave Mackey - Vision for an online coaching
platform leveraging machine learning
technology to help companies help their
middle managers.
Engaged me first as Chief Architect and later
as CTO to bring this vision to reality.
It worked, but was not funded...
Brief Overview of Simply Coached
Simply Coached provided online career coaching
system
Key Goal was connecting users with relevant
curated content
Relevance was determined by
• Identifying topics the user is interested in by
observing and learning from click-throughs
• Several other customized metrics
Learning What Interests the User
Goals:
• Suggest things of interest, things the user is willing
to learn
• Make predictions based on the user’s behavior
• Unique prediction per user
• Predictions based on entire user base
• Avoid calcification of the predictive model, old data
needs to be sunset regularly.
Modeling Candidate - Neural Nets
Simply Coached started right at the dawn of
the “3rd wave” of neural networks
Limitations of Neural Networks
• They don’t adapt well to a changing
problem
• We would need to retrain predictive
network regularly
• Require massive data for training
Modeling Candidate - Regression
Not as “cool” as deep learning, but more suitable
• Can converge faster with less data
• But, most techniques are assuming that the
residuals are normally distributed
• Many of our features will be binary
• Normality is right out the window
I decided to look for a Non-Parametric Regression
technique
Non-Parametric Multiplicative Regression
• Can predict continuous variable
• Good for sorting/ranking
• Kernel smoother, presence/absence
kernels already well known
• Non-parametric
Nice discrete sub-calculations...
Used in Habitat Ecology for predicting habitat suitability
https://ir.library.oregonstate.edu/concern/defaults/z029p910z
NPMR Equations
Performance across multiple factors (features)
Local Mean estimator
Example kernels continuous or
presence/absence:
Key realization: the products and sums for i,j can
pre precalculated:
I called these pre-calculated portions “Partials” and
added temporal metadata
But wait... What are we predicting?
Needed a continuous statistic to predict
Thesis: Given a good teaser etc., the user
will click more rapidly on more interesting
articles.
Suggest documents in order of predicted
reaction time
Demo
Three users:
• Sporty Black likes black sports cars
• Cooper Blue likes blue coupes
• Racy Yeller likes yellow formula cars
System was trained by presenting ~20 sets of 9
randomly selected cars and clicking with
varying speed
Images copyright Andrew Mauldin, used with permission
Streaming
Expression!
As mentioned,
NPMR prediction
was only part of the
overall product
It occupies the top
line of Boxes A1 and
B1
The actual demo expression
Streaming Expressions
The final sorted list of documents can be calculated
The bold portion is the final summation
Continuously running, recalculating
sums/products once a minute
Included traditional document indexing
(article scanner) and also streaming
expression update()
(send_per_user_sum_update)
By end, running continuously for months
at a time unattended.
http://www.jesterj.org for more info about
JesterJ
Indexing With JesterJ
Adapting and Scaling
• Adapt at query time with filters, could also
filter in other ways with additional metadata
• Four Dimensions:
1. Content to recommend - constant set
2. Users - per user pricing to the rescue
3. Activity - price for average activity level
4. Time - data accumulates over time
Time Routed Aliases are a solution to #4
Time Routed Aliases
David Smiley
Have a Lot of Timestamped Data?
And you need keyword search and/or analytics
(faceting/aggregations) capabilities.
Examples:
• Logs
• Sensor data (IoT)
• Social media posts
Characteristics: tons of docs, continuously flowing in, limited
retention
Strategies
Hash Partitioned:
• One collection, hash routed shards (built-in)
Time Partitioned:
• One collection, time routed shards (DIY)
• Time partitioned collections (DIY)
• Time partitioned collections via TRAs (built-in)
“Partitioned” = “Routed” = “Organized”
One Collection, Hash Routed Shards
Hash on ID, even distribution
router.name=compositeId
+ Easy (default)
+ High write throughput
- Deleted dead-weight
- Poor realtime search
- Queries execute everywhere
- Inflexible sizing
- … thus Expensive (uniform
hardware requirements)
Deleted docs
Live docs
See DocExpirationUpdateProcessorFactory
shard1 shard4 shard7
2 5 8
9
Time Partitioned Data
Generally better...
• New data always goes to most recent partition
• Variable or equal sized (depends on approach)
• Addresses all negatives of hash routing
• Write throughput can be addressed by sharding within partition
• Opportunities to “optimize” aged indexes
• Flexibility in assigning partitions to better or cheaper hardware
• … all leads to cost savings
How?...
2017-01 2017-02 2017-03 2017-04
2017-05
2017-06 2017-07 2017-08
2017-09
Time Partitioned Data
Implementation Strategies:
• One Collection, router.name=implicit
• See this sample code: (DIY) https://github.com/cga-harvard/hhypermap-
bop/tree/master/bop-core/solr-
plugins/src/main/java/edu/harvard/gis/hhypermap/bop/solrplugins by me
• Multiple Collections
• See this blog: (DIY)
http://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for-
really-big-data/ by Mark Miller
• TRAs… (built-in)
2017-01 2017-02 2017-03 2017-04
2017-05
2017-06 2017-07 2017-08
2017-09
About Collection Aliases
SolrCloud supports collection aliases
Aliases point to one or more collections
Ex: Alias “tra-demo” → 2017-09, 2017-08, 2017-07, ...
2017-01 2017-02 2017-03 2017-04
2017-05
2017-06 2017-07 2017-08
2017-09
Time Routed Aliases
Aliases have new tricks up their sleeves…
• Aliases now have metadata (API to read & edit)
• Can create a “time routed alias” w/ first collection
• Collections in a TRA can
• Route update requests to the correct collections
• Adds/deletes collections automatically
• data driven
• just-in-time or preemptively
Mutable configuration, mostly
New
in Solr 7.3
TRA Creation (using V2 API)
curl http://localhost:8983/api/c -H
'Content-type:application/json' -d '{
"create-alias":{
"name": "tra-demo",
"router": {
"name": "time",
"field": "evt_dt",
"start": "2018-01-01T00:00:00Z",
"interval": "+1DAY",
"autoDeleteAge": "-2DAY"
},
"create-collection": {
"config": "_default",
"numShards": 2,
"maxShardsPerNode": 2
}
}
}'
TRAs, the fine print
TODOs
• Size capped (e.g. 5M docs / collection)
• Query routing to subset of collections
• Better “auto-scaling” tie-ins
• “optimize” of older collections
• TRA deletion, ease of use
TRAs may not be for everyone
• Partitioning is strict and must be adjacent (no gaps)
• Doesn’t work with CDCR
Thank you!
Gus Heck & David Smiley
#Activate18 #ActivateSearch

Weitere ähnliche Inhalte

Was ist angesagt?

20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetup20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetupRik Van Bruggen
 
Neo4j Training Cypher
Neo4j Training CypherNeo4j Training Cypher
Neo4j Training CypherMax De Marzi
 
Intro to graphs for HR analytics
Intro to graphs for HR analyticsIntro to graphs for HR analytics
Intro to graphs for HR analyticsRik Van Bruggen
 
Data Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah GuidoData Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah GuidoBitly
 
Getting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4jGetting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4jSuroor Wijdan
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jMax De Marzi
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to CypherNeo4j
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageNeo4j
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j FundamentalsMax De Marzi
 
Exploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksExploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksLucidworks
 
Building Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopBuilding Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopJayant Shekhar
 
Data modeling with neo4j tutorial
Data modeling with neo4j tutorialData modeling with neo4j tutorial
Data modeling with neo4j tutorialMax De Marzi
 
Neo4J : Introduction to Graph Database
Neo4J : Introduction to Graph DatabaseNeo4J : Introduction to Graph Database
Neo4J : Introduction to Graph DatabaseMindfire Solutions
 
Introducing Neo4j graph database
Introducing Neo4j graph databaseIntroducing Neo4j graph database
Introducing Neo4j graph databaseAmirhossein Saberi
 
Windy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4jWindy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4jMax De Marzi
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
 
Using graphs for recommendations
Using graphs for recommendationsUsing graphs for recommendations
Using graphs for recommendationsRik Van Bruggen
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsMaya Hristakeva
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
 

Was ist angesagt? (20)

20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetup20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetup
 
Neo4j Training Cypher
Neo4j Training CypherNeo4j Training Cypher
Neo4j Training Cypher
 
Intro to graphs for HR analytics
Intro to graphs for HR analyticsIntro to graphs for HR analytics
Intro to graphs for HR analytics
 
Data Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah GuidoData Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah Guido
 
Getting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4jGetting started with Graph Databases & Neo4j
Getting started with Graph Databases & Neo4j
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4j
 
Intro to Cypher
Intro to CypherIntro to Cypher
Intro to Cypher
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j Fundamentals
 
Exploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksExploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, Lucidworks
 
Building Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopBuilding Recommendation Platforms with Hadoop
Building Recommendation Platforms with Hadoop
 
Data modeling with neo4j tutorial
Data modeling with neo4j tutorialData modeling with neo4j tutorial
Data modeling with neo4j tutorial
 
Solr for Data Science
Solr for Data ScienceSolr for Data Science
Solr for Data Science
 
Neo4J : Introduction to Graph Database
Neo4J : Introduction to Graph DatabaseNeo4J : Introduction to Graph Database
Neo4J : Introduction to Graph Database
 
Introducing Neo4j graph database
Introducing Neo4j graph databaseIntroducing Neo4j graph database
Introducing Neo4j graph database
 
Windy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4jWindy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4j
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 
Using graphs for recommendations
Using graphs for recommendationsUsing graphs for recommendations
Using graphs for recommendations
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research Recommendations
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 

Ähnlich wie Query-time Nonparametric Regression with Temporally Bounded Models - Patrick Heck, Needham Software & David Smiley, D W Smiley LLC

Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample TrackingBruce Kozuma
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningSergey Karayev
 
Developer Night - Opticon18
Developer Night - Opticon18Developer Night - Opticon18
Developer Night - Opticon18Optimizely
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsRussell Jurney
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...Dataiku
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Lucidworks
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using sparkDatabricks
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 

Ähnlich wie Query-time Nonparametric Regression with Temporally Bounded Models - Patrick Heck, Needham Software & David Smiley, D W Smiley LLC (20)

kaggle_meet_up
kaggle_meet_upkaggle_meet_up
kaggle_meet_up
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
 
Developer Night - Opticon18
Developer Night - Opticon18Developer Night - Opticon18
Developer Night - Opticon18
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics Applications
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using spark
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 

Mehr von Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Mehr von Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

Query-time Nonparametric Regression with Temporally Bounded Models - Patrick Heck, Needham Software & David Smiley, D W Smiley LLC

  • 1. Query-time Nonparametric Regression with Temporally Bounded Models Gus Heck & David Smiley #Activate18 #ActivateSearch
  • 2. Agenda • Presenters • Query time Nonparametric Regression • Demo – Suggesting tagged images • Time Routed Aliases • Demo – Creating a Time Routed Alias
  • 3. Presenters Patrick (Gus) Heck • Solr Contributor since 2013 • Consulting since 2012 • Enterprise Search since 2010 • Apache Ant contributor 2003-2004 • Web Applications since 2003 David Smiley • Lucene/Solr Committer (PMC) • Consulting • Author of first book on Solr • Presentations & Training
  • 4. In the beginning.. Dave Mackey - Vision for an online coaching platform leveraging machine learning technology to help companies help their middle managers. Engaged me first as Chief Architect and later as CTO to bring this vision to reality. It worked, but was not funded...
  • 5. Brief Overview of Simply Coached Simply Coached provided online career coaching system Key Goal was connecting users with relevant curated content Relevance was determined by • Identifying topics the user is interested in by observing and learning from click-throughs • Several other customized metrics
  • 6. Learning What Interests the User Goals: • Suggest things of interest, things the user is willing to learn • Make predictions based on the user’s behavior • Unique prediction per user • Predictions based on entire user base • Avoid calcification of the predictive model, old data needs to be sunset regularly.
  • 7. Modeling Candidate - Neural Nets Simply Coached started right at the dawn of the “3rd wave” of neural networks Limitations of Neural Networks • They don’t adapt well to a changing problem • We would need to retrain predictive network regularly • Require massive data for training
  • 8. Modeling Candidate - Regression Not as “cool” as deep learning, but more suitable • Can converge faster with less data • But, most techniques are assuming that the residuals are normally distributed • Many of our features will be binary • Normality is right out the window I decided to look for a Non-Parametric Regression technique
  • 9. Non-Parametric Multiplicative Regression • Can predict continuous variable • Good for sorting/ranking • Kernel smoother, presence/absence kernels already well known • Non-parametric Nice discrete sub-calculations... Used in Habitat Ecology for predicting habitat suitability https://ir.library.oregonstate.edu/concern/defaults/z029p910z
  • 10. NPMR Equations Performance across multiple factors (features) Local Mean estimator Example kernels continuous or presence/absence: Key realization: the products and sums for i,j can pre precalculated: I called these pre-calculated portions “Partials” and added temporal metadata
  • 11. But wait... What are we predicting? Needed a continuous statistic to predict Thesis: Given a good teaser etc., the user will click more rapidly on more interesting articles. Suggest documents in order of predicted reaction time
  • 12. Demo Three users: • Sporty Black likes black sports cars • Cooper Blue likes blue coupes • Racy Yeller likes yellow formula cars System was trained by presenting ~20 sets of 9 randomly selected cars and clicking with varying speed Images copyright Andrew Mauldin, used with permission
  • 13. Streaming Expression! As mentioned, NPMR prediction was only part of the overall product It occupies the top line of Boxes A1 and B1
  • 14. The actual demo expression
  • 15. Streaming Expressions The final sorted list of documents can be calculated The bold portion is the final summation
  • 16. Continuously running, recalculating sums/products once a minute Included traditional document indexing (article scanner) and also streaming expression update() (send_per_user_sum_update) By end, running continuously for months at a time unattended. http://www.jesterj.org for more info about JesterJ Indexing With JesterJ
  • 17. Adapting and Scaling • Adapt at query time with filters, could also filter in other ways with additional metadata • Four Dimensions: 1. Content to recommend - constant set 2. Users - per user pricing to the rescue 3. Activity - price for average activity level 4. Time - data accumulates over time Time Routed Aliases are a solution to #4
  • 19. Have a Lot of Timestamped Data? And you need keyword search and/or analytics (faceting/aggregations) capabilities. Examples: • Logs • Sensor data (IoT) • Social media posts Characteristics: tons of docs, continuously flowing in, limited retention
  • 20. Strategies Hash Partitioned: • One collection, hash routed shards (built-in) Time Partitioned: • One collection, time routed shards (DIY) • Time partitioned collections (DIY) • Time partitioned collections via TRAs (built-in) “Partitioned” = “Routed” = “Organized”
  • 21. One Collection, Hash Routed Shards Hash on ID, even distribution router.name=compositeId + Easy (default) + High write throughput - Deleted dead-weight - Poor realtime search - Queries execute everywhere - Inflexible sizing - … thus Expensive (uniform hardware requirements) Deleted docs Live docs See DocExpirationUpdateProcessorFactory shard1 shard4 shard7 2 5 8 9
  • 22. Time Partitioned Data Generally better... • New data always goes to most recent partition • Variable or equal sized (depends on approach) • Addresses all negatives of hash routing • Write throughput can be addressed by sharding within partition • Opportunities to “optimize” aged indexes • Flexibility in assigning partitions to better or cheaper hardware • … all leads to cost savings How?... 2017-01 2017-02 2017-03 2017-04 2017-05 2017-06 2017-07 2017-08 2017-09
  • 23. Time Partitioned Data Implementation Strategies: • One Collection, router.name=implicit • See this sample code: (DIY) https://github.com/cga-harvard/hhypermap- bop/tree/master/bop-core/solr- plugins/src/main/java/edu/harvard/gis/hhypermap/bop/solrplugins by me • Multiple Collections • See this blog: (DIY) http://blog.cloudera.com/blog/2013/10/collection-aliasing-near-real-time-search-for- really-big-data/ by Mark Miller • TRAs… (built-in) 2017-01 2017-02 2017-03 2017-04 2017-05 2017-06 2017-07 2017-08 2017-09
  • 24. About Collection Aliases SolrCloud supports collection aliases Aliases point to one or more collections Ex: Alias “tra-demo” → 2017-09, 2017-08, 2017-07, ... 2017-01 2017-02 2017-03 2017-04 2017-05 2017-06 2017-07 2017-08 2017-09
  • 25. Time Routed Aliases Aliases have new tricks up their sleeves… • Aliases now have metadata (API to read & edit) • Can create a “time routed alias” w/ first collection • Collections in a TRA can • Route update requests to the correct collections • Adds/deletes collections automatically • data driven • just-in-time or preemptively Mutable configuration, mostly New in Solr 7.3
  • 26. TRA Creation (using V2 API) curl http://localhost:8983/api/c -H 'Content-type:application/json' -d '{ "create-alias":{ "name": "tra-demo", "router": { "name": "time", "field": "evt_dt", "start": "2018-01-01T00:00:00Z", "interval": "+1DAY", "autoDeleteAge": "-2DAY" }, "create-collection": { "config": "_default", "numShards": 2, "maxShardsPerNode": 2 } } }'
  • 27. TRAs, the fine print TODOs • Size capped (e.g. 5M docs / collection) • Query routing to subset of collections • Better “auto-scaling” tie-ins • “optimize” of older collections • TRA deletion, ease of use TRAs may not be for everyone • Partitioning is strict and must be adjacent (no gaps) • Doesn’t work with CDCR
  • 28. Thank you! Gus Heck & David Smiley #Activate18 #ActivateSearch