SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Using SweetSpotSimilarity for
Solr Fulltext Indexing
(A Public Service Message)
Jay Luker
SAO/NASA Astrophysics Data System
http://adsabs.harvard.edu/
From http://lucene.apache.org/java/2_9_3/api/all/org/apache/lucene/search/Similarity.html
Score for a
particular
result
Buncha stuff you probably
ought to read up on.
"encapsulates a
few (indexing
time) boost and
length factors"
{
norm(t,d)
Includes...
● Document boost - e.g. <doc boost="2.5">
● Field boost - e.g. <field boost="3.0">
and what we're concerned with...
● lengthNorm(field) - computed at index time based
on the number of tokens in the field of the input
document.
These factors, multiplied together, make up the norm(t,
d) for a given document
lengthNorm(String fieldName, int numTokens)
"Matches in longer fields are less precise, so implementations of
this method usually return smaller values when numTokens is
large, and larger values when numTokens is small."
Translation:
SHORTER DOCUMENTS SCORE HIGHER
from the javadoc:
changes this ...
to this ...
lengthNorm(L) =
1
sqrt(L)
SweetSpotSimilarity
lucene/contrib/misc/...
lengthNorm(L) =
1
sqrt(steepness*(|L-min|+|L-max|-(max-min))+1)
min/max = your "sweet spot" range. Lengths within
this range compute to a constant, i.e., 1.
steepness = controls the curve up to and down from
the sweet spot "plateau".
(termcounts for all ADS's searchable fulltext since 01/2000)
<similarity class="org.ads.solr.SweetSpotSimilarityFactory">
<str name="min">1000</str>
<str name="max">20000</str>
<str name="steepness">0.5</str>
</similarity>
In schema.xml
public class SweetSpotSimilarityFactory extends SimilarityFactory {
public static final Logger log = 
LoggerFactory.getLogger(SolrResourceLoader.class);
@Override
public Similarity getSimilarity() {
SweetSpotSimilarity sim = new SweetSpotSimilarity();
int max = this.params.getInt("max");
int min = this.params.getInt("min");
float steepness = this.params.getFloat("steepness");
log.info("max: " + max);
log.info("min: " + min);
log.info("steepness: " + steepness);
// yuck! hardcoded field settings for now
sim.setLengthNormFactors("body", min, max, steepness, true);
return sim;
}
}
Thanks!
Further reading:
"Lucene and Juru at TREC 2007: 1-Million Queries Track"
http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf
Also, check out our Blacklight beta search!
http://labs.adsabs.harvard.edu/fulltext

Weitere ähnliche Inhalte

Was ist angesagt?

Similarity Measurement Preliminary Results
Similarity  Measurement  Preliminary ResultsSimilarity  Measurement  Preliminary Results
Similarity Measurement Preliminary Resultsxiaojuzheng
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clusteringguest0edcaf
 
Query Optimization
Query OptimizationQuery Optimization
Query Optimizationrohitsalunke
 
ML2014_Poster_ TextClusteringDemo
ML2014_Poster_ TextClusteringDemoML2014_Poster_ TextClusteringDemo
ML2014_Poster_ TextClusteringDemoGeorge Simov
 
LoryfelNunezInsight
LoryfelNunezInsightLoryfelNunezInsight
LoryfelNunezInsightLory Nunez
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
RDBMS
RDBMSRDBMS
RDBMSsowfi
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...IJCSEA Journal
 
Modelling Accessibility Performance in LTE networks, An Analytics Methodology
Modelling Accessibility Performance in LTE networks, An Analytics MethodologyModelling Accessibility Performance in LTE networks, An Analytics Methodology
Modelling Accessibility Performance in LTE networks, An Analytics Methodologyalien_gmx
 
Poster-SetCoverAlgorithm
Poster-SetCoverAlgorithmPoster-SetCoverAlgorithm
Poster-SetCoverAlgorithmDivya Jain
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTDavide Gallitelli
 

Was ist angesagt? (18)

Similarity Measurement Preliminary Results
Similarity  Measurement  Preliminary ResultsSimilarity  Measurement  Preliminary Results
Similarity Measurement Preliminary Results
 
Textmining Retrieval And Clustering
Textmining Retrieval And ClusteringTextmining Retrieval And Clustering
Textmining Retrieval And Clustering
 
Query Optimization
Query OptimizationQuery Optimization
Query Optimization
 
Os
OsOs
Os
 
ML2014_Poster_ TextClusteringDemo
ML2014_Poster_ TextClusteringDemoML2014_Poster_ TextClusteringDemo
ML2014_Poster_ TextClusteringDemo
 
LoryfelNunezInsight
LoryfelNunezInsightLoryfelNunezInsight
LoryfelNunezInsight
 
LoryfelNunez
LoryfelNunezLoryfelNunez
LoryfelNunez
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
RDBMS
RDBMSRDBMS
RDBMS
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
 
Journal paper 1
Journal paper 1Journal paper 1
Journal paper 1
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
 
Modelling Accessibility Performance in LTE networks, An Analytics Methodology
Modelling Accessibility Performance in LTE networks, An Analytics MethodologyModelling Accessibility Performance in LTE networks, An Analytics Methodology
Modelling Accessibility Performance in LTE networks, An Analytics Methodology
 
Poster-SetCoverAlgorithm
Poster-SetCoverAlgorithmPoster-SetCoverAlgorithm
Poster-SetCoverAlgorithm
 
Entropy scaling search method
Entropy scaling search methodEntropy scaling search method
Entropy scaling search method
 
Query trees
Query treesQuery trees
Query trees
 
Ghost
GhostGhost
Ghost
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDT
 

Ähnlich wie Using SweetSpotSimilarity for Solr Fulltext Indexing

Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learningtelss09
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluationavniS
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code executionAlexander Decker
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code executionAlexander Decker
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
Declarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTDeclarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTdiannepatricia
 
Implementation of query optimization for reducing run time
Implementation of query optimization for reducing run timeImplementation of query optimization for reducing run time
Implementation of query optimization for reducing run timeAlexander Decker
 
A look ahead at spark 2.0
A look ahead at spark 2.0 A look ahead at spark 2.0
A look ahead at spark 2.0 Databricks
 
엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630Yong Joon Moon
 
Query optimization for_sensor_networks
Query optimization for_sensor_networksQuery optimization for_sensor_networks
Query optimization for_sensor_networksHarshavardhan Achrekar
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)maclean liu
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithmTrupti Agrawal
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachFindwise
 

Ähnlich wie Using SweetSpotSimilarity for Solr Fulltext Indexing (20)

Aggarwal Draft
Aggarwal DraftAggarwal Draft
Aggarwal Draft
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
 
Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Query optimization to improve performance of the code execution
Query optimization to improve performance of the code executionQuery optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
 
11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution11.query optimization to improve performance of the code execution
11.query optimization to improve performance of the code execution
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Declarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemTDeclarative Multilingual Information Extraction with SystemT
Declarative Multilingual Information Extraction with SystemT
 
Implementation of query optimization for reducing run time
Implementation of query optimization for reducing run timeImplementation of query optimization for reducing run time
Implementation of query optimization for reducing run time
 
A look ahead at spark 2.0
A look ahead at spark 2.0 A look ahead at spark 2.0
A look ahead at spark 2.0
 
엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630
 
Query optimization for_sensor_networks
Query optimization for_sensor_networksQuery optimization for_sensor_networks
Query optimization for_sensor_networks
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
 
Stress test data pipeline
Stress test data pipelineStress test data pipeline
Stress test data pipeline
 
The life of a query (oracle edition)
The life of a query (oracle edition)The life of a query (oracle edition)
The life of a query (oracle edition)
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised Approach
 

Mehr von Jay Luker

Learning Engineering Initiatives at Harvard DCE
Learning Engineering Initiatives at Harvard DCELearning Engineering Initiatives at Harvard DCE
Learning Engineering Initiatives at Harvard DCEJay Luker
 
N Characters in Search of an Author: Improving Author Name Indexing & Searchi...
N Characters in Search of an Author: Improving Author Name Indexing & Searchi...N Characters in Search of an Author: Improving Author Name Indexing & Searchi...
N Characters in Search of an Author: Improving Author Name Indexing & Searchi...Jay Luker
 
Letting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search ComponentLetting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search ComponentJay Luker
 
LexFarm Busa Farm Site Plan
LexFarm Busa Farm Site PlanLexFarm Busa Farm Site Plan
LexFarm Busa Farm Site PlanJay Luker
 
LexFarm Presentation
LexFarm PresentationLexFarm Presentation
LexFarm PresentationJay Luker
 
LexFarm Proposal
LexFarm ProposalLexFarm Proposal
LexFarm ProposalJay Luker
 

Mehr von Jay Luker (7)

Coinage
CoinageCoinage
Coinage
 
Learning Engineering Initiatives at Harvard DCE
Learning Engineering Initiatives at Harvard DCELearning Engineering Initiatives at Harvard DCE
Learning Engineering Initiatives at Harvard DCE
 
N Characters in Search of an Author: Improving Author Name Indexing & Searchi...
N Characters in Search of an Author: Improving Author Name Indexing & Searchi...N Characters in Search of an Author: Improving Author Name Indexing & Searchi...
N Characters in Search of an Author: Improving Author Name Indexing & Searchi...
 
Letting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search ComponentLetting In the Light: Using Solr as an External Search Component
Letting In the Light: Using Solr as an External Search Component
 
LexFarm Busa Farm Site Plan
LexFarm Busa Farm Site PlanLexFarm Busa Farm Site Plan
LexFarm Busa Farm Site Plan
 
LexFarm Presentation
LexFarm PresentationLexFarm Presentation
LexFarm Presentation
 
LexFarm Proposal
LexFarm ProposalLexFarm Proposal
LexFarm Proposal
 

Kürzlich hochgeladen

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 

Kürzlich hochgeladen (20)

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 

Using SweetSpotSimilarity for Solr Fulltext Indexing

  • 1. Using SweetSpotSimilarity for Solr Fulltext Indexing (A Public Service Message) Jay Luker SAO/NASA Astrophysics Data System http://adsabs.harvard.edu/
  • 2. From http://lucene.apache.org/java/2_9_3/api/all/org/apache/lucene/search/Similarity.html Score for a particular result Buncha stuff you probably ought to read up on. "encapsulates a few (indexing time) boost and length factors" {
  • 3. norm(t,d) Includes... ● Document boost - e.g. <doc boost="2.5"> ● Field boost - e.g. <field boost="3.0"> and what we're concerned with... ● lengthNorm(field) - computed at index time based on the number of tokens in the field of the input document. These factors, multiplied together, make up the norm(t, d) for a given document
  • 4. lengthNorm(String fieldName, int numTokens) "Matches in longer fields are less precise, so implementations of this method usually return smaller values when numTokens is large, and larger values when numTokens is small." Translation: SHORTER DOCUMENTS SCORE HIGHER from the javadoc:
  • 5. changes this ... to this ... lengthNorm(L) = 1 sqrt(L) SweetSpotSimilarity lucene/contrib/misc/... lengthNorm(L) = 1 sqrt(steepness*(|L-min|+|L-max|-(max-min))+1)
  • 6. min/max = your "sweet spot" range. Lengths within this range compute to a constant, i.e., 1. steepness = controls the curve up to and down from the sweet spot "plateau".
  • 7. (termcounts for all ADS's searchable fulltext since 01/2000)
  • 8. <similarity class="org.ads.solr.SweetSpotSimilarityFactory"> <str name="min">1000</str> <str name="max">20000</str> <str name="steepness">0.5</str> </similarity> In schema.xml
  • 9. public class SweetSpotSimilarityFactory extends SimilarityFactory { public static final Logger log = LoggerFactory.getLogger(SolrResourceLoader.class); @Override public Similarity getSimilarity() { SweetSpotSimilarity sim = new SweetSpotSimilarity(); int max = this.params.getInt("max"); int min = this.params.getInt("min"); float steepness = this.params.getFloat("steepness"); log.info("max: " + max); log.info("min: " + min); log.info("steepness: " + steepness); // yuck! hardcoded field settings for now sim.setLengthNormFactors("body", min, max, steepness, true); return sim; } }
  • 10.
  • 11.
  • 12. Thanks! Further reading: "Lucene and Juru at TREC 2007: 1-Million Queries Track" http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf Also, check out our Blacklight beta search! http://labs.adsabs.harvard.edu/fulltext