Search Analytics Component 
Steven Bower 
©2014 Bloomberg L.P.
Bloomberg 
• Largest provider of financial news and information 
• Our strength is quickly and accurately delivering data,...
Bloomberg Search Team 
• Search infrastructure 
• Develop and support search as a service platform 
• Support for other se...
Our Challenge
Our Approach 
• Use Search/Solr as it provides flexible search/filtering over large, fast moving, 
result sets 
• Initiall...
DocValues 
• DocValues provide high performance 
columnar access to fields in the index 
• No un-inversion cost 
• Increas...
Analytics Component 
• New component from the ground up 
• Designed/Implemented by the Bloomberg Search Team over summer o...
Features 
• Flexible/Extendable framework for adding additional statistics/faceting 
• Supports Multiple Analytics Request...
Features - Faceting 
• Field Faceting 
• Support for int, long, float, double, date, string fields 
• Support for multi-va...
Features – Map Operators 
• Basic Math 
• neg(<expr>) 
• add(<expr>,...) 
• mult(<expr>,...) 
• div(<expr>,<expr>) 
• pow(...
Features – Reduction Operators 
• Statistical 
• min(<expr>) 
• max(<expr>) 
• sum(<expr>) 
• count(<expr>) 
• miss(<expr>...
Examples 
• Weighted Average 
• Calculate weighted average of field_a with field_b as the weight 
div( mean( mult(field_a,...
Examples 
• T-Score 
• Calculate a t-score where ## is the value and all values in your sample are stored in field_a. 
div...
How We Use It 
• Segment, aggregate and analyze 
financial data quickly 
• Aggregate time series data across 
multiple fie...
Future Plans 
• Multi-shard support 
• Pivot Facet Support 
• Statistics on Multi-value fields 
• To support unique() 
• F...
Links and Questions? 
Analytics Component 
h"ps://issues.apache.org/jira/browse/SOLR-­‐5302 
More About Bloomberg 
h"p://w...
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Nächste SlideShare
Wird geladen in …5
×

Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.

4.581 Aufrufe

Veröffentlicht am

Presented at Lucene/Solr Revolution 2014

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.

  1. 1. Search Analytics Component Steven Bower ©2014 Bloomberg L.P.
  2. 2. Bloomberg • Largest provider of financial news and information • Our strength is quickly and accurately delivering data, news and analytics • Creating high performance and accurate information retrieval systems is core to our strength
  3. 3. Bloomberg Search Team • Search infrastructure • Develop and support search as a service platform • Support for other search applications within the company • Consultancy • Provide design consultancy/support to application teams • Promote search best practices/standardization throughout the company • Machine learning • Develop machine learning techniques to improve relevancy • Create natural language processors to answer questions • Unified search • Create information retrieval tools to organize and connect the vast and varied datasets provided to our clients
  4. 4. Our Challenge
  5. 5. Our Approach • Use Search/Solr as it provides flexible search/filtering over large, fast moving, result sets • Initially used StatsComponent, but quickly ran into limitations • Wanted to push the bounds of analytics capabilities in Solr/Lucene • Needed a pluggable framework to perform complex calculations/aggregations on numerical time-series data • DocValues provided high performance columnar access to fields in the index (without un-inversion cost)
  6. 6. DocValues • DocValues provide high performance columnar access to fields in the index • No un-inversion cost • Increased storage footprint • Helps achieve NRT • Values live off-heap in memory map
  7. 7. Analytics Component • New component from the ground up • Designed/Implemented by the Bloomberg Search Team over summer of 2013 • Initial implementation was built using DocValues API directly, but moved to FieldCache • Refactored existing faceting implementation to support analytics • Created simple prefix notation for statistical expressions • Available as a Solr Contrib module in Solr 5.x or patches for 4.8+ on SOLR-5302
  8. 8. Features • Flexible/Extendable framework for adding additional statistics/faceting • Supports Multiple Analytics Requests per query execution • Multiple statistic calculations per request • Multiple facets per request • Each request can facet statistics over different fields and ranges
  9. 9. Features - Faceting • Field Faceting • Support for int, long, float, double, date, string fields • Support for multi-value fields • Support for limit, offset and mincount • Support for sorting of stats-facets by any statistic (i.e. sort by mean) • Range faceting • Numeric types and dates • Dynamically calculate range/gap based on calculated statistics • Support for query faceting of stats • Use calculated statistics to generate facet queries
  10. 10. Features – Map Operators • Basic Math • neg(<expr>) • add(<expr>,...) • mult(<expr>,...) • div(<expr>,<expr>) • pow(<expr>,<expr>) • log(<expr>,<expr>) • Constants • const_num(<number>) • const_date(<date>) • const_str(<string>) • Date Math • date_math(<date expr>,<date op>,...) • String operations • rev(<expr>) • concat(<expr>,...) • Field • <field> • Missing Values • miss(<expr>,<value>)
  11. 11. Features – Reduction Operators • Statistical • min(<expr>) • max(<expr>) • sum(<expr>) • count(<expr>) • miss(<expr>) • unique(<expr>) • Complex • sumofsquares(<expr>) • mean(<expr>) • stddev(<expr>) • median(<expr>) • percentile(<expr>)
  12. 12. Examples • Weighted Average • Calculate weighted average of field_a with field_b as the weight div( mean( mult(field_a, field_b) ), sum(field_b) ) • Variance • Calculate the variance of field_a pow( stddev(field_a), const_num(2) )
  13. 13. Examples • T-Score • Calculate a t-score where ## is the value and all values in your sample are stored in field_a. div( add( const_num(##), neg( mean(field_a) ) ), div( stddev(field_a), pow( count(field_a), const_num(.5) ) ) )
  14. 14. How We Use It • Segment, aggregate and analyze financial data quickly • Aggregate time series data across multiple fields to render charts • Created flexible diagnostic tools/ visualizations to analyze Solr performance
  15. 15. Future Plans • Multi-shard support • Pivot Facet Support • Statistics on Multi-value fields • To support unique() • Filter result set based upon calculated statistics • Generalize facet implementation
  16. 16. Links and Questions? Analytics Component h"ps://issues.apache.org/jira/browse/SOLR-­‐5302 More About Bloomberg h"p://www.bloomberglabs.com/

×