Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Search Analytics Component 
Steven Bower 
©2014 Bloomberg L.P.
Bloomberg 
• Largest provider of financial news and information 
• Our strength is quickly and accurately delivering data,...
Bloomberg Search Team 
• Search infrastructure 
• Develop and support search as a service platform 
• Support for other se...
Our Challenge
Our Approach 
• Use Search/Solr as it provides flexible search/filtering over large, fast moving, 
result sets 
• Initiall...
DocValues 
• DocValues provide high performance 
columnar access to fields in the index 
• No un-inversion cost 
• Increas...
Analytics Component 
• New component from the ground up 
• Designed/Implemented by the Bloomberg Search Team over summer o...
Features 
• Flexible/Extendable framework for adding additional statistics/faceting 
• Supports Multiple Analytics Request...
Features - Faceting 
• Field Faceting 
• Support for int, long, float, double, date, string fields 
• Support for multi-va...
Features – Map Operators 
• Basic Math 
• neg(<expr>) 
• add(<expr>,...) 
• mult(<expr>,...) 
• div(<expr>,<expr>) 
• pow(...
Features – Reduction Operators 
• Statistical 
• min(<expr>) 
• max(<expr>) 
• sum(<expr>) 
• count(<expr>) 
• miss(<expr>...
Examples 
• Weighted Average 
• Calculate weighted average of field_a with field_b as the weight 
div( mean( mult(field_a,...
Examples 
• T-Score 
• Calculate a t-score where ## is the value and all values in your sample are stored in field_a. 
div...
How We Use It 
• Segment, aggregate and analyze 
financial data quickly 
• Aggregate time series data across 
multiple fie...
Future Plans 
• Multi-shard support 
• Pivot Facet Support 
• Statistics on Multi-value fields 
• To support unique() 
• F...
Links and Questions? 
Analytics Component 
h"ps://issues.apache.org/jira/browse/SOLR-­‐5302 
More About Bloomberg 
h"p://w...
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Nächste SlideShare
Wird geladen in …5
×

Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.

5.157 Aufrufe

Veröffentlicht am

Presented at Lucene/Solr Revolution 2014

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.

  1. 1. Search Analytics Component Steven Bower ©2014 Bloomberg L.P.
  2. 2. Bloomberg • Largest provider of financial news and information • Our strength is quickly and accurately delivering data, news and analytics • Creating high performance and accurate information retrieval systems is core to our strength
  3. 3. Bloomberg Search Team • Search infrastructure • Develop and support search as a service platform • Support for other search applications within the company • Consultancy • Provide design consultancy/support to application teams • Promote search best practices/standardization throughout the company • Machine learning • Develop machine learning techniques to improve relevancy • Create natural language processors to answer questions • Unified search • Create information retrieval tools to organize and connect the vast and varied datasets provided to our clients
  4. 4. Our Challenge
  5. 5. Our Approach • Use Search/Solr as it provides flexible search/filtering over large, fast moving, result sets • Initially used StatsComponent, but quickly ran into limitations • Wanted to push the bounds of analytics capabilities in Solr/Lucene • Needed a pluggable framework to perform complex calculations/aggregations on numerical time-series data • DocValues provided high performance columnar access to fields in the index (without un-inversion cost)
  6. 6. DocValues • DocValues provide high performance columnar access to fields in the index • No un-inversion cost • Increased storage footprint • Helps achieve NRT • Values live off-heap in memory map
  7. 7. Analytics Component • New component from the ground up • Designed/Implemented by the Bloomberg Search Team over summer of 2013 • Initial implementation was built using DocValues API directly, but moved to FieldCache • Refactored existing faceting implementation to support analytics • Created simple prefix notation for statistical expressions • Available as a Solr Contrib module in Solr 5.x or patches for 4.8+ on SOLR-5302
  8. 8. Features • Flexible/Extendable framework for adding additional statistics/faceting • Supports Multiple Analytics Requests per query execution • Multiple statistic calculations per request • Multiple facets per request • Each request can facet statistics over different fields and ranges
  9. 9. Features - Faceting • Field Faceting • Support for int, long, float, double, date, string fields • Support for multi-value fields • Support for limit, offset and mincount • Support for sorting of stats-facets by any statistic (i.e. sort by mean) • Range faceting • Numeric types and dates • Dynamically calculate range/gap based on calculated statistics • Support for query faceting of stats • Use calculated statistics to generate facet queries
  10. 10. Features – Map Operators • Basic Math • neg(<expr>) • add(<expr>,...) • mult(<expr>,...) • div(<expr>,<expr>) • pow(<expr>,<expr>) • log(<expr>,<expr>) • Constants • const_num(<number>) • const_date(<date>) • const_str(<string>) • Date Math • date_math(<date expr>,<date op>,...) • String operations • rev(<expr>) • concat(<expr>,...) • Field • <field> • Missing Values • miss(<expr>,<value>)
  11. 11. Features – Reduction Operators • Statistical • min(<expr>) • max(<expr>) • sum(<expr>) • count(<expr>) • miss(<expr>) • unique(<expr>) • Complex • sumofsquares(<expr>) • mean(<expr>) • stddev(<expr>) • median(<expr>) • percentile(<expr>)
  12. 12. Examples • Weighted Average • Calculate weighted average of field_a with field_b as the weight div( mean( mult(field_a, field_b) ), sum(field_b) ) • Variance • Calculate the variance of field_a pow( stddev(field_a), const_num(2) )
  13. 13. Examples • T-Score • Calculate a t-score where ## is the value and all values in your sample are stored in field_a. div( add( const_num(##), neg( mean(field_a) ) ), div( stddev(field_a), pow( count(field_a), const_num(.5) ) ) )
  14. 14. How We Use It • Segment, aggregate and analyze financial data quickly • Aggregate time series data across multiple fields to render charts • Created flexible diagnostic tools/ visualizations to analyze Solr performance
  15. 15. Future Plans • Multi-shard support • Pivot Facet Support • Statistics on Multi-value fields • To support unique() • Filter result set based upon calculated statistics • Generalize facet implementation
  16. 16. Links and Questions? Analytics Component h"ps://issues.apache.org/jira/browse/SOLR-­‐5302 More About Bloomberg h"p://www.bloomberglabs.com/

×