3. Bloomberg
⢠Largest provider of financial news and information
⢠Our strength is quickly and accurately delivering data, news and analytics
⢠Creating high performance and accurate information retrieval systems is core to
our strength
4. Bloomberg Search Team
⢠Search infrastructure
⢠Develop and support search as a service platform
⢠Support for other search applications within the company
⢠Consultancy
⢠Provide design consultancy/support to application teams
⢠Promote search best practices/standardization throughout the company
⢠Machine learning
⢠Develop machine learning techniques to improve relevancy
⢠Create natural language processors to answer questions
⢠Unified search
⢠Create information retrieval tools to organize and connect the vast and varied
datasets provided to our clients
6. Our Approach
⢠Use Search/Solr as it provides flexible search/filtering over large, fast moving,
result sets
⢠Initially used StatsComponent, but quickly ran into limitations
⢠Wanted to push the bounds of analytics capabilities in Solr/Lucene
⢠Needed a pluggable framework to perform complex calculations/aggregations on
numerical time-series data
⢠DocValues provided high performance columnar access to fields in the index
(without un-inversion cost)
7. DocValues
⢠DocValues provide high performance
columnar access to fields in the index
⢠No un-inversion cost
⢠Increased storage footprint
⢠Helps achieve NRT
⢠Values live off-heap in memory map
8. Analytics Component
⢠New component from the ground up
⢠Designed/Implemented by the Bloomberg Search Team over summer of 2013
⢠Initial implementation was built using DocValues API directly, but moved to
FieldCache
⢠Refactored existing faceting implementation to support analytics
⢠Created simple prefix notation for statistical expressions
⢠Available as a Solr Contrib module in Solr 5.x or patches for 4.8+ on SOLR-5302
9. Features
⢠Flexible/Extendable framework for adding additional statistics/faceting
⢠Supports Multiple Analytics Requests per query execution
⢠Multiple statistic calculations per request
⢠Multiple facets per request
⢠Each request can facet statistics over different fields and ranges
10. Features - Faceting
⢠Field Faceting
⢠Support for int, long, float, double, date, string fields
⢠Support for multi-value fields
⢠Support for limit, offset and mincount
⢠Support for sorting of stats-facets by any statistic (i.e. sort by mean)
⢠Range faceting
⢠Numeric types and dates
⢠Dynamically calculate range/gap based on calculated statistics
⢠Support for query faceting of stats
⢠Use calculated statistics to generate facet queries
13. Examples
⢠Weighted Average
⢠Calculate weighted average of field_a with field_b as the weight
div( mean( mult(field_a, field_b) ), sum(field_b) )
⢠Variance
⢠Calculate the variance of field_a
pow( stddev(field_a), const_num(2) )
14. Examples
⢠T-Score
⢠Calculate a t-score where ## is the value and all values in your sample are stored in field_a.
div( add( const_num(##), neg( mean(field_a) ) ),
div( stddev(field_a), pow( count(field_a), const_num(.5) ) ) )
15. How We Use It
⢠Segment, aggregate and analyze
financial data quickly
⢠Aggregate time series data across
multiple fields to render charts
⢠Created flexible diagnostic tools/
visualizations to analyze Solr
performance
16. Future Plans
⢠Multi-shard support
⢠Pivot Facet Support
⢠Statistics on Multi-value fields
⢠To support unique()
⢠Filter result set based upon calculated statistics
⢠Generalize facet implementation
17. Links and Questions?
Analytics Component
h"ps://issues.apache.org/jira/browse/SOLR-Ââ5302
More About Bloomberg
h"p://www.bloomberglabs.com/