SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
1©MapR Technologies - Confidential
Crowd Sourcing Reflected Intelligence
Using Search and Big Data
Ted Dunning, MapR
Grant Ingersoll, LucidWorks
2©MapR Technologies - Confidential
Is Search Enough?
• Keyword search is a commodity
• Holistic view of the data and the
user interactions with that data are
critical
• Search, Discovery and Analytics are
the key to unlocking this view of
users and data
Search
MapR and LucidWorks
3©MapR Technologies - Confidential
Grant’s Background
 Co-founder:
– LucidWorks – CTO
– Apache Mahout
 Long time Lucene/Solr committer
 Author: Taming Text
 Background in IR and NLP
– Built CLIR, QA and a variety of other search-based apps
4©MapR Technologies - Confidential
Ted’s Background
 Academia, Startups
– MapR, MusicMatch, ID Analytics, Veoh Networks
– Big data since before big
 Open source
– since the dark ages before the internet
– Mahout, Zookeeper, Drill
– bought the beer at first HUG
 Co-Author
– Mahout in Action
 MapR
– Chief Application Architect
 Founding member of Apache Drill
5©MapR Technologies - Confidential
Agenda
 Intro
 Search (R)evolution
 Reflected Intelligence Use Cases
 Building a Next Generation Search and Discovery Platform
– MapR
– LucidWorks
 1+1=3
6©MapR Technologies - Confidential
User Interactions With Big Data
Data
Data
Data
DFS
Key Value
Store
Index
Command
Line
Query
Language
Keyword
Search
System
Administrator
Engineer
End User
7©MapR Technologies - Confidential
User Interactions With Big Data
Data
Data
Data
DFS
Key Value
Store
Index
Command
Line
Query
Language
Keyword
Search
System
Administrator
Engineer
End User
Reflected
Intelligence
8©MapR Technologies - Confidential
Search (R)evolution
 Search use leads to search abuse
– denormalization frees your mind
– scoring is just a sparse matrix multiply
 Lucene/Solr evolution
– non free text usages abound
– many DB-like features
– noSQL before NoSQL was cool
– flexible indexing
– finite State Transducers FTW!
 Scale
 “This ain’t your father’s relevance anymore”
9©MapR Technologies - Confidential
Search, Discovery and Analytics
 Large-scale analysis is key to reflected intelligence
– correlation analysis
• based on queries, clicks, mouse tracks,
even explicit feedback
• produce clusters, trends, topics, SIP’s
– start with engineered knowledge,
refine with user feedback
 Large-scale discovery features
encourage experimentation
 Always test, always enrich!
Search
DiscoveryAnalytics
10©MapR Technologies - Confidential
Social Media Analysis in Telecom
 Goal
– Detect flash-mob traffic events
– Provision additional resources before failures
 Method: Correlate mobile traffic analysis with social media analysis
– events cause traffic micro-bursts
– participants tweet the events ahead of time
– tweet locations converge on burst location
 Deploy operations faster to predict outages and better handle
emergency situations
– high cost bandwidth augmentation can be marshaled as the traffic appears
– anticipation beats reaction
11©MapR Technologies - Confidential
Provenance is 80% of Value
 Problem
– Broadcasters don’t know what audiences really like at a micro level
 Method:
– Analysis of social media to determine advertising reach and response
– Time resolution of social traffic can provide detailed response metrics
 Results:
– In one case the untargeted advertising was worth 5x more if with
supporting response data
12©MapR Technologies - Confidential
Claims Analysis
 Goal
– Insurance claims processing and analysis
– fraud analysis
 Method
– Combine free text search with metadata analysis to identify high risk
activities across the country
– Integrate with corporate workflows to detect and fix outliers in customer
relations
 Results
– Questions that took 24-48 hours now take seconds to answer
13©MapR Technologies - Confidential
Virginia Tech - Help the World
 Grab data around crisis
 Search immediately
 Large-scale analysis enriches data to find
ways to improve responses and
understanding
 http://www.ctrnet.net
14©MapR Technologies - Confidential
Can Search Catch the Bad Guys?
 Online Drug Counterfeit
detection
 Identify commonly used language
indicating counterfeits
– you know it when you see it
– and you know you have seen it
 Feed to analyst via search-driven
application
– enrich based on analysts feedback
15©MapR Technologies - Confidential
Veoh - Cross Recommendations
 Cross recommendation as search
– with search used to build cross recommendation!
 Recommend content to people who exhibit certain behaviors
(clicks, query terms, other)
 (Ab)use of a search engine
– but not as a search engine for content
– more like a search engine for behavior
16©MapR Technologies - Confidential
Recommendation Basics
 History:
User Thing
1 3
2 4
3 4
2 3
3 2
1 1
2 1
17©MapR Technologies - Confidential
Recommendation Basics
 History as matrix:
 t1+t3 cooccur 2 times, t1+t4 once, t2+t4 once
t1 t2 t3 t4
u1 1 0 1 0
u2 1 0 1 1
u3 0 1 0 1
18©MapR Technologies - Confidential
Recommendation Basics
 Coocurrence
t1 t2 t3 t4
t1 2 0 2 1
t2 0 1 0 1
t3 2 0 1 1
t4 1 1 1 2
t3 not t3
t1 2 1
not t1 1 1
21©MapR Technologies - Confidential
Spot the Anomaly
 Root LLR is roughly like standard deviations
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
0.44 0.98
2.26 7.15
22©MapR Technologies - Confidential
Threshold by Score
 Coocurrence
t1 t2 t3 t4
t1 2 0 2 1
t2 0 1 0 1
t3 2 0 1 1
t4 1 1 1 2
23©MapR Technologies - Confidential
Threshold by Score
 Indicators
t1 t2 t3 t4
t1 1 0 0 1
t2 0 1 0 1
t3 0 0 1 1
t4 1 0 0 1
indicator: t2, t4
24©MapR Technologies - Confidential
Search Engine for Reflected Intelligence
 Map-reduce “big data” part
– Logs record user + item occurrence
– Group by user to get rows of occurrence matrix
– Self-join to get cooccurrence
– Log-likelihood test to find anomalies
 Search part
– Anomalous cooccurrences are indicators
(or use statistical scores to provide fancy boosts)
– Indicator fields and other meta-data are indexed
– Recommendation implemented using a single search
– Boosts, functions, similarity also can reflect learned behavior
25©MapR Technologies - Confidential
What Platform Do You Need?
 Fast, efficient, scalable search
– bulk and near real-time indexing
– handle billions of records with sub-second search and faceting
 Large scale, cost effective storage and processing capabilities
 NLP and machine learning tools that scale to enhance discovery
and analysis
 Integrated log analysis workflows that close the loop between the
raw data and user interactions
26©MapR Technologies - Confidential
Shards
1
2
3 N
Search View
•Documents
•Users
•Logs
Document
Store
Analytic
Services
View into
numeric/histor
ic data
Classification
Recommendation
Personalization &
Machine Learning
Services
Classification Models
In memory
Replicated
Multi-tenant
Discovery &
Enrichment
Clustering,
classification, NLP,
topic identification,
search log analysis,
user behavior
Content Acquisition
ETL, batch or near
real-time
Access APIs
Data
• LucidWorks Search
connectors
• Push
Reference Architecture
27©MapR Technologies - Confidential
MapR
 MapR provides the technology leading Hadoop distribution
– full eco-system distribution
– integrated data platform
– complete solution for data integrity
 MapR clusters also provide tight integration with search
technologies like LucidWorks
– integration is key for effective ops
28©MapR Technologies - Confidential
LucidWorks
 LucidWorks provides the leading packaging of Apache Lucene and
Solr
– build your own, we support
– founded by the most prominent Lucene/Solr experts
 LucidWorks Search
– “Solr++”
• UI, REST API, MapR connectors, relevance tools, much more
 LucidWorks Big Data
– Big Data as a Service
– Integrated LucidWorks Search, Hadoop, machine learning with prebuilt
workflows for many of these tasks
29©MapR Technologies - Confidential
LucidWorks Big Data
Inputs
API
MgmtSearch, Discovery, Analytics
Processing & Storage
Analytics Service Document Service
Big Data LucidWorks Web HDFS
Admin
Service
Mgmt
Data
Mgmt
Provisioning, Monitoring & Configuration
30©MapR Technologies - Confidential
Easy Wins
 Analyze logs from application stored in MapR
 Seamlessly store search indexes in MapR
– and feed to Pig, Mahout and others
– use mirrors + NFS to directly deploy indexes
 Snapshots make backups a snap
– A lot like version control for data
 LucidWorks 2.5.2 easily connects with MapR
31©MapR Technologies - Confidential
1 + 1 = 3
32©MapR Technologies - Confidential
Learn More
 Talk to Ted
@ted_dunning
tdunning@maprtech.com
 Talk to Grant
@gsingers
grant@lucidworks.com
 MapR and Lucid Works
http://www.mapr.com
http://www.lucidworks.com
Hash Tags
#mapr #lucenesolr #lucidworks

Weitere ähnliche Inhalte

Was ist angesagt?

Realtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBaseRealtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBase
DataWorks Summit
 
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
Alex Pinto
 
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Alex Pinto
 
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Alex Pinto
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
Ravi Teja
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Alex Pinto
 

Was ist angesagt? (20)

Basic Sentiment Analysis using Hive
Basic Sentiment Analysis using HiveBasic Sentiment Analysis using Hive
Basic Sentiment Analysis using Hive
 
Realtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBaseRealtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBase
 
Machine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern DetectionMachine Learning Approaches for Crime Pattern Detection
Machine Learning Approaches for Crime Pattern Detection
 
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionBeyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
 
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
 
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
 
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
 
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
 
Foresight conversation
Foresight conversationForesight conversation
Foresight conversation
 
Whooster for Law Enforcement
Whooster for Law EnforcementWhooster for Law Enforcement
Whooster for Law Enforcement
 
EventShop ISG talk 140213
EventShop ISG talk 140213EventShop ISG talk 140213
EventShop ISG talk 140213
 
Observing real world phenomena through event web
Observing real world phenomena through event webObserving real world phenomena through event web
Observing real world phenomena through event web
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
How To Drive Value with Security Data
How To Drive Value with Security DataHow To Drive Value with Security Data
How To Drive Value with Security Data
 
Law Enforcement - Overview
Law Enforcement - OverviewLaw Enforcement - Overview
Law Enforcement - Overview
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
 
Social Life Networks (Eventshop and Personal Event Shop)
Social Life Networks (Eventshop and Personal Event Shop)Social Life Networks (Eventshop and Personal Event Shop)
Social Life Networks (Eventshop and Personal Event Shop)
 
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
 
Advanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time SpeedAdvanced Analytics for Any Data at Real-Time Speed
Advanced Analytics for Any Data at Real-Time Speed
 

Andere mochten auch

Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
lucenerevolution
 
Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniques
lucenerevolution
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
lucenerevolution
 
Television news search and analysis with lucene solr
Television news search and analysis with lucene solrTelevision news search and analysis with lucene solr
Television news search and analysis with lucene solr
lucenerevolution
 
Estatutos 2012
Estatutos 2012Estatutos 2012
Estatutos 2012
fondiscol
 
Senior BI -Consultant -Sumit Kumar Thakur
Senior BI -Consultant -Sumit Kumar ThakurSenior BI -Consultant -Sumit Kumar Thakur
Senior BI -Consultant -Sumit Kumar Thakur
Sumit thakur
 

Andere mochten auch (20)

Lucene solr 4 spatial extended deep dive
Lucene solr 4 spatial   extended deep diveLucene solr 4 spatial   extended deep dive
Lucene solr 4 spatial extended deep dive
 
Brahe mass scale flexible indexing
Brahe   mass scale flexible indexingBrahe   mass scale flexible indexing
Brahe mass scale flexible indexing
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
 
Cms integration of apache solr how we did it.
Cms integration of apache solr   how we did it.Cms integration of apache solr   how we did it.
Cms integration of apache solr how we did it.
 
Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniques
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
 
Implementing a custom search syntax using solr, lucene & parboiled
Implementing a custom search syntax using solr, lucene & parboiledImplementing a custom search syntax using solr, lucene & parboiled
Implementing a custom search syntax using solr, lucene & parboiled
 
Semantic search in the cloud
Semantic search in the cloudSemantic search in the cloud
Semantic search in the cloud
 
Migration from Fast ESP to Lucene Solr - Michael McIntosh
Migration from Fast ESP to Lucene Solr - Michael McIntoshMigration from Fast ESP to Lucene Solr - Michael McIntosh
Migration from Fast ESP to Lucene Solr - Michael McIntosh
 
How to Access Your Library Book Collections Using Solr
How to Access Your Library Book Collections Using SolrHow to Access Your Library Book Collections Using Solr
How to Access Your Library Book Collections Using Solr
 
Updateable Fields in Lucene and other Codec Applications
Updateable Fields in Lucene and other Codec ApplicationsUpdateable Fields in Lucene and other Codec Applications
Updateable Fields in Lucene and other Codec Applications
 
Television news search and analysis with lucene solr
Television news search and analysis with lucene solrTelevision news search and analysis with lucene solr
Television news search and analysis with lucene solr
 
How to Gain Greater Business Intelligence from Lucene/Solr
How to Gain Greater Business Intelligence from Lucene/SolrHow to Gain Greater Business Intelligence from Lucene/Solr
How to Gain Greater Business Intelligence from Lucene/Solr
 
Things Made Easy: One Click CMS Integration with Solr & Drupal
Things Made Easy: One Click CMS Integration with Solr & DrupalThings Made Easy: One Click CMS Integration with Solr & Drupal
Things Made Easy: One Click CMS Integration with Solr & Drupal
 
Japanese Linguistics in Lucene and Solr
Japanese Linguistics in Lucene and Solr Japanese Linguistics in Lucene and Solr
Japanese Linguistics in Lucene and Solr
 
Search with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial SearchSearch with Polygons: Another Approach to Solr Geospatial Search
Search with Polygons: Another Approach to Solr Geospatial Search
 
Lectura Comprensiva - Anexo 12
Lectura Comprensiva - Anexo 12Lectura Comprensiva - Anexo 12
Lectura Comprensiva - Anexo 12
 
cv-chrif
cv-chrifcv-chrif
cv-chrif
 
Estatutos 2012
Estatutos 2012Estatutos 2012
Estatutos 2012
 
Senior BI -Consultant -Sumit Kumar Thakur
Senior BI -Consultant -Sumit Kumar ThakurSenior BI -Consultant -Sumit Kumar Thakur
Senior BI -Consultant -Sumit Kumar Thakur
 

Ähnlich wie Crowd sourced intelligence built into search over hadoop

Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLA
Paul Barsch
 
MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211
MapR Technologies
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
Revolution Analytics
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
javed75
 
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar 18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
Revolution Analytics
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Sarah Aerni
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
Ted Dunning
 

Ähnlich wie Crowd sourced intelligence built into search over hadoop (20)

MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
 
Harnessing Big Data_UCLA
Harnessing Big Data_UCLAHarnessing Big Data_UCLA
Harnessing Big Data_UCLA
 
MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211MapR LucidWorks Joint Webinar 121211
MapR LucidWorks Joint Webinar 121211
 
MapR lucidworks joint webinar
MapR lucidworks joint webinarMapR lucidworks joint webinar
MapR lucidworks joint webinar
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 
Crowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and HadoopCrowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and Hadoop
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Big data beyond the hype may 2014
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014
 
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar 18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
 
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
Data Science as a Commodity: Use MADlib, R, & other OSS Tools for Data Scienc...
 
Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Zero Touch Analytics
Zero Touch AnalyticsZero Touch Analytics
Zero Touch Analytics
 
Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over Hadoop
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 

Mehr von lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

Mehr von lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Kürzlich hochgeladen

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Kürzlich hochgeladen (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

Crowd sourced intelligence built into search over hadoop

  • 1. 1©MapR Technologies - Confidential Crowd Sourcing Reflected Intelligence Using Search and Big Data Ted Dunning, MapR Grant Ingersoll, LucidWorks
  • 2. 2©MapR Technologies - Confidential Is Search Enough? • Keyword search is a commodity • Holistic view of the data and the user interactions with that data are critical • Search, Discovery and Analytics are the key to unlocking this view of users and data Search MapR and LucidWorks
  • 3. 3©MapR Technologies - Confidential Grant’s Background  Co-founder: – LucidWorks – CTO – Apache Mahout  Long time Lucene/Solr committer  Author: Taming Text  Background in IR and NLP – Built CLIR, QA and a variety of other search-based apps
  • 4. 4©MapR Technologies - Confidential Ted’s Background  Academia, Startups – MapR, MusicMatch, ID Analytics, Veoh Networks – Big data since before big  Open source – since the dark ages before the internet – Mahout, Zookeeper, Drill – bought the beer at first HUG  Co-Author – Mahout in Action  MapR – Chief Application Architect  Founding member of Apache Drill
  • 5. 5©MapR Technologies - Confidential Agenda  Intro  Search (R)evolution  Reflected Intelligence Use Cases  Building a Next Generation Search and Discovery Platform – MapR – LucidWorks  1+1=3
  • 6. 6©MapR Technologies - Confidential User Interactions With Big Data Data Data Data DFS Key Value Store Index Command Line Query Language Keyword Search System Administrator Engineer End User
  • 7. 7©MapR Technologies - Confidential User Interactions With Big Data Data Data Data DFS Key Value Store Index Command Line Query Language Keyword Search System Administrator Engineer End User Reflected Intelligence
  • 8. 8©MapR Technologies - Confidential Search (R)evolution  Search use leads to search abuse – denormalization frees your mind – scoring is just a sparse matrix multiply  Lucene/Solr evolution – non free text usages abound – many DB-like features – noSQL before NoSQL was cool – flexible indexing – finite State Transducers FTW!  Scale  “This ain’t your father’s relevance anymore”
  • 9. 9©MapR Technologies - Confidential Search, Discovery and Analytics  Large-scale analysis is key to reflected intelligence – correlation analysis • based on queries, clicks, mouse tracks, even explicit feedback • produce clusters, trends, topics, SIP’s – start with engineered knowledge, refine with user feedback  Large-scale discovery features encourage experimentation  Always test, always enrich! Search DiscoveryAnalytics
  • 10. 10©MapR Technologies - Confidential Social Media Analysis in Telecom  Goal – Detect flash-mob traffic events – Provision additional resources before failures  Method: Correlate mobile traffic analysis with social media analysis – events cause traffic micro-bursts – participants tweet the events ahead of time – tweet locations converge on burst location  Deploy operations faster to predict outages and better handle emergency situations – high cost bandwidth augmentation can be marshaled as the traffic appears – anticipation beats reaction
  • 11. 11©MapR Technologies - Confidential Provenance is 80% of Value  Problem – Broadcasters don’t know what audiences really like at a micro level  Method: – Analysis of social media to determine advertising reach and response – Time resolution of social traffic can provide detailed response metrics  Results: – In one case the untargeted advertising was worth 5x more if with supporting response data
  • 12. 12©MapR Technologies - Confidential Claims Analysis  Goal – Insurance claims processing and analysis – fraud analysis  Method – Combine free text search with metadata analysis to identify high risk activities across the country – Integrate with corporate workflows to detect and fix outliers in customer relations  Results – Questions that took 24-48 hours now take seconds to answer
  • 13. 13©MapR Technologies - Confidential Virginia Tech - Help the World  Grab data around crisis  Search immediately  Large-scale analysis enriches data to find ways to improve responses and understanding  http://www.ctrnet.net
  • 14. 14©MapR Technologies - Confidential Can Search Catch the Bad Guys?  Online Drug Counterfeit detection  Identify commonly used language indicating counterfeits – you know it when you see it – and you know you have seen it  Feed to analyst via search-driven application – enrich based on analysts feedback
  • 15. 15©MapR Technologies - Confidential Veoh - Cross Recommendations  Cross recommendation as search – with search used to build cross recommendation!  Recommend content to people who exhibit certain behaviors (clicks, query terms, other)  (Ab)use of a search engine – but not as a search engine for content – more like a search engine for behavior
  • 16. 16©MapR Technologies - Confidential Recommendation Basics  History: User Thing 1 3 2 4 3 4 2 3 3 2 1 1 2 1
  • 17. 17©MapR Technologies - Confidential Recommendation Basics  History as matrix:  t1+t3 cooccur 2 times, t1+t4 once, t2+t4 once t1 t2 t3 t4 u1 1 0 1 0 u2 1 0 1 1 u3 0 1 0 1
  • 18. 18©MapR Technologies - Confidential Recommendation Basics  Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2 t3 not t3 t1 2 1 not t1 1 1
  • 19. 21©MapR Technologies - Confidential Spot the Anomaly  Root LLR is roughly like standard deviations A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 2 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 0.44 0.98 2.26 7.15
  • 20. 22©MapR Technologies - Confidential Threshold by Score  Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2
  • 21. 23©MapR Technologies - Confidential Threshold by Score  Indicators t1 t2 t3 t4 t1 1 0 0 1 t2 0 1 0 1 t3 0 0 1 1 t4 1 0 0 1 indicator: t2, t4
  • 22. 24©MapR Technologies - Confidential Search Engine for Reflected Intelligence  Map-reduce “big data” part – Logs record user + item occurrence – Group by user to get rows of occurrence matrix – Self-join to get cooccurrence – Log-likelihood test to find anomalies  Search part – Anomalous cooccurrences are indicators (or use statistical scores to provide fancy boosts) – Indicator fields and other meta-data are indexed – Recommendation implemented using a single search – Boosts, functions, similarity also can reflect learned behavior
  • 23. 25©MapR Technologies - Confidential What Platform Do You Need?  Fast, efficient, scalable search – bulk and near real-time indexing – handle billions of records with sub-second search and faceting  Large scale, cost effective storage and processing capabilities  NLP and machine learning tools that scale to enhance discovery and analysis  Integrated log analysis workflows that close the loop between the raw data and user interactions
  • 24. 26©MapR Technologies - Confidential Shards 1 2 3 N Search View •Documents •Users •Logs Document Store Analytic Services View into numeric/histor ic data Classification Recommendation Personalization & Machine Learning Services Classification Models In memory Replicated Multi-tenant Discovery & Enrichment Clustering, classification, NLP, topic identification, search log analysis, user behavior Content Acquisition ETL, batch or near real-time Access APIs Data • LucidWorks Search connectors • Push Reference Architecture
  • 25. 27©MapR Technologies - Confidential MapR  MapR provides the technology leading Hadoop distribution – full eco-system distribution – integrated data platform – complete solution for data integrity  MapR clusters also provide tight integration with search technologies like LucidWorks – integration is key for effective ops
  • 26. 28©MapR Technologies - Confidential LucidWorks  LucidWorks provides the leading packaging of Apache Lucene and Solr – build your own, we support – founded by the most prominent Lucene/Solr experts  LucidWorks Search – “Solr++” • UI, REST API, MapR connectors, relevance tools, much more  LucidWorks Big Data – Big Data as a Service – Integrated LucidWorks Search, Hadoop, machine learning with prebuilt workflows for many of these tasks
  • 27. 29©MapR Technologies - Confidential LucidWorks Big Data Inputs API MgmtSearch, Discovery, Analytics Processing & Storage Analytics Service Document Service Big Data LucidWorks Web HDFS Admin Service Mgmt Data Mgmt Provisioning, Monitoring & Configuration
  • 28. 30©MapR Technologies - Confidential Easy Wins  Analyze logs from application stored in MapR  Seamlessly store search indexes in MapR – and feed to Pig, Mahout and others – use mirrors + NFS to directly deploy indexes  Snapshots make backups a snap – A lot like version control for data  LucidWorks 2.5.2 easily connects with MapR
  • 29. 31©MapR Technologies - Confidential 1 + 1 = 3
  • 30. 32©MapR Technologies - Confidential Learn More  Talk to Ted @ted_dunning tdunning@maprtech.com  Talk to Grant @gsingers grant@lucidworks.com  MapR and Lucid Works http://www.mapr.com http://www.lucidworks.com Hash Tags #mapr #lucenesolr #lucidworks