SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
The Analytics Challenges Posed 
by Big Data 
Roger Bradford 
Agilex Technologies 
15 April 2013
2 
Velocity 
Standard Big Data View 
Big Data 
Volume 
Traditional BI 
Source: Forrester Group
3 
Big Data - Volume Examples 
Activity Rate 
E-mail >300 Billion*/Day 
Text Messages > 24 Billion/Day 
Cell Phones > 10 Billion Calls/Day 
YouTube > 1 Million New Videos/day 
Twitter > 500 Million Tweets/Day 
Facebook > 1 Billion Posts/Day 
*Short Scale Billion = 1,000 Million = 109
4 
Big Data - Velocity Example
By Website Content By User Native Language 
English 
5 
Big Data Variety Example – 
Internet Language Usage 
Spanish 
German 
English 
Other 
French 
Chinese 
Japanese 
Russian 
Russian 
Other 
Portuguese 
Japanese 
Spanish 
Chinese 
French 
Arabic 
German
Big Data - Variability Example 
Functions of 17,209 Genes 6
7 
Structured and Unstructured Data 
Structured Unstructured 
Sales Data E-mail 
Financial Data Instant messaging 
Climate Data Tweets 
Census Data Audio 
Movie Ratings Images 
Sensor Measurements Video 
Unstructured Information Accounts for more than 
80% of all Data in Organizations and is Growing 
15X Faster than Structured Data
8 
Challenges: Big Data vs. Hard Problems 
Big Data 
Volume 
Velocity 
Variety 
Variability 
Hard Problems 
Ambiguity 
Nth-order Relations 
Cardinality 
Non-locality
9 
•Synonomy: 
Ambiguity in Text 
Common English Nouns have 6-8 Close Synonyms 
Common English Verbs have 9-11 
•Polysemy: 
The Word Strike has 30 Common Meanings 
•Entity Ambiguity: 
 There are more than 45,000 People Named John Smith in 
the United States 
 There are more than 300,000 People Named Zhang Wei 
in China 
•Entity Variability: 
Some Person Names in Collections of Interest Occur in over 100 
Variants
Name Variant Example 
Vladimir Putin Vladimir Poutine Vladimir V. Putin 
Vladmir Putin Valdimir Putin Vladimir Vladimirovich 
10 
Putin 
Vladamir Putin Vladimr Putin Vladimir Vladimirovitch 
Putin 
Vlaidimir Putin Vladimir Puttin Vladimir Vladimirovic 
Putin 
Vladimir Poutin Putin, Vladimir Putin, Vladimir 
Vladimirovitch 
Vladimir Puttin Vladamir Putin Putin, Vladimir 
Vladimirovich 
Vlademir Putin Vladimier Putin V.V. Putin
# of Relations in 
5,998 Documents: 
11 
John  Bob Relationship: 
First Order: 
Second Order: 
Third Order: 
JOHN 
BOB 
JOHN 
TOM 
TOM 
BOB 
JOHN 
TOM 
TOM 
DAVE 
51,474 
DAVE 
BOB 
11,026,553 
68,070,600 
Nth-order Relationships
12 
Cardinality Example – Alias Detection 
Arthur 
Bishop 
Raul 
Sanchez 
Joel 
Rifkin 
Jose 
Haddock 
William 
Bonin 
Arthur 
Bishop 
Raul 
Sanchez 
.0366 
Joel Rifkin -.0464 .0616 
Jose 
Haddock 
.0366 .9675 .0616 
William 
Bonin 
.1526 .0125 .0016 .0125 
Challenge: Many by Many Comparisons- 
Processing 10 Million Names Requires 50 Trillion 
Comparisons
Non-locality Example– Clustering Documents 
13
14 
Twitter Example
15 
The Tweet Analysis Problem 
• Volume – 500 Million Tweets per Day Worldwide 
• Challenges: 
Very Low Signal to Noise Ratio (31 Million People 
Follow Lady Gaga) 
Implicit Context (“Let’s all Meet at Bob’s House”) 
Incomplete, Conflicting, and Erroneous Information 
Deliberate Deception (50% of all Tweets are Machine-generated)
16 
Applicable Analytic Techniques 
• Statistical Analysis 
• Categorization 
• Clustering 
• NLP Techniques 
• Semantic Analysis 
In General, Application of such Techniques to 
Big Data Problems is Computationally Intensive
17 
Cloud Enabling 
Millions of Documents 
Semantic Indexing Time (in Hours) 
Datacenter 
Server 
Map – Reduce 
with 64 Nodes
18 
GPU Enabling 
CPU 
GPU 
CPU: Intel Xeon X5660 
GPU: Nvidia Quadro 2000 
Seconds (in Thousands) 
Elements (in Billions) 
kNN Calculation
Representation 
19 
Semantic Enabling 
Data 
Semantic 
Analysis 
Semantic 
Space 
• Accommodates Nth-order Relationships 
• Automatically Coalesces Term Variants 
• Supports Automated Entity Disambiguation 
• Identifies Subtle Relationships 
• Can Combine Structured and Unstructured Data 
But Not as Well Understood as Structured Data 
Analysis Techniques
20 
IBM WATSON Winning “Jeopardy” 
• Volume: “Only” 1TB of Data (Mostly Text) 
• Velocity: Meeting the 3-second Response 
__Requirement of Jeopardy Required 80 
__Teraflops of Processing Power 
Challenge: 
•Question Decomposition
21 
Music Genome 
Objective: Match Liked Songs to Recommended Ones 
•  400 Attributes per 
_Song 
• 10 Million Songs 
• Each Song 
_Represented by a 
_Vector of Elements 
• 140 Trillion Elements 
• Distance Function is 
_Calculated between All 
_Songs
22 
Literature-based Discovery 
• PubMed Abstracts 
• Gene – Function Relationships 
__Derived Semantically 
• 98,074,359 Potential Gene-function 
__Associations. 
Zukas, A., GO-Driven Literature-Based Discovery using Semantic Analysis, MS Thesis, George Mason University, 
2007.
23 
Literature-based Discovery (Cont’d) 
Latent Gene and Function Relationships from 
the June 2006 Gene Ontology Later 
Documented in the January 2007 Gene 
Ontology 
•Nth-order Relationships 
• Complexity of Relations 
Challenges:
24 
Patent 
Databases 
Online 
Technical 
Literature 
Internal 
Publications 
Semantic Representation 
Space 
 
 
 
 
  
Prior Art 
Analysis 
White 
Space 
Analysis 
Patent Analysis 
• Need for Conceptual Comparisons 
•Technical Terminology / Obfuscation 
• Convoluted Structure (Claims) 
Challenges:
25 
Concept-driven Discovery 
Incoming 
Reporting Stream 
Fraud 
Exemplars 
Semantic 
Representation 
Space 
Xxxxxxxxx 
Xxxxxxxxx 
defraud 
Xxxxxxxxx 
scheme 
Continuous Cycling 
through ALL Names 
Generate 
Alerts 
Issue: N a me Disambiguation
26 
Rapid Data Overview 
Clustering 
Political 
Economic 
Incoming 
Data 
Admin 
Technical 
Regulatory 
•Technical Information 
• Multilingual Data 
Challenges:
Docs in 13 Languages 
 English Examples 
Range of 
Human 
Performance 
27 
Crosslingual Document Categorization 
– Big Data Solution Accuracy + Completeness 
of Categorization 
English Docs  
English Examples 
Number of Simultaneous Languages
28 
Where is Big Data Analytics Going? 
• Real-time Analysis 
• Multimedia Collections 
 Text 
 Structured Data 
 Audio 
 Video 
 Sensor Data 
• Temporal and Spatial Data Integration 
• Interactive Visualization 
• Continuous Retrospective Analysis 
• Advanced Analytics (Especially Semantic Analysis)
29 
Integration of Multimedia Data 
Integrated 
Analytics 
Structured Data 
Images 
Multi-lingual 
Text 
Audio 
Sensor Data 
Video 
Buyer Seller Material Amount Date 
John 
Smith 
Ace 
Jewelers 
Diamond 
Ring 
3 Carat 8/18/06
30 
Spatiotemporal Data Integration 
•Fully Automatic Integration of Spatial, 
_Temporal, and _Semantic Information 
•Location Disambiguation 
Challenges:
31 
Questions or Comments 
Roger Bradford 
Agilex Technologies Inc 
1-703-889-3916 
r.bradford@agilex.com

Weitere ähnliche Inhalte

Ähnlich wie The Analytics Challenges Posed by Big Data and Hard Problems

Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInKrishnaram Kenthapadi
 
HPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems
 
DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDATAVERSITY
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData Blueprint
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfphongnguyen312110237
 
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdfMeet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf09372002dedi
 
Big Data for One Big Family
Big Data for One Big FamilyBig Data for One Big Family
Big Data for One Big FamilyMatt Asay
 
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Sudhir Tonse
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformSudhir Tonse
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handoutYi-Shin Chen
 
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...Cengage Learning
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesJohn Mulhall
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014Raja Chiky
 
Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Peter Mika
 
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆台灣資料科學年會
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 

Ähnlich wie The Analytics Challenges Posed by Big Data and Hard Problems (20)

Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
 
HPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago Chapter
 
DataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big DataDataEd Online: Demystifying Big Data
DataEd Online: Demystifying Big Data
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdf
 
datamining-lect1.pptx
datamining-lect1.pptxdatamining-lect1.pptx
datamining-lect1.pptx
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdfMeet 1 - Introduction Data Mining - Dedi Darwis.pdf
Meet 1 - Introduction Data Mining - Dedi Darwis.pdf
 
Big Data for One Big Family
Big Data for One Big FamilyBig Data for One Big Family
Big Data for One Big Family
 
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handout
 
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...
Course Tech 2013, Mark Frydenberg, Drinking from the Fire Hose: Tools for Int...
 
HUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation SlidesHUG Ireland Event - HPCC Presentation Slides
HUG Ireland Event - HPCC Presentation Slides
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015
 
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 

Mehr von Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

Mehr von Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Kürzlich hochgeladen

Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGAPNIC
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Servicegwenoracqe6
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Standkumarajju5765
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 

Kürzlich hochgeladen (20)

Networking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOGNetworking in the Penumbra presented by Geoff Huston at NZNOG
Networking in the Penumbra presented by Geoff Huston at NZNOG
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 

The Analytics Challenges Posed by Big Data and Hard Problems

  • 1. The Analytics Challenges Posed by Big Data Roger Bradford Agilex Technologies 15 April 2013
  • 2. 2 Velocity Standard Big Data View Big Data Volume Traditional BI Source: Forrester Group
  • 3. 3 Big Data - Volume Examples Activity Rate E-mail >300 Billion*/Day Text Messages > 24 Billion/Day Cell Phones > 10 Billion Calls/Day YouTube > 1 Million New Videos/day Twitter > 500 Million Tweets/Day Facebook > 1 Billion Posts/Day *Short Scale Billion = 1,000 Million = 109
  • 4. 4 Big Data - Velocity Example
  • 5. By Website Content By User Native Language English 5 Big Data Variety Example – Internet Language Usage Spanish German English Other French Chinese Japanese Russian Russian Other Portuguese Japanese Spanish Chinese French Arabic German
  • 6. Big Data - Variability Example Functions of 17,209 Genes 6
  • 7. 7 Structured and Unstructured Data Structured Unstructured Sales Data E-mail Financial Data Instant messaging Climate Data Tweets Census Data Audio Movie Ratings Images Sensor Measurements Video Unstructured Information Accounts for more than 80% of all Data in Organizations and is Growing 15X Faster than Structured Data
  • 8. 8 Challenges: Big Data vs. Hard Problems Big Data Volume Velocity Variety Variability Hard Problems Ambiguity Nth-order Relations Cardinality Non-locality
  • 9. 9 •Synonomy: Ambiguity in Text Common English Nouns have 6-8 Close Synonyms Common English Verbs have 9-11 •Polysemy: The Word Strike has 30 Common Meanings •Entity Ambiguity: There are more than 45,000 People Named John Smith in the United States There are more than 300,000 People Named Zhang Wei in China •Entity Variability: Some Person Names in Collections of Interest Occur in over 100 Variants
  • 10. Name Variant Example Vladimir Putin Vladimir Poutine Vladimir V. Putin Vladmir Putin Valdimir Putin Vladimir Vladimirovich 10 Putin Vladamir Putin Vladimr Putin Vladimir Vladimirovitch Putin Vlaidimir Putin Vladimir Puttin Vladimir Vladimirovic Putin Vladimir Poutin Putin, Vladimir Putin, Vladimir Vladimirovitch Vladimir Puttin Vladamir Putin Putin, Vladimir Vladimirovich Vlademir Putin Vladimier Putin V.V. Putin
  • 11. # of Relations in 5,998 Documents: 11 John Bob Relationship: First Order: Second Order: Third Order: JOHN BOB JOHN TOM TOM BOB JOHN TOM TOM DAVE 51,474 DAVE BOB 11,026,553 68,070,600 Nth-order Relationships
  • 12. 12 Cardinality Example – Alias Detection Arthur Bishop Raul Sanchez Joel Rifkin Jose Haddock William Bonin Arthur Bishop Raul Sanchez .0366 Joel Rifkin -.0464 .0616 Jose Haddock .0366 .9675 .0616 William Bonin .1526 .0125 .0016 .0125 Challenge: Many by Many Comparisons- Processing 10 Million Names Requires 50 Trillion Comparisons
  • 15. 15 The Tweet Analysis Problem • Volume – 500 Million Tweets per Day Worldwide • Challenges: Very Low Signal to Noise Ratio (31 Million People Follow Lady Gaga) Implicit Context (“Let’s all Meet at Bob’s House”) Incomplete, Conflicting, and Erroneous Information Deliberate Deception (50% of all Tweets are Machine-generated)
  • 16. 16 Applicable Analytic Techniques • Statistical Analysis • Categorization • Clustering • NLP Techniques • Semantic Analysis In General, Application of such Techniques to Big Data Problems is Computationally Intensive
  • 17. 17 Cloud Enabling Millions of Documents Semantic Indexing Time (in Hours) Datacenter Server Map – Reduce with 64 Nodes
  • 18. 18 GPU Enabling CPU GPU CPU: Intel Xeon X5660 GPU: Nvidia Quadro 2000 Seconds (in Thousands) Elements (in Billions) kNN Calculation
  • 19. Representation 19 Semantic Enabling Data Semantic Analysis Semantic Space • Accommodates Nth-order Relationships • Automatically Coalesces Term Variants • Supports Automated Entity Disambiguation • Identifies Subtle Relationships • Can Combine Structured and Unstructured Data But Not as Well Understood as Structured Data Analysis Techniques
  • 20. 20 IBM WATSON Winning “Jeopardy” • Volume: “Only” 1TB of Data (Mostly Text) • Velocity: Meeting the 3-second Response __Requirement of Jeopardy Required 80 __Teraflops of Processing Power Challenge: •Question Decomposition
  • 21. 21 Music Genome Objective: Match Liked Songs to Recommended Ones • 400 Attributes per _Song • 10 Million Songs • Each Song _Represented by a _Vector of Elements • 140 Trillion Elements • Distance Function is _Calculated between All _Songs
  • 22. 22 Literature-based Discovery • PubMed Abstracts • Gene – Function Relationships __Derived Semantically • 98,074,359 Potential Gene-function __Associations. Zukas, A., GO-Driven Literature-Based Discovery using Semantic Analysis, MS Thesis, George Mason University, 2007.
  • 23. 23 Literature-based Discovery (Cont’d) Latent Gene and Function Relationships from the June 2006 Gene Ontology Later Documented in the January 2007 Gene Ontology •Nth-order Relationships • Complexity of Relations Challenges:
  • 24. 24 Patent Databases Online Technical Literature Internal Publications Semantic Representation Space Prior Art Analysis White Space Analysis Patent Analysis • Need for Conceptual Comparisons •Technical Terminology / Obfuscation • Convoluted Structure (Claims) Challenges:
  • 25. 25 Concept-driven Discovery Incoming Reporting Stream Fraud Exemplars Semantic Representation Space Xxxxxxxxx Xxxxxxxxx defraud Xxxxxxxxx scheme Continuous Cycling through ALL Names Generate Alerts Issue: N a me Disambiguation
  • 26. 26 Rapid Data Overview Clustering Political Economic Incoming Data Admin Technical Regulatory •Technical Information • Multilingual Data Challenges:
  • 27. Docs in 13 Languages English Examples Range of Human Performance 27 Crosslingual Document Categorization – Big Data Solution Accuracy + Completeness of Categorization English Docs English Examples Number of Simultaneous Languages
  • 28. 28 Where is Big Data Analytics Going? • Real-time Analysis • Multimedia Collections Text Structured Data Audio Video Sensor Data • Temporal and Spatial Data Integration • Interactive Visualization • Continuous Retrospective Analysis • Advanced Analytics (Especially Semantic Analysis)
  • 29. 29 Integration of Multimedia Data Integrated Analytics Structured Data Images Multi-lingual Text Audio Sensor Data Video Buyer Seller Material Amount Date John Smith Ace Jewelers Diamond Ring 3 Carat 8/18/06
  • 30. 30 Spatiotemporal Data Integration •Fully Automatic Integration of Spatial, _Temporal, and _Semantic Information •Location Disambiguation Challenges:
  • 31. 31 Questions or Comments Roger Bradford Agilex Technologies Inc 1-703-889-3916 r.bradford@agilex.com