SlideShare ist ein Scribd-Unternehmen logo
1 von 82
1
Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data WSU & AFRL Window-on-Science Seminar on Data Mining Amit P. Sheth, LexisNexis Ohio Eminent Scholar Director, Kno.e.sis center, Wright State University knoesis.org Thanks: K. Gomadam, M. Nagarajan, C. Thomas, C. Henson, C. Ramakrishnan, P. Jain  and Kno.e.sis Researchers
Data & Knowledge Ecosystem 3 Situational Awareness Decision Support Insight Knowledge Discovery Analysis (eg Patterns) Understanding & Perception Data Mining Integration Search Browsing Multimedia Data Structured, Semistructured Unstructured Data Textual Data: Scientific Literature, Web Pages, News, Blogs,                        Reports, Wiki, Forums, Comments, Tweets  Experimental Data Observational Data Transactional Data
Some examples of R&D we have done Semantic Search & Ranking of Stories and Reports – connecting the dots applications (insider threat, financial risk analysis) Mining of biomedical (scientific) literature (extraction of entities and relationships) – discovering hidden public knowledge Semantic Integration, Analysis and Decision Support over Sensor Data Extracting taxonomy/domain model from Wikipedia Discovering Hidden Relationships (insights) in Community Created Content (Wikipedia) 4
Understanding User Generated Content  (on Social Networking Sites)* What are people talking about How people write Why people write With application to  ,[object Object]
Advertisement on Social Media
Identifying Social Signals – spatio-temporal-thematic analysis of Citizen Sensor Data5 * MeenaNagarajan
Search Integration Analysis Discovery Question    Answering Situational     Awareness Domain Models Patterns / Inference / Reasoning RDB Relationship Web Meta data / Semantic Annotations Metadata Extraction Multimedia Content and Web data Text Sensor Data Structured and Semi-structured data
Insider threat demo (semantic search/querying, ranking, …) 7
Knowledge Discovery from Scientific Literature CarticRamakrishnan
9 What Knowledge Discovery is NOT  Search Keyword-in-document-out   Keywords are fully specified features of expected outcome Searching for prospective mining sites Mining  Know where to look Underspecified characteristics of what is sought are available Patterns CarticRamakrishnan
10 What is knowledge discovery? “knowledge discovery is more like sifting through a warehouse filled with small gears, levers, etc., none of which is particularly valuable by itself. After appropriate assembly, however, a Rolex watch emerges from the disparate parts.” – James Caruther  “discovery is often described as more opportunistic search in a less well-defined space, leading to a psychological element of surprise” – James Buchanan Opportunistic search over an ill-defined space leading to surprising but useful emergent knowledge CarticRamakrishnan
Element of surprise – Swanson’s discoveries Stress ? Swanson’s  Discoveries Magnesium Migraine Calcium  Channel  Blockers Spreading Cortical Depression 11 possible associations found PubMed Associations Discovered based on keyword searches  followed by manually analysis of text to establish possible relevant relationships 11
Knowledge Discovery over text Text Assigning interpretation to text  Semantic metadata  in the form of semi-structured data Extraction of  Semantics  from text Semantic Metadata  Guided  Knowledge Explorations  Semantic Metadata  Guided  Knowledge Discovery Triple-based Semantic  Search Semantic browser Subgraph discovery 12 CarticRamakrishnan
Information Extraction via Ontology assisted text mining – Relationship extraction 4733  documents 9284  documents 5  documents UMLS  Semantic Network complicates Biologically  active substance affects causes causes Disease or Syndrome Lipid affects instance_of instance_of ??????? Fish Oils Raynaud’s Disease MeSH PubMed 13 CarticRamakrishnan
Background knowledge and Data used UMLS – A high level schema of the biomedical domain 136 classes and 49 relationships Synonyms of all relationship – using variant lookup (tools from NLM) 49 relationship + their synonyms = ~350 verbs MeSH  22,000+ topics organized as a forest of 16 trees Used to query PubMed PubMed  Over 16 million abstract Abstracts annotated with one or more MeSH terms 14
Method – Parse Sentences in PubMed SS-Tagger (University of Tokyo) SS-Parser (University of Tokyo) ,[object Object]
“adenomatous” modifies “hyperplasia”
“An excessive endogenous or exogenous stimulation” modifies “estrogen”
 Entities can also occur as composites of 2 or more other entities
“adenomatous hyperplasia” and “endometrium” occur as “adenomatous hyperplasia of the endometrium”(TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ exogenous) ) (NN stimulation) ) (PP (IN by) (NP (NN estrogen) ) ) ) (VP (VBZ induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT the) (NN endometrium) ) ) ) ) ) )  15 CarticRamakrishnan
Method – Identify entities and relationships in Parse Tree Modifiers TOP Modified entities Composite Entities S VP UMLS ID  T147 NP VBZ induces NP PP NP NP NN estrogen IN by JJ excessive PP DT the ADJP NN stimulation MeSHID D004967 IN of JJ adenomatous NN hyperplasia NP JJ endogenous JJ exogenous CC or MeSHID D006965 NN endometrium DT the MeSHID D004717 16
Representation – Resulting RDF Modifiers Modified entities Composite Entities 17
18 Preliminary Results  Swanson’s discoveries – Associations between Migraine and Magnesium [Hearst99] ,[object Object]
stress can lead to loss of magnesium
calcium channel blockersprevent some migraines
magnesiumis a natural calcium channel blocker
spreading cortical depression (SCD) is implicated in some migraines
high levels of magnesiuminhibit SCD
migraine patients have highplatelet aggregability
magnesium can suppressplatelet aggregabilityData sets generated using these entities (marked red above) as boolean keyword queries against pubmed Bidirectional breadth-first search used to find paths in resulting RDF
Paths between Migraine and Magnesium Paths are considered interesting if they have one or more named relationship Other thanhasPart or hasModifiers in them 19 CarticRamakrishnan
An example of such a path CONCLUSION ,[object Object]
Our definition of compound and modified entities are critical for identifying both implicit and explicit relationships
Swanson’s discovery can be automated – if recall can be improved – what hurts recall?20
Unsupervised Joint Extraction of Compound Entities and Relationship Cartic Ramakrishnan, Pablo N. Mendes, Shaojun Wang and Amit P. Sheth  "Unsupervised Discovery of Compound Entities for Relationship Extraction" EKAW 2008 - 16th International Conference on Knowledge Engineering and Knowledge Management Knowledge Patterns
Joint Extraction approach governor dependent Dependency parse – Stanford Parser amod        = adjectival modifier nsubjpass = nominal subject in passive voice 22
Algorithm Relationship head Subject head Object head Object head 23 CarticRamakrishnan
24 Preliminary results CarticRamakrishnan
25 Extracted Triples
Semantic Metadata Guided Knowledge Explorations and Discovery
27 Results CarticRamakrishnan
Hypothesis Driven  retrieval of Scientific Literature   affects Migraine Magnesium Stress isa inhibit Patient Calcium Channel  Blockers Complex  Query Supporting Document  sets retrieved Keyword query: Migraine[MH] + Magnesium[MH] PubMed 28
29 Applications Triple-based semantic search Semantic Browser
30 Knowledge Discovery = Extraction + Heuristic Aggregation Undiscovered Public  Knowledge
Understanding, Analyzing, Mining  Social Media MeenaNagarajan, Karthik Gomadam
mumbai, india
november 26, 2008
another chapter in the war against civilization
 and
 the world saw it Through the eyes of the people
 the world read it Through the words of the people
PEOPLE told their stories to PEOPLE
A powerful new era in  Information dissemination had taken firm ground
Making it possible for us to create a global network of citizens Citizen Sensors –  Citizens observing, processing, transmitting, reporting
Geocoder (Reverse Geo-coding) Address to location database 18 Hormusji Street, Colaba VasantVihar Image Metadata latitude: 18° 54′ 59.46″ N,  longitude: 72° 49′ 39.65″ E Structured Meta Extraction Nariman House Income Tax Office Identify and extract information from tweets Spatio-Temporal Analysis
Research Challenge #1 Spatio Temporal and Thematic analysis What else happened “near” this event location? What events occurred “before” and “after” this event? Any message about “causes” for this event?
Spatial Analysis…. Which tweets originated from an address near 18.916517°N 72.827682°E?
Which tweets originated during Nov 27th 2008,from 11PM to 12 PM
Giving us Tweets originated from an address near 18.916517°N, 72.827682°E during time interval27th Nov 2008 between 11PM to 12PM?
Research Challenge #2:Understanding and Analyzing Casual Text Casual text Microblogs are often written in SMS style language Slangs, abbreviations
Understanding Casual Text Not the same as news articles or scientific literature Grammatical errors Implications on NL parser results Inconsistent writing style Implications on learning algorithms that generalize from corpus
Nature of Microblogs Additional constraint of limited context Max. of x chars in a microblog Context often provided by the discourse Entity identification and disambiguation Pre-requisite to other sophisticated information analytics
NL understanding is hard to begin with.. Not so hard “commando raid appears to be nigh at Oberoinow” Oberoi = Oberoi Hotel, Nigh = high Challenging new wing, live fire @ taj 2nd floor on iDesi TV stream Fire on the second floor of the Taj hotel, not on iDesi TV
Research Opportunities NER, disambiguation in casual, informal text is a budding area of research Another important area of focus: Combining information of varied quality from a  corpus (statistical NLP),  domain knowledge (tags, folksonomies, taxonomies, ontologies),  social context (explicit and implicit communities)
Social Context surrounding content Social context in which a message appears is also an added valuable resource Post 1:  “Hareemane Househostages said by eyewitnesses to be Jews. 7 Gunshots heard by reporters at Taj” Follow up post that is Nariman House, not (Hareemane)
Understanding content … informal text I say: “Your music is wicked”  What I really mean: “Your music is good”  54
Urban Dictionary Sentiment expression: Rocks  Transliterates to: cool, good Structured text (biomedical literature) Semantic Metadata: Smile is a Track Lil transliterates to Lilly Allen Lilly Allen is an Artist MusicBrainz Taxonomy Informal Text (Social Network chatter) Artist: Lilly Allen Track: Smile     Your smile rocks Lil Multimedia Content and Web data Web Services
Example: Pulse of a Community Imagine millions of such informal opinions Individual expressions to mass opinions “Popular artists” lists from MySpace comments Lilly Allen	 Lady Sovereign	 Amy Winehouse Gorillaz Coldplay Placebo Sting Kean Joss Stone
What Drives the Spatio-Temporal-Thematic Analysis and Casual Text Understanding Semantics with the help of Domain Models Domain Models Domain Models(ontologies, folksonomies)
Domain Knowledge: A key driver Places that are nearby ‘Nariman house’ Spatial query Messages originated around this place Temporal analysis Messages about related events / places Thematic analysis
Research Challenge #3But Where does the Domain Knowledge come from? Expert and committee based ontology creation  … works in some domains (e.g., biomedicine, health care,…) Community driven knowledge extraction  How to create models that are “socially scalable”? How to organically grow and maintain this model?
Building models…seed word to hierarchy creation using WIKIPEDIA Query: “cognition”
Identifying relationships: Hard, harder than many hard things  But NOT that Hard, When WE do it
Games with a purpose Get humans to give their solitaire time  Solve real hard computational problems Image tagging, Identifying part of an image  Tag a tune, Squigl, Verbosity, and Matchin Pioneered by Luis Von Ahn
OntoLablr Relationship Identification Game ,[object Object]
causesExplosion Traffic congestion
How  do you get comprehensive situational awareness by merging “human sensing” and “machine sensing”? 64
Research Challenge #4: Semantic Sensor Web
Semantically Annotated O&M <swe:component name="time"> 	<swe:Time definition="urn:ogc:def:phenomenon:time" uom="urn:ogc:def:unit:date-time"> 		<sa:swe rdfa:about="?time" rdfa:instanceof="time:Instant"> 			<sa:sml rdfa:property="xs:date-time"/> 		</sa:swe> 	</swe:Time> </swe:component> <swe:component name="measured_air_temperature"> 	<swe:Quantity definition="urn:ogc:def:phenomenon:temperature“ 			           		uom="urn:ogc:def:unit:fahrenheit"> 		<sa:swe rdfa:about="?measured_air_temperature“              			rdfa:instanceof=“senso:TemperatureObservation"> 			<sa:swe rdfa:property="weather:fahrenheit"/> 			<sa:swe rdfa:rel="senso:occurred_when" resource="?time"/> 			<sa:swe rdfa:rel="senso:observed_by" resource="senso:buckeye_sensor"/> 		</sa:sml>				 	</swe:Quantity> </swe:component> <swe:value name=“weather-data"> 	2008-03-08T05:00:00,29.1 </swe:value>

Weitere ähnliche Inhalte

Ähnlich wie Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherNils Gehlenborg
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic ComputingMeena Nagarajan
 
NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生ysuzuki-naist
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Knowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text FinalKnowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text Finalkdjamies
 
Resesarch types
Resesarch typesResesarch types
Resesarch typesNits Kedia
 
Visualizing and Making Sense of Information
Visualizing and Making Sense of InformationVisualizing and Making Sense of Information
Visualizing and Making Sense of InformationPARC, a Xerox company
 
Semantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial ResearchSemantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial ResearchOntotext
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Innovation Quotient Pvt Ltd
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
 
Guided visual exploration of patient stratifications in cancer genomics
Guided visual exploration of patient stratifications in cancer genomicsGuided visual exploration of patient stratifications in cancer genomics
Guided visual exploration of patient stratifications in cancer genomicsNils Gehlenborg
 
Ibm cognitive seminar march 2015 watsonsim final
Ibm cognitive seminar march 2015  watsonsim finalIbm cognitive seminar march 2015  watsonsim final
Ibm cognitive seminar march 2015 watsonsim finaldiannepatricia
 

Ähnlich wie Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data (20)

Cartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defenseCartic Ramakrishnan's dissertation defense
Cartic Ramakrishnan's dissertation defense
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All Together
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Navigating the Neuroscience Data Landscape
Navigating the Neuroscience Data LandscapeNavigating the Neuroscience Data Landscape
Navigating the Neuroscience Data Landscape
 
NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Knowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text FinalKnowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text Final
 
Brief introduction to Bioinformatics
Brief introduction to BioinformaticsBrief introduction to Bioinformatics
Brief introduction to Bioinformatics
 
Resesarch types
Resesarch typesResesarch types
Resesarch types
 
Resesarch types
Resesarch typesResesarch types
Resesarch types
 
Visualizing and Making Sense of Information
Visualizing and Making Sense of InformationVisualizing and Making Sense of Information
Visualizing and Making Sense of Information
 
Semantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial ResearchSemantic Data Normalization For Efficient Clinical Trial Research
Semantic Data Normalization For Efficient Clinical Trial Research
 
Human Brain Essay.pdf
Human Brain Essay.pdfHuman Brain Essay.pdf
Human Brain Essay.pdf
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
Paul Groth
Paul GrothPaul Groth
Paul Groth
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
Guided visual exploration of patient stratifications in cancer genomics
Guided visual exploration of patient stratifications in cancer genomicsGuided visual exploration of patient stratifications in cancer genomics
Guided visual exploration of patient stratifications in cancer genomics
 
Ibm cognitive seminar march 2015 watsonsim final
Ibm cognitive seminar march 2015  watsonsim finalIbm cognitive seminar march 2015  watsonsim final
Ibm cognitive seminar march 2015 watsonsim final
 

Kürzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data

  • 1. 1
  • 2. Semantics-Empowered Understanding, Analysis and Mining of Nontraditional and Unstructured Data WSU & AFRL Window-on-Science Seminar on Data Mining Amit P. Sheth, LexisNexis Ohio Eminent Scholar Director, Kno.e.sis center, Wright State University knoesis.org Thanks: K. Gomadam, M. Nagarajan, C. Thomas, C. Henson, C. Ramakrishnan, P. Jain and Kno.e.sis Researchers
  • 3. Data & Knowledge Ecosystem 3 Situational Awareness Decision Support Insight Knowledge Discovery Analysis (eg Patterns) Understanding & Perception Data Mining Integration Search Browsing Multimedia Data Structured, Semistructured Unstructured Data Textual Data: Scientific Literature, Web Pages, News, Blogs, Reports, Wiki, Forums, Comments, Tweets Experimental Data Observational Data Transactional Data
  • 4. Some examples of R&D we have done Semantic Search & Ranking of Stories and Reports – connecting the dots applications (insider threat, financial risk analysis) Mining of biomedical (scientific) literature (extraction of entities and relationships) – discovering hidden public knowledge Semantic Integration, Analysis and Decision Support over Sensor Data Extracting taxonomy/domain model from Wikipedia Discovering Hidden Relationships (insights) in Community Created Content (Wikipedia) 4
  • 5.
  • 7. Identifying Social Signals – spatio-temporal-thematic analysis of Citizen Sensor Data5 * MeenaNagarajan
  • 8. Search Integration Analysis Discovery Question Answering Situational Awareness Domain Models Patterns / Inference / Reasoning RDB Relationship Web Meta data / Semantic Annotations Metadata Extraction Multimedia Content and Web data Text Sensor Data Structured and Semi-structured data
  • 9. Insider threat demo (semantic search/querying, ranking, …) 7
  • 10. Knowledge Discovery from Scientific Literature CarticRamakrishnan
  • 11. 9 What Knowledge Discovery is NOT Search Keyword-in-document-out Keywords are fully specified features of expected outcome Searching for prospective mining sites Mining Know where to look Underspecified characteristics of what is sought are available Patterns CarticRamakrishnan
  • 12. 10 What is knowledge discovery? “knowledge discovery is more like sifting through a warehouse filled with small gears, levers, etc., none of which is particularly valuable by itself. After appropriate assembly, however, a Rolex watch emerges from the disparate parts.” – James Caruther “discovery is often described as more opportunistic search in a less well-defined space, leading to a psychological element of surprise” – James Buchanan Opportunistic search over an ill-defined space leading to surprising but useful emergent knowledge CarticRamakrishnan
  • 13. Element of surprise – Swanson’s discoveries Stress ? Swanson’s Discoveries Magnesium Migraine Calcium Channel Blockers Spreading Cortical Depression 11 possible associations found PubMed Associations Discovered based on keyword searches followed by manually analysis of text to establish possible relevant relationships 11
  • 14. Knowledge Discovery over text Text Assigning interpretation to text Semantic metadata in the form of semi-structured data Extraction of Semantics from text Semantic Metadata Guided Knowledge Explorations Semantic Metadata Guided Knowledge Discovery Triple-based Semantic Search Semantic browser Subgraph discovery 12 CarticRamakrishnan
  • 15. Information Extraction via Ontology assisted text mining – Relationship extraction 4733 documents 9284 documents 5 documents UMLS Semantic Network complicates Biologically active substance affects causes causes Disease or Syndrome Lipid affects instance_of instance_of ??????? Fish Oils Raynaud’s Disease MeSH PubMed 13 CarticRamakrishnan
  • 16. Background knowledge and Data used UMLS – A high level schema of the biomedical domain 136 classes and 49 relationships Synonyms of all relationship – using variant lookup (tools from NLM) 49 relationship + their synonyms = ~350 verbs MeSH 22,000+ topics organized as a forest of 16 trees Used to query PubMed PubMed Over 16 million abstract Abstracts annotated with one or more MeSH terms 14
  • 17.
  • 19. “An excessive endogenous or exogenous stimulation” modifies “estrogen”
  • 20. Entities can also occur as composites of 2 or more other entities
  • 21. “adenomatous hyperplasia” and “endometrium” occur as “adenomatous hyperplasia of the endometrium”(TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ exogenous) ) (NN stimulation) ) (PP (IN by) (NP (NN estrogen) ) ) ) (VP (VBZ induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT the) (NN endometrium) ) ) ) ) ) ) 15 CarticRamakrishnan
  • 22. Method – Identify entities and relationships in Parse Tree Modifiers TOP Modified entities Composite Entities S VP UMLS ID T147 NP VBZ induces NP PP NP NP NN estrogen IN by JJ excessive PP DT the ADJP NN stimulation MeSHID D004967 IN of JJ adenomatous NN hyperplasia NP JJ endogenous JJ exogenous CC or MeSHID D006965 NN endometrium DT the MeSHID D004717 16
  • 23. Representation – Resulting RDF Modifiers Modified entities Composite Entities 17
  • 24.
  • 25. stress can lead to loss of magnesium
  • 27. magnesiumis a natural calcium channel blocker
  • 28. spreading cortical depression (SCD) is implicated in some migraines
  • 29. high levels of magnesiuminhibit SCD
  • 30. migraine patients have highplatelet aggregability
  • 31. magnesium can suppressplatelet aggregabilityData sets generated using these entities (marked red above) as boolean keyword queries against pubmed Bidirectional breadth-first search used to find paths in resulting RDF
  • 32. Paths between Migraine and Magnesium Paths are considered interesting if they have one or more named relationship Other thanhasPart or hasModifiers in them 19 CarticRamakrishnan
  • 33.
  • 34. Our definition of compound and modified entities are critical for identifying both implicit and explicit relationships
  • 35. Swanson’s discovery can be automated – if recall can be improved – what hurts recall?20
  • 36. Unsupervised Joint Extraction of Compound Entities and Relationship Cartic Ramakrishnan, Pablo N. Mendes, Shaojun Wang and Amit P. Sheth "Unsupervised Discovery of Compound Entities for Relationship Extraction" EKAW 2008 - 16th International Conference on Knowledge Engineering and Knowledge Management Knowledge Patterns
  • 37. Joint Extraction approach governor dependent Dependency parse – Stanford Parser amod = adjectival modifier nsubjpass = nominal subject in passive voice 22
  • 38. Algorithm Relationship head Subject head Object head Object head 23 CarticRamakrishnan
  • 39. 24 Preliminary results CarticRamakrishnan
  • 41. Semantic Metadata Guided Knowledge Explorations and Discovery
  • 43. Hypothesis Driven retrieval of Scientific Literature affects Migraine Magnesium Stress isa inhibit Patient Calcium Channel Blockers Complex Query Supporting Document sets retrieved Keyword query: Migraine[MH] + Magnesium[MH] PubMed 28
  • 44. 29 Applications Triple-based semantic search Semantic Browser
  • 45. 30 Knowledge Discovery = Extraction + Heuristic Aggregation Undiscovered Public Knowledge
  • 46. Understanding, Analyzing, Mining Social Media MeenaNagarajan, Karthik Gomadam
  • 49. another chapter in the war against civilization
  • 51.
  • 52.
  • 53. the world saw it Through the eyes of the people
  • 54. the world read it Through the words of the people
  • 55. PEOPLE told their stories to PEOPLE
  • 56. A powerful new era in Information dissemination had taken firm ground
  • 57. Making it possible for us to create a global network of citizens Citizen Sensors – Citizens observing, processing, transmitting, reporting
  • 58. Geocoder (Reverse Geo-coding) Address to location database 18 Hormusji Street, Colaba VasantVihar Image Metadata latitude: 18° 54′ 59.46″ N, longitude: 72° 49′ 39.65″ E Structured Meta Extraction Nariman House Income Tax Office Identify and extract information from tweets Spatio-Temporal Analysis
  • 59. Research Challenge #1 Spatio Temporal and Thematic analysis What else happened “near” this event location? What events occurred “before” and “after” this event? Any message about “causes” for this event?
  • 60. Spatial Analysis…. Which tweets originated from an address near 18.916517°N 72.827682°E?
  • 61. Which tweets originated during Nov 27th 2008,from 11PM to 12 PM
  • 62. Giving us Tweets originated from an address near 18.916517°N, 72.827682°E during time interval27th Nov 2008 between 11PM to 12PM?
  • 63. Research Challenge #2:Understanding and Analyzing Casual Text Casual text Microblogs are often written in SMS style language Slangs, abbreviations
  • 64. Understanding Casual Text Not the same as news articles or scientific literature Grammatical errors Implications on NL parser results Inconsistent writing style Implications on learning algorithms that generalize from corpus
  • 65. Nature of Microblogs Additional constraint of limited context Max. of x chars in a microblog Context often provided by the discourse Entity identification and disambiguation Pre-requisite to other sophisticated information analytics
  • 66. NL understanding is hard to begin with.. Not so hard “commando raid appears to be nigh at Oberoinow” Oberoi = Oberoi Hotel, Nigh = high Challenging new wing, live fire @ taj 2nd floor on iDesi TV stream Fire on the second floor of the Taj hotel, not on iDesi TV
  • 67. Research Opportunities NER, disambiguation in casual, informal text is a budding area of research Another important area of focus: Combining information of varied quality from a corpus (statistical NLP), domain knowledge (tags, folksonomies, taxonomies, ontologies), social context (explicit and implicit communities)
  • 68. Social Context surrounding content Social context in which a message appears is also an added valuable resource Post 1: “Hareemane Househostages said by eyewitnesses to be Jews. 7 Gunshots heard by reporters at Taj” Follow up post that is Nariman House, not (Hareemane)
  • 69. Understanding content … informal text I say: “Your music is wicked” What I really mean: “Your music is good” 54
  • 70. Urban Dictionary Sentiment expression: Rocks Transliterates to: cool, good Structured text (biomedical literature) Semantic Metadata: Smile is a Track Lil transliterates to Lilly Allen Lilly Allen is an Artist MusicBrainz Taxonomy Informal Text (Social Network chatter) Artist: Lilly Allen Track: Smile Your smile rocks Lil Multimedia Content and Web data Web Services
  • 71. Example: Pulse of a Community Imagine millions of such informal opinions Individual expressions to mass opinions “Popular artists” lists from MySpace comments Lilly Allen Lady Sovereign Amy Winehouse Gorillaz Coldplay Placebo Sting Kean Joss Stone
  • 72. What Drives the Spatio-Temporal-Thematic Analysis and Casual Text Understanding Semantics with the help of Domain Models Domain Models Domain Models(ontologies, folksonomies)
  • 73. Domain Knowledge: A key driver Places that are nearby ‘Nariman house’ Spatial query Messages originated around this place Temporal analysis Messages about related events / places Thematic analysis
  • 74. Research Challenge #3But Where does the Domain Knowledge come from? Expert and committee based ontology creation … works in some domains (e.g., biomedicine, health care,…) Community driven knowledge extraction How to create models that are “socially scalable”? How to organically grow and maintain this model?
  • 75. Building models…seed word to hierarchy creation using WIKIPEDIA Query: “cognition”
  • 76. Identifying relationships: Hard, harder than many hard things But NOT that Hard, When WE do it
  • 77. Games with a purpose Get humans to give their solitaire time Solve real hard computational problems Image tagging, Identifying part of an image Tag a tune, Squigl, Verbosity, and Matchin Pioneered by Luis Von Ahn
  • 78.
  • 80. How do you get comprehensive situational awareness by merging “human sensing” and “machine sensing”? 64
  • 81. Research Challenge #4: Semantic Sensor Web
  • 82. Semantically Annotated O&M <swe:component name="time"> <swe:Time definition="urn:ogc:def:phenomenon:time" uom="urn:ogc:def:unit:date-time"> <sa:swe rdfa:about="?time" rdfa:instanceof="time:Instant"> <sa:sml rdfa:property="xs:date-time"/> </sa:swe> </swe:Time> </swe:component> <swe:component name="measured_air_temperature"> <swe:Quantity definition="urn:ogc:def:phenomenon:temperature“ uom="urn:ogc:def:unit:fahrenheit"> <sa:swe rdfa:about="?measured_air_temperature“ rdfa:instanceof=“senso:TemperatureObservation"> <sa:swe rdfa:property="weather:fahrenheit"/> <sa:swe rdfa:rel="senso:occurred_when" resource="?time"/> <sa:swe rdfa:rel="senso:observed_by" resource="senso:buckeye_sensor"/> </sa:sml> </swe:Quantity> </swe:component> <swe:value name=“weather-data"> 2008-03-08T05:00:00,29.1 </swe:value>
  • 83. Semantic Sensor ML – Adding Ontological Metadata Domain Ontology Person Company Spatial Ontology Coordinates Coordinate System Temporal Ontology Time Units Timezone 67 Mike Botts, "SensorML and Sensor Web Enablement," Earth System Science Center, UAB Huntsville
  • 84. 68 Semantic Query Semantic Temporal Query Model-references from SML to OWL-Time ontology concepts provides the ability to perform semantic temporal queries Supported semantic query operators include: contains: user-specified interval falls wholly within a sensor reading interval (also called inside) within: sensor reading interval falls wholly within the user-specified interval (inverse of contains or inside) overlaps: user-specified interval overlaps the sensor reading interval Example SPARQL query defining the temporal operator ‘within’
  • 86. Semantic Sensor Web demo (online) Semantic Sensor Web demo (local) 70
  • 87. Synthetic but realistic scenario an image taken from a raw satellite feed 71
  • 88. an image taken by a camera phone with an associated label, “explosion.” Synthetic but realistic scenario 72
  • 89. Textual messages (such as tweets) using STT analysis Synthetic but realistic scenario 73
  • 90. Correlating to get Synthetic but realistic scenario
  • 91. Create better views (smart mashups)
  • 92. Extracting Social Signals what are the important topics of discussions and concerns in different parts of the world on a particular day how different cultures or countries are reacting to the same event or situation (eg Mumbai Attack) how a situation such as financial crisis is evolving over a period of time in terms of key topics of discussion and issues of concern (eg subprime mortgages and foreclosures, followed by troubled banks and credit freeze, followed by massive government intervention and borrowing, and so on). Twitris Demo 76
  • 93. A few more things Use of background knowledge Event extraction from text time and location extraction Such information may not be present Someone from Washington DC can tweet about Mumbai Scalable semantic analytics Subgraph and pattern discovery Meaningful subgraphs like relevant and interesting paths Ranking paths
  • 94. The Sum of the Parts Spatio-Temporal analysis Find out where and when + Thematic What and how + Semantic Extraction from text, multimedia and sensor data - tags, time, location, concepts, events + Semantic models & background knowledge Making better sense of STT Integration + Semantic Sensor Web The platform = Situational Awareness
  • 95. KNO.E.SIS as a case study of world class research based higher education environment http://knoesis.org 79
  • 96.
  • 97. Exceptional students Six of the senior PhD students: 84 papers, 43 program committees, contributed to winning NIH and NSF grants. Successfully competed with two Stanford PhDs, 1000+ citations in 2 years of his graduation. “BTW, Meena is an absolute find.  If all of your other students are as talented, you are very lucky.  …  I’d definitely like to work with more interns of her caliber, ... ”[Dr. Kevin Haas, Director of Search at Yahoo!] “It has been a few years since I visited Dayton (Wright AFB). However, it is clear that Wright State has transformed itself. Congratulations on your success with the KnoesisCenter.” [Dr. AlpersCaglayan – looking to hire Kno.e.sis grads]
  • 98. Funding, Collaboration, etc UGA, Stanford, CCHMC, SAIC, HP, IBM, Yahoo! NIH, NSF, AFRL-HE, AFRL-Sensor, HP, IBM, Microsoft, Google 70% Federal, 19% State, 11% Industry Students intern at the bestIndustry labs & national labs Graduates very successful 83
  • 99. Interested in more background? Semantics-Empowered Social Computing Semantic Sensor Web Traveling the Semantic Web through Space, Theme and Time Relationship Web: Blazing Semantic Trails between Web Resources Text Mining, Workflow Management, Semantic Web Services, Cloud Computing with application to healthcare, biomedicine, defense/intelligence, energy Contact/more details: amit @ knoesis.org Special thanks: Karthik Gomadam, MeenaNagarajan, Christopher Thomas Partial Funding: NSF (Semantic Discovery: IIS: 071441, Spatio Temporal Thematic: IIS-0842129), AFRL and DAGSI (Semantic Sensor Web), Microsoft Research and IBM Research (Analysis of Social Media Content),and HP Research (Knowledge Extraction from Community-Generated Content).

Hinweis der Redaktion

  1. Microblogs are one of the most powerful ways of talking of CSD
  2. Implicit social context created by people responding to other messages. In this example we are showing how the system can identify that its is Nariman and not Hareemane
  3. In the scenario, what techniques and technlologies are being brought together? Semantic + Social Computing + Mobile Web
  4. Users are shown two images along with labels. Labels gotten from GI or similar data source. Users add relationships. When 2 users agree, the labels are tagged with this relationship. Multiple relationships, using ML techniques, the system will learn .