Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Semantic Web: introduction & overview

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 77 Anzeige

Semantic Web: introduction & overview

Herunterladen, um offline zu lesen

A lecture/conversation focusing on the first 12 years of Semantic Web - delivered on February 21, 2012.

See http://j.mp/SWIntro for more details. More detailed course material is at http://knoesis.org/courses/web3/

A lecture/conversation focusing on the first 12 years of Semantic Web - delivered on February 21, 2012.

See http://j.mp/SWIntro for more details. More detailed course material is at http://knoesis.org/courses/web3/

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Andere mochten auch (10)

Anzeige

Ähnlich wie Semantic Web: introduction & overview (20)

Aktuellste (20)

Anzeige

Semantic Web: introduction & overview

  1. 1. 1 Semantic Web: intro & overview A conversation with students – 1 Sept 2015 Amit Sheth http://knoesis.org/amit Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH, USA
  2. 2. What are the most important recent software/Internet success stories?
  3. 3. Apple’s Siri IBM’s Watson Google Semantic Search What are common technologies?
  4. 4. Just stepping back a bit
  5. 5. Semantic technologies in the mainstream • Microsoft purchased Powerset in 2008 • Apple purchased Siri [Apr 2010] – “Once Again The Back Story Is About Semantic Web” • Google buys Metaweb [June 2010]...” Google Snaps Up Metaweb in Semantic Web Play” and releases Semantic search in 2013 – Now see: “Google Knowledge Graph Could Change Search Forever” • Facebook OpenGraph, Twitter annotation …”another example of semantic web going mainstream” “Google, Twitter and Facebook build the semantic web” 5
  6. 6. • RDFa adoption ….Search engines (esp Bing) started using domain models and (all) use of background knowledge/structured databases with large entity bases (these are part of Knowledge Graph and equivalent) • Bing, Yahoo! and Google are using schema.org in a big way
  7. 7. A bit of history • Semantics with metadata and ontologies for heterogeneous documents and multiple repositories of data including the Web was discussed in 1990s (semantic information brokering, faceted search, InfoHarness, SIMS, Ariadne, OBSERVER, SHOE, MREF, InfoQuilt, …). Also DAML and OIL. • Tim Berners-Lee used “Semantic Web” in his 1999 book • I had founded a company Taalee in 1999, gave a keynote on Semantic Web & commercialization in 2000 and filed for a patent in 2000 (awarded 2001). • Well known TBL, Hendler, Lassila paper in Scientific American took AI-ish approach (agents,…) to Semantic Web • First 5 years saw too much of AI/DL, but more practical/applied work has dominated recently
  8. 8. Different foci • TBL – focus on data: Data Web (“In a way, the Semantic Web is a bit like having all the databases out there as one big database.”) • Others focus on reasoning and intelligent processing • But the biggest current use seems to be about Search: – 15 years of Semantic Search and Ontology-enabled Semantic Applications
  9. 9. 1 2 3 of Semantic Web
  10. 10. 1 • Ontology: Agreement with a common vocabulary/nomenclature, conceptual models and domain Knowledge • Schema + Knowledge base • Agreement is what enables interoperability • Formal description - Machine processability is what leads to automation
  11. 11. 2 • Semantic Annotation (Metadata Extraction): Associating meaning with data, or labeling data so it is more meaningful to the system and people. • Can be manual, semi-automatic (automatic with human verification), automatic.
  12. 12. From Syntax to Semantics Shallow semantics Deep semantics Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics
  13. 13. SSN Ontology 2 Interpreted data (deductive) [in OWL] e.g., threshold 1 Annotated Data [in RDF] e.g., label 0 Raw Data [in TEXT] e.g., number 3 Interpreted data (abductive) [in OWL] e.g., diagnosis Intellego “150” Systolic blood pressure of 150 mmHg Elevated Blood Pressure Hyperthyroidism …… 13 Levels of Abstraction
  14. 14. 3 • Reasoning/Computation: semantics enabled search, integration, answering complex queries, connections and analyses (paths, sub graphs), pattern finding, mining, hypothesis validation, discovery, visualization
  15. 15. Semantic Web Stack • Web of Linked Data • Introduced by Berners Lee et. al as next step for Web of Documents • Allow “machine understanding” of data, • Create “common” models of domains using formal language - ontologies Layer cake image source: http://www.w3.org; see W3C SW publications Semantic Web Layer Cake
  16. 16. Characteristics of Semantic Web 16 Self Describing Machine & Human Readable Issued by a Trusted Authority Easy to Understand Convertible Can be Secured The Semantic Web: XML, RDF & Ontology Adapted from William Ruh (CISCO)
  17. 17. • Resource Description Framework – Recommended by W3C for metadata modeling [RDF] • A standard common modeling framework – usable by humans and machine understandable Resource Description Framework IBM Armonk, New York, United States Zurich, Switzerland Location Company RDF/OWL slides From: Semantic Web in Health Informatics (thanks: Satya)
  18. 18. • RDF Triple o Subject: The resource that the triple is about o Predicate: The property of the subject that is described by the triple o Object: The value of the property • Web Addressable Resource: Uniform Resource Locator (URL), Uniform Resource Identifier (URI), Internationalized Resource Identifier (IRI) • Qualified Namespace: http://www.w3.org/2001/XMLSchema# as xsd: o xsd: string instead of http://www.w3.org/2001/XMLSchema#string RDF: Triple Structure, IRI, Namespace IBM Armonk, New York, United States Headquarters located in
  19. 19. • Two types of property values in a triple o Web resource o Typed literal RDF Representation IBM Armonk, New York, United States Headquarters located in IBM Has total employees “430,000” ^^xsd:integer • The graph model of RDF: node-arc-node is the primary representation model • Secondary notations: Triple notation o companyExample:IBM companyExample:has-Total- Employee “430,000”^^xsd:integer .
  20. 20. • RDF Schema: Vocabulary for describing groups of resources [RDFS] RDF Schema IBM Armonk, New York, United States Headquarters located in Oracle Redwood Shores, California, United States Headquarters located in Company Geographical Location Headquarters located in
  21. 21. • Property domain (rdfs:domain) and range (rdfs:range) RDF Schema Headquarters located in Company Domain Range Geographical Location • Class Hierarchy/Taxonomy: rdfs:subClassOf rdfs:subClassOf Computer Technology Company SubClass (Parent) Class Company Banking Company Insurance Company
  22. 22. Ontology: A Working Definition • Ontologies are shared conceptualizations of a domain represented in a formal language* • Ontologies: o Common representation model - facilitate interoperability, integration across different projects, and enforce consistent use of terminology o Closely reflect domain-specific details (domain semantics) essential to answer end user o Support reasoning to discover implicit knowledge * Paraphrased from Gruber, 1993
  23. 23. Expressiveness Range:Knowledge Representation and Ontologies Catalog/ID General Logical constraints Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal is-a Formal instance Value Restriction Disjointness, Inverse, part of… Ontology Dimensions After McGuinness and Finin Simple Taxonomies Expressive Ontologies Wordnet CYCRDF DAML OO DB Schema RDFS IEEE SUOOWL UMLS GO KEGG TAMBIS EcoCyc BioPAX GlycOSWETO Pharma
  24. 24. • A language for modeling ontologies [OWL] • OWL2 is declarative • An OWL2 ontology (schema) consists of: o Entities: Company, Person o Axioms: Company employs Person o Expressions: A Person Employed by a Company = CompanyEmployee • Reasoning: Draw a conclusion given certain constraints are satisfied o RDF(S) Entailment o OWL2 Entailment OWL2 Web Ontology Language
  25. 25. • Class Disjointness: Instance of class A cannot be instance of class B • Complex Classes: Combining multiple classes with set theory operators: o Union: Parent = ObjectUnionOf (:Mother :Father) o Logical negation: UnemployedPerson = ObjectIntersectionOf (:EmployedPerson) o Intersection: Mother = ObjectIntersectionOf (:Parent :Woman) OWL2 Constructs
  26. 26. • Property restrictions: defined over property • Existential Quantification: o Parent = ObjectSomeValuesFrom (:hasChild :Person) o To capture incomplete knowledge • Universal Quantification: o US President = objectAllValuesFrom (:hasBirthPlace United States) • Cardinality Restriction OWL2 Constructs
  27. 27. SPARQL: Querying Semantic Web Data • A SPARQL query pattern composed of triples • Triples correspond to RDF triple structure, but have variable at: o Subject: ?company ex:hasHeadquaterLocation ex:NewYork. o Predicate: ex:IBM ?whatislocatedin ex:NewYork. o Object: ex:IBM ex:hasHeadquaterLocation ?location. • Result of SPARQL query is list of values – values can replace variable in query pattern
  28. 28. SPARQL: Query Patterns • An example query pattern PREFIX ex:<http://www.eecs600.case.edu/> SELECT ?company ?location WHERE {?company ex:hasHeadquaterLocation ?location.} • Query Result company location IBM NewYork Oracle RedwoodCity MicorosoftCorporation Bellevue Multiple Matches
  29. 29. SPARQL: Query Forms • SELECT: Returns the values bound to the variables • CONSTRUCT: Returns an RDF graph • DESCRIBE: Returns a description (RDF graph) of a resource (e.g. IBM) o The contents of RDF graph is determined by SPARQL query processor • ASK: Returns a Boolean o True o False
  30. 30. a little bit about ontologies
  31. 31. Open Biomedical Ontologies http://bioportal.bioontology.org/ , http://obo.sourceforge.net/ Many Ontologies Available Today
  32. 32. From simple ontologies
  33. 33. Drug Ontology Hierarchy (showing is-a relationships) owl:thing prescription _drug_ brand_name brandname_ undeclared brandname_ composite prescription _drug monograph _ix_class cpnum_ group prescription _drug_ property indication_ property formulary_ property non_drug_ reactant interaction_ property property formulary brandname_ individual interaction_ with_prescri ption_drug interaction indication generic_ individual prescription _drug_ generic generic_ composite interaction_ with_non_ drug_reactant interaction_ with_mono graph_ix_cl ass
  34. 34. to complex ontologies
  35. 35. N-Glycosylation metabolic pathway GNT-I attaches GlcNAc at position 2 UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 GNT-V attaches GlcNAc at position 6 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 N-acetyl-glucosaminyl_transferase_VN-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4
  36. 36. A little bit about semantic metadata extractions and annotations
  37. 37. WWW, Enterprise Repositories METADATA EXTRACTORS Digital Maps Nexis UPI AP Feeds/ Documents Digital Audios Data Stores Digital Videos Digital Images . . . . . . . . . Create/extract as much (semantics) metadata automatically as possible; Use ontlogies to improve and enhance extraction Extraction for Metadata Creation
  38. 38. Automatic Semantic Metadata Extraction/Annotation
  39. 39. Semantics & Semantic Web in 1999-2002
  40. 40. Sample applications • Early Semantic Search, use baby steps of today’s engines • Enterprise applications – healthcare & life sciences, financial, security • Driving the innovation with new types of data: sensor (Semantic Sensor Web), social (Semantic Social Web), semantic IoT/WoT
  41. 41. BLENDED BROWSING & QUERYING INTERFACE ATTRIBUTE & KEYWORD QUERYING uniform view of worldwide distributed assets of similar type SEMANTIC BROWSING Targeted e-shopping/e-commerce assets access Taalee Semantic/Faceted Search & Browsing (1999-2001)
  42. 42. Search for company ‘Commerce One’ Links to news on companies that compete against Commerce One Links to news on companies Commerce One competes against (To view news on Ariba, click on the link for Ariba) Crucial news on Commerce One’s competitors (Ariba) can be accessed easily and automatically Semantic Search/Browsing/Directory (2001-….)
  43. 43. System recognizes ENTITY & CATEGORY Relevant portion of the Directory is automatically presented. Semantic Search/Browsing/Directory (2001-….)
  44. 44. Users can explore Semantically related Information. Semantic Search/Browsing/Directory (2001-….)
  45. 45. Focused relevant content organized by topic (semantic categorization) Automatic Content Aggregation from multiple content providers and feeds Related relevant content not explicitly asked for (semantic associations) Competitive research inferred automatically Automatic 3rd party content integration Equity Research Dashboard with Blended Semantic Querying and Browsing
  46. 46. Semagix Freedom for building ontology-driven information system Extracting Semantic Metadata from Semistructured and Structured Sources (1999 – 2002) Managing Semantic Content on the Web
  47. 47. Ontology Semantic Query Server 1. Ontology Model Creation (Description) 2. Knowledge Agent Creation 3. Automatic aggregation of Knowledge4. Querying the Ontology Ontology Creation and Maintenance Steps © Semagix, Inc.
  48. 48. 2004 SEMAGIX 48 Watch list Organization Company Hamas WorldCom FBI Watchlist Ahmed Yaseer appears on Watchlist member of organization works for Company Ahmed Yaseer: • Appears on Watchlist ‘FBI’ • Works for Company ‘WorldCom’ • Member of a banned organization’ Semantic Associations - Connecting the Dots
  49. 49. Global Investment Bank Fraud Prevention application used in financial services – Related KYC application is deployed at Majority of Global Banks User will be able to navigate the ontology using a number of different interfaces World Wide Web content Public Records BLOGS, RSS Un-structure text, Semi-structured Data Watch Lists Law Enforcement Regulators Semi-structured Government Data Scores the entity based on the content and entity relationships Establishing New Account
  50. 50. Fast forward to 2005-2006
  51. 51. Semantic Web + Clinical Practice Informatics = Active Semantic Electronic Medical Record (ASEMR) Operationally deployed in January 2006, in use (as of 2012)
  52. 52. ASEMR: SW application in use In daily use at Athens Heart Center – 28 person staff • Interventional Cardiologists • Electrophysiology Cardiologists – Deployed since January 2006 – 40-60 patients seen daily – 3000+ active patients – Serves a population of 250,000 people
  53. 53. Information Overload in Clinical Practice • New drugs added to market – Adds interactions with current drugs – Changes possible procedures to treat an illness • Insurance Coverage's Change – Insurance may pay for drug X but not drug Y even though drug X and Y are equivalent – Patient may need a certain diagnosis before some expensive test are run • Physicians need a system to keep track of ever changing landscape
  54. 54. Active Semantic Document (ASD) A document (typically in XML) with the following features: • Semantic annotations – Linking entities found in a document to ontology – Linking terms to a specialized lexicon [TR] • Actionable information – Rules over semantic annotations – Violated rules can modify the appearance of the document (Show an alert)
  55. 55. Active Semantic Patient Record • An application of ASD • Three Ontologies – Practice Information about practice such as patient/physician data – Drug Information about drugs, interaction, formularies, etc. – ICD/CPT Describes the relationships between CPT and ICD codes • Medical Records in XML created from database
  56. 56. Active Semantic Electronic Medical Record App In Use Today at Athens Heart Center For Clinical Decision Support since January 2006 Amit P. Sheth, S. Agrawal,Jonathan Lathem, Nicole Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active Semantic Electronic Medical Record, Proc. of the 5th International Semantic Web Conference, 2006
  57. 57. Demo of ASEMR and other applications http://knoesis.org/showcase http://archive.knoesis.org/library/demos/
  58. 58. Benefits of ASEMR • Error prevention (drug interactions, allergy) – Patient care – insurance • Decision Support (formulary, billing) – Patient satisfaction – Reimbursement • Efficiency/time – Real-time chart completion – “semantic” and automated linking with billing
  59. 59. Using large data sets for Structured Data on the web: Linked Open Data – samples from 2005 to 2010
  60. 60. Linked Open Data Publish Open Data Sets in RDF By 2010, 203 data data sets 25 billion Triples Image: http://richard.cyganiak.de/2007/10/lod/
  61. 61. You publish the raw data… Semantic Web Adoption and Application
  62. 62. … and others can use it Semantic Web Adoption and Application
  63. 63. Using the LOD to build Web site: BBC Semantic Web Adoption and Application
  64. 64. Using the LOD to build Web site: BBC Semantic Web Adoption and Application
  65. 65. GoodRelations Ontology - RDFa Semantic Web Adoption and Application
  66. 66. GoodRelations Ontology - RDFa Semantic Web Adoption and Application
  67. 67. GoodRelations Ontology - RDFa Semantic Web Adoption and Application
  68. 68. Fast forward to 2010-2011
  69. 69. Schema.org Shared Vocabulary Amazing things can happen Will give some on-line examples
  70. 70. Twitris: Semantic Social Web Mash-up Select topicSelect date Topic tree Spatial Marker N-gram summaries Wikipedia articles Reference newsRelated tweets Images & Videos Tweet traffic Sentiment Analysis TWITRIS
  71. 71. Web (and associated computing) is evolving Web of pages - text, manually created links - extensive navigation 2007 1997 Web of databases - dynamically generated pages - web query interfaces Web of resources - data, service, data, mashups - 4 billion mobile computing Web of people, Sensor Web - social networks, user-created casual content - 40 billion sensors, 500M+ FB users, 1B tweets/wk Web as an oracle / assistant / partner - “ask the Web”: using semantics to leverage text + data + services - Powerset Computing for Human Experience Keywords Patterns Objects Situations, Events Enhanced Experience, Tech assimilated in life
  72. 72. Structured text (Scientific publications / white papers) Experimental Results Clinical Trial Data Public domain knowledge (PubMed) Metadata Extraction/Semantic Annotations Ontologies/Dom ain Models/ Knowledge Meta data / Semantic Annotations Semantic Search/ Browsing/Personalization/ Analysis, Knowledge Discovery, Visualization, Situational Awareness Big data Search and browsing Patterns / Inference / Reasoning 2D-3D & Immersive Visualization, Human Computer Interfaces Impacting bottom line Knowledge discovery Migraine Stress Patient affects isa Magnesium Calcium Channel Blockers inhibit SEMANTICS, MEANING PROCESSING 72
  73. 73. Semantics as core enabler, enhancer @ Kno.e.sis
  74. 74. Take Home Message (Cont.) Semantics play a key role in refering "meaning" behind the data. Requires progress from keywords -> entities -> relationships -> events, from raw data to human-centric abstractions.
  75. 75. Take Home Message (Cont.) Wide variety of semantic models and KBs (vocabularies, social dictionaries, community created semi-structured knowledge, domain-specific datasets, ontologies) empower semantic solutions. This can lead to Semantic Scalability – scalability that is meaningful to human activities and decision making.
  76. 76. Interested in more? Kno.e.sis Wiki for the following and more: • Computing for Human Experience • Continuous Semantics to Analyze Real-Time Data • Semantic Modeling for Cloud Computing • Citizen Sensing, Social Signals, and Enriching Human Experience • Semantics-Empowered Social Computing • Semantic Sensor Web • Traveling the Semantic Web through Space, Theme and Time • Relationship Web: Blazing Semantic Trails between Web Resources • SA-REST: Semantically Interoperable and Easier-to-Use Services and Mashups • Semantically Annotating a Web Service Tutorials: Semantic Web:Technologies and Applications for the Real-World (WWW2007) Citizen Sensor Data Mining, Social Media Analytics and Development Centric Web Applications (WWW2011) Partial Funding: NSF (Semantic Discovery: IIS: 071441, Spatio Temporal Thematic: IIS-0842129), AFRL and DAGSI (Semantic Sensor Web), Microsoft Research (Semantic Search) and IBM Research (Analysis of Social Media Content),and HP Researh (Knowledge Extraction from Community-Generated Content).
  77. 77. 77 http://knoesis.org Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA Vision Paper: Computing for Human Experience:http://wiki.knoesis.org/index.php/Computing_For_Human_Experience Future: Computing for Human Experience

Hinweis der Redaktion

  • RDF: Triple structure
  • Review types of heterogeneity. Why we need to reconcile data heterogeneityUniform Resource Locator: A network location and used as an identifier for resources on the Web. URL is a specific type of URI. URI can be used to refer to anythingIRI: In addition to ASCII character set, contains Universal Character Set (from RFC 3987)
  • RDF uses XML Schema datatypes
  • Allows creation of an abstract representation of domain
  • Allows creation of an abstract representation of domain
  • Review types of heterogeneity. Why we need to reconcile data heterogeneity
  • Review types of heterogeneity. Why we need to reconcile data heterogeneity
  • Review types of heterogeneity. Why we need to reconcile data heterogeneity
  • Taalee (subsequently Voquette and Semagix) was founded in 1999 as an Audio/Video Web Search Company (focus on A/V mainly for scalability and market focus reasons, servicename: MediaAnywhere). Domain models/ontologies were created in major areas (many more than what you can find on Bing in 2011) and automatically populated to build knowledge bases (populated ontologies or WorldModel) from a variety of structured and semistructured sources, and periodically kept up to date. This was than used for semantic annotation/metadata extraction to drive semantic search, browsing, etc applications over data crawled from Web sites.
  • The important thing is that the system knew that Robert Duval is a movie actor, is a different person that David Duval who is a golfer and a sportsperson, and had understanding of a variety of relationships Robert Duval participates in – such as
  • Obtained from Ivan’s slide
  • Obtained from Ivan’s slide
  • Obtained from Ivan’s slide
  • Obtained from Ivan’s slide
  • Obtained from Ivan’s slide
  • Let me give a technological introduction to what our center is about: we all face a fire hose of data-- Pubmed adds 2000 to 4000 citations per day, it is usual to add about 5 gig from a single run of a scientific experiment -- and just imagine how much data created by all the cameras and 40 billion mobile sensors in the world! But even with all the search and browsing tools we have, we face huge information glut. How do we make sense from the data? Just as humans apply their knowledge and experience to understand what they see– we apply domain model or knowledge to attach meaningful labels to these data. Then we can apply computational techniques to visualize, provide situational awareness, discovery nuggets of knowledge of information and insight. For example, from all that biomedical data, what a scientist may be looking for is– how can we treat Migraine? What has Magnesium to do with Migraine? Why does Magnesium deficiency cause Migraine? What is the process by which Magnesium affects Migraine?
  • Kno.e.sis has 15 faculty in Computer Science, life sciences and health care, cognitive science and business. It has about 50 PhD students and post docs– about 2/3 of these in Computer Science. Its faculty members have 40 labs, and occupies a majority of 50K sqft Joshi Research Center. Its students are highly successful– eg tenure track faculty @ Case Western Reserve Univ or Researcher at IBM Almaden. It has received recent funding from funding from Microsoft Research. IBM Research, HP Labs, Google, and small companies (Janya, EZdi,…) and collaborates with many more (Yahoo! Labs, NLM, …).

×