+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
DB and IR Integration
1. in collaboration with Georgiana Ifrim, Gjergji Kasneci, Josiane Parreira, Maya Ramanath, Ralf Schenkel, Fabian Suchanek, Martin Theobald
2. DB and IR: Two Parallel Universes canonical application: accounting libraries data type: numbers, short strings text foundation: algebraic / logic based probabilistic / statistics based search paradigm: Boolean retrieval (exact queries, result sets/bags) ranked retrieval (vague queries, result lists) Database Systems Information Retrieval market leaders: Oracle, IBM DB2, MS SQL Server, etc. Google, Yahoo!, MSN, Verity, Fast, etc. parallel universes forever ?
3.
4.
5.
6. Outline • Past • Future • Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
7.
8. DB IR 1990 1995 2000 2005 VAGUE (Motro) Proximal Nodes (Baeza-Yates et al.) WHIRL (Cohen) Prob. Datalog (Fuhr et al.) INEX XPath XPath Full-Text Prob. DB (Cavallo&Pittarelli) Prob. Tuples (Barbara et al.) Web Entity Search: Libra, Avatar, ExDB … Faceted Search: Flamenco … 1st Gen. XML IR: XXL, XIRQL, Elixir, JuruXML Multimedia IR Web Query Languages: W3QS, WebOQL, Araneus … Semistructured Data: Lore, Xyleme … 2nd Gen. XML IR: XRank,Timber, TIJAH, XSearch, FleXPath, CoXML, TopX, MarkLogic, Fast … Uncertain & Prob. Relations: Mystiq, Trio … Struct. Docs Deep Web Search Digital Libraries Graph IR
9.
10. XXL: Early XML IR [Anja Theobald, GW: Adding Relevance toXML, WebDB’00] Which professors from Saarbruecken (SB) are teaching IR and have research projects on XML? Union of heterogeneous sources without global schema Similarity-aware XPath: // ~ Professor [//* = ” ~ SB“] [ // ~ Course [//* = ” ~ IR“] ] [ // ~ Research [//* = ” ~ XML“] ] Similarity-aware XPath: // ~ Professor [//* = ” ~ SB“] [ // ~ Course [//* = ” ~ IR“] ] [ // ~ Research [//* = ” ~ XML“] ] Professor Name : Gerhard Weikum Address ... City : SB Country : Germany Teaching Research Course Title : IR Description : Information retrieval ... Syllabus ... Book Article ... ... Project Title : Intelligent Search of Heterogeneous XML Data Funding : EU ... Name : Ralf Schenkel Lecturer Address: Max-Planck Institute for Informatics, Germany Activities Seminar Contents: Ranked retrieval … Literature: … Scientific Name: INEX task coordinator (Initiative for the Evaluation of XML …) Other Sponsor: EU …
11.
12.
13. Outline Past • Future • Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
20. Outline Past • Future Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
21. Knowledge Queries Nobel laureate who survived both world wars and his children drama with three women making a prophecy to a British nobleman that he will become king proteins that inhibit both protease and some other enzyme connection between Thomas Mann and Goethe differences in Rembetiko music from Greece and from Turkey neutron stars with Xray bursts > 10 40 erg s -1 & black holes in 10‘‘ market impact of Web2.0 technology in December 2006 sympathy or antipathy for Germany from May to August 2006 Turn the Web, Web2.0, and Web3.0 into the world‘s most comprehensive knowledge base („ semantic DB “) ! Answer „knowledge queries“ such as:
26. Exploit Hand-Crafted Knowledge {{Infobox_Scientist | name = Max Planck | birth_date = [[April 23]], [[1858]] | birth_place = [[Kiel]], [[Germany]] | death_date = [[October 4]], [[1947]] | death_place = [[Göttingen]], [[Germany]] | residence = [[Germany]] | nationality = [[Germany|German]] | field = [[Physicist]] | work_institution = [[University of Kiel]]</br> [[Humboldt-Universität zu Berlin]]</br> [[Georg-August-Universität Göttingen]] | alma_mater = [[Ludwig-Maximilians-Universität München]] | doctoral_advisor = [[Philipp von Jolly]] | doctoral_students = [[Gustav Ludwig Hertz]]</br> … | known_for = [[Planck's constant]], [[Quantum mechanics|quantum theory]] | prizes = [[Nobel Prize in Physics]] (1918) … Wikipedia, WordNet, and other lexical sources
27.
28. YAGO Knowledge Representation Entity Max_Planck April 23, 1858 Person City Country subclass Location subclass instanceOf subclass subclass bornOn “ Max Planck” means “ Dr. Planck” means subclass October 4, 1947 diedOn Kiel bornIn Nobel Prize Erwin_Planck FatherOf hasWon Scientist means “ Max Karl Ernst Ludwig Planck” Physicist instanceOf subclass Biologist subclass concepts individuals words Online access and download at http://www.mpi-inf.mpg.de/~suchanek/yago/ Accuracy: 97% Knowledge Base # Facts KnowItAll 30 000 SUMO 60 000 WordNet 200 000 OpenCyc 300 000 Cyc 5 000 000 YAGO 6 000 000
29. NAGA: Graph IR on YAGO [G. Kasneci et al.: WWW‘07] queries with regular expressions Ling $x scientist isa hasFirstName | hasLastName $y Zhejiang locatedIn * worksFor conjunctive queries Beng Chin Ooi (coAuthor | advisor) * Kiel $x scientist isa bornIn Graph-based search on YAGO-style knowledge bases with built-in ranking based on confidence and informativeness statistical language model for result graphs
30.
31. Information Extraction (IE): Text to Records combine NLP, pattern matching, lexicons, statistical learning Max Planck 4/23, 1858 Kiel Albert Einstein 3/14, 1879 Ulm Mahatma Gandhi 10/2, 1869 Porbandar Person BirthDate BirthPlace ... Person ScientificResult Max Planck Quantum Theory Person Collaborator Max Planck Albert Einstein Max Planck Niels Bohr Planck‘s constant 6.226 10 23 Js Constant Value Dimension
32.
33. Methods for Web-Scale Fact Extration city(Beijing) plays(Coltrane, sax) city(Beijing) old center of Beijing plays(Coltrane, sax) sax player Coltrane city(Beijing) old center of Beijing old center of X plays(Coltrane, sax) sax player Coltrane Y player X Example: city (Seattle) in downtown Seattle city (Seattle) Seattle and other towns city (Las Vegas) Las Vegas and other towns plays (Zappa, guitar) playing guitar: … Zappa plays (Davis, trumpet) Davis … blows trumpet seeds text rules new facts Example: city (Seattle) in downtown Seattle in downtown X city (Seattle) Seattle and other towns X and other towns city (Las Vegas) Las Vegas and other towns X and other towns plays (Zappa, guitar) playing guitar: … Zappa playing Y: … X plays (Davis, trumpet) Davis … blows trumpet X … blows Y Example: city (Seattle) in downtown Seattle in downtown X city (Seattle) Seattle and other towns X and other towns city (Las Vegas) Las Vegas and other towns X and other towns plays (Zappa, guitar) playing guitar: … Zappa playing Y: … X plays (Davis, trumpet) Davis … blows trumpet X … blows Y Example: city (Seattle) in downtown Seattle in downtown X city (Seattle) Seattle and other towns X and other towns city (Las Vegas) Las Vegas and other towns X and other towns plays (Zappa, guitar) playing guitar: … Zappa playing Y: … X plays (Davis, trumpet) Davis … blows trumpet X … blows Y in downtown Beijing city(Beijing) Coltrane blows sax plays(C., sax) Assessment of facts & generation of rules based on statistics Rules can be more sophisticated: playing NN: (ADJ|ADV)* NP & class(NN)=instrument & class(head(NP))=person plays(head(NP), NN)
34. Performance of Web-IE State-of-the-art precision/recall results: Anecdotic evidence: invented (A.G. Bell, telephone) married (Hillary Clinton, Bill Clinton) isa (yoga, relaxation technique) isa ( zearalenone, mycotoxin) contains (chocolate, theobromine) contains (Singapore sling, gin) invented (Johannes Kepler, logarithm tables) married (Segolene Royal, Francois Hollande) isa (yoga, excellent way) isa (your day, good one) contains (chocolate, raisins) plays (the liver, central role) makes (everybody, mistakes) relation precision recall corpus systems countries 80% 90% Web KnowItAll cities 80% ??? Web KnowItAll scientists 60% ??? Web KnowItAll headquarters 90% 50% News Snowball, LEILA birthdates 80% 70% Wikipedia LEILA instanceOf 40% 20% Web Text2Onto, LEILA Open IE 80% ??? Web TextRunner precision value-chain: entities 80%, attributes 70%, facts 60%, events 50%
35.
36.
37.
38. Outline Past Future Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
39. Major Trends in DB and IR malleable schema (later) deep NLP, adding structure record linkage info extraction graph mining entity-relationship graph IR ontologies ranking Database Systems Information Retrieval statistical language models data uncertainty programmability search as Web Service dataspaces Web objects Web 2.0 Web 2.0