Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Knowledge Engineering for TELDAP

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 29 Anzeige

Knowledge Engineering for TELDAP

Herunterladen, um offline zu lesen

Keh-Jiann Chen
Principal Investigator
Core Platforms for Digital Contents Project, TELDAP
Research Fellow
Research Center for Information Technology Innovation &
Institute of Information Science, Academia Sinica

Keh-Jiann Chen
Principal Investigator
Core Platforms for Digital Contents Project, TELDAP
Research Fellow
Research Center for Information Technology Innovation &
Institute of Information Science, Academia Sinica

Anzeige
Anzeige

Weitere Verwandte Inhalte

Anzeige

Ähnlich wie Knowledge Engineering for TELDAP (20)

Weitere von AAT Taiwan (20)

Anzeige

Knowledge Engineering for TELDAP

  1. 1. Knowledge Engineering for TELDAP Keh-Jiann Chen Principal Investigator Core Platforms for Digital Contents Project, TELDAP Research Fellow Research Center for Information Technology Innovation & Institute of Information Science, Academia Sinica
  2. 2. Outline Introduction Union catalog Databases and metadata for digital contents and websites Knowledge engineering Future perspective
  3. 3. Introduction The integration and management of digital contents has become an important issue as the amount of digital contents produced from different projects and institutions increases rapidly. Our project goal is to achieve optimized preservation, retrieval, and presentation of digital collections.
  4. 4. 1. Union Catalog
  5. 5. What is the union catalog¡H It is a catalog and portal for all digital collections of TELDAP. It is an integrated platform for browsing and searching entire digital contents of TELDAP. Metadata provides core descriptions and licensing information of each digital collection.
  6. 6. Browsing by topics Search by keywords Home Page of Union Catalogs
  7. 7. 2. Databases and metadata for digital contents and websites
  8. 8. Metadata models for different types of objects Archived digital items Union catalog metadata model- Dublin core+ Web sites DCCAP (Dublin Core Collections Application Profile) Fields for internal used only Unique Identifier, Format, Evaluation, Cataloging History Documents Document metadata-Dublin core
  9. 9. Metadata for Element Definition Title A name given to the resource digital items¡G Creator An entity primarily responsible for making the content of the resource Subject and Keywords The topic of the content of the resource Over 2 million Description An account of the content of the resource Publisher An entity responsible for making the resource digital items and available An entity responsible for making contributions to the Contributor content of the resource still increasing Date A date associated with an event in the life cycle of the resource Resource Type The nature or genre of the content of the resource Format The physical or digital manifestation of the resource Resource Identifier An unambiguous reference to the resource within a given context Source A Reference to a resource from which the present resource is derived Language A language of the intellectual content of the resource Relation A reference to a related resource Coverage The extent or scope of the content of the resource Rights Management Information about rights held in and over the resource 9
  10. 10. 10
  11. 11. Metadata for websites Over 200 websites and still increasing Metadata DCCAP (Dublin Core Collections Application Profile) To Combine the standard with our requirements: 19 data fields
  12. 12. Metadata for websites The Website Homepage Picture URL, Project Information Type, Name, Author, Subject, Description, Language, Item Type, Target Archived Information: URL, time, authorization Copyright, Purpose, Other Information Figure: http://digitalarchives.tw
  13. 13. Dynamic categorization User-oriented categorization General, elementary school students, high school students, researchers, …etc. Topical-based categorization Archaeology, painting, animal, plant, document, …etc. Functional-based categorization Research, education, business, technology,… Categorization based on institutions Academia Sinica, Taiwan U., Palace museum,…
  14. 14. Figure: http://digitalarchives.tw Purpose: Education Target: Elementary school student, Junior high school student, Teacher… Select Items: According to 40 evaluation indicators, select top 5 websites Purpose: Creative applications Select Items: According to 40 evaluation indicators, select top 5 websites Purpose: Academic research Subject: Animal, Archaeology, Anthropology… Select Items: According to 40 evaluation indicators, select top 3 websites
  15. 15. Metadata for project documents Over 5000 documents and still increasing Metadata- Dublin core Construct Teldapwiki- A Wikipedia for Teldap http://wiki.teldap.tw/
  16. 16. 3. Knowledge Engineering
  17. 17. Plans of making knowledge structures for TELDAP Construct metadata models for different objects. Establish hyperlinks between contexts and objects. Develop keyword extraction tools. Design automatic tagging tools. Construct Teldap ontology and thesaurus Art & Architecture Thesaurus by Getty Chinese WordNet
  18. 18. (1) Metadata models for different objects Digital collections Union catalog metadata model- Dublin core+ Web sites DCCAP (Dublin Core Collections Application Profile) Public fields Private fields Unique Identifier, Format, Evaluation, Cataloging History Documents Document metadata-Dublin core
  19. 19. (2) Establish hyperlinks between contents and objects Identify keywords in contents Tag keywords with related object hyperlinks
  20. 20. Develop hyperlink tagging tools Word segmentation tools Resolve word segmentation ambiguities and identify keywords. CKIP word segmentation system: http://ckipsvr.iis.sinica.edu.tw/
  21. 21. Develop hyperlink tagging tools TELDAP keyword dictionary Extract keywords from metadata and establish object-keyword relations. Extract text from XML data for each object The text are classified by topics, titles, descriptions, authors, locations, eras etc. From each class of text file extract keywords by automatic word segmentation and keyword extraction techniques.
  22. 22. Prototype system for hyperlink tagger Identify and select keywords from the input text
  23. 23. Prototype system for hyperlink tagger Produce text with hyperlinks
  24. 24. Prototype system for hyperlink tagger Hyperlinks point to the related digital collections
  25. 25. (3) Construct Teldap ontology and thesaurus Topical relation Hypernym/hyponym Synonym relation [¹¾¡²B³ ]/[ªM =ÄFY© ª¬ = Sushi ¡B½L¡B¸J¡BÂ| ] =©µ¥-°p¤ý Establish implicit links between objects by author, material, object type, …etc..
  26. 26. (3) Construct Teldap ontology and thesaurus Establish association links between Chinese keywords and Getty AAT. Merging Chinese WordNet with English WordNet
  27. 27. Future Perspectives Technology development Construct multi-lingua thesauri – Getty AAT Maintain the TELDAP keyword and object relation database Construct name authority files, gazetteers, and universal calendars Design hyperlink taggers and keyword extension tools Designing authoring tool which provides hyperlinks of keyword related digital contents automatically Design knowledge-based content retrieval system
  28. 28. Future Perspectives Content enrichment Within TELDAP¡G Standardize object metadata model and data format All TELDAP objects should have their metadata Writing scripts and stories for different topics with Wiki-like knowledge structure Enrich the digital collections Establish hyperlinks between text books and TELDAP collections Extend the knowledge sources¡G e.g. Wikipedia
  29. 29. Thank you for your attention! ·q½Ð«ü±

×