Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Taxonomies for Publishing: Enhancing the User Experience

Wird geladen in …3

Hier ansehen

1 von 66 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Taxonomies for Publishing: Enhancing the User Experience (20)


Aktuellste (20)


Taxonomies for Publishing: Enhancing the User Experience

  1. 1. Taxonomies for Publishing:  Enhancing the User Experience August 31, 2011 11:30am-12:30pm Eastern Marjorie Hlava, President [email_address] Access Innovations, Inc. www.accessinn.com
  2. 2. Abstract <ul><li>In the modern environment of Web-hosted content, where users can instantly find and access vast numbers of works, the competitive value of a journal article, a book, or a collection is increasingly derived from users’ ability to connect the information contained within the works themselves with other content that is important to them. This ability to create instant connections between works that have been heretofore kept separate or “siloed” – by journal title, content type, or publishing house – represents a revolution in the way users search for, consume, and pay for information. This trend has been happening for several years, often to the commercial disadvantage of publishers: by indexing every page on the Web, Google has changed the economy of publishing to one that is increasingly based around the article or the chapter, rather than the journal or the book. Instead of starting at a journal’s home page, users go directly to articles, bypassing the publisher’s home page branding and revenue-generating advertising . Google has effectively become the “hub” connecting all content on the Web. Nearly all of the major publishing organizations are discovering that in order to protect their brands and preserve the value of subscription sales, they need to become more active participants in this process and create new paths that enable users to find additional value within their content collections. </li></ul>
  3. 3. The Publisher Challenge: Google is a one stop shop and they index your content Information-rich assets, unconnected from each other, leave the total organization less than the sum of its parts <ul><li>Information in Silos </li></ul><ul><li>Disjointed content </li></ul><ul><li>Untapped value within archives – new products </li></ul><ul><li>Points of contact vary </li></ul>Content People Programs
  4. 4. The Taxonomy Solution not just to be discovered ….. Subjects: Concepts Entities: People, Places, Things
  5. 5. Fight back! Index your content yourself <ul><li>Google indexes it – for the world </li></ul><ul><li>You need to index it for your community </li></ul><ul><li>Use the language your users use </li></ul><ul><li>Set the context for your content </li></ul><ul><li>Leverage the access to that content </li></ul><ul><li>Maximize the Google “hub” </li></ul>
  6. 6. Create a new roadmap <ul><li>Link the content </li></ul><ul><li>Link the people </li></ul><ul><li>Link the terms </li></ul><ul><li>Take charge of the terminology used </li></ul><ul><li>Manage your community </li></ul><ul><li>Let’s look at some options </li></ul>
  7. 7. The Semantic Roadmap: K nowledge Organization Systems <ul><li>Semantic network </li></ul><ul><li>Ontology </li></ul><ul><li>Thesaurus </li></ul><ul><li>Taxonomy </li></ul><ul><li>Controlled vocabulary </li></ul><ul><li>Synonym set/ring </li></ul><ul><li>Name authority file </li></ul><ul><li>Uncontrolled list </li></ul><ul><li>Unrelated Entities </li></ul><ul><li>Ambiguity </li></ul><ul><li>Simple </li></ul><ul><li>Low Value </li></ul><ul><li>Linked Entities </li></ul><ul><li>Contextual Specificity </li></ul><ul><li>Complex </li></ul><ul><li>High value </li></ul>
  8. 8. <ul><li>Semantically enrich content </li></ul><ul><li>Organize it well </li></ul>
  9. 9. For search all publications Search database for Journals and pubs Bookstore search Search of 53 crawled sites including journals, books, web site, conference sites, etc. Site search Navigation Here is an old fashioned approach Data is silo-ed and hard to find It is easier to use Google to search this site
  10. 10. web sites mobile apps, social networks, author networks, Etc. Once tagged, the data can be used in many different ways
  11. 11. Data can be fed to the web site or to publications from the same interface. Repurpose the data Use for Database distribution
  12. 12. More Task 2 Key Points Take-away. And on the website
  13. 13. <ul><li>Improve the overall user experience. </li></ul>
  14. 14. Semantics = Words + = Taxonomy
  15. 15. Supporting the user experience
  16. 16. A highly trusted source Taxonomies support social technologies and the utility of long-trusted sources
  17. 17. Add a DC:Subject area for all linking options
  18. 18. Your Taxonomy
  19. 19. <ul><li>Power content links to enable discovery </li></ul>
  20. 20. Domain-based connections
  21. 21. IEEE Subject Browsing Note broad categories as well as deep indexing
  22. 22. Again the user goes to Google and finds the resource faster
  23. 23. Target resources by subject or user role CONFIDENTIAL Double approach to find data user or subject role
  24. 24. Targeted resources based on subject or user role CONFIDENTIAL Quick access to special areas based on the taxonomy
  25. 25. Taxonomy Driven Search Presentation
  26. 26. Copyright © 2005 - Access Innovations, Inc. E-commerce drill down Expanded categories & additional information Taxonomy Top Categories
  27. 27. Enable links to ideas and data sets outside the article
  28. 28. Link to Society Resources Journal Article on Topic A Other Journal Articles on Topic A Upcoming Conference on Topic A Podcast Interview with Researcher Working on Topic A Grant Available for Researchers Working on Topic A CME Activity on Topic A Job Posting for Expert on Topic A Author Networks Social Networking
  29. 29. Link to Society Resources Cancer Epidemiology Biomarkers & Prevention Vol. 12, 161-164, February 2003 © 2003 American Association for Cancer Research Short Communications Alcohol, Folate, Methionine, and Risk of Incident Breast Cancer in the American Cancer Society Cancer Prevention Study II Nutrition Cohort Heather Spencer Feigelson 1 , Carolyn R. Jonas, Andreas S. Robertson, Marjorie L. McCullough, Michael J. Thun and Eugenia E. Calle Department of Epidemiology and Surveillance Research, American Cancer Society, National Home Office, Atlanta, Georgia 30329-4251 Recent studies suggest that the increased risk of breast cancer associated with alcohol consumption may be reduced by adequate folate intake. We examined this question among 66,561 postmenopausal women in the American Cancer Society Cancer Prevention Study II Nutrition Cohort. <ul><li>Related Press Releases </li></ul><ul><li>How What and How Much We Eat (And Drink) Affects Our Risk of Cancer </li></ul><ul><li>Novel COX-2 Combination Treatment May Reduce Colon Cancer Risk Combination Regimen of COX-2 Inhibitor and Fish Oil Causes Cell Death </li></ul><ul><li>COX-2 Levels Are Elevated in Smokers </li></ul><ul><li>Related AACR Workshops and Conferences </li></ul><ul><li>Frontiers in Cancer Prevention Research </li></ul><ul><li>Continuing Medical Education (CME) </li></ul><ul><li>Molecular Targets and Cancer Therapeutics </li></ul><ul><li>Related Meeting Abstracts </li></ul><ul><li>Association between dietary folate intake, alcohol intake, and methylenetetrahydrofolate reductase C677T and A1298C polymorphisms and subsequent breast </li></ul><ul><li>Folate, folate cofactor, and alcohol intakes and risk for colorectal adenoma </li></ul><ul><li>Dietary folate intake and risk of prostate cancer in a large prospective cohort study </li></ul><ul><li>Related Working Groups </li></ul><ul><li>Finance </li></ul><ul><li>Charter </li></ul><ul><li>Molecular Epidemiology </li></ul>Related Education Book Content Oral Contraceptives, Postmenopausal Hormones, and Breast Cancer Physical Activity and Cancer Hormonal Interventions: From Adjuvant Therapy to Breast Cancer Prevention <ul><li>Related Awards </li></ul><ul><li>AACR-GlaxoSmithKline Clinical Cancer Research Scholar Awards </li></ul><ul><li>ACS Award </li></ul><ul><li>Weinstein Distinguished Lecture </li></ul>Webcasts Related Webcasts Think Tank Report Related Think Tank Report Content
  30. 30. HTML Headers META NAME KEYWORD Use the taxonomy here
  31. 31. Connect people, places, events, products
  32. 32. Search ORCID DOI keyword Search Keyword
  33. 33. … or by location…
  34. 34. … Or in the Document itself: http://dx.doi.org/10.1371/journal.pntd.0000228.x001
  35. 35. Authors at a Place
  36. 36. Create networks of authors, peer reviewers, and society members
  37. 37. Integrating Identity into Publisher Systems
  38. 38. Editorial Workflow Integration Author Submission Module The author fills in the data to the document template, attaching images and graphs as necessary. An API calls Data Harmony and generates a list of indexing terms based on the content.
  39. 39. Editorial Workflow Integration Author Submission Module Authors review the indexing and may change it. Content is stored into a data repository as HTML, XML, etc.
  40. 40. Expert Reviewer Identification <ul><li>American Chemical Society had separate reviewer lists for all journals </li></ul><ul><li>Difficult to find the expertise to review articles </li></ul><ul><li>Paper-based </li></ul><ul><li>Can use the taxonomy, reviewers, and member profiles used to quickly find appropriate experts </li></ul>
  41. 41. Member Profiles <ul><li>Members can choose from a list of terms </li></ul><ul><li>Browse it </li></ul><ul><li>Use synonyms </li></ul><ul><li>Have automatic suggestions from their controlled vocabulary </li></ul>
  42. 42. Corporate Profile Tagging User pastes or uploads CV Button to auto-extract taxonomy attributes
  43. 43. Author Authority Database <ul><li>Create full author records </li></ul><ul><li>Use author linking </li></ul><ul><li>Identify authors in all name forms </li></ul><ul><li>Find co-authors </li></ul><ul><li>Find authors in a place </li></ul><ul><li>Find potential collaborators </li></ul><ul><li>Find new employees </li></ul><ul><li>Add Taxonomy terms to build a full system to …. </li></ul>
  44. 44. One Person, Many Representations VIAF : Virtual International Authority File http://viaf.org/viaf/95216565/
  45. 45. Creating an Author Authority Database <ul><li>Tag all articles in the repository with standard subjects </li></ul><ul><li>Export author names, subjects, institutions, locations, etc. </li></ul><ul><li>Disambiguate authors with the same or similar names </li></ul>
  46. 46. Many Repositories for Names
  47. 47. Enable Collaboration Links Double connection and fish eye views available from prefuse.org
  48. 48. VIAF : Virtual International Authority File http://viaf.org
  49. 49. Use Subject Headings (Taxonomy) or go back to Google
  50. 50. <ul><li>Deep tag data </li></ul><ul><li>for direct access </li></ul>
  51. 51. Inline Tagging - HTML View Show the exact point where the concept is mentioned. Mouse-over to view the term record Statistical summary , showing the number of times each term is mentioned in the article
  52. 52. XML View for Inline Tagging
  53. 53. <ul><li>Find more like these… based on </li></ul><ul><li>taxonomy terms </li></ul>
  54. 54. More like these
  55. 55. More like these using taxonomy Recommendations: People who liked this also liked
  56. 56. <ul><li>Forecast directions from your data… </li></ul><ul><li>A higher level view </li></ul>
  57. 57. Term Analytics (not text) <ul><ul><ul><li>Where are your publication strengths? </li></ul></ul></ul><ul><ul><ul><li>What are the emerging topics? </li></ul></ul></ul><ul><ul><ul><li>Use your own data to address these question </li></ul></ul></ul>
  58. 58. 1. Select related corpus Term Analytics 2. Identify related terms 3. Resulting term set 4. Term:Term Matrix IEEE 2k terms 1.2M documents 14k DTIC 475k patents 24k MeSH PubMed 525k docs IEEE 2k terms 1.2M documents IEEE 2k terms 1.2M documents
  59. 59. Simplified Top Three Levels
  60. 60. ALL terms (9 levels)
  61. 61. Red lines show T-BT links
  62. 62. Gray lines show Term – Related Term Links
  63. 63. Use Case 10: Data Visualization Matrix Visualization Software
  64. 64. About Access Innovations The Access Innovations team are experts in content creation, enrichment and conversion services. We provide services to semantically enrich and tag and raw text into highly structured data. We deliver clean, well formed, metadata enriched data so our clients can reuse repurpose, store, and find their knowledge assets. We go beyond the standards to build taxonomies and other data control structures as a solid foundation for data. Our services and software allow organizations to use and present their information to both internal and external constituents by leveraging search, presentation, e-commerce . We change search to found! <ul><li>Founded in 1978 </li></ul><ul><li>Headquartered in Albuquerque </li></ul><ul><li>Privately held </li></ul><ul><li>Delivered more than 2000 engagements </li></ul>
  65. 65. What we covered <ul><li>You can take control - Remove the silos </li></ul><ul><li>Leverage Web-hosted content </li></ul><ul><li>Getting users to direct connect gives competitive value. </li></ul><ul><li>Join the revolution in users search </li></ul><ul><li>Make Google indexes a commercial advantage </li></ul><ul><li>Take users to the article and author </li></ul><ul><li>Become active participants - create instant connections </li></ul><ul><li>Create new paths, enable users to find additional value from your collection </li></ul><ul><li>Manage the language - use a taxonomy! </li></ul>
  66. 66. Thank you for your attention! Slides will be available on Access Web site tomorrow Or send me an email Marjorie M. K. Hlava Access Innovations / Data Harmony [email_address] +505.998.0800 Questions?

Hinweis der Redaktion

  • The problem is that users don’t find what they need – and more importantly, they don’t find what the organization wants them to discover! This is because the knowledge surrounding the key revenue driving assets is often siloed, disorganized and disjointed. [Click] Content – journal articles, conference proceedings, and newsletters might be hosted in multiple archives; [Click] People – society members, authors, editors, reporters, bloggers – might be found in a database or member-only portal; and [Click] Programs – conferences, workshops, products, and services – might be somewhere else! When someone searching for an article doesn’t learn about an upcoming conference on that same subject, or fails to connect with fellow members or other experts, then the organization fails to reach its potential. [CLICK]
  • So how do we solve this problem? In an XML-driven world, we do it by enriching the content: [Click] By creating well-formed metadata around people, places, and things, By tagging subjects, using a taxonomy built for the vertical market or discipline; and [Click] By combining human understanding with automated processes to insure consistency and efficiency. This provides a number of benefits: [Click] - Improve the search experience on your site: better precision and recall; and user-friendly features [Click] - Serve the needs of different markets, with different views of the content [Click] - Unlock the value of “long-tail” content: Increase discovery, repurpose and cross-market content [Click] - Increase web traffic with improved Search Engine Optimization [Click] - Enable visualization and interactive functionality – a gateway to further discovery [Click] - Provide easy-to-use current awareness tools that alert subscribers to new content in subjects of interest. I am going to show you how a couple of organizations have created better user experiences, increased revenue, and reduced costs through Access Innovations data enrichment and the Data Harmony software.
  • What we’re really talking about here are the systems by which we organize the knowledge assets of our organizations. At it’s simplest level, you might have an uncontrolled list – such as the keywords that authors pick with they submit their manuscripts – there is no attempt to seek agreement on the meanings of terms [click] A step up in organizing and increasing finadability is to standardize names, like institutional names, place names, or chemical formulas [click] Synonym sets help expand a search. These are often used in search engines [click] Controlled vocabularies mandate the use of predefined, authorized terms, turning natural language into terms computers can easily understand [click] A taxonomy introduces the hierarchical relationships between terms, as in Parent/Child relationships: for example, Microbiology would be a child or a narrower term under the broader term, “biology” [click] A Thesaurus adds the associative relationships, like synonyms and related terms, as well as extras like scope notes, editorial notes, codes for different classification systems [click] An ontology can include further descriptors showing different kinds of relationship, like “a catalyzes b” [click] A semantic network – sometimes called “Linked Data” enables the expression of ad hoc and dynamic relationships, like friend of a friend The simplest of systems are unrelated and often ambiguous entities; and those at the top of the list are more meaningful, and consequently the most useful. If you think of this from the perspective of costs – your organization and users pay a premium price in more time spent searching and organizing the content every time they do a project I am going to show you a number of ways that publishers and scholarly organizations have used these strategies of semantic enrichment to create new value for their organizations through better user experiences and increased revenue, while at the same time reducing costs through Access Innovations data enrichment and the Data Harmony software.
  • Our solution will transform ERIC from a static repository of information into a true community of interconnected stakeholders Dramatically improves processing time and quality through the use of a proven, field-tested product in Data Harmony Our solution is architecture for the future to capitalize on emerging technologies which will enhance the ERIC community We have spoken with ERIC “power-users” and incorporated their feedback into our solution.
  • Our solution will transform ERIC from a static repository of information into a true community of interconnected stakeholders Dramatically improves processing time and quality through the use of a proven, field-tested product in Data Harmony Our solution is architecture for the future to capitalize on emerging technologies which will enhance the ERIC community We have spoken with ERIC “power-users” and incorporated their feedback into our solution.
  • Our solution will transform ERIC from a static repository of information into a true community of interconnected stakeholders Dramatically improves processing time and quality through the use of a proven, field-tested product in Data Harmony Our solution is architecture for the future to capitalize on emerging technologies which will enhance the ERIC community We have spoken with ERIC “power-users” and incorporated their feedback into our solution.
  • Let’s watch these numbers shift as social applications increasingly facilitate asynchronous, many-to-many communications between peers.
  • In February, the IEEE released their new Xplore platform, providing access to more than 2 million articles from more than 12,000 publications. In planning for this upgrade, IEEE client services managers traveled around the world to solicit feedback on the features most desired by academic, corporate, and government customers. Among the features that were highly requested by users was subject browsing. [CLICK] In preparation for this new feature, Access Innovations updated and restructured the thesaurus to reflect the many changes in the engineering fields over the past few years. [CLICK] The Data Harmony software tools, powered by Machine-Aided Indexing or MAI, increased productivity in the indexing staff, enabling more comprehensive coverage at reduced cost. [CLICK] The result has been a critically acclaimed upgrade to one of the world’s richest technology information resources.
  • NICEM, the National Information Center for Educational Media, is a database of over 640,000 audio and visual items in all subject areas that apply to learning, from preschool through professional. Access Innovations applies subject terms from the NICEM taxonomy to each bibliographic record. In addition to that backend data work, we also created a search and presentation layer for the website, which we call Search Harmony. Here are some of the user-friendly features that are included in Search Harmony: [Click] Users can navigate the site by browsing the full taxonomy, and see the number of records tagged with each subject [CLICK] Auto-completion of search terms, which is a common feature of many search engines, but in this case the user is assisted in formulating a search by seeing a pick list of terms from the taxonomy, including all synonyms – even if they appear in the middle of a phrase. [CLICK] The user is also guided to expand the topic with broader terms or related terms from the taxonomy, or narrow the search to find more precise information. [CLICK] The resulting site has been recognized as an indispensable resource for educators.
  • Expands to 2 nd and 3 rd levels of taxonomy, includes Related Terms
  • Thanks to Helen Atkins of AACR for this illustration. The real power of this is that the links can all go in all directions, so we take advantage of having the user’s attention regardless of how they step into our “web”
  • Linking Data Elements to External Resources
  • But as various efforts are made to create profiles, the big question often comes down to “Is this the same person?” from one repository to another.
  • American Chemical Society had separate reviewer lists for all journals. Difficult to find the expertise to review articles Paper based Now use the taxonomy, reviewers and member profiles used to quickly find appropriate experts
  • But now let’s take the challenge a step further: These repositories of people data are not walled gardens. The real power of the web is in connecting to all the instances where a person might be named. Each of these different repositories serves a specific market and supports specific applications. How do we create some consistency as we move across these interconnected data stores?
  • 16 browse terms in RED, first two levels of thesaurus terms in GREEN, lower levels in YELLOW