SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
HATHI TRUST
         A Shared Digital Repository




    Delivering Data For 
New Generations of Research
New Generations of Research
   Strategies and Challenges
   Strategies and Challenges
               Jeremy York
            NISO/BISG Forum
            NISO/BISG Forum
                ALA 2010
Introduction
• Digital Repository
  Digital Repository
  – Initial focus on digitized book and journal content
  – “Light” archive
     Light archive
• Collections and Collaboration
  –   Comprehensive collection
      C       h i       ll ti
  –   Shared strategies
  –   Local services
      Local services
  –   Public Good
Content Distribution
Content Distribution


  19%



                                  In Copyright
                                  Public Domain
           81%



                 6,173,575 – Total
                 1,177,667 – Public Domain     
                    * As of June 15, 2010
Language Distribution (1)
               Language Distribution (1)
                                             The top 10 languages make up ~86% 
                                                     p       g g        p    %
                                             of all content 
              Polish
 Arabic         1%              Remaining 
   2%   Italian
                                Languages
          3%
Japanese                           14%
   4%                                                           English
                                                                 48%
Chinese
 h
  4%

 Spanish               French
   4%                    7%       German
                                    8%
     Russian
       5%

                                                        * As of June 15, 2010
Language Distribution (2)
             Language Distribution (2)
                                                                    The next 40 
       Serbian     Romanian Ancient‐Greek Slovenian Multiple
                                        Yiddish          p          languages make up 
         2%
          %          1%
                      %                   1%1%%              Portuguese
                                                                    ~13% of total
                           Panjabi 1%
                           Malayalam                 1%
                Bulgarian                                           6%
                             1%1% Slovak     Finnish
    Vietnamese 2%                     1%             Hindi
                             Greek Catalan
                          Armenian Malay       1%
         2% Ukrainian                1%               6%
                              1% 1%%
                             1%      1                             Hebrew
Hungarian        2%                                                  6%
   2%        Sanskrit                                                       Indonesian
               2%                                                               6%
          Norwegian                                                          Dutch
                                                                             D t h
              2%                                                              5%
      Bengali
        2% Korean                                                        Latin
              2%
                                                                          5%
  Persian                                                                          Urdu
    3%        Undetermined                                                          4%
                   3%                                                    Swedish
            Tamil              Danish Thai Czech              Turkish      4%
             3%     Croatian    3% 3%            Unknown
                                            3%                  4%
                      3%                           4%
                                                             * As of June 15, 2010
Originating Institution
      Originating Institution
                                       Penn State 
Uni ersit of Indiana  University of
University of         University of 
                                       University
 Wisconsin University Minnesota
               3%         1%              0%
    6%



     University of 
      California
         25%
                                              University of 
                                               Michigan
                                                  65%




                                           * As of June 15, 2010
Content over time
                              Content over time
100%

 80%

 60%                                                                                                         Minnesota
                                                                                                             Penn State
 40%
                                                                                                             California
 20%                                                                                                         Indiana
  0%                                                                                                         Wisconsin
                                                                                                             Michigan
       Sep‐04
            4
                Nov‐04
                         Jan‐05

                                  Mar‐05

                                           May‐05

                                                    Jul‐05

                                                             Sep‐05

                                                                      Nov‐05

                                                                                an‐06

                                                                                         ar‐06

                                                                                                   y‐06
                                                                               Ja



                                                                                                 May
                                                                                        Ma
                                                                      N




                                                                                            * As of June 15, 2010
Content Growth
Content Growth
Data Distribution & APIs
         Data Distribution & APIs
•   OAI PMH
    OAI‐PMH
•   Metadata files
•   Bibliographic API
     ibli     hi
•   Data API
Extended Services
            Extended Services
•   Community Development Environment
    Community Development Environment
•   Non‐Google Ingest
•   Non‐Book/Non‐Journal Ingest
           k/           l
•   Computational Research
Strategies for Computational Research
Strategies for Computational Research
• Data distribution
  Data distribution
• Protocol‐based access
• Research Center
          hC
SEASR Architecture
                                                      Visualizations



                                                     User Interfaces
                                                                           Web 
                                      Apps           Plugins                          Services
                                                                           Apps

Meandre Workbench

                            r Tools                 Meandre Data‐Intensive Flows

                                                                                                  Repositories
                                                         Components
                    Developer

                                                                                                     Data
                                             Data              Analytics          Visualization
                                                                                                    Analysis
                                         Component Repository              Component Discovery    Components
                                                                                                     Flows
                                                        Meandre Infrastructure

                                                Virtualization Infrastructure




                                               Cloud Computing
SEASR @ Work – Tag Cloud
• Count tokens
• Filter options
  supported
• St
  Stem words d
SEASR @ Work – Entity Mash-up
• E tit E t ti with
  Entity Extraction ith
  OpenNLP or
  Stanford NER
• Locations viewed on
  Google Map
• D
  Dates viewed on
          i    d
  Simile Timeline
SEASR @ Work – Entities To
             Network
• Identify entities
• Define relationships between entities within
  same sentence
SEASR @ Work – Text Clustering

• Clustering of Text by token counts
• Filtering options for stop words Part of Speech
                             words,
• Dendogram Visualization
SEASR @ Work – Audio Analysis
•   NEMA: Executes a SEASR
    flow for each run
     – Loads audio data
     – Extracts features for
       every 10 sec moving
       window of audio
        i d      f di
     – Loads and applies the
       models
     – Sends results back to
       the WebUI
•   NESTER: Annotation of
    Audio via Spectral Analysis
SEASR @ Work – Zotero
• Plugin to Firefox
• Zotero manages the
  collection
• Launch SEASR Analytics
  – Citation Analysis uses the
    JUNG network importance
    algorithms to rank the authors
    in the citation network that is
    exported as RDF data from
    Zotero to SEASR
  – Zotero Export to Fedora
    through SEASR
  – Saves results from SEASR
    Analytics to a Collection
• Launch MONK
  Processing
  – MONK DB Ingestion Workflow
SEASR @ Work – Emotion
              Tracking
Goal is to have this type of Visualization to track emotions across 
  a text document (Leveraging flare.prefuse.org)
Sentiment Analysis: Visualization
Person Extraction:
Scott's Waverley, Ivanhoe, and The Heart of Midlothian. 
Location Extraction:
Top: Walter Scott's Waverley   Bottom: Maria Edgeworth's Castle Rackrent
Thank you!
hathitrust‐info@umich.edu
    jjyork@umich.edu

Weitere ähnliche Inhalte

Mehr von bisg

Digital Content in Public Libraries: What do Patrons Think?
Digital Content in Public Libraries: What do Patrons Think? Digital Content in Public Libraries: What do Patrons Think?
Digital Content in Public Libraries: What do Patrons Think? bisg
 
What Your Metadata Does When You're Not Looking with Joshua Tallent
What Your Metadata Does When You're Not Looking with Joshua TallentWhat Your Metadata Does When You're Not Looking with Joshua Tallent
What Your Metadata Does When You're Not Looking with Joshua Tallentbisg
 
Student Attitudes Toward content in Higher Education: Nadine Vassallo, Projec...
Student Attitudes Toward content in Higher Education: Nadine Vassallo, Projec...Student Attitudes Toward content in Higher Education: Nadine Vassallo, Projec...
Student Attitudes Toward content in Higher Education: Nadine Vassallo, Projec...bisg
 
The Inclusive Access Model, presented by Jason Lorgan, Stores Director, Unive...
The Inclusive Access Model, presented by Jason Lorgan, Stores Director, Unive...The Inclusive Access Model, presented by Jason Lorgan, Stores Director, Unive...
The Inclusive Access Model, presented by Jason Lorgan, Stores Director, Unive...bisg
 
BISAC Subject Codes, 2014 Edition
BISAC Subject Codes, 2014 EditionBISAC Subject Codes, 2014 Edition
BISAC Subject Codes, 2014 Editionbisg
 
Navigating the Transition from ONIX 2.1 to 3.0
Navigating the Transition from ONIX 2.1 to 3.0 Navigating the Transition from ONIX 2.1 to 3.0
Navigating the Transition from ONIX 2.1 to 3.0 bisg
 
ONIX: Migrating from 2.1 to 3.0, presented by Graham Bell, Executive Director...
ONIX: Migrating from 2.1 to 3.0, presented by Graham Bell, Executive Director...ONIX: Migrating from 2.1 to 3.0, presented by Graham Bell, Executive Director...
ONIX: Migrating from 2.1 to 3.0, presented by Graham Bell, Executive Director...bisg
 
Product Development for Common Core Standards, presented by Emma Williams, Co...
Product Development for Common Core Standards, presented by Emma Williams, Co...Product Development for Common Core Standards, presented by Emma Williams, Co...
Product Development for Common Core Standards, presented by Emma Williams, Co...bisg
 
XBITS 101, a presentation for BISG by Diane Degener, IT Business Analyst & Pr...
XBITS 101, a presentation for BISG by Diane Degener, IT Business Analyst & Pr...XBITS 101, a presentation for BISG by Diane Degener, IT Business Analyst & Pr...
XBITS 101, a presentation for BISG by Diane Degener, IT Business Analyst & Pr...bisg
 
Thema: The new, global subject classification system- Julie Morris- BISG/NISO...
Thema: The new, global subject classification system- Julie Morris- BISG/NISO...Thema: The new, global subject classification system- Julie Morris- BISG/NISO...
Thema: The new, global subject classification system- Julie Morris- BISG/NISO...bisg
 
Best Practices for Keywords in Metadata, with Jenny Bullough, Manager of Digi...
Best Practices for Keywords in Metadata, with Jenny Bullough, Manager of Digi...Best Practices for Keywords in Metadata, with Jenny Bullough, Manager of Digi...
Best Practices for Keywords in Metadata, with Jenny Bullough, Manager of Digi...bisg
 
BISG Rights Summit June 11, 2014 (Michael Healy, Copyright Clearance Center)
BISG Rights Summit June 11, 2014 (Michael Healy, Copyright Clearance Center)BISG Rights Summit June 11, 2014 (Michael Healy, Copyright Clearance Center)
BISG Rights Summit June 11, 2014 (Michael Healy, Copyright Clearance Center)bisg
 
BISG Rights Summit June 11, 2014 (Len Vlahos, BISG)
BISG Rights Summit June 11, 2014 (Len Vlahos, BISG)BISG Rights Summit June 11, 2014 (Len Vlahos, BISG)
BISG Rights Summit June 11, 2014 (Len Vlahos, BISG)bisg
 
Diversification, Discovery, and Data: 13 Insights from 13 Years of Safari, pr...
Diversification, Discovery, and Data: 13 Insights from 13 Years of Safari, pr...Diversification, Discovery, and Data: 13 Insights from 13 Years of Safari, pr...
Diversification, Discovery, and Data: 13 Insights from 13 Years of Safari, pr...bisg
 
Subscription Services in the Context of Market Trends, presented by Jonathan ...
Subscription Services in the Context of Market Trends, presented by Jonathan ...Subscription Services in the Context of Market Trends, presented by Jonathan ...
Subscription Services in the Context of Market Trends, presented by Jonathan ...bisg
 
Digital Books and the New Subscription Economy: Preliminary Results from the ...
Digital Books and the New Subscription Economy: Preliminary Results from the ...Digital Books and the New Subscription Economy: Preliminary Results from the ...
Digital Books and the New Subscription Economy: Preliminary Results from the ...bisg
 
The International Standard Name Identifier (ISNI): A Close Look, with Laura D...
The International Standard Name Identifier (ISNI): A Close Look, with Laura D...The International Standard Name Identifier (ISNI): A Close Look, with Laura D...
The International Standard Name Identifier (ISNI): A Close Look, with Laura D...bisg
 
Metadata: Standards Basics for the Independent Publishing Community, with Gra...
Metadata: Standards Basics for the Independent Publishing Community, with Gra...Metadata: Standards Basics for the Independent Publishing Community, with Gra...
Metadata: Standards Basics for the Independent Publishing Community, with Gra...bisg
 
ISBNs and Identifiers: Standards Basics for the Independent Publishing Commun...
ISBNs and Identifiers: Standards Basics for the Independent Publishing Commun...ISBNs and Identifiers: Standards Basics for the Independent Publishing Commun...
ISBNs and Identifiers: Standards Basics for the Independent Publishing Commun...bisg
 
Student Attitudes Toward Content in Higher Education, with Nadine Vassallo, P...
Student Attitudes Toward Content in Higher Education, with Nadine Vassallo, P...Student Attitudes Toward Content in Higher Education, with Nadine Vassallo, P...
Student Attitudes Toward Content in Higher Education, with Nadine Vassallo, P...bisg
 

Mehr von bisg (20)

Digital Content in Public Libraries: What do Patrons Think?
Digital Content in Public Libraries: What do Patrons Think? Digital Content in Public Libraries: What do Patrons Think?
Digital Content in Public Libraries: What do Patrons Think?
 
What Your Metadata Does When You're Not Looking with Joshua Tallent
What Your Metadata Does When You're Not Looking with Joshua TallentWhat Your Metadata Does When You're Not Looking with Joshua Tallent
What Your Metadata Does When You're Not Looking with Joshua Tallent
 
Student Attitudes Toward content in Higher Education: Nadine Vassallo, Projec...
Student Attitudes Toward content in Higher Education: Nadine Vassallo, Projec...Student Attitudes Toward content in Higher Education: Nadine Vassallo, Projec...
Student Attitudes Toward content in Higher Education: Nadine Vassallo, Projec...
 
The Inclusive Access Model, presented by Jason Lorgan, Stores Director, Unive...
The Inclusive Access Model, presented by Jason Lorgan, Stores Director, Unive...The Inclusive Access Model, presented by Jason Lorgan, Stores Director, Unive...
The Inclusive Access Model, presented by Jason Lorgan, Stores Director, Unive...
 
BISAC Subject Codes, 2014 Edition
BISAC Subject Codes, 2014 EditionBISAC Subject Codes, 2014 Edition
BISAC Subject Codes, 2014 Edition
 
Navigating the Transition from ONIX 2.1 to 3.0
Navigating the Transition from ONIX 2.1 to 3.0 Navigating the Transition from ONIX 2.1 to 3.0
Navigating the Transition from ONIX 2.1 to 3.0
 
ONIX: Migrating from 2.1 to 3.0, presented by Graham Bell, Executive Director...
ONIX: Migrating from 2.1 to 3.0, presented by Graham Bell, Executive Director...ONIX: Migrating from 2.1 to 3.0, presented by Graham Bell, Executive Director...
ONIX: Migrating from 2.1 to 3.0, presented by Graham Bell, Executive Director...
 
Product Development for Common Core Standards, presented by Emma Williams, Co...
Product Development for Common Core Standards, presented by Emma Williams, Co...Product Development for Common Core Standards, presented by Emma Williams, Co...
Product Development for Common Core Standards, presented by Emma Williams, Co...
 
XBITS 101, a presentation for BISG by Diane Degener, IT Business Analyst & Pr...
XBITS 101, a presentation for BISG by Diane Degener, IT Business Analyst & Pr...XBITS 101, a presentation for BISG by Diane Degener, IT Business Analyst & Pr...
XBITS 101, a presentation for BISG by Diane Degener, IT Business Analyst & Pr...
 
Thema: The new, global subject classification system- Julie Morris- BISG/NISO...
Thema: The new, global subject classification system- Julie Morris- BISG/NISO...Thema: The new, global subject classification system- Julie Morris- BISG/NISO...
Thema: The new, global subject classification system- Julie Morris- BISG/NISO...
 
Best Practices for Keywords in Metadata, with Jenny Bullough, Manager of Digi...
Best Practices for Keywords in Metadata, with Jenny Bullough, Manager of Digi...Best Practices for Keywords in Metadata, with Jenny Bullough, Manager of Digi...
Best Practices for Keywords in Metadata, with Jenny Bullough, Manager of Digi...
 
BISG Rights Summit June 11, 2014 (Michael Healy, Copyright Clearance Center)
BISG Rights Summit June 11, 2014 (Michael Healy, Copyright Clearance Center)BISG Rights Summit June 11, 2014 (Michael Healy, Copyright Clearance Center)
BISG Rights Summit June 11, 2014 (Michael Healy, Copyright Clearance Center)
 
BISG Rights Summit June 11, 2014 (Len Vlahos, BISG)
BISG Rights Summit June 11, 2014 (Len Vlahos, BISG)BISG Rights Summit June 11, 2014 (Len Vlahos, BISG)
BISG Rights Summit June 11, 2014 (Len Vlahos, BISG)
 
Diversification, Discovery, and Data: 13 Insights from 13 Years of Safari, pr...
Diversification, Discovery, and Data: 13 Insights from 13 Years of Safari, pr...Diversification, Discovery, and Data: 13 Insights from 13 Years of Safari, pr...
Diversification, Discovery, and Data: 13 Insights from 13 Years of Safari, pr...
 
Subscription Services in the Context of Market Trends, presented by Jonathan ...
Subscription Services in the Context of Market Trends, presented by Jonathan ...Subscription Services in the Context of Market Trends, presented by Jonathan ...
Subscription Services in the Context of Market Trends, presented by Jonathan ...
 
Digital Books and the New Subscription Economy: Preliminary Results from the ...
Digital Books and the New Subscription Economy: Preliminary Results from the ...Digital Books and the New Subscription Economy: Preliminary Results from the ...
Digital Books and the New Subscription Economy: Preliminary Results from the ...
 
The International Standard Name Identifier (ISNI): A Close Look, with Laura D...
The International Standard Name Identifier (ISNI): A Close Look, with Laura D...The International Standard Name Identifier (ISNI): A Close Look, with Laura D...
The International Standard Name Identifier (ISNI): A Close Look, with Laura D...
 
Metadata: Standards Basics for the Independent Publishing Community, with Gra...
Metadata: Standards Basics for the Independent Publishing Community, with Gra...Metadata: Standards Basics for the Independent Publishing Community, with Gra...
Metadata: Standards Basics for the Independent Publishing Community, with Gra...
 
ISBNs and Identifiers: Standards Basics for the Independent Publishing Commun...
ISBNs and Identifiers: Standards Basics for the Independent Publishing Commun...ISBNs and Identifiers: Standards Basics for the Independent Publishing Commun...
ISBNs and Identifiers: Standards Basics for the Independent Publishing Commun...
 
Student Attitudes Toward Content in Higher Education, with Nadine Vassallo, P...
Student Attitudes Toward Content in Higher Education, with Nadine Vassallo, P...Student Attitudes Toward Content in Higher Education, with Nadine Vassallo, P...
Student Attitudes Toward Content in Higher Education, with Nadine Vassallo, P...
 

Kürzlich hochgeladen

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 

Kürzlich hochgeladen (20)

Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 

ALA 2010 -- Jeremy York

  • 1. HATHI TRUST A Shared Digital Repository Delivering Data For  New Generations of Research New Generations of Research Strategies and Challenges Strategies and Challenges Jeremy York NISO/BISG Forum NISO/BISG Forum ALA 2010
  • 2. Introduction • Digital Repository Digital Repository – Initial focus on digitized book and journal content – “Light” archive Light archive • Collections and Collaboration – Comprehensive collection C h i ll ti – Shared strategies – Local services Local services – Public Good
  • 3. Content Distribution Content Distribution 19% In Copyright Public Domain 81% 6,173,575 – Total 1,177,667 – Public Domain      * As of June 15, 2010
  • 4. Language Distribution (1) Language Distribution (1) The top 10 languages make up ~86%  p g g p % of all content  Polish Arabic 1% Remaining  2% Italian Languages 3% Japanese 14% 4% English 48% Chinese h 4% Spanish French 4% 7% German 8% Russian 5% * As of June 15, 2010
  • 5. Language Distribution (2) Language Distribution (2) The next 40  Serbian Romanian Ancient‐Greek Slovenian Multiple Yiddish p languages make up  2% % 1% % 1%1%% Portuguese ~13% of total Panjabi 1% Malayalam 1% Bulgarian 6% 1%1% Slovak Finnish Vietnamese 2% 1% Hindi Greek Catalan Armenian Malay 1% 2% Ukrainian 1% 6% 1% 1%% 1% 1 Hebrew Hungarian 2% 6% 2% Sanskrit Indonesian 2% 6% Norwegian Dutch D t h 2% 5% Bengali 2% Korean Latin 2% 5% Persian Urdu 3% Undetermined 4% 3% Swedish Tamil Danish Thai Czech Turkish 4% 3% Croatian 3% 3% Unknown 3% 4% 3% 4% * As of June 15, 2010
  • 6. Originating Institution Originating Institution Penn State  Uni ersit of Indiana  University of University of  University of  University Wisconsin University Minnesota 3% 1% 0% 6% University of  California 25% University of  Michigan 65% * As of June 15, 2010
  • 7. Content over time Content over time 100% 80% 60% Minnesota Penn State 40% California 20% Indiana 0% Wisconsin Michigan Sep‐04 4 Nov‐04 Jan‐05 Mar‐05 May‐05 Jul‐05 Sep‐05 Nov‐05 an‐06 ar‐06 y‐06 Ja May Ma N * As of June 15, 2010
  • 9.
  • 10. Data Distribution & APIs Data Distribution & APIs • OAI PMH OAI‐PMH • Metadata files • Bibliographic API ibli hi • Data API
  • 11. Extended Services Extended Services • Community Development Environment Community Development Environment • Non‐Google Ingest • Non‐Book/Non‐Journal Ingest k/ l • Computational Research
  • 12. Strategies for Computational Research Strategies for Computational Research • Data distribution Data distribution • Protocol‐based access • Research Center hC
  • 13.
  • 14. SEASR Architecture Visualizations User Interfaces Web  Apps Plugins Services Apps Meandre Workbench r Tools Meandre Data‐Intensive Flows Repositories Components Developer Data Data Analytics Visualization Analysis Component Repository Component Discovery Components Flows Meandre Infrastructure Virtualization Infrastructure Cloud Computing
  • 15. SEASR @ Work – Tag Cloud • Count tokens • Filter options supported • St Stem words d
  • 16. SEASR @ Work – Entity Mash-up • E tit E t ti with Entity Extraction ith OpenNLP or Stanford NER • Locations viewed on Google Map • D Dates viewed on i d Simile Timeline
  • 17. SEASR @ Work – Entities To Network • Identify entities • Define relationships between entities within same sentence
  • 18. SEASR @ Work – Text Clustering • Clustering of Text by token counts • Filtering options for stop words Part of Speech words, • Dendogram Visualization
  • 19. SEASR @ Work – Audio Analysis • NEMA: Executes a SEASR flow for each run – Loads audio data – Extracts features for every 10 sec moving window of audio i d f di – Loads and applies the models – Sends results back to the WebUI • NESTER: Annotation of Audio via Spectral Analysis
  • 20. SEASR @ Work – Zotero • Plugin to Firefox • Zotero manages the collection • Launch SEASR Analytics – Citation Analysis uses the JUNG network importance algorithms to rank the authors in the citation network that is exported as RDF data from Zotero to SEASR – Zotero Export to Fedora through SEASR – Saves results from SEASR Analytics to a Collection • Launch MONK Processing – MONK DB Ingestion Workflow
  • 21. SEASR @ Work – Emotion Tracking Goal is to have this type of Visualization to track emotions across  a text document (Leveraging flare.prefuse.org)
  • 24. Location Extraction: Top: Walter Scott's Waverley Bottom: Maria Edgeworth's Castle Rackrent