SlideShare a Scribd company logo
1 of 17
Building a Public Research Center for the HathiTrust Digital Library @hathitresearch | @hathitrust http://www.hathitrust-research.org Robert H. McDonald Associate Dean for Library Technologies and Digital Libraries Associate Director-Data to Insight Center, Pervasive Technology Institute Indiana University June 14, 2011 JCDL 2011: Big Data! Big Deal? Panel
HathiTrust Research Center (HTRC) Team Indiana University Beth Plale – Director Robert McDonald – Executive Committee University of Illinois Scott Poole – Co-Director John Unsworth – Executive Committee
HathiTrust Digital Library History To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge. Launched in October 2008 University of Michigan Indiana University Used Google Books Repository at Michigan as Model Expanded to include content from  CIC Member Libraries UC System Libraries University of Virginia Now includes more than 50 partner institutions and more than 8 million volumes
Towards a HathiTrust Research Center Started in response to proposed Google Settlement  - June 2009 ,[object Object]
Worked to identify key stakeholders from HT institutions to collaborate and write RFP
Google Settlement in early 2011 did not stop the centerDeveloped specific RFP for HathiTrust to solicit proposals – Summer/Fall 2009 HTRC RFP Working Group RFP Released – Winter 2010
Our Collaboration HTRC is founded as a joint venture between Indiana University and the University of Illinois Urbana-Champaign, aimed at solving the difficult challenges of increasing computational access to the public domain and copyrighted material in HathiTrust.
Our Mission Phase I : starting Apr 2011 and going for 18 mos. Phase II : starting Fall 2012 and going for … Goal: enable strong computational research and education on a collection that has not been amenable to computational exploration EVER before!
Our Goals Maintain repository of text mining algorithms and retrieval tools available on-line for human and programmatic discovery.  Also register derived data sets, indexes, and versions in registry repository.   Be a user-driven resource, with an active advisory board, and a community model that allows users to share algorithms and tools.   Support interoperability across collections and institutions, through use of inCommon SAML identity.
Our Future Support innovation in cyberinfrastructure to deliver optimal access and use of the HathiTrust corpus. Implement “Non-consumptive” research: a technical and intellectual challenge Identify and host existing data analysis, text mining and retrieval toolsthat are of interest to the community.   Stimulate development of new analytical methods and tools. We hope that the scale of the HTRC will promote new levels of collaboration in tool development.
HathiTrust Research Center Today HTRC is dedicated to the provision of access to a comprehensive body of published works for scholarship and education for computational research purposes. Lightweight Organization Executive Committee Beth Plale, Indiana Scott Poole, Illinois Robert H. McDonald, Indiana John Unsworth, Illinois Advisory Board TBD HathiTrust Executive Committee Liaison Laine Farley, California Digital Library
HathiTrust Research Center Today	 $250K in funding for initial 18 month startup Creating Themed Collections for early Use Cases Astronomy – Victorian Literature - Influenza Ingest and Replication Mechanisms Between HT and HTRC Full-text SOLR indexes Data Capsule integration Karma integration Integration with SEASR/MEANDRE SOA services at NCSA Alignment with Bamboo Technology Project Alignment with international Google Books Research Centers Establishing long-term non-consumptive research methodologies
HTRC Proposed Technical Architecture Courtesy IU Data to Insight Center – Beth Plale/Yiming Sun
Courtesy IU Data to Insight Center – Felix Terkhorn/Yiming Sun Current SEASR Integration Demo 1.  User enters Author name or Volume title 2.  Query RIS for Author Name or Volume Title Sample Collection Bibliography Database JS/PHP Auto-completer Book Search Interface by Author or Title 3.  Volume ID 7. Tag Cloud returned to user 4. Invoke Tag Cloud service with URL Converted from MARC to RIS 5. Use URL to Retrieve Volume Public-domain OCR Web Access Servlet A persistent RESTful Web Service Tag Cloud Viewer Data Flow 6. OCR for volume Sample Public Domain Collection Meandre Workbench Organized as pairtree for demo only SEASR Infrastructure Administrator creates tag cloud viewer in advance through SEASR
Non-Consumptive Research Track No action or set of actions on the part of HathiTrust Research Center users, either acting alone or in cooperation with other users over the duration of one or multiple sessions can result in sufficient information gathered from the HathiTrust collection to reassemble pages from the collection.  Beth Plale (Indiana University) Atul Prakash (University of Michigan) Geoffrey Fox (Indiana University) Robert H. McDonald (Indiana University)
HTRC Managed Data-Intensive Compute Resources HathiTrust Digital Library Content ,[object Object]
 Access to HT copyrighted indices

More Related Content

What's hot

Open Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKOpen Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKEDINA, University of Edinburgh
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecycleAnita de Waard
 
Sue cook c3 dis dm-ps 1.pptx
Sue cook c3 dis dm-ps 1.pptxSue cook c3 dis dm-ps 1.pptx
Sue cook c3 dis dm-ps 1.pptxARDC
 
Natasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxNatasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxARDC
 
John morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptxJohn morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptxARDC
 
Guidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansGuidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansICPSR
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamPlatforma Otwartej Nauki
 
Practical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationPractical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationSEAD
 
2013 ICPSR Data Services
2013 ICPSR Data Services2013 ICPSR Data Services
2013 ICPSR Data ServicesICPSR
 
Mendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperMendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperWilliam Gunn
 
Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)Philipp Zumstein
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
 
Poster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: Data information literacy multiple paths to a single goalPoster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: Data information literacy multiple paths to a single goalASIS&T
 
Why does research data matter to libraries
Why does research data matter to librariesWhy does research data matter to libraries
Why does research data matter to librariesJisc RDM
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...ICPSR
 
Poster: Very Open Data Project
Poster: Very Open Data ProjectPoster: Very Open Data Project
Poster: Very Open Data ProjectEdward Blurock
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 

What's hot (20)

Open Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKOpen Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UK
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Sue cook c3 dis dm-ps 1.pptx
Sue cook c3 dis dm-ps 1.pptxSue cook c3 dis dm-ps 1.pptx
Sue cook c3 dis dm-ps 1.pptx
 
BioSharing - Update - Feb2016
BioSharing - Update - Feb2016BioSharing - Update - Feb2016
BioSharing - Update - Feb2016
 
Natasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxNatasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptx
 
John morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptxJohn morrissey c3 dis fair working data.pptx
John morrissey c3 dis fair working data.pptx
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
Guidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansGuidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access Plans
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, Potsdam
 
Practical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationPractical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object Preservation
 
2013 ICPSR Data Services
2013 ICPSR Data Services2013 ICPSR Data Services
2013 ICPSR Data Services
 
Mendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperMendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 Paper
 
Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
Poster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: Data information literacy multiple paths to a single goalPoster RDAP13: Data information literacy multiple paths to a single goal
Poster RDAP13: Data information literacy multiple paths to a single goal
 
Why does research data matter to libraries
Why does research data matter to librariesWhy does research data matter to libraries
Why does research data matter to libraries
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
 
Poster: Very Open Data Project
Poster: Very Open Data ProjectPoster: Very Open Data Project
Poster: Very Open Data Project
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 

Viewers also liked

The strategic significance of the hardiman research building 26jan14
The strategic significance of the hardiman research building 26jan14The strategic significance of the hardiman research building 26jan14
The strategic significance of the hardiman research building 26jan14jjcox
 
Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...
Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...
Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...Cyril MAUGER
 
Research design, laboratory experiment
Research design, laboratory experimentResearch design, laboratory experiment
Research design, laboratory experimentleannacatherina
 
Lime 5 lenovo case study-3 minutes
Lime 5 lenovo case study-3 minutesLime 5 lenovo case study-3 minutes
Lime 5 lenovo case study-3 minutesArun Khedwal
 
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...ariadnenetwork
 
ARCHITECTURAL STANDARDS
ARCHITECTURAL STANDARDSARCHITECTURAL STANDARDS
ARCHITECTURAL STANDARDSstuti31
 

Viewers also liked (7)

The strategic significance of the hardiman research building 26jan14
The strategic significance of the hardiman research building 26jan14The strategic significance of the hardiman research building 26jan14
The strategic significance of the hardiman research building 26jan14
 
Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...
Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...
Transdisciplinary Research - Buildings as Service-Oriented Product-Service Sy...
 
Research design, laboratory experiment
Research design, laboratory experimentResearch design, laboratory experiment
Research design, laboratory experiment
 
Lime 5 lenovo case study-3 minutes
Lime 5 lenovo case study-3 minutesLime 5 lenovo case study-3 minutes
Lime 5 lenovo case study-3 minutes
 
Architectural details
Architectural detailsArchitectural details
Architectural details
 
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...
 
ARCHITECTURAL STANDARDS
ARCHITECTURAL STANDARDSARCHITECTURAL STANDARDS
ARCHITECTURAL STANDARDS
 

Similar to Building a Public Research Center for the HathiTrust Digital Library

JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesRobert H. McDonald
 
The Repository Roadmap - are we heading in the right direction?
The Repository Roadmap - are we heading in the right direction?The Repository Roadmap - are we heading in the right direction?
The Repository Roadmap - are we heading in the right direction?Eduserv Foundation
 
Andy Powell Presentation
Andy Powell PresentationAndy Powell Presentation
Andy Powell PresentationDonggi heo
 
Di d dlf_handout
Di d dlf_handoutDi d dlf_handout
Di d dlf_handoutcwilliford
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonAfrican Open Science Platform
 
Virtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipVirtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipJohn Butler
 
Virtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipVirtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipJohn Butler
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesRobert H. McDonald
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott LibraryRebekah Cummings
 
Leveraging the power of the web - Open Repositories 2015
Leveraging the power of the web - Open Repositories 2015Leveraging the power of the web - Open Repositories 2015
Leveraging the power of the web - Open Repositories 2015Kaitlin Thaney
 
Curating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research LibrariesCurating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research LibrariesKeith Webster
 
How to open repositories
How to open repositoriesHow to open repositories
How to open repositoriesIryna Kuchma
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional RepositoriesSridhar Gutam
 
Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries? Robin Rice
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
 

Similar to Building a Public Research Center for the HathiTrust Digital Library (20)

JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
The Repository Roadmap - are we heading in the right direction?
The Repository Roadmap - are we heading in the right direction?The Repository Roadmap - are we heading in the right direction?
The Repository Roadmap - are we heading in the right direction?
 
Andy Powell Presentation
Andy Powell PresentationAndy Powell Presentation
Andy Powell Presentation
 
Di d dlf_handout
Di d dlf_handoutDi d dlf_handout
Di d dlf_handout
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Hagedorn, "Seamless Sharing: NYU, HathiTrust, ReCAP and the Cloud Library"
Hagedorn, "Seamless Sharing: NYU, HathiTrust, ReCAP and the Cloud Library"Hagedorn, "Seamless Sharing: NYU, HathiTrust, ReCAP and the Cloud Library"
Hagedorn, "Seamless Sharing: NYU, HathiTrust, ReCAP and the Cloud Library"
 
Virtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipVirtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing Scholarship
 
Virtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipVirtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing Scholarship
 
Aggregation as tactic sm new
Aggregation as tactic sm newAggregation as tactic sm new
Aggregation as tactic sm new
 
Aggregation as Tactic
Aggregation as TacticAggregation as Tactic
Aggregation as Tactic
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational Services
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott Library
 
Leveraging the power of the web - Open Repositories 2015
Leveraging the power of the web - Open Repositories 2015Leveraging the power of the web - Open Repositories 2015
Leveraging the power of the web - Open Repositories 2015
 
Curating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research LibrariesCurating the Scholarly Record: Data Management and Research Libraries
Curating the Scholarly Record: Data Management and Research Libraries
 
How to open repositories
How to open repositoriesHow to open repositories
How to open repositories
 
Sharing Big Data - Bob Jones
Sharing Big Data - Bob JonesSharing Big Data - Bob Jones
Sharing Big Data - Bob Jones
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional Repositories
 
Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?Research data support: a growth area for academic libraries?
Research data support: a growth area for academic libraries?
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 

More from Robert H. McDonald

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelRobert H. McDonald
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...Robert H. McDonald
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Robert H. McDonald
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15Robert H. McDonald
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterRobert H. McDonald
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesRobert H. McDonald
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14Robert H. McDonald
 
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkRobert H. McDonald
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsRobert H. McDonald
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesRobert H. McDonald
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudRobert H. McDonald
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...Robert H. McDonald
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Robert H. McDonald
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...Robert H. McDonald
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionRobert H. McDonald
 

More from Robert H. McDonald (20)

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote Slides
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
 
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for Libraries
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
 
SCONUL Kuali OLE Briefing
SCONUL Kuali OLE BriefingSCONUL Kuali OLE Briefing
SCONUL Kuali OLE Briefing
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
 
Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast Version
 
HTRC Architecture Overview
HTRC Architecture OverviewHTRC Architecture Overview
HTRC Architecture Overview
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Building a Public Research Center for the HathiTrust Digital Library

  • 1. Building a Public Research Center for the HathiTrust Digital Library @hathitresearch | @hathitrust http://www.hathitrust-research.org Robert H. McDonald Associate Dean for Library Technologies and Digital Libraries Associate Director-Data to Insight Center, Pervasive Technology Institute Indiana University June 14, 2011 JCDL 2011: Big Data! Big Deal? Panel
  • 2. HathiTrust Research Center (HTRC) Team Indiana University Beth Plale – Director Robert McDonald – Executive Committee University of Illinois Scott Poole – Co-Director John Unsworth – Executive Committee
  • 3. HathiTrust Digital Library History To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge. Launched in October 2008 University of Michigan Indiana University Used Google Books Repository at Michigan as Model Expanded to include content from CIC Member Libraries UC System Libraries University of Virginia Now includes more than 50 partner institutions and more than 8 million volumes
  • 4.
  • 5. Worked to identify key stakeholders from HT institutions to collaborate and write RFP
  • 6. Google Settlement in early 2011 did not stop the centerDeveloped specific RFP for HathiTrust to solicit proposals – Summer/Fall 2009 HTRC RFP Working Group RFP Released – Winter 2010
  • 7. Our Collaboration HTRC is founded as a joint venture between Indiana University and the University of Illinois Urbana-Champaign, aimed at solving the difficult challenges of increasing computational access to the public domain and copyrighted material in HathiTrust.
  • 8. Our Mission Phase I : starting Apr 2011 and going for 18 mos. Phase II : starting Fall 2012 and going for … Goal: enable strong computational research and education on a collection that has not been amenable to computational exploration EVER before!
  • 9. Our Goals Maintain repository of text mining algorithms and retrieval tools available on-line for human and programmatic discovery. Also register derived data sets, indexes, and versions in registry repository. Be a user-driven resource, with an active advisory board, and a community model that allows users to share algorithms and tools. Support interoperability across collections and institutions, through use of inCommon SAML identity.
  • 10. Our Future Support innovation in cyberinfrastructure to deliver optimal access and use of the HathiTrust corpus. Implement “Non-consumptive” research: a technical and intellectual challenge Identify and host existing data analysis, text mining and retrieval toolsthat are of interest to the community.   Stimulate development of new analytical methods and tools. We hope that the scale of the HTRC will promote new levels of collaboration in tool development.
  • 11. HathiTrust Research Center Today HTRC is dedicated to the provision of access to a comprehensive body of published works for scholarship and education for computational research purposes. Lightweight Organization Executive Committee Beth Plale, Indiana Scott Poole, Illinois Robert H. McDonald, Indiana John Unsworth, Illinois Advisory Board TBD HathiTrust Executive Committee Liaison Laine Farley, California Digital Library
  • 12. HathiTrust Research Center Today $250K in funding for initial 18 month startup Creating Themed Collections for early Use Cases Astronomy – Victorian Literature - Influenza Ingest and Replication Mechanisms Between HT and HTRC Full-text SOLR indexes Data Capsule integration Karma integration Integration with SEASR/MEANDRE SOA services at NCSA Alignment with Bamboo Technology Project Alignment with international Google Books Research Centers Establishing long-term non-consumptive research methodologies
  • 13. HTRC Proposed Technical Architecture Courtesy IU Data to Insight Center – Beth Plale/Yiming Sun
  • 14. Courtesy IU Data to Insight Center – Felix Terkhorn/Yiming Sun Current SEASR Integration Demo 1. User enters Author name or Volume title 2. Query RIS for Author Name or Volume Title Sample Collection Bibliography Database JS/PHP Auto-completer Book Search Interface by Author or Title 3. Volume ID 7. Tag Cloud returned to user 4. Invoke Tag Cloud service with URL Converted from MARC to RIS 5. Use URL to Retrieve Volume Public-domain OCR Web Access Servlet A persistent RESTful Web Service Tag Cloud Viewer Data Flow 6. OCR for volume Sample Public Domain Collection Meandre Workbench Organized as pairtree for demo only SEASR Infrastructure Administrator creates tag cloud viewer in advance through SEASR
  • 15. Non-Consumptive Research Track No action or set of actions on the part of HathiTrust Research Center users, either acting alone or in cooperation with other users over the duration of one or multiple sessions can result in sufficient information gathered from the HathiTrust collection to reassemble pages from the collection. Beth Plale (Indiana University) Atul Prakash (University of Michigan) Geoffrey Fox (Indiana University) Robert H. McDonald (Indiana University)
  • 16.
  • 17. Access to HT copyrighted indices
  • 18.
  • 19. HathiTrust Research Center Events HTRC Kickoff Event at Digital Humanities Conference 2011 Stanford University - June 20, 2011 Working on models for collaborative research AHRC/ESRC/IMLS/JISC/NEH/NSF/NOW/SSHRC Digging into Data Round 2 http://www.diggingintodata.org/ Working on early advanced user case studies for the HathiTrust Corpus
  • 20. Support and Acknowledgements IU UITS Research Technologies National Center for Supercomputing Applications IU Data to Insight Center iCHASS Illinois Informatics Institute Lilly Endowment, Inc. The Alfred P. Sloan Foundation
  • 21. For More on HathiTrust Research Center See – http://www.hathitrust-research.org Follow us @hathitresearch on twitter Robert H. McDonald @mcdonald on twitter robert@indiana.edu

Editor's Notes

  1. State Core Team NamesTalk about Partnership between IU and UIUC
  2. Basic History of HathiTrust Digital Library – Digital Public Library of America - LAC