SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Preserving the Inputs and Outputs of
            Scholarship




                Tim Babbitt
          SVP, ProQuest Platforms
Our Vision




         ProQuest will be
        central to research
         around the world
THE CHANGING CONTEXT


                       3
A Revolution in Research




 What is at stake is nothing less
 than the ways in which
 astronomy will be done in the era
 of information abundance
          Astronomer George Djorgovski
                                         4
Drivers of context change

    Growth of the internet
    Low cost, rapid digitization of print materials
    Open Source movement
    Rise of Social Software, Web 2.0 tools, mobile
    Publishing and scholarship ecosystem
      Changing policies
  Internationalization of scholarship
  Growth in primary source datasets




                                                       5
Key characteristics of the current
research landscape
   The products of research and the starting point of
    new research are increasingly digital and increasingly
    ―born-digital‖
   Exploding volumes and rising demand for data use
    by the rapid pace of digital technology innovations
   The rapid expansion of the inputs and outputs of
    scholarship




                                                             6
Linking the Scholarly lifecycle
                                                       Vitae          Grants

                                        Related
                                        Articles          Comments
        Notebooks                                         & Reviews


    Models

             Codes                                                       Presentations


    Algorithms

                                                                               Preprints




                                                                         Podcasts
       Models

                 Methods
                                                                       Video
         Plans

                                 Data              Ontologies

                  Intermediate
                     Results
                                                                                           7
Network of Ideas (citations)
Network of datasets
Examples of text as data

  Changes in word sense ( e.g. consumption( TB )
   , moot, oratio1 ) and spelling (e.g. 18th C. ſ to s ,
   *re  *er )
  Bibliometrics and other usage analyses
     Citation patterns
     Institution vs. discipline
     Author demographics
  Pharma: Drug / Symptom correlation.
  Biology: Species / date / location observations.
  Social Sci: Work/life habits of undergrads based
   on access patterns at different institutions [ usage
   data based]
  …

                                                           10
Text Mining
            Unstructured text to queryable data structures

WHY?
 TOO MUCH TEXT TO HAND ANALYZE.
 Improved discovery ( better ‗metadata‘ )
 Business Intelligence
      e.g. content stats -> content acquisitions
 Saleable datasets
   E.g. Distribution of authors vs. disciplines vs. grants
 End User research agendas
    High-End : Custom (user specified) mining as a service
    Simple : Visualization of results ( frequency / co-occurrence
     …)


                                                                     11
Datasets: Factoids & point data
   ca. 1.4M Faculty ( 50% full-time ) in US HE, ~75M people enrolled in US HE
   ca. 100k Faculty in UK HE
   44% of Researchers use online (other people‘s) datasets for their research
   48% of Researchers use datasets > 1GB
   10.8% store their data outside their institution ( 50% store it in their ―lab‖)
   1 - 5% of datasets are formally moved into the curation process.
   66%of faculty have requested other people‘s data ( and 49% of those got it).
   [ 26.5% have the expertise to analyze their own data.
   [ 80.3% do not have sufficient expertise to manage their own data
   Institutional storage costs ~ $600 / TB / year
   [ 58% is the annual increase in the amount of data being generated
   [ 20-40% is annual growth in the amount of storage deployed (est.)

   < 1% of ecological data is accessible after publication.
   > 85% of all information is in text form

   2.7 times more citations accrue to papers with accessible data
   3 to 6 times more papers emerge if the data is accessible.




                                                                                      12
Curation OF scholar data

  Tools to ingest, add & validate schemas, publish,
   migrate and preserve. ( DMP1 provision )
  Tools to analyze2
  Tools to discover datasets
     ―Summon‖ for IR datasets, gov‘t datasets …
  Tools to merge (create composite datasets) 3
  Citation management & attribution for datasets.
  Generic capabilities (domain specific later).




                                                       13
Dataset provision TO scholars

  Content procurement and dissemination.
     What we do already (intermediary)
     Needs discovery tools
  Easy to focused on selected domains that are
   publicly available.
  Most research does not use publicly available data




                                                        14
Towards reproducible research

 Reproducible
  research
    means context, quality,
     trust
    means easy access to
     the sources
 Science depends
  entirely on the
  knowledge and data
  gained in the past to
  further advance


                                15
Preserving Research Data

  Growing trend of journals and publishers linking to open-
   access data repositories
     Elsevier and PANGAEA – Publishing Network for Geoscientific
      & Environmental Data
         Reciprocal linking of articles and the data behind the research
  Journals and funding agencies setting policy to preserve
   and associate data supporting research results
     e.g. American Naturalist new policy:
         This journal requires, as a condition for publication, that data
          supporting the results in the paper should be archived in an
          appropriate public archive, such as GenBank, TreeBASE, Dryad,
          or the Knowledge Network for Biocomplexity. Data are important
          products of the scientific enterprise, and they should be preserved
          and usable for decades in the future. Authors may elect to have the
          data publicly available at time of publication, or, if the technology of
          the archive allows, may opt to embargo access to the data for a
          period up to a year after publication. Exceptions may be granted at
          the discretion of the editor, especially for sensitive information such
          as human subject data or the location of endangered species.


                                                                                     16
Digital Universe Growth
Falling Costs/Rising Investments
PROQUEST & PRESERVATION
ProQuest Microfilm
  PQ business original objectives: preservation and access
     New technology, microfilming
     1938 British Library – 120,000 first printed books in English
     1939 established Dissertations filming, printing program
     1940‘s began microfilming newspapers
     1948 began microfilming serials
     Added 700+ Research Collections for Academic market, still
      actively filming several
     2.5M Dissertations and Theses, actively filming
     Newspaper Archive contains 10,700 titles, 900 titles actively
      filming
Microfilm Commitment
   With the ongoing research and archival need for
    microfilmed content, ProQuest invested significantly to
    build a new filming operation in Ypsilanti, MI.
   Opened May, 2010
   Employing 65 staff
   Utilizing eBeam Cameras: digital images to film masters
   Scanning operation.
   Utilizing 2 archive locations: Iron Mountain and Ypsilanti
Film Archive at Iron Mountain
Film Archive at Iron Mountain
Film Archive at Iron Mountain
Camera Work
eBeam Cameras
Newspaper Microfilm Archive - Ypsilanti
Microfiche Archive - Ypsilanti
Microform and Digital Interface
  Microforms are the source materials for numerous
   historical digital products.
       Historical Newspapers
       Periodical Archive Online, Periodical Index Online
       Early English Books Online
       Parliamentary Papers
       Sanborn Maps, Geo-edition Sanborn Maps
       Gerritsen Collection of Women‘s History
       700+ Research Collections……
Digital Microfilm


                                 Adobe controls
                                  for zooming,
                                rotating, printing,
                                saving, emailing
                                  PDFs or links
              Use this area
             for further date
                selection
Image
Adjustment
Dissertations

  ProQuest ―UMI‖ Dissertation Publishing
     Over 50 years
     Official repository of dissertations and theses for the national
      libraries of Canada and the United States
  Archive
     Use of Microform
     Multi-location digital copies
     Tape
GOING FORWARD
Preservation of inputs and outputs
of scholarship
  Publication part of
   wider network of                                          Related
                                                                             Vitae          Grants


                                                             Articles           Comments

   scholarly                 Notebooks


                         Models
                                                                                & Reviews




   information:                   Codes                                                        Presentations


                         Algorithms


     Original data                                                                                  Preprints




     Shared databases      Models
                                                                                               Podcasts




     Multimedia
                                      Methods
                                                                                             Video
                              Plans


      expressions                                     Data              Ontologies

                                       Intermediate


     Social media
                                          Results




  Preservation should
   encompass all of
   this
Our concern for scholarship

  Secondary source publications are much better
   protected than inputs to research
  Research data-explosion
     Primary sources
     Datasets
     Text as data
  Focus on objects rather than linkages

  We need to continue to support the preservation of
   scholarship inputs and outputs as they evolves
Our questions for us…

  Can practices of preservation and sustainability
   become common place?
  What is the right balance of new digital technology
   and analog methods of preservation?
     Film industry—research and practice on preservation born-
      digital films
  How should we approach going beyond the current
   atomic level of preservation—the object? How should
   we deal with:
     Links
     Text as data
     mining
Towards increasing the
sustainability of research output
   Persistent identifiers—linkages of underlying output
    of scholarship
      i.e. DOI, ISBN, ISNI
   Establishing network of safe/trusted repositories for
    for all outputs of scholars
   Link/citation practices to outputs, not just official
    publications; focus on reliability
Preservation of born digital outputs

  Capability to preserve objects in digital formats—
   addressing storage capacity; accessibility; and
   frequent churn in digital formats, media, and tools
   that turn bits into humanly-recognizable artifacts—is
   a core requirement of digital scholarship.
     Leverage Microfilm as superior vehicle for ―born digital‖
      preservation
  Driver for movement from print to digital in library
   collections. See for example, 2009 Ithaka paper,
   ―What to Withdraw: Print Collections Management in
   the Wake of Digitization‖
Preservation as a practice

  We have a history in the preservation of
   scholarship that continues today
  Build preservation practices into our everyday
   management of scholarly inputs and outputs.
  Work with the community of scholars, libraries,
   and publishers to evolve our thinking of needs
   and practices
  Working with CRL towards TRAC criteria audit of
   our digital data and content
  Partner with repositories for sustainability


                                                     40
Thank you!


Questions?
 Tim Babbitt
 timothy.babbitt@proquest.com
 (734) 997-4593




                                41

Weitere ähnliche Inhalte

Was ist angesagt?

FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...Carole Goble
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsCarole Goble
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?Jian Qin
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals FederationManjulaPatel
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentManjulaPatel
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataManjulaPatel
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
 
Lei_Resume-it.doc
Lei_Resume-it.docLei_Resume-it.doc
Lei_Resume-it.docbutest
 
Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...librarianrafia
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation HeidornBryan Heidorn
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesManjulaPatel
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceLizLyon
 

Was ist angesagt? (20)

FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
The eCrystals Federation
The eCrystals FederationThe eCrystals Federation
The eCrystals Federation
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Lei_Resume-it.doc
Lei_Resume-it.docLei_Resume-it.doc
Lei_Resume-it.doc
 
Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...Open Access: Open Access Looking for ways to increase the reach and impact of...
Open Access: Open Access Looking for ways to increase the reach and impact of...
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation Heidorn
 
Integrated research data management in the Structural Sciences
Integrated research data management in the Structural SciencesIntegrated research data management in the Structural Sciences
Integrated research data management in the Structural Sciences
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 

Ähnlich wie Preserving the Inputs and Outputs of Scholarship

Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewAngelo Salatino
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 
Change Management for Libraries
Change Management for LibrariesChange Management for Libraries
Change Management for LibrariesThomas King
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...OpenAIRE
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Paolo Manghi
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
Publishing of Scientific Data  - Science Foundation Ireland Summit 2010Publishing of Scientific Data  - Science Foundation Ireland Summit 2010
Publishing of Scientific Data - Science Foundation Ireland Summit 2010jodischneider
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data ChallengesPhilip Bourne
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData ManagementUlrike Wittig
 

Ähnlich wie Preserving the Inputs and Outputs of Scholarship (20)

Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Engaging the Researcher in RDM
Engaging the Researcher in RDMEngaging the Researcher in RDM
Engaging the Researcher in RDM
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Change Management for Libraries
Change Management for LibrariesChange Management for Libraries
Change Management for Libraries
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Sgci iwsg-a-10-10-16
Sgci iwsg-a-10-10-16Sgci iwsg-a-10-10-16
Sgci iwsg-a-10-10-16
 
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
Publishing of Scientific Data  - Science Foundation Ireland Summit 2010Publishing of Scientific Data  - Science Foundation Ireland Summit 2010
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 

Preserving the Inputs and Outputs of Scholarship

  • 1. Preserving the Inputs and Outputs of Scholarship Tim Babbitt SVP, ProQuest Platforms
  • 2. Our Vision ProQuest will be central to research around the world
  • 4. A Revolution in Research What is at stake is nothing less than the ways in which astronomy will be done in the era of information abundance Astronomer George Djorgovski 4
  • 5. Drivers of context change  Growth of the internet  Low cost, rapid digitization of print materials  Open Source movement  Rise of Social Software, Web 2.0 tools, mobile  Publishing and scholarship ecosystem  Changing policies  Internationalization of scholarship  Growth in primary source datasets 5
  • 6. Key characteristics of the current research landscape  The products of research and the starting point of new research are increasingly digital and increasingly ―born-digital‖  Exploding volumes and rising demand for data use by the rapid pace of digital technology innovations  The rapid expansion of the inputs and outputs of scholarship 6
  • 7. Linking the Scholarly lifecycle Vitae Grants Related Articles Comments Notebooks & Reviews Models Codes Presentations Algorithms Preprints Podcasts Models Methods Video Plans Data Ontologies Intermediate Results 7
  • 8. Network of Ideas (citations)
  • 10. Examples of text as data  Changes in word sense ( e.g. consumption( TB ) , moot, oratio1 ) and spelling (e.g. 18th C. ſ to s , *re  *er )  Bibliometrics and other usage analyses  Citation patterns  Institution vs. discipline  Author demographics  Pharma: Drug / Symptom correlation.  Biology: Species / date / location observations.  Social Sci: Work/life habits of undergrads based on access patterns at different institutions [ usage data based]  … 10
  • 11. Text Mining Unstructured text to queryable data structures WHY?  TOO MUCH TEXT TO HAND ANALYZE.  Improved discovery ( better ‗metadata‘ )  Business Intelligence  e.g. content stats -> content acquisitions  Saleable datasets E.g. Distribution of authors vs. disciplines vs. grants  End User research agendas  High-End : Custom (user specified) mining as a service  Simple : Visualization of results ( frequency / co-occurrence …) 11
  • 12. Datasets: Factoids & point data  ca. 1.4M Faculty ( 50% full-time ) in US HE, ~75M people enrolled in US HE  ca. 100k Faculty in UK HE  44% of Researchers use online (other people‘s) datasets for their research  48% of Researchers use datasets > 1GB  10.8% store their data outside their institution ( 50% store it in their ―lab‖)  1 - 5% of datasets are formally moved into the curation process.  66%of faculty have requested other people‘s data ( and 49% of those got it).  [ 26.5% have the expertise to analyze their own data.  [ 80.3% do not have sufficient expertise to manage their own data  Institutional storage costs ~ $600 / TB / year  [ 58% is the annual increase in the amount of data being generated  [ 20-40% is annual growth in the amount of storage deployed (est.)  < 1% of ecological data is accessible after publication.  > 85% of all information is in text form  2.7 times more citations accrue to papers with accessible data  3 to 6 times more papers emerge if the data is accessible. 12
  • 13. Curation OF scholar data  Tools to ingest, add & validate schemas, publish, migrate and preserve. ( DMP1 provision )  Tools to analyze2  Tools to discover datasets  ―Summon‖ for IR datasets, gov‘t datasets …  Tools to merge (create composite datasets) 3  Citation management & attribution for datasets.  Generic capabilities (domain specific later). 13
  • 14. Dataset provision TO scholars  Content procurement and dissemination.  What we do already (intermediary)  Needs discovery tools  Easy to focused on selected domains that are publicly available.  Most research does not use publicly available data 14
  • 15. Towards reproducible research  Reproducible research  means context, quality, trust  means easy access to the sources  Science depends entirely on the knowledge and data gained in the past to further advance 15
  • 16. Preserving Research Data  Growing trend of journals and publishers linking to open- access data repositories  Elsevier and PANGAEA – Publishing Network for Geoscientific & Environmental Data  Reciprocal linking of articles and the data behind the research  Journals and funding agencies setting policy to preserve and associate data supporting research results  e.g. American Naturalist new policy:  This journal requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, TreeBASE, Dryad, or the Knowledge Network for Biocomplexity. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species. 16
  • 20. ProQuest Microfilm  PQ business original objectives: preservation and access  New technology, microfilming  1938 British Library – 120,000 first printed books in English  1939 established Dissertations filming, printing program  1940‘s began microfilming newspapers  1948 began microfilming serials  Added 700+ Research Collections for Academic market, still actively filming several  2.5M Dissertations and Theses, actively filming  Newspaper Archive contains 10,700 titles, 900 titles actively filming
  • 21. Microfilm Commitment  With the ongoing research and archival need for microfilmed content, ProQuest invested significantly to build a new filming operation in Ypsilanti, MI.  Opened May, 2010  Employing 65 staff  Utilizing eBeam Cameras: digital images to film masters  Scanning operation.  Utilizing 2 archive locations: Iron Mountain and Ypsilanti
  • 22. Film Archive at Iron Mountain
  • 23. Film Archive at Iron Mountain
  • 24. Film Archive at Iron Mountain
  • 28. Microfiche Archive - Ypsilanti
  • 29. Microform and Digital Interface  Microforms are the source materials for numerous historical digital products.  Historical Newspapers  Periodical Archive Online, Periodical Index Online  Early English Books Online  Parliamentary Papers  Sanborn Maps, Geo-edition Sanborn Maps  Gerritsen Collection of Women‘s History  700+ Research Collections……
  • 30. Digital Microfilm Adobe controls for zooming, rotating, printing, saving, emailing PDFs or links Use this area for further date selection
  • 32.
  • 33. Dissertations  ProQuest ―UMI‖ Dissertation Publishing  Over 50 years  Official repository of dissertations and theses for the national libraries of Canada and the United States  Archive  Use of Microform  Multi-location digital copies  Tape
  • 35. Preservation of inputs and outputs of scholarship  Publication part of wider network of Related Vitae Grants Articles Comments scholarly Notebooks Models & Reviews information: Codes Presentations Algorithms  Original data Preprints  Shared databases Models Podcasts  Multimedia Methods Video Plans expressions Data Ontologies Intermediate  Social media Results  Preservation should encompass all of this
  • 36. Our concern for scholarship  Secondary source publications are much better protected than inputs to research  Research data-explosion  Primary sources  Datasets  Text as data  Focus on objects rather than linkages  We need to continue to support the preservation of scholarship inputs and outputs as they evolves
  • 37. Our questions for us…  Can practices of preservation and sustainability become common place?  What is the right balance of new digital technology and analog methods of preservation?  Film industry—research and practice on preservation born- digital films  How should we approach going beyond the current atomic level of preservation—the object? How should we deal with:  Links  Text as data  mining
  • 38. Towards increasing the sustainability of research output  Persistent identifiers—linkages of underlying output of scholarship  i.e. DOI, ISBN, ISNI  Establishing network of safe/trusted repositories for for all outputs of scholars  Link/citation practices to outputs, not just official publications; focus on reliability
  • 39. Preservation of born digital outputs  Capability to preserve objects in digital formats— addressing storage capacity; accessibility; and frequent churn in digital formats, media, and tools that turn bits into humanly-recognizable artifacts—is a core requirement of digital scholarship.  Leverage Microfilm as superior vehicle for ―born digital‖ preservation  Driver for movement from print to digital in library collections. See for example, 2009 Ithaka paper, ―What to Withdraw: Print Collections Management in the Wake of Digitization‖
  • 40. Preservation as a practice  We have a history in the preservation of scholarship that continues today  Build preservation practices into our everyday management of scholarly inputs and outputs.  Work with the community of scholars, libraries, and publishers to evolve our thinking of needs and practices  Working with CRL towards TRAC criteria audit of our digital data and content  Partner with repositories for sustainability 40
  • 41. Thank you! Questions? Tim Babbitt timothy.babbitt@proquest.com (734) 997-4593 41

Hinweis der Redaktion

  1. Whilst content can be obfuscated or reduced, there are thorny issues with usage data. Early policy decisions need to be taken with respect to exposing usage data, even indirectly ( triangulation is always possible ).--1 Oratio has shifted from ‘speech’ to ‘prayer’ and back again in the latin literature. See Greg Crane et al.
  2. Figures on faculty demographics from http://nces.ed.gov/programs/digest/d09Sources in earlier paper on datasets.
  3. DMP : JISC / NSF mandated Data Management PlanBoth ‘canned’ such as histograms and user-scriptable.E.g combining observational data over time and space to turn point measurements into a time series of distribution map.
  4. A reminder - Digital Microfilm acts like an extension of microfilm – there is no searching. It does provides basic amounts of metadata – for newspapers: title, year, month, day, and page – that make it easy to skip through the reels. Another reminder - It is web-based, so researchers can access the film content from their kitchens or their dorm rooms.
  5. A reminder - Digital Microfilm acts like an extension of microfilm – there is no searching. It does provides basic amounts of metadata – for newspapers: title, year, month, day, and page – that make it easy to skip through the reels. Another reminder - It is web-based, so researchers can access the film content from their kitchens or their dorm rooms.
  6. A reminder - Digital Microfilm acts like an extension of microfilm – there is no searching. It does provides basic amounts of metadata – for newspapers: title, year, month, day, and page – that make it easy to skip through the reels. Another reminder - It is web-based, so researchers can access the film content from their kitchens or their dorm rooms.