This document discusses the opportunities and challenges of big data for libraries, researchers, and digital humanities. It notes that big data is growing exponentially from sensors, internet data, and scientific instruments. Libraries and librarians have new roles to play in data management, curation, and research data services. Researchers need help with data literacy, data management plans, and archiving research data. Digital humanities can use big data and visualization to gain new insights. Standards like TEI and services like data repositories are important to enable access and reuse of data.
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
From DARPA to Shakespeare: All the Data we Can Handle
1.
2. From
DARPA to Shakespeare
and all the data we can handle
Big Data and Digital Humanities
February 2014
http://www.darpa.mil/newsevents/releases/2012/03/29.aspx
3. 1. Big Data
2. Libraries & Librarians
3. University Researchers & Beyond
4. Digital Humanities
5. High-Performance Computing (HPC) Act of 1991 (Public
Law 102-194)
as amended by the
Next Generation Internet Research Act of 1998 (Public Law
105-305)
and
America COMPETES Act of 2007 (Public Law 110-69).
It’s the law!
These laws authorize Federal agencies to set goals, prioritize their investments, and coordinate their
activities in networking and information technology research and development.
George O. Strawn NITRD
Networking and Information Technology Research and Development (NITRD) Program
From : Hot Topics in Big Data: What You Need to Know Now!
FEDLINK, NFAIS, CENDI; December 11, 2012
6.
7. Big data...
is a mystery
is a child of the internet
Big Data has grown
from...
CPU's of information
Disks of information
...to
Networks of information
Sensors everywhere
George O. Strawn NITRD
8. Urban computing also aims to deeply understand the nature and sciences behind the phenomenon
occurring in urban spaces, using a variety of heterogeneous data sources, such as traffic flows, human
mobility, geographic and map data, environment, energy consumption, populations, and economics,
etc. Recently, real-world data reflecting city dynamics becomes widely available, including, e.g., users’
mobile phone signal, GPS traces of vehicles and people, ticketing data in public transportation systems,
user-generated content (like tweets, micro-blog, check-ins, photos), data from transportation sensor
networks (camera and loop sensors) and environment sensor networks (temperature and air quality), as
well as data from the Internet of Things.
http://www.meetup.com/UrbanComputing/
Smart
Cities
9. Examples of big data:
• Electronic Health Records
• Text vs tables
• Textual analytics TEI
• Sentiment analysis - FB posts, Twitter
• Distributed data, distributed computing
• Atmospheric sensors, undersea sensors
• Hubble telescope
• Library ERM
10. Big Data & Science...
• Analyzing output from simulations
• Analyzing instrument output - LHC, Curiosity
• Creating DB's to support wide collaboration:
Human Genome Project
• Creating Knowledge Bases from textural information:
Semantic Medline
• Proteomics will be bigger than genomics
How do you move 100TB of information
within a University or a research area?
13. From bits to its...
Does the world consist of ...
matter, energy and information?
Newton - matter and motion
Steam engine - thermodynamics, matter, energy
Computer - science of information, matter, energy and information
Data intensive science is revolutionary science
Big Data is TOO BIG To KNOW!
The dust hasn't settled; dust is swirling all
around us; it is FUN dust! George O. Strawn
14. See presentation:
Philosophy & Big Data: Big Data, the Individual, and Society
by Melanie Swan
January 24, 2013
http://www.slideshare.net/lablogga/philosophy-and-big-data-
big-data-the-individual-and-society
18. Michael Furlough
Associate Dean for Research and Scholarly Communications
Penn State University Libraries
Libraries roles and challenges:
Libraries will have to operate on faith
Libraries will need deep collaboration
19. Librarians - new roles
Instruction - Best Practices
Data Information Literacy
Collaborate - DMP & more
Data Management Plans
Preserving/curating research
DO
Manage - RDS Services
Keeping up!
20. Conversion & Interoperability
Cultures of Practice
Databases & Data Formats
Data Curation & Reuse
Data Management & Organization
Data Processing & Analysis
Data Quality & Documentation
Discovery & Acquisition
Ethics & Attribution
Metadata & Data Description
Preservation
Visualization & Representation
See more at: Data Information Literacy Competencies
http://wiki.lib.purdue.edu/display/ste/Materials+for+the+DIL+Symposium
Data
is
information
21. Librarians - new roles
Instruction - Best Practices
Data Information Literacy
Collaborate - DMP & more
Data Management Plans
Preserving/curating research
DO
Manage - RDS Services
Keeping up!
22. Build on successes
MANTRA - Research Management Data Training
http://datalib.edina.ac.uk/mantra/
Data Management Course 2014 -
University 0f Minnesota
https://sites.google.com/a/umn.edu/data-management-workshop-series/
Data Train
http://archaeologydataservice.ac.uk/learning/DataTrain#section-
DataTrain-AimsObjectives
23. See Data Managment Modules
from University of Minnesota
Lisa Johnston
https://sites.google.com/a/umn.edu/data-
management-workshop-series/module1
25. Librarians - new roles
Instruction - Best Practices
Data Information Literacy
Collaborate - DMP & more
Data Management Plans
Preserving/curating research
DO
Manage - RDS Services
Keeping up!
26. What do researchers care about?
Where can I put my stuff?
What is a data management plan?
Data needs to be...
• available
• findable
• re-usable
• citable
30. DataNet from NSF
http://datafed.org/
Digital Preservation from the LoC
http://www.digitalpreservation.gov/
HathiTrust Digital Library
http://www.hathitrust.org/
Digital Preservation Network
http://www.dpn.org/
31. Title:
State of Sustainability Practices among Minnesota Tourism Businesses, 2007-2013
Authors:
Qian, Xinyi (Lisa)
Schneider, Ingrid E.
32. Title:
Public-Use Data from the Obstetrics and Periodontal
Therapy (OPT) Study, a randomized trial of periodontal
therapy to prevent pre-term birth
Authors:
Hodges, James S.
Michalowicz, Bryan S.
33. Title:
"Laundry Soap" from the Ojibwe Conversational Archives Project
Authors:
Hermes, Mary
Tainter, Rose
Kingbird-Porter, Margaret
36. Librarians - new roles
Instruction - Best Practices
Data Information Literacy
Collaborate - DMP & more
Data Management Plans
Preserving/curating research
DO
Manage - RDS Services
Keeping up!
37. Research Data Services
University of Minnesota
https://www.lib.umn.edu/datamanagement/archiving
George Mason University
http://dataservices.gmu.edu/resources/data-management
University of Maryland
http://www.lib.umd.edu/data
38. For all links please see:
http://guides.lib.cua.edu/hoffman
[tab] BigData
Keeping Research Data Safe
http://www.beagrie.com/krds.php
47. Geography of the London Ballad Trade 1500-
1700
http://ebba.english.ucsb.edu/balladprintersite/L
BP_main.html
World War I Document Archive
http://www.gwpda.org/
48.
49. Examples and Tools for DH projects
http://miriamposner.com/blog/how-did-they-make-that/#more-1571
ScrollKit
https://www.scrollkit.com/
53. Examples of TEI:
American Memory (uses a TEI-conformant DTD)
http://memory.loc.gov/ammem/index.html
Early Canada Online
http://www.canadiana.org/
Victorian Women Writers Project
http://www.indiana.edu/~letrs/vwwp/index.html
Oxford Text Archive
http://ota.ahds.ac.uk/
56. • Data is information
• Libraries can be partners in providing value
- access and analytics
• Deep Collaboration - Federal, University,
Business, Researchers/Industry, Future of
Research
• Data Policies
• Renaissance of Archivists
• Librarians as information consultants
• Librarians as researchers
58. References
2012/03/29 DARPA calls for advances in big data to help the warfighter. (2012). Retrieved from
http://www.darpa.mil/newsevents/releases/2012/03/29.aspx
Boyle, D. E., Yates, D. C., & Yeatman, E. M. (2013). Urban sensor data streams: London 2013. Internet Computing, IEEE, 17(6), 12-20.
doi:10.1109/MIC.2013.85
Domingo, A., Bellalta, B., Palacin, M., Oliver, M., & Almirall, E. (2013). Public open sensor data: Revolutionizing smart cities. Technology and
Society Magazine, IEEE, 32(4), 50-56. doi:10.1109/MTS.2013.2286421
Gladney, H. M. (2012). Long-term digital preservation: A digital humanities topic? HISTORICAL SOCIAL RESEARCH-HISTORISCHE
SOZIALFORSCHUNG, 37(3), 201-217.
IBM smarter cities - overview - ireland. Retrieved from http://www.ibm.com/smarterplanet/ie/en/smarter_cities/overview/index.html?re=CS1
JADH 2013: ODDly pragmatic: Documenting encoding practices in digital humanities projects by james cummings on prezi. Retrieved from
http://prezi.com/af2auinap-ug/jadh-2013-oddly-pragmatic-documenting-encoding-practices-in-digital-humanities-projects/
Lisa Johnston, Research Data Management and Curation Lead, & University Libraries University of Minnesota -‐ Twin Cities . (2014). A
Workflow Model for Curating Research Data in the University of Minnesota Libraries: Report from the 2013 Data Curation Pilot .
().University Digital of Minnesota Conservancy.
Michael Pepi. (2013). The postmodernity of big data – the new inquiry. Retrieved from http://thenewinquiry.com/essays/the-postmodernity-of-
big-data/
Van den Eynden, V., Corti, L., Woollard, M., Bishop, L., & Horton, L. (2011). Managing and sharing data: Best practice for researchers