SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Sharing Datta-Mine-ing John Unsworth (with contributions from Ted Underwood and the HTRC executive committee) Graduate School of Library and Information Science University of Illinois, Urbana-Champaign June 2011
Where we’re going Our anxieties about quantitative methods in the humanities Worse and better examples of arguments in the humanities using quantitative data The real problem: data which exists, but to which we don’t have access A solution to that problem, involving three-foot-long spoons.
Julia Flanders, Digital Humanities and the Politics of Scholarly Work:  “Debates about quantification, numerical analysis, and the reductiveness of detail have figured significantly in discussions of scholarly method and the nature of literary study for over two centuries, and they also seemed to me to have important continuities with the problem of the aesthetic[….] consider the following brief episode from ItaloCalvino’s If on a Winter's Night a Traveller [1979]. In this vignette, a novelist named Calixto Bandera is considering a new spin on the problem of audience:” http://dev.stg.brown.edu/staff/Julia_Flanders/pubs/flanders_dissertation.xhtml
The Computer as Reader “I asked Lotaria if she has already read some books of mine that I lent her. She said no, because here she doesn't have a computer at her disposal.”  (186) “She explained to me that a suitably programmed computer can read a novel in a few minutes and record the list of all the words contained in the text, in order of frequency [...]
The Computer as Writer Now, every time I write a word, I see it spun around by the electronic brain, ranked according to its frequency, next to other words whose identity I cannot know [….] Perhaps instead of a book I could write lists of words, in alphabetical order, an avalanche of isolated words which expresses that truth I still do not know, and from which the computer, reversing its program, could construct the book, my book.” (188)
Arguing with Data Data enables arguments based on quantitative and/or empirical data Data still requires interpretation, and you can still make better and worse interpretations, and more or less compelling arguments In addition to new kinds of arguments, you can make new kinds of mistakes, especially mistakes based on incomplete data or on an incomplete understanding of data
Mistakes based on incomplete data
Mistakes based on incomplete data
New kinds of arguments Ted Underwood is exploring the changing etymological basis of diction in English, over a 200-year period, especially the shift from words derived from German, to words derived from Latin, and back again.
Etymology and StyleTed Underwood, 2011 ,[object Object]
There is nevertheless good evidence that older words do predominate in informal, and especially spoken English. [Laly Bar-Ilan and Ruth A. Berman, “Developing register differentiation: the Latinate-Germanic divide in English,” Linguistics 45 (2007): 1-35.]
Can we use this fact to trace broad changes of register in the history of written English?,[object Object]
To understand the significance of the result, it needs to be broken down by genre. Initial results suggest that fiction and nonfiction prose both become more formal (less like speech) in the 18c. Drama and poetry change little, although older, less formal, “speechlike” words always predominate in drama.
Datum = Something Given So, Ted’s investigation concerns historical trends: as such, it is reasonable to think that it might be interesting to extend beyond 1900.   Can we do that?  Only if we are given the data.
Copyright Creep http://en.wikipedia.org/wiki/Copyright_Term_Extension_Act
A murine chronology (1928) AprilProduction begins on the Mickey Mouse film Plane Crazy inspired by Charles Lindbergh's trans-atlanticflight  May 15 Work is completed on the film Plane Crazy. Walt Disney's first silent film featuring Mickey Mouse, Plane Crazy premieres as a sneak preview at a theatre on Sunset Boulevard, in Los Angeles, California. It cost US$1772.89 to make. Minnie Mouse also debuts.  May 16 Walt Disney applies for a trademark for "Mickey Mouse", for use in motion pictures. http://www.islandnet.com/~kpolsson/mmouse/
Steamboat Willie “Steamboat Willie has been close to entering the public domain in the United States several times. Each time, copyright protection in the United States has been extended. It could have entered public domain in 4 different years; first in 1956, renewed to 1984, then to 2003 by the Copyright Act of 1976, and finally to the current public domain date of 2023 by the Copyright Term Extension Act (also known pejoratively as the Mickey Mouse Protection Act)[3] of 1998. The U.S. copyright on Steamboat Willie will be in effect through 2023 unless there is another change of the law.” http://en.wikipedia.org/wiki/Steamboat_Willie
The Waste Land T.S. Eliot, by Wyndham Lewis, 1938 Original publication of the poem: 1922, in The Dial (an American literary magazine
Copyright and The Waste Land “The copyright was registered in the United States sometime in 1922.  The copyright gave 28 years of protection plus any additional time to cause it to expire after midnight on the last day of the year. Thus it was protected up to and throughout 1950 (1922 + 28).  In 1950 the copyright could be renewed for 28 more years meaning that it would enter the public domain in the United States after the end of 1978 (1950 + 28).  In the United States, the Copyright Act of 1976 extended the renewal from 28 years to 47 years giving The Waste Land protection for 19 more years or throughout 1997 (1950 + 28 + 19).”
Copyright and The Waste Land “On January 1, 1998, The Waste Land went into public domain in the United States.  On October 27, 1998 U.S. public law 105-298 extended renewal of copyrighted items (that were still under protection) by 20 years.  The Waste Land was, however, already in the public domain in the United States and thus remains in that state.  If The Waste Land was written in 1923 it would be protected for 95 years (28 + 28 + 19 + 20) plus the remainder of the last calendar year meaning that it would go into the public domain (in the US) January 1, 2019.”
And in England… “The Waste Land is still under copyright restrictions in the United Kingdom and most likely in the countries of the European Union, the Commonwealth of Nations and other countries. Copies of T.S. Eliot's poems, plays, essays and other of his works that are placed on computers for public access through the internet may be infringing on copyrights held by Faber and Faber, Mrs. T.S. Eliot and others.” Copyright information about the Waste Land comes from R.A. Parker, “Exploring the Waste Land,” a hobbyist site at http://www.std.com/~raparker/exploring/thewasteland/excopy.html
Give, sympathize, control 401. 'Datta, dayadhvam, damyata' (Give, sympathize, control). The fable of the meaning of the Thunder is found in the Brihadaranyaka--Upanishad, 5, 1. A translation is found in Deussen'sSechzig Upanishads des Veda, p. 489.
The Waste Land ,[object Object]
Faber, in England, isn’t letting go of it their property yet, either.  By continuing to produce new products based on it, they strengthen their claim to it.  The battle for the Waste Land may be lost in the colonies, but not yet in the Kingdom.
The Waste Land marks the chronological beginning of a wasteland created by Datta-mine-ing. ,[object Object]
1923 is about here             | Date range starts here, and goes counter-clockwise.
HathiTrust Research Center
HathiTrust Digital Library History To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge. Launched in October 2008 University of Michigan Indiana University Used Google Books Repository at Michigan as model Expanded to include content from  CIC Member Libraries UC System Libraries University of Virginia Now includes more than 50 partner institutions
HathiTrust Research Center Today HTRC is dedicated to the provision of access to a comprehensive body of published works for scholarship and education for computational research purposes. Lightweight Organization Executive Committee Beth Plale, Indiana Scott Pool, Illinois Robert H. McDonald, Indiana John Unsworth, Illinois Advisory Board – TBD HT Executive Committee Sponsor Laine Farley, California Digital Library
HathiTrust Research Center Will: Maintain repository of text mining algorithms and retrieval tools available on-line for human and programmatic discovery.  Also register derived data sets, indexes, and versions in registry repository.   Be a user-driven resource, with an active advisory board, and a community model that allows users to share algorithms and tools.   Support interoperability across collections and institutions, through use of inCommon SAML identity.
Non-consumptive Research “Research in which computational analysis is performed on one or more Books, but not research in which a researcher reads or displays substantial portions of a Book to understand the intellectual content presented within the Book.”

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (7)

Reference Guide to International Space Station
Reference Guide to International Space StationReference Guide to International Space Station
Reference Guide to International Space Station
 
Latesht employee details
Latesht employee detailsLatesht employee details
Latesht employee details
 
Comic commands
Comic commandsComic commands
Comic commands
 
Week14
Week14Week14
Week14
 
Final draft corrected pdf
Final draft corrected pdfFinal draft corrected pdf
Final draft corrected pdf
 
การใช้งานของ Google
การใช้งานของ  Googleการใช้งานของ  Google
การใช้งานของ Google
 
Love
LoveLove
Love
 

Ähnlich wie Sharing Datta-Mine-ing John Unsworth's Document on Quantitative Methods

Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistDigital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistRebecca Davis
 
Student Introduction to National History Day in Ohio
Student Introduction to National History Day in OhioStudent Introduction to National History Day in Ohio
Student Introduction to National History Day in OhioMegan Wood
 
Essay American Dream.pdf
Essay American Dream.pdfEssay American Dream.pdf
Essay American Dream.pdfEllen Blackburn
 
Paragraph Writing with Examples.pdf
Paragraph Writing      with Examples.pdfParagraph Writing      with Examples.pdf
Paragraph Writing with Examples.pdfKamran Abdullah
 
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...Digital Classicist Seminar Berlin
 
Primary Source Analysis Assignment #1 (worth 10)This assignment.docx
Primary Source Analysis Assignment #1 (worth 10)This assignment.docxPrimary Source Analysis Assignment #1 (worth 10)This assignment.docx
Primary Source Analysis Assignment #1 (worth 10)This assignment.docxChantellPantoja184
 
Essay Writing Discursive Essays Opinion Essay
Essay Writing Discursive Essays Opinion EssayEssay Writing Discursive Essays Opinion Essay
Essay Writing Discursive Essays Opinion EssayJessica Navarro
 
Comparative literature in the age of digital humanities on possible futures...
Comparative literature in the  age of digital humanities  on possible futures...Comparative literature in the  age of digital humanities  on possible futures...
Comparative literature in the age of digital humanities on possible futures...Asari Bhavyang
 
Comparative literature in the age of digital humanities on possible futures...
Comparative literature in the  age of digital humanities  on possible futures...Comparative literature in the  age of digital humanities  on possible futures...
Comparative literature in the age of digital humanities on possible futures...Nidhi Jethava
 
How To Write An Essay Plan For University
How To Write An Essay Plan For UniversityHow To Write An Essay Plan For University
How To Write An Essay Plan For UniversityKatie Parker
 
A review of some recent cyber theory
A review of some recent cyber theoryA review of some recent cyber theory
A review of some recent cyber theoryKeith Devereux
 
Moritz iSchools at The Getty Dec, 2007
Moritz iSchools at The Getty Dec, 2007Moritz iSchools at The Getty Dec, 2007
Moritz iSchools at The Getty Dec, 2007Tom Moritz
 
How To Write A Compare And Contrast Essay Topic
How To Write A Compare And Contrast Essay TopicHow To Write A Compare And Contrast Essay Topic
How To Write A Compare And Contrast Essay TopicHeather Freek
 
In the Minds of Men: Darwin and the New World Order, by Ian T. Taylor
In the Minds of Men: Darwin and the New World Order, by Ian T. TaylorIn the Minds of Men: Darwin and the New World Order, by Ian T. Taylor
In the Minds of Men: Darwin and the New World Order, by Ian T. TaylorOrthodoxoOnline
 
The public domain book it
The public domain book itThe public domain book it
The public domain book itchubbyat
 
Essay On Library Museum And Archive
Essay On Library Museum And ArchiveEssay On Library Museum And Archive
Essay On Library Museum And ArchiveJessica Rinehart
 

Ähnlich wie Sharing Datta-Mine-ing John Unsworth's Document on Quantitative Methods (20)

Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century HumanistDigital Scholarship Seminar: Implications of Data for the 21st-century Humanist
Digital Scholarship Seminar: Implications of Data for the 21st-century Humanist
 
Student Introduction to National History Day in Ohio
Student Introduction to National History Day in OhioStudent Introduction to National History Day in Ohio
Student Introduction to National History Day in Ohio
 
Mla Essay Cover Page
Mla Essay Cover PageMla Essay Cover Page
Mla Essay Cover Page
 
Essay American Dream.pdf
Essay American Dream.pdfEssay American Dream.pdf
Essay American Dream.pdf
 
Paragraph Writing with Examples.pdf
Paragraph Writing      with Examples.pdfParagraph Writing      with Examples.pdf
Paragraph Writing with Examples.pdf
 
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
[DCSB] Gregory Crane (University of Leipzig): "Digital Philology, World Liter...
 
Primary Source Analysis Assignment #1 (worth 10)This assignment.docx
Primary Source Analysis Assignment #1 (worth 10)This assignment.docxPrimary Source Analysis Assignment #1 (worth 10)This assignment.docx
Primary Source Analysis Assignment #1 (worth 10)This assignment.docx
 
Essay Writing Discursive Essays Opinion Essay
Essay Writing Discursive Essays Opinion EssayEssay Writing Discursive Essays Opinion Essay
Essay Writing Discursive Essays Opinion Essay
 
Comparative literature in the age of digital humanities on possible futures...
Comparative literature in the  age of digital humanities  on possible futures...Comparative literature in the  age of digital humanities  on possible futures...
Comparative literature in the age of digital humanities on possible futures...
 
Comparative literature in the age of digital humanities on possible futures...
Comparative literature in the  age of digital humanities  on possible futures...Comparative literature in the  age of digital humanities  on possible futures...
Comparative literature in the age of digital humanities on possible futures...
 
Essay On Shooting.pdf
Essay On Shooting.pdfEssay On Shooting.pdf
Essay On Shooting.pdf
 
How To Write An Essay Plan For University
How To Write An Essay Plan For UniversityHow To Write An Essay Plan For University
How To Write An Essay Plan For University
 
A review of some recent cyber theory
A review of some recent cyber theoryA review of some recent cyber theory
A review of some recent cyber theory
 
Moritz iSchools at The Getty Dec, 2007
Moritz iSchools at The Getty Dec, 2007Moritz iSchools at The Getty Dec, 2007
Moritz iSchools at The Getty Dec, 2007
 
How To Write A Compare And Contrast Essay Topic
How To Write A Compare And Contrast Essay TopicHow To Write A Compare And Contrast Essay Topic
How To Write A Compare And Contrast Essay Topic
 
In the Minds of Men: Darwin and the New World Order, by Ian T. Taylor
In the Minds of Men: Darwin and the New World Order, by Ian T. TaylorIn the Minds of Men: Darwin and the New World Order, by Ian T. Taylor
In the Minds of Men: Darwin and the New World Order, by Ian T. Taylor
 
Balancing Act
Balancing ActBalancing Act
Balancing Act
 
The public domain book it
The public domain book itThe public domain book it
The public domain book it
 
HUIN105: From Blogs to Bards
HUIN105: From Blogs to BardsHUIN105: From Blogs to Bards
HUIN105: From Blogs to Bards
 
Essay On Library Museum And Archive
Essay On Library Museum And ArchiveEssay On Library Museum And Archive
Essay On Library Museum And Archive
 

Kürzlich hochgeladen

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Kürzlich hochgeladen (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Sharing Datta-Mine-ing John Unsworth's Document on Quantitative Methods

  • 1. Sharing Datta-Mine-ing John Unsworth (with contributions from Ted Underwood and the HTRC executive committee) Graduate School of Library and Information Science University of Illinois, Urbana-Champaign June 2011
  • 2. Where we’re going Our anxieties about quantitative methods in the humanities Worse and better examples of arguments in the humanities using quantitative data The real problem: data which exists, but to which we don’t have access A solution to that problem, involving three-foot-long spoons.
  • 3. Julia Flanders, Digital Humanities and the Politics of Scholarly Work: “Debates about quantification, numerical analysis, and the reductiveness of detail have figured significantly in discussions of scholarly method and the nature of literary study for over two centuries, and they also seemed to me to have important continuities with the problem of the aesthetic[….] consider the following brief episode from ItaloCalvino’s If on a Winter's Night a Traveller [1979]. In this vignette, a novelist named Calixto Bandera is considering a new spin on the problem of audience:” http://dev.stg.brown.edu/staff/Julia_Flanders/pubs/flanders_dissertation.xhtml
  • 4. The Computer as Reader “I asked Lotaria if she has already read some books of mine that I lent her. She said no, because here she doesn't have a computer at her disposal.” (186) “She explained to me that a suitably programmed computer can read a novel in a few minutes and record the list of all the words contained in the text, in order of frequency [...]
  • 5. The Computer as Writer Now, every time I write a word, I see it spun around by the electronic brain, ranked according to its frequency, next to other words whose identity I cannot know [….] Perhaps instead of a book I could write lists of words, in alphabetical order, an avalanche of isolated words which expresses that truth I still do not know, and from which the computer, reversing its program, could construct the book, my book.” (188)
  • 6. Arguing with Data Data enables arguments based on quantitative and/or empirical data Data still requires interpretation, and you can still make better and worse interpretations, and more or less compelling arguments In addition to new kinds of arguments, you can make new kinds of mistakes, especially mistakes based on incomplete data or on an incomplete understanding of data
  • 7. Mistakes based on incomplete data
  • 8. Mistakes based on incomplete data
  • 9. New kinds of arguments Ted Underwood is exploring the changing etymological basis of diction in English, over a 200-year period, especially the shift from words derived from German, to words derived from Latin, and back again.
  • 10.
  • 11. There is nevertheless good evidence that older words do predominate in informal, and especially spoken English. [Laly Bar-Ilan and Ruth A. Berman, “Developing register differentiation: the Latinate-Germanic divide in English,” Linguistics 45 (2007): 1-35.]
  • 12.
  • 13. To understand the significance of the result, it needs to be broken down by genre. Initial results suggest that fiction and nonfiction prose both become more formal (less like speech) in the 18c. Drama and poetry change little, although older, less formal, “speechlike” words always predominate in drama.
  • 14. Datum = Something Given So, Ted’s investigation concerns historical trends: as such, it is reasonable to think that it might be interesting to extend beyond 1900. Can we do that? Only if we are given the data.
  • 16. A murine chronology (1928) AprilProduction begins on the Mickey Mouse film Plane Crazy inspired by Charles Lindbergh's trans-atlanticflight May 15 Work is completed on the film Plane Crazy. Walt Disney's first silent film featuring Mickey Mouse, Plane Crazy premieres as a sneak preview at a theatre on Sunset Boulevard, in Los Angeles, California. It cost US$1772.89 to make. Minnie Mouse also debuts. May 16 Walt Disney applies for a trademark for "Mickey Mouse", for use in motion pictures. http://www.islandnet.com/~kpolsson/mmouse/
  • 17. Steamboat Willie “Steamboat Willie has been close to entering the public domain in the United States several times. Each time, copyright protection in the United States has been extended. It could have entered public domain in 4 different years; first in 1956, renewed to 1984, then to 2003 by the Copyright Act of 1976, and finally to the current public domain date of 2023 by the Copyright Term Extension Act (also known pejoratively as the Mickey Mouse Protection Act)[3] of 1998. The U.S. copyright on Steamboat Willie will be in effect through 2023 unless there is another change of the law.” http://en.wikipedia.org/wiki/Steamboat_Willie
  • 18. The Waste Land T.S. Eliot, by Wyndham Lewis, 1938 Original publication of the poem: 1922, in The Dial (an American literary magazine
  • 19. Copyright and The Waste Land “The copyright was registered in the United States sometime in 1922. The copyright gave 28 years of protection plus any additional time to cause it to expire after midnight on the last day of the year. Thus it was protected up to and throughout 1950 (1922 + 28). In 1950 the copyright could be renewed for 28 more years meaning that it would enter the public domain in the United States after the end of 1978 (1950 + 28). In the United States, the Copyright Act of 1976 extended the renewal from 28 years to 47 years giving The Waste Land protection for 19 more years or throughout 1997 (1950 + 28 + 19).”
  • 20. Copyright and The Waste Land “On January 1, 1998, The Waste Land went into public domain in the United States. On October 27, 1998 U.S. public law 105-298 extended renewal of copyrighted items (that were still under protection) by 20 years. The Waste Land was, however, already in the public domain in the United States and thus remains in that state. If The Waste Land was written in 1923 it would be protected for 95 years (28 + 28 + 19 + 20) plus the remainder of the last calendar year meaning that it would go into the public domain (in the US) January 1, 2019.”
  • 21. And in England… “The Waste Land is still under copyright restrictions in the United Kingdom and most likely in the countries of the European Union, the Commonwealth of Nations and other countries. Copies of T.S. Eliot's poems, plays, essays and other of his works that are placed on computers for public access through the internet may be infringing on copyrights held by Faber and Faber, Mrs. T.S. Eliot and others.” Copyright information about the Waste Land comes from R.A. Parker, “Exploring the Waste Land,” a hobbyist site at http://www.std.com/~raparker/exploring/thewasteland/excopy.html
  • 22. Give, sympathize, control 401. 'Datta, dayadhvam, damyata' (Give, sympathize, control). The fable of the meaning of the Thunder is found in the Brihadaranyaka--Upanishad, 5, 1. A translation is found in Deussen'sSechzig Upanishads des Veda, p. 489.
  • 23.
  • 24. Faber, in England, isn’t letting go of it their property yet, either. By continuing to produce new products based on it, they strengthen their claim to it. The battle for the Waste Land may be lost in the colonies, but not yet in the Kingdom.
  • 25.
  • 26. 1923 is about here | Date range starts here, and goes counter-clockwise.
  • 28. HathiTrust Digital Library History To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge. Launched in October 2008 University of Michigan Indiana University Used Google Books Repository at Michigan as model Expanded to include content from CIC Member Libraries UC System Libraries University of Virginia Now includes more than 50 partner institutions
  • 29. HathiTrust Research Center Today HTRC is dedicated to the provision of access to a comprehensive body of published works for scholarship and education for computational research purposes. Lightweight Organization Executive Committee Beth Plale, Indiana Scott Pool, Illinois Robert H. McDonald, Indiana John Unsworth, Illinois Advisory Board – TBD HT Executive Committee Sponsor Laine Farley, California Digital Library
  • 30. HathiTrust Research Center Will: Maintain repository of text mining algorithms and retrieval tools available on-line for human and programmatic discovery. Also register derived data sets, indexes, and versions in registry repository. Be a user-driven resource, with an active advisory board, and a community model that allows users to share algorithms and tools. Support interoperability across collections and institutions, through use of inCommon SAML identity.
  • 31. Non-consumptive Research “Research in which computational analysis is performed on one or more Books, but not research in which a researcher reads or displays substantial portions of a Book to understand the intellectual content presented within the Book.”
  • 32. Non-consumptive Research One of HTRC’s unique challenges is support for non-consumptive research. This will entail bringing algorithms to data, and exporting results, and/or providing people with secure computational environments in which they can work with copyrighted materials without exporting them. We are still going to need to persuade publishers that this not only doesn’t threaten their business, but actually enhances and expands their business.
  • 33. Starving at the Banquet Variously attributed to Japanese, Chinese, and Jewish tradition; I think the story probably originates with a 13th-century Hassidic rabbi, MaharamMiRottenberg. A (man, woman) asks an (angel, monk) for a preview of heaven and hell. The angel takes the man to a beautiful place where a banquet has been set, and yet the people at the banquet are starving, because they each have three-foot-long spoons strapped to their arms, and they can’t get their food into their mouths. This is clearly hell. Then they go to visit heaven: same setting, same banquet, but everybody’s fat and happy, because they’re using their spoons to feed each other.

Hinweis der Redaktion

  1. Human(ities)?
  2. Humanists have a long and uncomfortable relationship with quantitative methods, one that Julia Flanders explored in her dissertation on digital humanities and the politics of scholarly work (online, recommended). The passage she cites here, from If On a Winter’s Night a Traveller, sets out these anxieties from an author’s point of view—the fear of being read quantitatively—but (since that work succeeds in making us identify with the author) it’s a good opening to a discussion of our anxieties about reading computationally, as well.
  3. c.f. “How Not To Read A Million Books” or “Distant Reading” or other such arguments.
  4. Calvino’s last move here is to fear something he doesn’t at all believe, namely that the analytical program could reverse engineer a book and a writer’s truth out of a list of words. But there’s something closer to reality here as well: the writer can’t keep track of his use of language in the same mindless numerical way that a computer can—and there are dimensions of style, for example, that can be understood through quantitative analysis in ways the author himself might not understand them. Usually, though, that kind of analysis is not just performed within one book, but in some kind of comparative framework, where comparative frequencies are what’s interesting—and even if the author could keep track of his own word frequencies, he certainly couldn’t keep track of the frequency with which he uses a word compared to all other writing in his language. And finally, to give Calvino his due, it is incumbent on a reader to make some interesting observation or argument based on these frequencies: simply reporting them is neither interesting nor useful.
  5. Data vs. Lore: something my son noticed when he was about five: “Oh, I get it: Data vs. Lore—facts vs. stories.” Never noticed that, myself. He’s a lawyer now, and I’m a lapsed English professor. “Digital computing doesn’t’ change the nature of argument, but it changes elements of argument, such as proof” (Robert Mitchell, Duke, cf TILTS). But I would also say that you tend to argue about different things when you argue with data, I’m going to give some examples that focus not just on computing, but on computing with data. But I’ll also talk about the fact that with these new kinds of arguments there also come new limitations and new kinds of error.
  6. “incomplete data” or, more accurately in this case, mistakes based on an incomplete understanding of data.
  7. Also, you need to understand something about the history of the language to understand the significance of certain quirks in data. And you might say “well, why doesn’t Google just treat its ngrams in a case-insensitive way?” That assumes that case isn’t of interest to some users (as it might be in the example here), and that Google can predict all transformations of the data that would be useful (which it can’t) and that even if they could, they’d have the time and money to perform them (which they don’t). So, they are absolutely literal about the data—and that’s probably the best choice, under the circumstances. But that’s why we need a computational environment in which we can make our own representations of this data, representations suitable to answer particular research questions.
  8. Latin vs. German roots have been used anecdotally to describe, define, characterize, analyze the style of various writers of English, usually along some spectrum of formality to informality, or high to low culture, or artificiality to authenticity. A lot of that’s baloney, but there’s a kernel of truth in this: linguists have provided some data—some empirical evidence that informal—that is, spoken—language has a higher proportion of older words than written English.
  9. Fate of the 500 most common words that entered the English lexicon before 1125, vs. the rest of the language. Why 1125? Really, it’s 1066: but 1125 is the point from which French really becomes the official language of England, for a couple of hundred years. That’s why it’s important to keep track of words that entered the language before that date: if they survive, they are some kind of ground truth about English, against which we could measure trends. The trendline we see here seems to tell the story of those words getting systematically pushed out of the lexicon until a bit before 1800, and then making a resurgence through the 19th century. If you think there’s a 19th-century movement to restore some kind of authentic diction to writing in English, then this trendline is interesting.
  10. So, actual evidence (data) suggests that there’s an other important dimension to this problem, namely genre. The trends in the etymological basis of diction are different for writing than they are for speech, and they’re different for those genres of writing that imitate speech (poetry, drama) than they are for those that don’t (fiction, non-fiction). We don’t have a trendline for poetry and drama here, but if we did, it wouldn’t look the trends for non-fiction and fiction [click] it would be more of a straight line.
  11. Datum, from Latin: something given. Or not given, when access to the data is prevented—for example, because of copyright.Legally, the Waste Land is available for data mining, at least in the US, and it would be an interesting text to look at, in terms of patterns of speech and poetry (and drama, bits of which are embedded throughout). It would no doubt be an outlier or a limit case for Ted’s investigation---but a very interesting outlier. The problem is that we can’t mine much beyond this work: it marks Consider the irony of having started a scholarly career studying 20th-century literature, and arriving at text-mining as your preferred method.
  12. Note the lengthening reach-back segments in this graph. What’s not clear from the graph is that between 1923 and 1963, copyright wasn’t automatically renewed after the initial 28-year period. The University of Michigan (praise be) is working hard to expand the public domain by investigating the copyright status of works in the collection from that time-period, which includes a great deal of the amorphous and difficult category of orphan works (that is, works which could be in copyright, given their publication dates, but for which copyright might not have been renewed,or for which copyright holders might no longer exist). Ownership, as it turns out, is a messy business
  13. Steamboat Willie, our secret overlord.
  14. Every time we extend the term of copyright, we reach back in time to make sure Steamboat Willie is on board. Why? Disney spends a lot of money lobbying your senators and representatives, that’s why.
  15. Note the publication date: 1922. Also note the scholarly mien of our author. Eliot acknowledged Jesse Weston's From Ritual to Romance as a source for the central idea of his poem, and Weston related the theme of the wasteland to the infertility or injury of the king, specifically the Fisher King. So, hold on tight here—for the purposes of the current metaphorical application of the Waste Land, the Fisher King is Mickey Mouse, and he can be healed only if the Scholar (let’s call him Percy) ask the right question about whom copyright is intended to serve—but having been warned he might be sued if he opens his mouth, Percy keeps silent, and the land remains barren. The Waste Land, then, is my boundary marker for the beginning of the new dark ages, under the dominion of Mickey the Merciless. We can mine it, but we can’t mine much beyond it.
  16. A hobbyists notes on the copyright status of the Waste Land: correct as far as I can tell, and it gives you some idea of what a mess this all is.
  17. Remember, it’s a mess like this for every single book published in, oh, the last hundred years or so.
  18. And once you think you have it figured out for one book for one country, then you get into international copyright law, and start all over.
  19. So, according to the upanishads, humans, demons, and celestial beings went to the creator and asked for direction. The creator said DA, but each category of being heard that differently, according to their spiritual needs. As Swami Krishnananda explains: “Human beings are greedy. They want to grab everything. Hoarding is their basic nature. "I want a lot of money"; "I have got a lot of land and property"; "I want to keep it with myself"; "I do not want to give anything to anybody". This is how they think. [….] So, to the human beings this was the instruction - Datta, give, because they are not prepared to give. They always want to keep. Greed is to be controlled by charity.” So the advice to humans: don’t be greedy: Give. The advice to demons was “sympathize,” because their nature is cruel; the advice to celestial beings was “control yourselves” because their nature was to do whatever they wanted to do.
  20. I do believe that Eliot would have relished the kind of exploration of the cultural record that digitized texts and text-mining now make possible. But if we were interested in doing a mashup of 20th century culture, like what Eliot did for earlier periods, that would be impossible. Copyright exists to encourage creativity, first from the author, but then later, from others.
  21. So, you guessed it: The Waste Land has an app. Actually, Apple’s app of the week, last week, when it was introduced. Produced by Faber & Faber (owner of most of the content) and TouchPress (an app developer that mostly concentrates on scientific educational apps). The app is nicely designed and produced, and it’s interesting to watch it dance around rights issues: it has newly commissioned readings and commentary, now owned by Faber, materials mined from the Faber archives (Faber was Eliot’s employer for much of his life), and one “buy-in-app” features (the reading by Sir Alec Guinness), which uses material Faber doesn’t own. Had this been made by an American company, it couldn’t have been sold on iTunes in England.
  22. Google is basically on track to digitize everything in WorldCat (30M volumes of print material). Other smaller digitization efforts like the Open Content Alliance have produced significant amounts of material; cultural institutions are engaged in their own smaller scale efforts, but taken together those still amount to a significant amount of digital content. Much of this is coming from academic research libraries, and much of that is flowing back into the HathiTrust, which began as a shared digital repository for CIC (Big Ten) libraries who were contributing to (and getting OCR'd text and page images back from) the Google Books project. The HathiTrust, today, contains 8.7M digitized volumes, 4.7M titles (lots of titles are more than one volume, and some single-volume titles get digitized more than once), and about 3B pages. Of that, about 27% is in the public domain.
  23. In its most restrictive sense, this means you can mine, but you're not allowed to read and understand. The idea is that with copyrighted material, we have to find a way to permit research, but this research can't include activities that would normally be benefits of having purchased the material being used for research--so, for example, if the material is a book, and one normally has to buy a book in order to read it (pace lending rights, etc. etc.) then you can't read. But of course, as any humanist will tell you, if you can't read, you can't understand. And as any machine learning expert will tell you, if you can't read, you supervise machine learning. So, in this wasteland, we can do research, but only with our eyes closed, only in the dark, and only if we promise not to understand our results.
  24. Example: Underwood and OED’s etymological data—taking what was a component of a product and developing an understanding of how it might be used on its own, as a product, in the form of a web service, for example. Using the interest of scholars as a way of understanding how to evolve the business of academic publishing and research support.
  25. In order to determine whether the HTRC represents heaven or hell, though, scholars are going to have to figure out how to work with publishers—how do we use our respective constraints to feed each other, rather than starving at a non-consumptive banquet?