Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Text Data Mining & Publishing

12 Aufrufe

Veröffentlicht am

If you are working on a computational text analysis project and have wondered how to legally acquire, use, and publish text and data, this workshop is for you! We will teach you 5 legal literacies (copyright, contracts, privacy, ethics, and special use cases) that will empower you to make well-informed decisions about compiling, using, and sharing your corpus. By the end of this workshop, and with a useful checklist in hand, you will be able to confidently design lawful text analysis projects or be well positioned to help others design such projects.

Veröffentlicht in: Bildung
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Text Data Mining & Publishing

  1. 1. Copyright & Fair Use for Digital Projects Text Data Mining & Publishing UC Berkeley Library Rachael Samberg, J.D., MLIS Stacy Reardon, MA, MLS
  2. 2. What you can do, not what you can’t
  3. 3. Scholars are turning content into data
  4. 4. But scholars (and academic staff, in supporting them) face questions about rights
  5. 5. The Basics of TDM “Text mining is the use of automated tools, techniques or technology to process large volumes of digital content that is often not well structured - to identify and select relevant information; to extract information from the content, to identify relationships within / between / across documents and incidents or events for meta-analysis.” - from Text & Data Mining - A Librarian Overview by Ann Oakerson (2013)
  6. 6. TDM Literacies Contracts Privacy Copyright Ethics & Policy Other Statutes/ Use Cases
  7. 7. Copyright Exclusive rights to original expression for limited periods of time
  8. 8. Exclusive Rights ▪Reproduction ▪Derivative works ▪Distribution ▪Public performance ▪Public display
  9. 9. Public Domain War and Peace, Tolstoy, English translation 1899 CDC report
  10. 10. Facts & Ideas Nicholas Mazza, Poetry therapy: Toward a research agenda for the 1990s, The Arts in Psychotherapy, Volume 20, Issue 1,1993,51-59,
  11. 11. Content Data about the content TDM researchers can use copyrighted content!
  12. 12. Fair Use 17 U.S.C.§ 107 “The fair use of a copyrighted work…for prposes such as criticism, comment, news reporting, teaching…, scholarship, or research, is not an infringement of copyright.”
  13. 13. Four-Factor Balancing Test 1. Purpose & character of use “Transformativeness” often dominates 2. Nature of copyrighted work Whether factual/scholarly work 3. Amount and substantiality Size & importance of portion 4. Effect on potential market Whether it supplants market
  14. 14. Authors Guild v. HathiTrust 755 F.3d 87 (2d Cir. 2014) Textual analysis that digital library enabled was transformative under factor one, and overall fair Authors Guild v. Google 804 F.3d 202 (2d Cir. 2015) Creation of full-text searchable database with “snippet view” and “ngram viewer” [search strings] were fair uses
  15. 15. iParadigms, 562 F. 3d 630 (4th Cir. 2009) Plagiarism detection software that replicated content to detect similarities was fair use
  16. 16. From research to publishing
  17. 17. Fox News v TVEyes, 883 F.3d 169 (2018) Basic functionality and archiving features were fair use, but making available 10-minute clips was not
  18. 18. ● Likely fair to digitize to conduct text data mining (w/security precautions) ● May not be fair to republish large portions of content ● May not be fair to circulate the digitized texts/corpus ● Case-by-case Takeaways
  19. 19. Contracts
  20. 20. Database Agreements Challenges: - Terms - Visibility
  21. 21. Archives Agreement “I understand that permission to publish, or otherwise publicly use, materials . . . must be [granted by library] I understand further that the University makes no representation that it is the owner of the copyright... and that permission to publish must also be obtained from the owner of the copyright.”
  22. 22. Website Terms “If you intend to quote extensive amounts of text, use other original content, or reproduce images from this site, please contact us for permission.”
  23. 23. California Digital Library’s Model Database Language Authorized Users may use the Licensed Materials to perform and engage in text and/or data mining activities for academic research, scholarship, and other educational purposes... and may utilize and share the results of text and/or data mining in their scholarly work and make the results available for use by others, so long as the purpose is not to create a product for use by third parties that would substitute for the Licensed Materials.
  24. 24. CDL Model License: Preserving Fair Use Notwithstanding the foregoing, nothing in this agreement shall otherwise restrict uses of the material that would be fair use pursuant to 17 U.S.C.§ 107 et seq.
  25. 25. ● Agreements may constrict uses that would otherwise be fair ● Familiarize yourself with the agreement(s), ask for help, evaluate risk ● Alternatives: ○ Check to see if site has an API ○ Negotiate with content providers / ask permission Takeaways
  26. 26. Other Statutes/ Use Cases
  27. 27. - Computer Fraud & Abuse Act - Digital Rights Management (DRM) & Digital Millennium Copyright Act Other Issues
  28. 28. Privacy
  29. 29. Rights of Privacy ● © protects copyright holders' property rights ● Privacy protects people who are subjects of works ● Fed’l (FERPA, HIPAA) vs. State ● State limits ○ Expire at death ○ Newsworthiness and permission are defenses
  30. 30. Ethics & Policy
  31. 31. - Indigenous knowledge - Cultural heritage materials - Endangered species protection
  32. 32. Exercise http://ucblib.link/rw
  33. 33. UC Berkeley Library Rachael Samberg, J.D., MLIS Stacy Reardon, MA, MLS Text Data Mining & Publishing Text Data Mining Guide (Library) guides.lib.berkeley.edu/text-mining TDM Access Help tdm-access@berkeley.edu