SlideShare ist ein Scribd-Unternehmen logo
1 von 37
SIMON BELL PUBLISHER
MAY/JUNE 2022
Wiley Digital Archives:
Automated Text Recognition
Discussion
• Wiley programme: context
• What is Automated Text Recognition (ATR)?
• Take a live look at examples from the Wiley
Digital Archives' manuscript collections from the
Royal Geographical Society and Royal College of
Physicians
• Q and A
Wiley Digital Archives Program
• Royal Anthropological Institute of Great
Britain and Ireland
• Royal Geographical Society (with IBG)
• Royal College of Physicians
• New York Academy of Sciences
• British Association for the Advancement
of Science
Bespoke Platform for
Primary Sources
Digital humanities tools and features
Tools to power research and teaching
The platform is embedded with an advanced set of digital
humanities tools, designed to maximize the value researchers
and students derive from primary source content.
Functionalities include:
• Textual analysis tools for concordance, collocation,
popularity, relationships and frequency distribution of
terms across archives, disciplines and timelines.
• Geo-tagged maps, even those drawn by hand, are overlaid
with current coordinates and downloaded as GeoTiff files
for use within GIS software suites.
• Exportable (EXCEL and CSV), fielded datasets for tables
and statistics from printed and handwritten sources.
• Textual materials can be downloaded as images, PDF or as
OCR/ATR text and translated into 105 languages.
• Enhanced metadata to facilitate discovery, citations and
references.
Royal Geographical
Society (with IBG)
1478-1953
Partner Archive
Royal Geographical Society (with IBG)
• Founded in 1830, Royal Charter granted by Queen Victoria in 1859
• The Society successfully advocated for the inclusion of geography
in schools and is responsible for the first university positions in
the discipline.
• Merged with the Institute of British Geographers (founded in 1933)
in 1995.
• Holds the world’s largest private collection of maps and charts,
featuring all parts of the world, along with atlases, globe, world
gazetteers, and original manuscript mapping.
• Notable members include John Hanning Speke, David
Livingstone, Gertrude Bell, Robert Falcon Scott, Stanley, Ernest
Shackleton, Edmund Hillary.
• Membership of 16,000+
RGS (with IBG): What’s inside
• Years covered: 1478-1953*
• The numbers:
• Over 20,000 manuscript items
• Over 2,800 monographs and pamphlets
• ~100,000 photographs and 20,000 lantern slides
• ~190,000 maps
• Scope:
• The RGS archive covers the expansion of
European colonial powers, trade efforts, and
conflicts and diplomacy throughout the Middle
East, Africa, South Asia, the Caribbean, the
Americas, East and Southeast Asia, and part of
South America.
• Research and exploration efforts throughout the
world, but especially concentrated in the Polar
regions, Africa, South Asia, and the Middle East.
RGS: Some highlights
• Ernest Shackleton’s expeditions notes, photographs, maps and correspondence
(including the Burberry® helmet) from his expeditions to the Antarctic, part of the
National Antarctic Expedition collection
• Gertrude Bell’s work alongside rich materials about other groundbreaking female
explorers of the late 19th and early 20th centuries
• Historic images, documents and notes from the great Antarctic adventures of Robert
Falcon Scott.
• John Hannon Speke’s African expeditions and first 19th century maps of the
continent
• David Livingstone and his search for the source of the Nile.
• Photos and documents recording Edmund Hillary’s he first successful Mount Everest
ascent in 1953.
• Sir Clements Markham collection
• RGS’s council minute books for over 150 years
• Manuscript maps from RGS as well as those collected by Fellows
• India and “Africa” reports, detailing the RGS’s interactions with the British Government
RGS archive subjects, area studies, and themes
Subjects Area Studies Themes
Anthropology
Agricultural Geography
Cartography
Cultural Studies
Environmental History
Ethnography
Geography
Geology
Geopolitics
Historical Geography
History
History of Colonization and
Decolonization
International Relations
Natural Resources
Meteorology
Physical Geography
Urban Studies
Arctic and Antarctic Studies
African Studies
Asian and Asian Pacific Studies
British and Commonwealth Studies
Caribbean Studies
European Studies
Latin American Studies
Middle Eastern Studies
North American Studies
Southeast Asian Studies
Expeditions into Africa
Expeditions to Arctic and Antarctic
British Empire
European colonization in Africa and
the Middle East
Climate Change
Colonial History, Law and Policies
Colonization and Decolonization
Connected Continents
Environmental Degradation
International Trade Route
Development
Power and Borders
Slavery and Manumission
Women in Science and Exploration
Royal College of
Physicians
1100s-1980s
Partner Archive: Royal College of Physicians
Founded in 1518 by a Royal Charter from King Henry VIII, the Royal College of
Physicians of London (RCP) is the oldest medical college in England. The RCP is
a professional membership body for physicians, with 34,000 members and
fellows across the globe. As the leading body for physicians in the UK and
internationally, The RCP archive brings rare and unique historical materials to
researchers, students and educators across a variety of fields and
departments, helping shape public health today.
Goals and
Activities
• influencing the way that healthcare is designed and delivered
• promoting good health and leading the prevention of ill
health across communities
• supporting physicians to fulfil their potential.
“Drawings by St Aubin of the
Intestine and Early
Classification of the Glandular
Structures.” 1795. Regulation
of Clinical Practice and
Standards. Wiley Digital
Archives: The Royal College of
Physicians. 1795.
Inside: Royal College of Physicians
What’s inside:
~2M page images, from new scanning, drawn from the archives and the Dorchester and John Dee Library collections.
Over 7 centuries of medical history and medical humanities, from ~1100 to ~1980.
Collections across a range of topics, including serving researchers and students in the areas of:
• Medical Humanities
• History/Philosophy of Science,
Medicine, and Technology
• Bioethics
• Anatomy
• Medical Law
• Medical Policy
• Non-Traditional Medicine
• Non-Western Medicine
• Medical Research
(Disease/Treatment),
• Military Medical Practices
• British History
• Colonial/Post-colonial history
(Empire)
• Public Health
• Global Health Policy
• Gender Studies: Women in
Medicine
• Health Education
• Health and Human Rights
• Health Economics
• Tobacco-related topics,
• Medical and Biological Illustration,
• Medicine or Science and
the Humanities
• Social Factors in Health
• Religion and Medicine
• History of Mental Illness
• History of Pharmacology
• Cultural and Social History
• Medieval Studies
• Early Modern Studies
• 18th-20th century Studies
• History of Education
• General History Research
Royal College of Physicians
Key Areas of Research Supported
History Medical Humanities History of the RCP
Military Medicine
Early and Medieval
Medical Texts
Public Health
Non-Western
Medicine
Anatomical Studies
MEDICINE
Medicine Disease
Law, Regulation,
Policy, and
Control
World Health
19th C
Questionnaires
Early Medical
Textbooks and
Education
RCP: Thomas Bateman
Watercolor; Drawing:
Diseases
RCP: Autographed letter from
Elizabeth Garrett
Inside: Royal College of Physicians
RCP: Manifestations of Cholera
at Sea Map
Distribution of Disease in
Africa: To Illustrate Paper
by R. W. Felkin, M.D.” 1894.
Map. Wiley Digital Archives:
Royal Geographical Society
(with IBG). 1894.
http://WDAgo.com/s/463b8
132
Connecting the RAI to the
RCP and the RGS—A visual
Aide
DEMO drawn from RCP and RGS databases
Automated Text
Recognition (ATR)
Seven centuries of manuscripts become searchable and accessible
What is ATR?
Automated Text Recognition (ATR) makes
manuscripts fully discoverable in search.
Before ATR
This manuscript page
can only be found via
top-level metadata.
The text isn’t
searchable. It can
only be analyzed by
reading it – which
scripts make taxing.
After ATR
This page has been
converted into
typeset. All the text is
searchable, and it can
be seamlessly
analyzed with digital
humanities tools.
Discovery
“South American Notes: History of Ecuador; Rocafuerte; Tupac Amaru, Etc.”
1814–1861. Special Collections: Sir Clements Markham. Wiley Digital Archives:
Royal Geographical Society (with IBG). http://WDAgo.com/s/fba31c53
Accessibility
Moorcroft, William. 1820–
1825. “Despatches
Concerning the Journey to
Leh, Ladakh and Kashmere.”
Journal Manuscripts. Wiley
Digital Archives: Royal
Geographical Society (with
IBG).
http://WDAgo.com/s/11bbe5
a8
Opportunity
Royal College of Physicians of
London. 1592–1675. “Affidavit
of Dr. Thomas Lawrence, the
President.” Membership. Wiley
Digital Archives: The Royal
College of Physicians.
February 19, 1592–July 29, 1675.
http://WDAgo.com/s/1f7336b7
Handwritten Text Recognition: Paving the Way for ATR
Handwritten Text Recognition: Probability
Handwritten Text Recognition: HTR uses algorithms to
determine the possible combinations of characters in
manuscript content in order to generate full-text hits. The
artificial intelligence then assigns a confidence rating to
each result to return relevant hits, and discards those which
the AI deems irrelevant.
HTR has been in development for many years, but the process
has never yielded an “acceptable” level of confidence.
Automated Text Recognition: Discovery, Accessibility, Opportunity
Automated Text Recognition: Analysis-based
ATR is an approach to text recognition that references sets of
baseline data (collections of words in different script styles from
across multiple sources) and then analyzes each word against
these “ground truth” datasets (via network analysis), identifying the
most suitable dataset.
The ATR engine then runs the images of the content against that
dataset (at the collection or document level, depending on the
analysis) to identify terms.
The Value of ATR: Unlocking Manuscripts
ATR enables the text of
handwritten documents of
WDA to be fully discoverable
in searches that were
previously only reading top-
level metadata (citation info,
notes field, item list,
catalogue entry).
Wiley’s old processes would
leave this page completely
undiscoverable via search
and excluded from the
Analysis Hub tools.
Before After
Search on “new route”
“..The greater part of it, is
entirely new, as he folloire a
new route; 4 of the places on
the Coast, which he
describes at length in the
beginning of the peper,
scarcely any thing was
previously known.”
What value does ATR bring to primary source-based research?
• Discoverability: ATR enables the manuscript-based items in
Wiley Digital Archives to be fully discoverable in searches, and to
“compete” in relevancy rankings against OCR’d content.
• Accessibility: ATR levels the playing field for non-specialists by
presenting easy-to-read text files for handwritten (including
cursive) documents which can also be translated into the user’s
preferred languages.
• Opportunity: Even expert users tend to favor typeset
documents because, until now, they were more discoverable in
searches, easier to read, and lent themselves to analysis via DH
tools (Analysis Hub).
• Now, WDA’s manuscript items are closer to being on-par with the
typeset content in these areas, representing a more holistic view
of a collection than was possible with OCR, alone.
“Before and After ATR” Report
A “before and after” analysis
is available based on a
sample set of search terms
Discovery, Accessibility, Opportunity
Automated Text Recognition (ATR):
• Adds an additional 84M terms to the first five Wiley archives and amongst those are results
that otherwise wouldn’t be found easily
• Makes the collections more useable by a wider group (non-specialist)
• Leads to greater discovery and new scholarship opportunities
• Saves researchers time in close-reading
• Enables new questions to be asked of the expanded dataset; and
• Balances the inherent bias of other digitized collections which are published/typeset, that
until now have been more discoverable and accessible through OCR than manuscripts.
ATR: How it accelerates research
• Search access to unique manuscript primary sources
that before could only be digitally discovered via top-level
metadata.
• Enables digital navigation for handwritten primary
source materials that before ATR was only possible by
reading the entire material.
• Solves comprehension challenges: Handwriting presents
readability issues that ATR solves by converting it into
typeset.
• Makes all manuscript content available for data analysis,
and seamlessly easy to use in quotes and refer in
citations.
• Powers easy, efficient, seamless, and meaningful search,
discovery, and analysis of unique manuscript content,
enabling focus on insightful rather than time-consuming
work.
DEMO drawn from RCP and RGS databases
Simon Bell Publisher
12 May 2022
sbell@wiley.com
WDA: Environmental
Science and History
33
WHAT IS IN IT?
WDA: Environmental Science and History
• focuses on critical aspects of anthropogenic change, with unique and rare
archival collections from multiple, global sources.
• builds on Wiley’s unrivalled publishing in these subject areas
• presents a rich historical dimension to growing fields of research related
to environmental history and environmental science.
34
SUBJECT AREAS
Content:
Collections touch on the following subjects (inter alia):
• Deforestation, Conservation and Forestry
• Agriculture, Livestock, and Fisheries (Food Production)
• Biodiversity
• Habitat Loss and Extinction
• Water Sources, Irrigation, Wetlands, and Hydrology
• Climate Change, Extreme Weather, and Meteorology
• Industrial Ecology and Pollution
• Natural Resources, Fossil Fuels, and Energy Consumption
• Polar studies
35
CONTENT PARTNERS
➢ Two partners currently signed:
➢ 150,000 pages of content
ATR: Feedback and Discussion
Your opinions matter:
• What do you think of ATR as a tool for students who
cannot read manuscript material? Or would otherwise
overlook manuscripts in favor of more accessible
content?
• Are ATR and the translated documents in other
languages a value to your students whose native
language is not English?
• How do you see ATR being used?
Thank you
sbell@wiley.com

Weitere ähnliche Inhalte

Ähnlich wie Automated Text Recognition (ATR).pdf

Inspiring Research, Inspiring Scholarship The value and benefits of digitise...
Inspiring Research, Inspiring Scholarship  The value and benefits of digitise...Inspiring Research, Inspiring Scholarship  The value and benefits of digitise...
Inspiring Research, Inspiring Scholarship The value and benefits of digitise...Simon Tanner
 
ArchiveDevelopment
ArchiveDevelopmentArchiveDevelopment
ArchiveDevelopmentCort Egan
 
WEEK 1.1.1 - Cradle of Early Science.pptx
WEEK 1.1.1 - Cradle of Early Science.pptxWEEK 1.1.1 - Cradle of Early Science.pptx
WEEK 1.1.1 - Cradle of Early Science.pptxJOEYJIMENEZ7
 
An Introduction to the Biodiversity Heritage Library
An Introduction to the Biodiversity Heritage LibraryAn Introduction to the Biodiversity Heritage Library
An Introduction to the Biodiversity Heritage LibraryMartin Kalfatovic
 
Free and Open Access to Biodiversity Literature: An Introduction to the Biodi...
Free and Open Access to Biodiversity Literature: An Introduction to the Biodi...Free and Open Access to Biodiversity Literature: An Introduction to the Biodi...
Free and Open Access to Biodiversity Literature: An Introduction to the Biodi...Martin Kalfatovic
 
Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdf
Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdfEncyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdf
Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdfAndrsHernndezGarca3
 
General & Multidisciplinary Science and Technology Resources
General & Multidisciplinary Science and Technology ResourcesGeneral & Multidisciplinary Science and Technology Resources
General & Multidisciplinary Science and Technology ResourcesAlyson Gamble
 
An Archaeology of the East Midlands: Class 1
An Archaeology of the East Midlands: Class 1An Archaeology of the East Midlands: Class 1
An Archaeology of the East Midlands: Class 1Keith Challis
 
Maurizio Tosi - What is Archaeology
Maurizio Tosi - What is ArchaeologyMaurizio Tosi - What is Archaeology
Maurizio Tosi - What is ArchaeologyAssociazione Minerva
 
3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage Library3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage LibraryMartin Kalfatovic
 
CUA LSC 747_2011
CUA LSC 747_2011CUA LSC 747_2011
CUA LSC 747_2011SCPilsk
 
WEEK 2 - Cradle of Early Science In STS.pptx
WEEK 2 - Cradle of Early Science In STS.pptxWEEK 2 - Cradle of Early Science In STS.pptx
WEEK 2 - Cradle of Early Science In STS.pptxJOEYJIMENEZ7
 
diss-lesson-3-introducing-geography-and-history-200805022056.pptx
diss-lesson-3-introducing-geography-and-history-200805022056.pptxdiss-lesson-3-introducing-geography-and-history-200805022056.pptx
diss-lesson-3-introducing-geography-and-history-200805022056.pptxDan Lhery Gregorious
 
Archaeology of the East Midlands: Class 3. Radcliffe Autumn 2014
Archaeology of the East Midlands: Class 3. Radcliffe Autumn 2014Archaeology of the East Midlands: Class 3. Radcliffe Autumn 2014
Archaeology of the East Midlands: Class 3. Radcliffe Autumn 2014Keith Challis
 
History of Archives Administration (LIS 170)
History of Archives Administration (LIS 170)History of Archives Administration (LIS 170)
History of Archives Administration (LIS 170)Roy Santos Necesario
 

Ähnlich wie Automated Text Recognition (ATR).pdf (20)

Working with Archives
Working with ArchivesWorking with Archives
Working with Archives
 
Inspiring Research, Inspiring Scholarship The value and benefits of digitise...
Inspiring Research, Inspiring Scholarship  The value and benefits of digitise...Inspiring Research, Inspiring Scholarship  The value and benefits of digitise...
Inspiring Research, Inspiring Scholarship The value and benefits of digitise...
 
Hidden Histories
Hidden HistoriesHidden Histories
Hidden Histories
 
ArchiveDevelopment
ArchiveDevelopmentArchiveDevelopment
ArchiveDevelopment
 
WEEK 1.1.1 - Cradle of Early Science.pptx
WEEK 1.1.1 - Cradle of Early Science.pptxWEEK 1.1.1 - Cradle of Early Science.pptx
WEEK 1.1.1 - Cradle of Early Science.pptx
 
Lib sci2 spring-1
Lib sci2 spring-1Lib sci2 spring-1
Lib sci2 spring-1
 
EBHL 2008
EBHL 2008EBHL 2008
EBHL 2008
 
An Introduction to the Biodiversity Heritage Library
An Introduction to the Biodiversity Heritage LibraryAn Introduction to the Biodiversity Heritage Library
An Introduction to the Biodiversity Heritage Library
 
Lib sci2 spring-1a
Lib sci2 spring-1aLib sci2 spring-1a
Lib sci2 spring-1a
 
Free and Open Access to Biodiversity Literature: An Introduction to the Biodi...
Free and Open Access to Biodiversity Literature: An Introduction to the Biodi...Free and Open Access to Biodiversity Literature: An Introduction to the Biodi...
Free and Open Access to Biodiversity Literature: An Introduction to the Biodi...
 
Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdf
Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdfEncyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdf
Encyclopedia.of.Archaeology.History.and.Discoveries.eBook-EEn.pdf
 
General & Multidisciplinary Science and Technology Resources
General & Multidisciplinary Science and Technology ResourcesGeneral & Multidisciplinary Science and Technology Resources
General & Multidisciplinary Science and Technology Resources
 
An Archaeology of the East Midlands: Class 1
An Archaeology of the East Midlands: Class 1An Archaeology of the East Midlands: Class 1
An Archaeology of the East Midlands: Class 1
 
Maurizio Tosi - What is Archaeology
Maurizio Tosi - What is ArchaeologyMaurizio Tosi - What is Archaeology
Maurizio Tosi - What is Archaeology
 
3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage Library3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage Library
 
CUA LSC 747_2011
CUA LSC 747_2011CUA LSC 747_2011
CUA LSC 747_2011
 
WEEK 2 - Cradle of Early Science In STS.pptx
WEEK 2 - Cradle of Early Science In STS.pptxWEEK 2 - Cradle of Early Science In STS.pptx
WEEK 2 - Cradle of Early Science In STS.pptx
 
diss-lesson-3-introducing-geography-and-history-200805022056.pptx
diss-lesson-3-introducing-geography-and-history-200805022056.pptxdiss-lesson-3-introducing-geography-and-history-200805022056.pptx
diss-lesson-3-introducing-geography-and-history-200805022056.pptx
 
Archaeology of the East Midlands: Class 3. Radcliffe Autumn 2014
Archaeology of the East Midlands: Class 3. Radcliffe Autumn 2014Archaeology of the East Midlands: Class 3. Radcliffe Autumn 2014
Archaeology of the East Midlands: Class 3. Radcliffe Autumn 2014
 
History of Archives Administration (LIS 170)
History of Archives Administration (LIS 170)History of Archives Administration (LIS 170)
History of Archives Administration (LIS 170)
 

Mehr von UKSG: connecting the knowledge community

UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...
UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...
UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...UKSG: connecting the knowledge community
 
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...UKSG 2024 Plenary 4 - Combining Open Access research and large language model...
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...UKSG: connecting the knowledge community
 
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...UKSG: connecting the knowledge community
 
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...UKSG: connecting the knowledge community
 
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...UKSG: connecting the knowledge community
 
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA Content
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA ContentUKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA Content
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA ContentUKSG: connecting the knowledge community
 
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...UKSG: connecting the knowledge community
 
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open Research
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open ResearchUKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open Research
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open ResearchUKSG: connecting the knowledge community
 
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...UKSG: connecting the knowledge community
 
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...UKSG: connecting the knowledge community
 
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...UKSG: connecting the knowledge community
 
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...UKSG 2024 - You don't know what you've got till it's gone: Future directions ...
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...UKSG: connecting the knowledge community
 
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...UKSG: connecting the knowledge community
 
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...UKSG: connecting the knowledge community
 
UKSG 2024 - Creating credibility through community: Encouraging high quality ...
UKSG 2024 - Creating credibility through community: Encouraging high quality ...UKSG 2024 - Creating credibility through community: Encouraging high quality ...
UKSG 2024 - Creating credibility through community: Encouraging high quality ...UKSG: connecting the knowledge community
 
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...UKSG: connecting the knowledge community
 
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...UKSG: connecting the knowledge community
 

Mehr von UKSG: connecting the knowledge community (20)

UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...
UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...
UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...
 
UKSG 2024 From algorithms to empowerment by Christina Dinh Nguyen.pdf
UKSG 2024 From algorithms to empowerment by Christina Dinh Nguyen.pdfUKSG 2024 From algorithms to empowerment by Christina Dinh Nguyen.pdf
UKSG 2024 From algorithms to empowerment by Christina Dinh Nguyen.pdf
 
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...UKSG 2024 Plenary 4 - Combining Open Access research and large language model...
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...
 
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...
 
UKSG 2024 Plenary 2 - Let's Talk About Green
UKSG 2024 Plenary 2 - Let's Talk About GreenUKSG 2024 Plenary 2 - Let's Talk About Green
UKSG 2024 Plenary 2 - Let's Talk About Green
 
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...
 
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
 
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA Content
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA ContentUKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA Content
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA Content
 
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...
 
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open Research
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open ResearchUKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open Research
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open Research
 
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...
 
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...
 
UKSG 2024 - Open infrastructure and standards: small bodies, big impact
UKSG 2024 - Open infrastructure and standards: small bodies, big impactUKSG 2024 - Open infrastructure and standards: small bodies, big impact
UKSG 2024 - Open infrastructure and standards: small bodies, big impact
 
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...
 
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...UKSG 2024 - You don't know what you've got till it's gone: Future directions ...
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...
 
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...
 
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...
 
UKSG 2024 - Creating credibility through community: Encouraging high quality ...
UKSG 2024 - Creating credibility through community: Encouraging high quality ...UKSG 2024 - Creating credibility through community: Encouraging high quality ...
UKSG 2024 - Creating credibility through community: Encouraging high quality ...
 
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...
 
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...
 

Kürzlich hochgeladen

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 

Kürzlich hochgeladen (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 

Automated Text Recognition (ATR).pdf

  • 1. SIMON BELL PUBLISHER MAY/JUNE 2022 Wiley Digital Archives: Automated Text Recognition
  • 2. Discussion • Wiley programme: context • What is Automated Text Recognition (ATR)? • Take a live look at examples from the Wiley Digital Archives' manuscript collections from the Royal Geographical Society and Royal College of Physicians • Q and A
  • 3. Wiley Digital Archives Program • Royal Anthropological Institute of Great Britain and Ireland • Royal Geographical Society (with IBG) • Royal College of Physicians • New York Academy of Sciences • British Association for the Advancement of Science
  • 4. Bespoke Platform for Primary Sources Digital humanities tools and features
  • 5. Tools to power research and teaching The platform is embedded with an advanced set of digital humanities tools, designed to maximize the value researchers and students derive from primary source content. Functionalities include: • Textual analysis tools for concordance, collocation, popularity, relationships and frequency distribution of terms across archives, disciplines and timelines. • Geo-tagged maps, even those drawn by hand, are overlaid with current coordinates and downloaded as GeoTiff files for use within GIS software suites. • Exportable (EXCEL and CSV), fielded datasets for tables and statistics from printed and handwritten sources. • Textual materials can be downloaded as images, PDF or as OCR/ATR text and translated into 105 languages. • Enhanced metadata to facilitate discovery, citations and references.
  • 7. Partner Archive Royal Geographical Society (with IBG) • Founded in 1830, Royal Charter granted by Queen Victoria in 1859 • The Society successfully advocated for the inclusion of geography in schools and is responsible for the first university positions in the discipline. • Merged with the Institute of British Geographers (founded in 1933) in 1995. • Holds the world’s largest private collection of maps and charts, featuring all parts of the world, along with atlases, globe, world gazetteers, and original manuscript mapping. • Notable members include John Hanning Speke, David Livingstone, Gertrude Bell, Robert Falcon Scott, Stanley, Ernest Shackleton, Edmund Hillary. • Membership of 16,000+
  • 8. RGS (with IBG): What’s inside • Years covered: 1478-1953* • The numbers: • Over 20,000 manuscript items • Over 2,800 monographs and pamphlets • ~100,000 photographs and 20,000 lantern slides • ~190,000 maps • Scope: • The RGS archive covers the expansion of European colonial powers, trade efforts, and conflicts and diplomacy throughout the Middle East, Africa, South Asia, the Caribbean, the Americas, East and Southeast Asia, and part of South America. • Research and exploration efforts throughout the world, but especially concentrated in the Polar regions, Africa, South Asia, and the Middle East.
  • 9. RGS: Some highlights • Ernest Shackleton’s expeditions notes, photographs, maps and correspondence (including the Burberry® helmet) from his expeditions to the Antarctic, part of the National Antarctic Expedition collection • Gertrude Bell’s work alongside rich materials about other groundbreaking female explorers of the late 19th and early 20th centuries • Historic images, documents and notes from the great Antarctic adventures of Robert Falcon Scott. • John Hannon Speke’s African expeditions and first 19th century maps of the continent • David Livingstone and his search for the source of the Nile. • Photos and documents recording Edmund Hillary’s he first successful Mount Everest ascent in 1953. • Sir Clements Markham collection • RGS’s council minute books for over 150 years • Manuscript maps from RGS as well as those collected by Fellows • India and “Africa” reports, detailing the RGS’s interactions with the British Government
  • 10. RGS archive subjects, area studies, and themes Subjects Area Studies Themes Anthropology Agricultural Geography Cartography Cultural Studies Environmental History Ethnography Geography Geology Geopolitics Historical Geography History History of Colonization and Decolonization International Relations Natural Resources Meteorology Physical Geography Urban Studies Arctic and Antarctic Studies African Studies Asian and Asian Pacific Studies British and Commonwealth Studies Caribbean Studies European Studies Latin American Studies Middle Eastern Studies North American Studies Southeast Asian Studies Expeditions into Africa Expeditions to Arctic and Antarctic British Empire European colonization in Africa and the Middle East Climate Change Colonial History, Law and Policies Colonization and Decolonization Connected Continents Environmental Degradation International Trade Route Development Power and Borders Slavery and Manumission Women in Science and Exploration
  • 12. Partner Archive: Royal College of Physicians Founded in 1518 by a Royal Charter from King Henry VIII, the Royal College of Physicians of London (RCP) is the oldest medical college in England. The RCP is a professional membership body for physicians, with 34,000 members and fellows across the globe. As the leading body for physicians in the UK and internationally, The RCP archive brings rare and unique historical materials to researchers, students and educators across a variety of fields and departments, helping shape public health today. Goals and Activities • influencing the way that healthcare is designed and delivered • promoting good health and leading the prevention of ill health across communities • supporting physicians to fulfil their potential. “Drawings by St Aubin of the Intestine and Early Classification of the Glandular Structures.” 1795. Regulation of Clinical Practice and Standards. Wiley Digital Archives: The Royal College of Physicians. 1795.
  • 13. Inside: Royal College of Physicians What’s inside: ~2M page images, from new scanning, drawn from the archives and the Dorchester and John Dee Library collections. Over 7 centuries of medical history and medical humanities, from ~1100 to ~1980. Collections across a range of topics, including serving researchers and students in the areas of: • Medical Humanities • History/Philosophy of Science, Medicine, and Technology • Bioethics • Anatomy • Medical Law • Medical Policy • Non-Traditional Medicine • Non-Western Medicine • Medical Research (Disease/Treatment), • Military Medical Practices • British History • Colonial/Post-colonial history (Empire) • Public Health • Global Health Policy • Gender Studies: Women in Medicine • Health Education • Health and Human Rights • Health Economics • Tobacco-related topics, • Medical and Biological Illustration, • Medicine or Science and the Humanities • Social Factors in Health • Religion and Medicine • History of Mental Illness • History of Pharmacology • Cultural and Social History • Medieval Studies • Early Modern Studies • 18th-20th century Studies • History of Education • General History Research
  • 14. Royal College of Physicians Key Areas of Research Supported History Medical Humanities History of the RCP Military Medicine Early and Medieval Medical Texts Public Health Non-Western Medicine Anatomical Studies MEDICINE Medicine Disease Law, Regulation, Policy, and Control World Health 19th C Questionnaires Early Medical Textbooks and Education
  • 15. RCP: Thomas Bateman Watercolor; Drawing: Diseases RCP: Autographed letter from Elizabeth Garrett Inside: Royal College of Physicians RCP: Manifestations of Cholera at Sea Map
  • 16. Distribution of Disease in Africa: To Illustrate Paper by R. W. Felkin, M.D.” 1894. Map. Wiley Digital Archives: Royal Geographical Society (with IBG). 1894. http://WDAgo.com/s/463b8 132 Connecting the RAI to the RCP and the RGS—A visual Aide
  • 17. DEMO drawn from RCP and RGS databases
  • 18. Automated Text Recognition (ATR) Seven centuries of manuscripts become searchable and accessible
  • 19. What is ATR? Automated Text Recognition (ATR) makes manuscripts fully discoverable in search. Before ATR This manuscript page can only be found via top-level metadata. The text isn’t searchable. It can only be analyzed by reading it – which scripts make taxing. After ATR This page has been converted into typeset. All the text is searchable, and it can be seamlessly analyzed with digital humanities tools.
  • 20. Discovery “South American Notes: History of Ecuador; Rocafuerte; Tupac Amaru, Etc.” 1814–1861. Special Collections: Sir Clements Markham. Wiley Digital Archives: Royal Geographical Society (with IBG). http://WDAgo.com/s/fba31c53
  • 21. Accessibility Moorcroft, William. 1820– 1825. “Despatches Concerning the Journey to Leh, Ladakh and Kashmere.” Journal Manuscripts. Wiley Digital Archives: Royal Geographical Society (with IBG). http://WDAgo.com/s/11bbe5 a8
  • 22. Opportunity Royal College of Physicians of London. 1592–1675. “Affidavit of Dr. Thomas Lawrence, the President.” Membership. Wiley Digital Archives: The Royal College of Physicians. February 19, 1592–July 29, 1675. http://WDAgo.com/s/1f7336b7
  • 23. Handwritten Text Recognition: Paving the Way for ATR Handwritten Text Recognition: Probability Handwritten Text Recognition: HTR uses algorithms to determine the possible combinations of characters in manuscript content in order to generate full-text hits. The artificial intelligence then assigns a confidence rating to each result to return relevant hits, and discards those which the AI deems irrelevant. HTR has been in development for many years, but the process has never yielded an “acceptable” level of confidence.
  • 24. Automated Text Recognition: Discovery, Accessibility, Opportunity Automated Text Recognition: Analysis-based ATR is an approach to text recognition that references sets of baseline data (collections of words in different script styles from across multiple sources) and then analyzes each word against these “ground truth” datasets (via network analysis), identifying the most suitable dataset. The ATR engine then runs the images of the content against that dataset (at the collection or document level, depending on the analysis) to identify terms.
  • 25. The Value of ATR: Unlocking Manuscripts ATR enables the text of handwritten documents of WDA to be fully discoverable in searches that were previously only reading top- level metadata (citation info, notes field, item list, catalogue entry). Wiley’s old processes would leave this page completely undiscoverable via search and excluded from the Analysis Hub tools. Before After
  • 26. Search on “new route” “..The greater part of it, is entirely new, as he folloire a new route; 4 of the places on the Coast, which he describes at length in the beginning of the peper, scarcely any thing was previously known.”
  • 27. What value does ATR bring to primary source-based research? • Discoverability: ATR enables the manuscript-based items in Wiley Digital Archives to be fully discoverable in searches, and to “compete” in relevancy rankings against OCR’d content. • Accessibility: ATR levels the playing field for non-specialists by presenting easy-to-read text files for handwritten (including cursive) documents which can also be translated into the user’s preferred languages. • Opportunity: Even expert users tend to favor typeset documents because, until now, they were more discoverable in searches, easier to read, and lent themselves to analysis via DH tools (Analysis Hub). • Now, WDA’s manuscript items are closer to being on-par with the typeset content in these areas, representing a more holistic view of a collection than was possible with OCR, alone.
  • 28. “Before and After ATR” Report A “before and after” analysis is available based on a sample set of search terms
  • 29. Discovery, Accessibility, Opportunity Automated Text Recognition (ATR): • Adds an additional 84M terms to the first five Wiley archives and amongst those are results that otherwise wouldn’t be found easily • Makes the collections more useable by a wider group (non-specialist) • Leads to greater discovery and new scholarship opportunities • Saves researchers time in close-reading • Enables new questions to be asked of the expanded dataset; and • Balances the inherent bias of other digitized collections which are published/typeset, that until now have been more discoverable and accessible through OCR than manuscripts.
  • 30. ATR: How it accelerates research • Search access to unique manuscript primary sources that before could only be digitally discovered via top-level metadata. • Enables digital navigation for handwritten primary source materials that before ATR was only possible by reading the entire material. • Solves comprehension challenges: Handwriting presents readability issues that ATR solves by converting it into typeset. • Makes all manuscript content available for data analysis, and seamlessly easy to use in quotes and refer in citations. • Powers easy, efficient, seamless, and meaningful search, discovery, and analysis of unique manuscript content, enabling focus on insightful rather than time-consuming work.
  • 31. DEMO drawn from RCP and RGS databases
  • 32. Simon Bell Publisher 12 May 2022 sbell@wiley.com WDA: Environmental Science and History
  • 33. 33 WHAT IS IN IT? WDA: Environmental Science and History • focuses on critical aspects of anthropogenic change, with unique and rare archival collections from multiple, global sources. • builds on Wiley’s unrivalled publishing in these subject areas • presents a rich historical dimension to growing fields of research related to environmental history and environmental science.
  • 34. 34 SUBJECT AREAS Content: Collections touch on the following subjects (inter alia): • Deforestation, Conservation and Forestry • Agriculture, Livestock, and Fisheries (Food Production) • Biodiversity • Habitat Loss and Extinction • Water Sources, Irrigation, Wetlands, and Hydrology • Climate Change, Extreme Weather, and Meteorology • Industrial Ecology and Pollution • Natural Resources, Fossil Fuels, and Energy Consumption • Polar studies
  • 35. 35 CONTENT PARTNERS ➢ Two partners currently signed: ➢ 150,000 pages of content
  • 36. ATR: Feedback and Discussion Your opinions matter: • What do you think of ATR as a tool for students who cannot read manuscript material? Or would otherwise overlook manuscripts in favor of more accessible content? • Are ATR and the translated documents in other languages a value to your students whose native language is not English? • How do you see ATR being used?