Simon Bell (Wiley)
ATR is a breakthrough AI which accelerates research work, making handwritten content fully discoverable via search, and turning handwriting into easily readable typeset that can be seamlessly utilized for data analysis, quoted, and cited. Through ATR, manuscripts and printed materials will come close to parity in their discoverability for the first time. With examples drawn from Wiley Digital Archives, find out how ATR can improve archive collection management and librarianship, supporting institutional objectives and publishing by placing researchers ahead of the curve in their fields. You’ll hear how the technology behind ATR works, the difference between ATR, OCR, and HTR. As well as, how ATR will enhance research and teaching by solving manuscript comprehension challenges.
2. Discussion
• Wiley programme: context
• What is Automated Text Recognition (ATR)?
• Take a live look at examples from the Wiley
Digital Archives' manuscript collections from the
Royal Geographical Society and Royal College of
Physicians
• Q and A
3. Wiley Digital Archives Program
• Royal Anthropological Institute of Great
Britain and Ireland
• Royal Geographical Society (with IBG)
• Royal College of Physicians
• New York Academy of Sciences
• British Association for the Advancement
of Science
5. Tools to power research and teaching
The platform is embedded with an advanced set of digital
humanities tools, designed to maximize the value researchers
and students derive from primary source content.
Functionalities include:
• Textual analysis tools for concordance, collocation,
popularity, relationships and frequency distribution of
terms across archives, disciplines and timelines.
• Geo-tagged maps, even those drawn by hand, are overlaid
with current coordinates and downloaded as GeoTiff files
for use within GIS software suites.
• Exportable (EXCEL and CSV), fielded datasets for tables
and statistics from printed and handwritten sources.
• Textual materials can be downloaded as images, PDF or as
OCR/ATR text and translated into 105 languages.
• Enhanced metadata to facilitate discovery, citations and
references.
7. Partner Archive
Royal Geographical Society (with IBG)
• Founded in 1830, Royal Charter granted by Queen Victoria in 1859
• The Society successfully advocated for the inclusion of geography
in schools and is responsible for the first university positions in
the discipline.
• Merged with the Institute of British Geographers (founded in 1933)
in 1995.
• Holds the world’s largest private collection of maps and charts,
featuring all parts of the world, along with atlases, globe, world
gazetteers, and original manuscript mapping.
• Notable members include John Hanning Speke, David
Livingstone, Gertrude Bell, Robert Falcon Scott, Stanley, Ernest
Shackleton, Edmund Hillary.
• Membership of 16,000+
8. RGS (with IBG): What’s inside
• Years covered: 1478-1953*
• The numbers:
• Over 20,000 manuscript items
• Over 2,800 monographs and pamphlets
• ~100,000 photographs and 20,000 lantern slides
• ~190,000 maps
• Scope:
• The RGS archive covers the expansion of
European colonial powers, trade efforts, and
conflicts and diplomacy throughout the Middle
East, Africa, South Asia, the Caribbean, the
Americas, East and Southeast Asia, and part of
South America.
• Research and exploration efforts throughout the
world, but especially concentrated in the Polar
regions, Africa, South Asia, and the Middle East.
9. RGS: Some highlights
• Ernest Shackleton’s expeditions notes, photographs, maps and correspondence
(including the Burberry® helmet) from his expeditions to the Antarctic, part of the
National Antarctic Expedition collection
• Gertrude Bell’s work alongside rich materials about other groundbreaking female
explorers of the late 19th and early 20th centuries
• Historic images, documents and notes from the great Antarctic adventures of Robert
Falcon Scott.
• John Hannon Speke’s African expeditions and first 19th century maps of the
continent
• David Livingstone and his search for the source of the Nile.
• Photos and documents recording Edmund Hillary’s he first successful Mount Everest
ascent in 1953.
• Sir Clements Markham collection
• RGS’s council minute books for over 150 years
• Manuscript maps from RGS as well as those collected by Fellows
• India and “Africa” reports, detailing the RGS’s interactions with the British Government
10. RGS archive subjects, area studies, and themes
Subjects Area Studies Themes
Anthropology
Agricultural Geography
Cartography
Cultural Studies
Environmental History
Ethnography
Geography
Geology
Geopolitics
Historical Geography
History
History of Colonization and
Decolonization
International Relations
Natural Resources
Meteorology
Physical Geography
Urban Studies
Arctic and Antarctic Studies
African Studies
Asian and Asian Pacific Studies
British and Commonwealth Studies
Caribbean Studies
European Studies
Latin American Studies
Middle Eastern Studies
North American Studies
Southeast Asian Studies
Expeditions into Africa
Expeditions to Arctic and Antarctic
British Empire
European colonization in Africa and
the Middle East
Climate Change
Colonial History, Law and Policies
Colonization and Decolonization
Connected Continents
Environmental Degradation
International Trade Route
Development
Power and Borders
Slavery and Manumission
Women in Science and Exploration
12. Partner Archive: Royal College of Physicians
Founded in 1518 by a Royal Charter from King Henry VIII, the Royal College of
Physicians of London (RCP) is the oldest medical college in England. The RCP is
a professional membership body for physicians, with 34,000 members and
fellows across the globe. As the leading body for physicians in the UK and
internationally, The RCP archive brings rare and unique historical materials to
researchers, students and educators across a variety of fields and
departments, helping shape public health today.
Goals and
Activities
• influencing the way that healthcare is designed and delivered
• promoting good health and leading the prevention of ill
health across communities
• supporting physicians to fulfil their potential.
“Drawings by St Aubin of the
Intestine and Early
Classification of the Glandular
Structures.” 1795. Regulation
of Clinical Practice and
Standards. Wiley Digital
Archives: The Royal College of
Physicians. 1795.
13. Inside: Royal College of Physicians
What’s inside:
~2M page images, from new scanning, drawn from the archives and the Dorchester and John Dee Library collections.
Over 7 centuries of medical history and medical humanities, from ~1100 to ~1980.
Collections across a range of topics, including serving researchers and students in the areas of:
• Medical Humanities
• History/Philosophy of Science,
Medicine, and Technology
• Bioethics
• Anatomy
• Medical Law
• Medical Policy
• Non-Traditional Medicine
• Non-Western Medicine
• Medical Research
(Disease/Treatment),
• Military Medical Practices
• British History
• Colonial/Post-colonial history
(Empire)
• Public Health
• Global Health Policy
• Gender Studies: Women in
Medicine
• Health Education
• Health and Human Rights
• Health Economics
• Tobacco-related topics,
• Medical and Biological Illustration,
• Medicine or Science and
the Humanities
• Social Factors in Health
• Religion and Medicine
• History of Mental Illness
• History of Pharmacology
• Cultural and Social History
• Medieval Studies
• Early Modern Studies
• 18th-20th century Studies
• History of Education
• General History Research
14. Royal College of Physicians
Key Areas of Research Supported
History Medical Humanities History of the RCP
Military Medicine
Early and Medieval
Medical Texts
Public Health
Non-Western
Medicine
Anatomical Studies
MEDICINE
Medicine Disease
Law, Regulation,
Policy, and
Control
World Health
19th C
Questionnaires
Early Medical
Textbooks and
Education
15. RCP: Thomas Bateman
Watercolor; Drawing:
Diseases
RCP: Autographed letter from
Elizabeth Garrett
Inside: Royal College of Physicians
RCP: Manifestations of Cholera
at Sea Map
16. Distribution of Disease in
Africa: To Illustrate Paper
by R. W. Felkin, M.D.” 1894.
Map. Wiley Digital Archives:
Royal Geographical Society
(with IBG). 1894.
http://WDAgo.com/s/463b8
132
Connecting the RAI to the
RCP and the RGS—A visual
Aide
19. What is ATR?
Automated Text Recognition (ATR) makes
manuscripts fully discoverable in search.
Before ATR
This manuscript page
can only be found via
top-level metadata.
The text isn’t
searchable. It can
only be analyzed by
reading it – which
scripts make taxing.
After ATR
This page has been
converted into
typeset. All the text is
searchable, and it can
be seamlessly
analyzed with digital
humanities tools.
20. Discovery
“South American Notes: History of Ecuador; Rocafuerte; Tupac Amaru, Etc.”
1814–1861. Special Collections: Sir Clements Markham. Wiley Digital Archives:
Royal Geographical Society (with IBG). http://WDAgo.com/s/fba31c53
21. Accessibility
Moorcroft, William. 1820–
1825. “Despatches
Concerning the Journey to
Leh, Ladakh and Kashmere.”
Journal Manuscripts. Wiley
Digital Archives: Royal
Geographical Society (with
IBG).
http://WDAgo.com/s/11bbe5
a8
22. Opportunity
Royal College of Physicians of
London. 1592–1675. “Affidavit
of Dr. Thomas Lawrence, the
President.” Membership. Wiley
Digital Archives: The Royal
College of Physicians.
February 19, 1592–July 29, 1675.
http://WDAgo.com/s/1f7336b7
23. Handwritten Text Recognition: Paving the Way for ATR
Handwritten Text Recognition: Probability
Handwritten Text Recognition: HTR uses algorithms to
determine the possible combinations of characters in
manuscript content in order to generate full-text hits. The
artificial intelligence then assigns a confidence rating to
each result to return relevant hits, and discards those which
the AI deems irrelevant.
HTR has been in development for many years, but the process
has never yielded an “acceptable” level of confidence.
24. Automated Text Recognition: Discovery, Accessibility, Opportunity
Automated Text Recognition: Analysis-based
ATR is an approach to text recognition that references sets of
baseline data (collections of words in different script styles from
across multiple sources) and then analyzes each word against
these “ground truth” datasets (via network analysis), identifying the
most suitable dataset.
The ATR engine then runs the images of the content against that
dataset (at the collection or document level, depending on the
analysis) to identify terms.
25. The Value of ATR: Unlocking Manuscripts
ATR enables the text of
handwritten documents of
WDA to be fully discoverable
in searches that were
previously only reading top-
level metadata (citation info,
notes field, item list,
catalogue entry).
Wiley’s old processes would
leave this page completely
undiscoverable via search
and excluded from the
Analysis Hub tools.
Before After
26. Search on “new route”
“..The greater part of it, is
entirely new, as he folloire a
new route; 4 of the places on
the Coast, which he
describes at length in the
beginning of the peper,
scarcely any thing was
previously known.”
27. What value does ATR bring to primary source-based research?
• Discoverability: ATR enables the manuscript-based items in
Wiley Digital Archives to be fully discoverable in searches, and to
“compete” in relevancy rankings against OCR’d content.
• Accessibility: ATR levels the playing field for non-specialists by
presenting easy-to-read text files for handwritten (including
cursive) documents which can also be translated into the user’s
preferred languages.
• Opportunity: Even expert users tend to favor typeset
documents because, until now, they were more discoverable in
searches, easier to read, and lent themselves to analysis via DH
tools (Analysis Hub).
• Now, WDA’s manuscript items are closer to being on-par with the
typeset content in these areas, representing a more holistic view
of a collection than was possible with OCR, alone.
28. “Before and After ATR” Report
A “before and after” analysis
is available based on a
sample set of search terms
29. Discovery, Accessibility, Opportunity
Automated Text Recognition (ATR):
• Adds an additional 84M terms to the first five Wiley archives and amongst those are results
that otherwise wouldn’t be found easily
• Makes the collections more useable by a wider group (non-specialist)
• Leads to greater discovery and new scholarship opportunities
• Saves researchers time in close-reading
• Enables new questions to be asked of the expanded dataset; and
• Balances the inherent bias of other digitized collections which are published/typeset, that
until now have been more discoverable and accessible through OCR than manuscripts.
30. ATR: How it accelerates research
• Search access to unique manuscript primary sources
that before could only be digitally discovered via top-level
metadata.
• Enables digital navigation for handwritten primary
source materials that before ATR was only possible by
reading the entire material.
• Solves comprehension challenges: Handwriting presents
readability issues that ATR solves by converting it into
typeset.
• Makes all manuscript content available for data analysis,
and seamlessly easy to use in quotes and refer in
citations.
• Powers easy, efficient, seamless, and meaningful search,
discovery, and analysis of unique manuscript content,
enabling focus on insightful rather than time-consuming
work.
33. 33
WHAT IS IN IT?
WDA: Environmental Science and History
• focuses on critical aspects of anthropogenic change, with unique and rare
archival collections from multiple, global sources.
• builds on Wiley’s unrivalled publishing in these subject areas
• presents a rich historical dimension to growing fields of research related
to environmental history and environmental science.
34. 34
SUBJECT AREAS
Content:
Collections touch on the following subjects (inter alia):
• Deforestation, Conservation and Forestry
• Agriculture, Livestock, and Fisheries (Food Production)
• Biodiversity
• Habitat Loss and Extinction
• Water Sources, Irrigation, Wetlands, and Hydrology
• Climate Change, Extreme Weather, and Meteorology
• Industrial Ecology and Pollution
• Natural Resources, Fossil Fuels, and Energy Consumption
• Polar studies
36. ATR: Feedback and Discussion
Your opinions matter:
• What do you think of ATR as a tool for students who
cannot read manuscript material? Or would otherwise
overlook manuscripts in favor of more accessible
content?
• Are ATR and the translated documents in other
languages a value to your students whose native
language is not English?
• How do you see ATR being used?