Presented by Marjorie Hlava, president of Access Innovations, Inc., at the American Society for Information Science and Technology's 23rd Annual SIG/CR Classification Research Workshop on October 26, 2012.
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
Tales From the Field: Implementing Information Technology
1. Tales from the Field:
Implementing
Information Theory
SIG CR - 2012
Marjorie Hlava, President
Access Innovations, Inc.
www.accessinn.com
2. Implementing Information Theory
The case of the missing abstracts
Russian information
US PTO
Getty adventures
Vatican bibles
Past basics
Thoughts on directions
3. The Bleeding Edge
Figure Out the client needs
Figure out the specifications
Get approval on the specifications
Figure out how to deliver the data
following the specs
Quality control the data delivery
…. But then life happens
4. The Case of Missing Abstracts
Tests showed that just searching the indexing
did not provide the full answers users
wanted. Searching the titles and abstracts as
well would improve search
Enough space could be found on servers if the
data was moved to in-house from Dialog and
Orbit.
New platform going into production
New format – Messenger
Specifications written, test file approved
5. Specifications
Need 99.998% accuracy for user acceptance
Left tagged ASCII
Office in Mexico City – Access de Mexico
Triple key - double proof
Two sets of volumes
792,000 abstract tapes destroyed
1970 – 1982 data
7. CAS to Philippines
Limo from the airport with the remaining volumes
Typhoon Dot
October 12, 1985
Clark Air Force base evacuated
Power out for weeks
8. Jamaica
Hurricane Kate November 1985
4 inches of water in the
computer room
No power on the island
9. Beijing China November 1985
NOTHING HAPPENED
Finished
On time
Under budget
At promised accuracy level
Client said “ when I read your contract I
thought you had an unusual level of detail
on the Acts of God clauses….
But I didn’t expect you to use every one of
them!”
21. Puzzles, Keys, and Digitization
Photocomposition keys
Science typographers
Puzzles – SGML
Encyclopaedia Britannica
Marquis Who’s Who
Designing the Chicago Research and trading
“desks”
22.
23.
24.
25.
26.
27.
28. US PTO Conversions
Scan at 300 dpi
OCR to 97%
5,400,000 patents
Create the machines
Testy
QC algorithms
Display image
Search dirty OCR
Spell right once in 30 pages = findable
35. Success - Failure - Future
Successes
• Chemical Abstracts
• USPTO
• Getty AATA
• British Map Collection
Failures
• Access Russia
• Ipsoa Video Disk
• MAI Mail
36. All projects use classification
To organize the job
To organize the information
To allow the finding of the items once digital
Apply term tags
• thesaurus and controlled
Apply notation
• Not necessarily classification
• Just reflects the content
The classification is NEVER done
• Needs to reflect the ever-changing data
37. Theoretical Underpinnings
Outlines of Knowledge
• Thomas Aquinas
• John Knox (Bacon)
• Morton Taube - Encyclopaedia Britannica
Organization of Knowledge
• Cutter – 1896
• COSATI – 1964
• Alvin Weinberg
• Cranfield Institute papers
• Cleverton, Aitcheson, Vickery
53. Information access is changing
Teletype
Fax
Online
CD-ROM
Downloading
Internet
54. The players are changing
Standalone publishers
Aggregators
Serials and book vendors
Hosting services
Cloud
Disaggregation
Everyone is an author
Loss of quality, accuracy, review
55. The formats are changing
Handwritten
Gutenberg
Linotype
Web Presses
• Photocomposition
Digital layout
Desktop publishing
Web publishing
62. Indexes
Pre-coordinate
• Back of the book
• Subject headings
Post-coordinate
Bayesian
Co-occurrence
Neural nets
Machine learning
Rules systems
63. Now
Changing the way we learn
Changing the way we find things
Easier to manipulate what we know
• http://www.youtube.com/watch?v=B8ofWFx5
25s
Comprehensive information / invasive
• http://www.youtube.com/watch?v=RNJl9EEc
soE
People now know what search is.
64. Future
Information any place, any time
A great big mess - Unless we corral it.
• Tag it,
• Clean it,
• Weed it
• Curate it
Everyone is creating content