JSTOR Labs, an experimental product development group at JSTOR, has been working on a new form of search, in which users can upload their own document and search using that. Scholars can upload near-finished manuscripts as a way to complete a literature review, and students can enter the few pages of a work-in-progress paper to find scholarship they'll need to finish it. In this Lightning Round, I will demonstrate this new tool and the technology that powers it, as well as step through the "design thinking" and rapid prototyping processes that led to its development.
Text Analyzer: a New Way to Search from JSTOR Labs
1. TEXT ANALYZER:
A NEW WAY TO SEARCH,
FROM JSTOR LABS
DPLAfest 2017
20 April 2017
Alex Humphreys
@abhumphreys
2. JSTOR Labs works with partner publishers, libraries and
labs to create tools for researchers, teachers and students
that are immediately useful – and a little bit magical.
4. UNDER THE HOOD
1. Extract Text
a. From many textual
formats (pdf, word, et
al)
b. OCR, if needed (e.g. a
picture of a page in a
magazine)
2. Identify Terms
a. Topics: JSTOR
Thesaurus & an LDA
Topic Model
b. Entities: Alchemy
(Watson), OpenCalais,
Stanford, Apache
3. Generate Results
a. TF-IDF to select 5 terms
b. “OR” search
c. Relevance ranked based
on “equalizer”
5. HOW WE BUILT IT
The Design Squiggle, by Damien Newman:
http://cargocollective.com/central/The-Design-Squiggle/
6. THE SEED…
The Design Squiggle, by Damien Newman:
http://cargocollective.com/central/The-Design-Squiggle/