A case study presented at UX Cambridge 2016.
For hundreds of years, discoveries in science have been discussed, debated and advanced within the scientific literature. Finding evidence in the literature, to test a hypothesis, is fundamental to scientific research.
But finding evidence in scientific literature can be time consuming and difficult, especially as the number of published articles increases significantly each year. Advances in text mining technology offer the potential to make this task easier and quicker. Text miners are software engineers and subject experts who write algorithms to find useful information in vast amounts of unstructured text content. Deciding what information is useful to end users, and presenting it in an intuitive way, at the right point in time, is where UX can help.
This is a case study about annotating scientific terms and concepts in millions of research articles, with the goal to help life science researchers identify relevant information in articles quickly and easily. We explain how text miners, UX and developers collaborated; what we discovered about user needs; challenges and constraints we faced and iterative improvements we have made to the design.
Six Myths about Ontologies: The Basics of Formal Ontology
Ā
Designing with algorithms
1. Designing with
algorithms
How text miners and UX can work together
@micheleidesmith @j_h_kim
Michele Ide-Smith, Product Manager
Jee-Hyub Kim, Text Miner
European Bioinformatics Institute (EMBL-EBI)
3. @micheleidesmith
@micheleidesmith @j_h_kim
What weāll cover
ā¢ Context - ļ¬nding evidence in research literature
ā¢ What are annotations?
ā¢ What is text mining?
ā¢ Research insights and our design process
ā¢ Summary - what we learnt
12. āSometimes itās nicer to scan a PDF, in my
opinion...less scrolling and the ļ¬gures are more
prominent. I really donāt like to read on the screen.ā
āI can search in the PDF a little bit more easily than
in the full text article.ā
āThis [full text] is fairly clear but sometimes PDFs are
slightly easier to read, slightly easier on the eye.ā
14. āI almost never look at PDFs, they are a bit of a pain.ā
āI never go to the publisher site - I like to see all the
articles in the same format. I donāt go to the PDF
unless I want to print it out.ā
15. @micheleidesmith
@micheleidesmith @j_h_kim
Our users
ā¢ Life sciences researchers - ļ¬nd evidence for their
research questions, learn new methods and ļ¬nd all
available literature on a topic
ā¢ Curators - ļ¬nd evidence for e.g. a gene function so
that they can curate a page in a database
34. @micheleidesmith
@micheleidesmith @j_h_kim
Scientiļ¬c literature
ā¢ Biological terms e.g. diseases, organisms, genes,
proteins and chemicals (using ontologies).
ā¢ Biological processes and functions e.g. gene-
disease relationships, protein-protein interactions or
gene function (from proximity of words in text and
position in the article)
38. @micheleidesmith
@micheleidesmith @j_h_kim
Skim
read
abstracts Look at
ļ¬gures
Skim
read
results
CTRL & F
to ļ¬nd
keywords
in text
Check
for data
ļ¬les
Prioritise what
to read
Researchers prioritise what
they want to read, as their
time is limited.
They use different strategies
to identify articles which are
worth reading in full.
41. @micheleidesmith
@micheleidesmith @j_h_kim
Research questions
ā¢ Do participants discover/use the feature?
ā¢ How easy is it to use/navigate through annotations?
ā¢ Do they trust the information?
ā¢ How do they feel about inaccurate annotations?
ā¢ Would they provide feedback if they had the
opportunity?
47. āIf itās not speciļ¬c enough,
I end up with a lot of
things being highlighted.ā
48. @micheleidesmith
@micheleidesmith @j_h_kim
Granularity
ā¢ Some terms appeared too frequently, or were too
general to be useful e.g. ācellā or āformationā.
ā¢ Participants expected us to split Gene Ontology (GO)
terms into 3 separate categories e.g. Biological
process, molecular function, cellular component
49. āI guess false positives
automatically make me anxious
about whether to believeā¦"
57. @micheleidesmith
@micheleidesmith @j_h_kim
Discoverability
ā¢ We can only show annotations on articles with a CC-
BY, CC-BY-NC or CC-0 license
ā¢ We can't display numbers in brackets due to the
performance impact on page loading
ā¢ Participants didnāt want highlights on by default
ā¢ Some people claim to ignore the right column
60. āI think itās good that you can click more
than one. Because you can more easily
associate proteins or genes with GO, or the
organism. Which is very good. I would look
for yellow close to blue or orange.ā
62. āThe details one is an extra level
of clicking thatās frustrating. This
[structure diagram] is great.ā
63. @micheleidesmith
@micheleidesmith @j_h_kim
Engagement
ā¢ Once annotations were highlighted in the text,
participants didnāt necessarily realise they could
interact with them
ā¢ They expected to see something useful, which makes
clicking on the annotation worthwhile
64. āMaybe Iām trying to be too lazyā.
āWith my curator hat on accession numbers are
exciting, but I wouldnāt want to have to scroll
through the article to see if there was one.ā
āIf you click on organisms Iād expect it to expand
out and see the unique items e.g. zebraļ¬shā
69. āI really think this is amazingly useful
to have all the names of the genes
highlighted because you can get a
quick overview, which is much better
than trying to read the text quickly.ā
70. āI do like it, itās clever! ...It makes
life much faster, rather than going
in and outā¦.It makes information
and searching much fasterā