Sophia Ananiadou talks about big mechanisms (from text to experiments using their text mining)
Training title:TDM unlocking a goldmine of information
Training overview:
Text and Data Mining (TDM) is a natural ‘next step’ in open science. It can lead to new and unexpected discoveries and increase the impact of publications and repositories. This workshop showcases examples of successful TDM and infrastructural solutions for researchers. We will also discuss what is needed to make most of infrastructures and how publishers and repositories can open up their content.
DAY 2 - PARALLEL SESSION 4 & 5
8. EventMine
• Machine learning pipeline event extrac-on system
– Rich linguis-c features
• Several parse results: deep parser (Enju), dependency parser
• Dic-onaries
• Coreference resolu-on, domain adapta-on, filtering
8
hZp://www.nactem.ac.uk/EventMine/
Miwa, M. & Ananiadou, S. (2015) Adaptable, high recall, event extrac-on system with minimal
configura-on, BMC Bioinforma,cs, 16(10), S7
Miwa, M., Thompson, P. and Ananiadou, S. (2012) Boos+ng automa+c event extrac+on from
the literature using domain adapta+on and coreference resolu+on. Bioinforma,cs, 28(13)
Miwa, M., Pyysalo, S., Ohta, T. and Ananiadou, S. (2013). Wide coverage biomedical event
extrac+on using mul+ple par+ally overlapping corpora. BMC Bioinforma<cs, 14(175)
10. Event interpreta-on
• Supports users of search systems
– Discovery of new knowledge, research hypotheses
– Detec-on of uncertainty as confidence measure
• Mul-ple dimensions (meta-knowledge)
– Knowledge Type (observa-on, inves-ga-on, analysis,
method, fact)
– Knowledge Source (current, other)
– Polarity (posi-ve, negated)
10
Thompson, P., Nawaz, R., McNaught, J. and Ananiadou, S. 2011. Enriching a biomedical
event corpus with meta-knowledge annota+on. BMC Bioinforma<cs 12, 393
31. Use case
• A pathway model
• contains reac-ons involving the Ras protein
• output of querying PathwayCommons for one- and two-hop
reac-ons centred on Ras
• A corpus of 12,660 full papers
• Retrieved from the PubMed Central Open Access repository
• using as queries “breast cancer” and its synonyms as
keywords, combined with names of breast cancer cell lines,
e.g., “T-47D”, “MCF-7” (and their variants).
• Methods for event extrac-on, model mapping and
confidence computa-on were applied on the events
extracted from the corpus
31