Anzeige
Anzeige

Más contenido relacionado

Similar a NAISTビッグデータシンポジウム - 情報 松本先生(20)

Anzeige

Último(20)

NAISTビッグデータシンポジウム - 情報 松本先生

  1. Scientific Paper Analysis Yuji Matsumoto Computational Linguistics Lab Graduate School of Information Science March 6, 2015 Big Data Symposium at NAIST
  2. Large Scale Text Data Data on the Web  SNS: twitter, blog  Wikipedia  News, … Scientific/Technical documents  Scientific Papers  Legal documents: law reports, casebooks  Patent documents
  3. Knowledge Bases Constructed manually  WordNet, Domain ontologies Constructed by community  (Wikipedia)  Freebase Constructed automatically  NELL: Never-Ending Language Learning  MindNet
  4. Applications Knowledge Graph (Google)  Knowledge extracted from Freebase, Wikipedia, … Watson (IBM)  Extracted from Wikipedia  Deep QA
  5. Structures of KB Linked structure  entities and relations  PDF  Entity: person, country, products, etc  Relation: born_in(Barack Obama, Honolulu) locates_in(Honolulu, Hawaii) state_of(Hawaii, USA)
  6. Natural Language Analysis How text is analyzed  Word segmentation, Part-of-speech tagging  Named entity recognition  Syntactic parsing  Semantic disambiguation  Semantic parsing  Discourse analysis
  7. Linked Knowledge Extraction Named entity recognition  Extraction of entities, concepts Syntactic dependency parsing  direct dependency between entities Semantic parsing  predicate argument structure analysis  subject-predicate-object, relation between entities Discourse analysis  co-reference – the same entity by different mentions  relation between facts: temporal, causal
  8. 8 We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. TPA induction inhibits the binding of the transcription factor NF-E2 to                                                this transcriptional control element. TPA induction increases the binding of AP-1 factors to this element. Cause Theme Theme Theme Theme S1 S2 S3 Semantic Parsing: Example Katsumasa Yoshikawa, Sebastian Riedel, Tsutomu Hirao, Masayuki Asahara, Yuji Matsumoto, "Coreference Based Event-Argument Relation Extraction on Biomedical Text,“ Journal of Biomedical Semantics, Volume 2, Supplement 5, S6, October 2011
  9. 9 "this element" in S2 is coreferent to… "a regulatory element" in S1 We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. Corefer TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. TPA induction increases the binding of AP-1 factors to this element. Cause Theme Theme Theme Theme S1 S2 S3 Co-reference analysis
  10. 10 The true argument (Theme) of binding is "a regulatory element“ and "this element" is just an anaphor of it Transitivity enables us to conflate the information We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. (B) Corefer (C) Theme TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. TPA induction increases the binding of AP-1 factors to this element. Cause Theme Theme Theme (A) Theme S1 S2 S3 (A) Theme & (B) Corefer => (C) Theme Information conflation
  11. 11 We analyzed the effect on the binding and the activity of transcription factors at a regulatory element. Corefer Theme TPA induction inhibits the binding of the transcription factor NF-E2 to this transcriptional control element. TPA induction increases the binding of AP-1 factors to this element. Cause Theme Theme Theme Theme Theme CoreferTheme S1 S2 S3 Discourse analysis
  12. Syntactic parsing NE chunking Part-of-Speech(POS) tagging Syntactic parsing NE chunking Part-of-Speech(POS) tagging Predicate-argument Structure analysis Coreference resolution Relation extractionsemantic/ context processing Machine Learning / Knowledge Acquisition Document Structure AnalysisDocument Structure Analysis Knowledge Bases (Dmain Ontologies) Knowledge Bases (Dmain Ontologies) NLP Technologies for Document Analysis 12
  13. What we can do with Scientific Papers Knowledge extraction (domain knowledge) New fact discovery Content-aware paper search Summarization  Automatic generation of abstracts  Keyword generation  Survey generation Recommendation of related papers Similar article/case search  Structural similarity: papers, law reports, patents
  14. Example: Structured Abstract Generation 14
  15. Related Project Big Mechanism (2014.07-, by DARPA)  http://www.darpa.mil/Our_Work/I2O/Programs/Big_Mechanism.aspx  The Big Mechanism program aims to develop technology to read research abstracts and papers to extract pieces of causal mechanisms, assemble these pieces into more complete causal models, and reason over these models to produce explanations. The domain of the program is cancer biology with an emphasis on signaling pathways.
  16. Architecture of Big Mechanism from Paul Cohen, “DARPA’s Big Mechanism Program”
  17. Deep Language Analysis Complex sentence structure analysis Robust Semantic Parsing Discourse Analysis  Co-reference  Causal / Temporal relation Representation and Reasoning Explanation / Anticipation Confidence/credibility (of extracted facts / what is written in documents)
  18. Large-scale Text Data syntactic dependency structure argument structure, coreference rhetorical / document structure POS tags, phrase/NE chunking relations ( temporal, causal, entailment ) 18 Knowledg Ontology Language Processing and Document Analysis Layers Document Analysis (Document Understanding, Similarity-based Search, Knowledge Discovery/Assembling)
  19. We may be able to do more Research Trend Survey Research (paper) Evaluation  Content-aware citation analysis Innovation Foresight  Eg: Foresight and Understanding from Scientific Exposition (FUSE) Project  http://www.iarpa.gov/index.php/research-programs/fuse Collaboration with people in application areas who need to read/understand documents

Hinweis der Redaktion

  1. OK, let see a typical example of event-argument relations including coreference information. Probably, most people here know biomedical event extraction much better than me. Actually I'm a stranger of bioinformatics. I'm a researcher of NLP. OK, anyway, here, we can see events and arguments in S2. E-A relations in S2 are perfectly labeled at least under the intra-sentential constraints. However, arguments are often related to the other mentions through coreference relations. So, when considering the contexts from forward and backward sentences...
  2. We can see "this element" in S2 is coreferent to "a regulatory element" in S1 Corefer means that more than two mentions refer to a same entity.
  3. In that case the true Theme of "binding" should be "a regulatory element" On the other hand, "this element" is just an anaphor of it. Here, we find a Transitivity. I'll show you that. Now, “this element is a Theme of the “binding”. And “this element” is coreferent to the mention, “a regulatory element”. Then “a regulatory element” is also a Theme of the “binding”. It’s very very simple but pretty effective in order to identify this kind of Event-argument relations. This is Transitivity. OK, let move on to another strategy.
  4. If we see the third sentence, another phrase, “this transcriptional control element” is coreferent to “a regulatory element”. So, the entity described by “a regulatory element” is mentioned several times, over and over again, right? This red line is sometimes called anaphoric chain and the arguments in such a long chain have higher Salience in Discourse. They are valuable in discourse structure and can help our document understanding. So, we want to extracts such arguments aggressively. Our approach with Markov Logic can implement this idea in very direct fashion. Moreover, such arguments are more likely to be arguments of events and this information can improve performance of event-argument extraction.
Anzeige