The past decade has seen various approaches for automatic identification and extraction of chemical information from unstructured sources emerging. These have opened new possibilities to exploit, organize, query, and analyse chemical content to support research and development processes as well as IP-related tasks.
Several solutions for chemical named entity recognition exist, all of them showing a reasonable annotation quality. Each of them uses slightly different approaches depending on its focus and therefore shows specific strengths and weaknesses. However, when it comes to real-world applications, technical challenges such as large and/or heterogeneous text corpora appear. Questions for scalability, performance, and parallelization emerge.
This talk addresses the above mentioned questions and challenges in terms of a joint FIZ Karlsruhe and InfoChem project, where FIZ Karlsruhe will leverage the chemical annotations, based on InfoChem’s chemical text mining technology, for its comprehensive range of patent full-texts, making them more easily accessible and allowing for even more precise and complete user queries.