Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
II-SDV 2017: From KNIME to HighThroughPut Pipelining - from KNIME to HTPP
1. From KNIME 2 HTPP
Transforming Prototypes into High Performance
Boehringer Ingelheim Pharma GmbH & Co. KG
Aleksandar Kapisoda
II-SDV 2017 • April 24th 2017 • Nice, France
2. Talk in brief
1. 2015 – Building Protoypes
Background: KNIME & ChemCurator process
2. 2016- Transforming Prototypes into High Performance
1. Motivation & Goals
How
3. BI UIMA Pipeline
1. Location for Input Data
2. Pipeline Setup for Process
3. Processing & Producing Results
4. Conclusions
5. Outlook
6. Acknowledgements
II-SDV 2017 • April 24th 2017 • Nice, France
6. Motivations & Goals
Why?
• Customizable processing pipelines:
About 10-20 different pipelines needed
• Standardized processing pipeline &
Standardized results
II-SDV 2017 • April 24th 2017 • Nice, France
7. Motivations & Goals
Accomplishments:
• remove complexity
• processing more patens
• improving speed & performance
• improving quality
II-SDV 2017 • April 24th 2017 • Nice, France
8. How?
UIMA (unstructured information management
architecture)
• A general framework for information processing –
factory approach
• Apache open source, modular, used by many
groups, many modules already available
• you can process what you want (pictures, also
rock music), e.g. used by IBM Watson
II-SDV 2017 • April 24th 2017 • Nice, France
10. BI UIMA Pipeline Manager
Location for Input Data
From a prototype, “personal” KNIME process towards a cloud based UIMA pipeline
Step 1: Cloud based sFTP location for input data and processed results:
II-SDV 2017 • April 24th 2017 • Nice, France
14. Pipeline Setup
Disambiguation Components
• Preparation and normalization of text
Reader can read
XML, PDF, TXT, HTML and Office files
II-SDV 2017 • April 24th 2017 • Nice, France
21. Pipeline Setup
Running Processes Manager
Cloud based performance
• depends on task & computer
• from 1 PDF patent / sec
• up to 60 XML patents / sec
• distributed processing possible
• multi user
II-SDV 2017 • April 24th 2017 • Nice, France
23. Processing & Producing Results
Step 3: Processing & Producing Results
II-SDV 2017 • April 24th 2017 • Nice, France
24. Processing & Producing Results
.csv format
#name prefName nameSource domain source sourceSection confidence mentionCount
ethylacetate acetic acid ethyl ester dict;n2sOpsin chem US5428149.pdf Body 0.6 8
Pd(OAc)2 chemFormula inorgmat US5428149.pdf Body 0.5 8
oligonucleotides Oligonucleotide dict polymers US5428149.pdf Body 0.76 8
carbon carbon dict chem US5428149.pdf Body 0.65 7
nucleotides nucleotides dict chem US5428149.pdf Body 0.76 7
tin n2sOpsin chem US5428149.pdf Body 0.07 7
2'-deoxyuridine 2'-Deoxyuridine dict;n2sOpsin chem US5428149.pdf Body 0.76 6
PPh3 Triphenylphosphine dict chem US5428149.pdf Body 0.73 6
triphos phates triphosphate group dict chemGroup US5428149.pdf Body 0.55 6
disulfide dihydrogen disulfide dict chem US5428149.pdf Body 0.6 5
CARBON-CARBON carbon carbon dict;n2sOpsin chem US5428149.pdf Body 0.6 5
alkenyl alkenyl group dict chemGroup US5428149.pdf Body 0.74 5
Organometallics organometallic group dict chemGroup US5428149.pdf Body 0.71 5
Acetone acetone dict;n2sOpsin chem US5428149.pdf Body 0.68 4
acetonitrile acetonitrile dict;n2sOpsin chem US5428149.pdf Body 0.6 4
Tributyltinhydride Tributyltin hydride dict;n2sOpsin chem US5428149.pdf Body 0.73 4
triazole 1,2,3-triazole dict chem US5428149.pdf Body 0.6 4
ribose D-ribofuranose dict chem US5428149.pdf Body 0.71 4
vinyl Vinyl radical dict;n2sOpsin chem US5428149.pdf Body 0.63 4
pyrimidine nucleosides pyrimidine nucleosides dict chem US5428149.pdf Body 0.73 4
aryl aryl group dict chemGroup US5428149.pdf Body 0.70 4
II-SDV 2017 • April 24th 2017 • Nice, France
25. Processing & Producing Results
Extracted Compounds (Smile format)
II-SDV 2017 • April 24th 2017 • Nice, France
26. Processing & Producing Results
XML Format with inline annotation
II-SDV 2017 • April 24th 2017 • Nice, France
27. Processing & Producing Results
Creating Lucene index for search engines
II-SDV 2017 • April 24th 2017 • Nice, France
28. Processing & Producing Results
GUI: BI Miner
• ontology based semantic searching: e.g.
“Steroids” (and all children)
II-SDV 2017 • April 24th 2017 • Nice, France
30. Excerpts from our mission (Leitbild) statement
We are dedicated to
serving people –
through researching
diseases and developing
new medications and
treatment approaches.
In order to achieve our
goals, we need to be
both financially
successful and open to
new ideas and
developments.
Research and
development are of
central importance to
our future success.
We concentrate our
efforts on diseases
that are currently not
able to be treated
satisfactorily.
As an employer, we
attract the best minds
and promote diversity
in the workplace.
Our organization is
characterized by
openness, innovation,
collaboration,
and mutual respect.
Corporate Standard Presentation 2017 – short version
Performance/Quantity vs. Quality
II-SDV 2017 • April 24th 2017 • Nice, France
31. • Dialectical Materialism (based on Hegels Principal of Negation)
The law of the transformation of quantity into quality and vice versa.
For our purpose, we could express this by qualitative changes can only occur
by the quantitative addition or subtraction of matter or motion.
We were not running in the Quantity / Quality Issue
Conclusions - Performance/Quantity vs. Quality
II-SDV 2017 • April 24th 2017 • Nice, France
https://en.wikipedia.org/wiki/Dialectical_materialism
https://en.wikipedia.org/wiki/Georg_Wilhelm_Friedrich_Hegel
We could process more patents
extract more information
with a better quality
33. • Processes Improvement
– Optimization of more Processes
– Transforming more Prototypes into High Performance Pipelines
• Graphs & Visualizstions
– Implementing graphics and visualization for Knowledge Worker & Data Scientists
• Using the BI Pipeline Manager for data & content merging
– Joining and merging different Data Sources
– Collaboration Boehringer Ingelheim, Deep Search 9 & OntoChem
Outlook
II-SDV 2017 • April 24th 2017 • Nice, France