Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insights Wolfgang Thielemann (Bayer, Germany )

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 36 Anzeige

AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insights Wolfgang Thielemann (Bayer, Germany )

Herunterladen, um offline zu lesen

What if there was a platform where literature, conference abstracts, patents, clinical trials, news, grants and other sources were fully integrated? What if the data would be harmonized, enriched with standardized concepts and ready for analysis? After building our patent analytics platform we didn’t stop dreaming and built our big data analytics platform by semantically integrating text-rich, scientific sources. In my presentation I will talk about what we built and why we built it. And, of course, I will also address the challenges and hurdles along the way. Was it worth it and what comes next? Let’s talk about it!

What if there was a platform where literature, conference abstracts, patents, clinical trials, news, grants and other sources were fully integrated? What if the data would be harmonized, enriched with standardized concepts and ready for analysis? After building our patent analytics platform we didn’t stop dreaming and built our big data analytics platform by semantically integrating text-rich, scientific sources. In my presentation I will talk about what we built and why we built it. And, of course, I will also address the challenges and hurdles along the way. Was it worth it and what comes next? Let’s talk about it!

Anzeige
Anzeige

Weitere Verwandte Inhalte

Ähnlich wie AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insights Wolfgang Thielemann (Bayer, Germany ) (20)

Weitere von Dr. Haxel Consult (20)

Anzeige

Aktuellste (20)

AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insights Wolfgang Thielemann (Bayer, Germany )

  1. 1. Integrated Data Platform at Bayer Turning bits into insights Wolfgang Thielemann
  2. 2. Agenda What platform did we built? What does it look like? Why did we build it? Architecture and data enrichment Challenges Plans for the future 2 /// AI-SDV 2022 // Integrated Data Platform at Bayer
  3. 3. /// AI-SDV 2022 // Integrated Data Platform at Bayer 3 What Platform did we built? 1
  4. 4. /// AI-SDV 2022 // Integrated Data Platform at Bayer 4 Our platform semantically integrates Terabytes of external scientific textual data to support insight generation along the R&D value chain
  5. 5. /// AI-SDV 2022 // Integrated Data Platform at Bayer 5 Big data platform This platform is… • A semantically integrated and harmonized big data hub containing major external, text- rich, and life-science related data sources • Enriched with FAIR meta-data generated by extracting the key information (e.g., molecular targets, medical conditions, active ingredients, technologies etc.) using NLP • An analysis-ready platform for end-users (GUI access) and data scientists (API access)
  6. 6. /// AI-SDV 2022 // Integrated Data Platform at Bayer 6 Scientific end users Data scientists Developers of digital products The users
  7. 7. /// AI-SDV 2022 // Integrated Data Platform at Bayer 7 The users End-user GUIs more power & precision for scientific search Project leaders R&D scientists Tech scouts & Co Find relevant information Alerts Analysis Filter & Review Expert APIs Provide structured data for insight generation Data scientists Computational scientists Information professionals Bioinformaticians Generate insights Find new targets & treatments Support pipeline decisions Build predictive models
  8. 8. /// AI-SDV 2022 // Integrated Data Platform at Bayer 8 What does it look like? 2
  9. 9. /// AI-SDV 2022 // Integrated Data Platform at Bayer 9 Example: Liver cancer Google-like search interface
  10. 10. /// AI-SDV 2022 // Integrated Data Platform at Bayer 10 Example: Liver cancer Interactive analysis and filtering
  11. 11. /// AI-SDV 2022 // Integrated Data Platform at Bayer 11 Example: Liver cancer Result overview
  12. 12. /// AI-SDV 2022 // Integrated Data Platform at Bayer 12 Example: Liver cancer Record view
  13. 13. /// AI-SDV 2022 // Integrated Data Platform at Bayer 13 Why did we build it? 3
  14. 14. /// AI-SDV 2022 // Integrated Data Platform at Bayer 14 Big Data Platform 6 Reasons why building it made and makes sense Richness of data sources Flexibility Costs Scalability FAIR meta-data Full transparency and control
  15. 15. /// AI-SDV 2022 // Integrated Data Platform at Bayer 15 Scientific sources in our platform Platforms limited to publicly available data 1. Bandwidth and richness of data sources Big Data Platform Why did we build it?
  16. 16. /// AI-SDV 2022 // Integrated Data Platform at Bayer 16 2. Maximum flexibility to analyze the data and to integrate it into our Bayer data ecosystem Existing platforms often come with limited/pre-defined analysis options and limited integrability Big Data Platform Why did we build it?
  17. 17. /// AI-SDV 2022 // Integrated Data Platform at Bayer 17 Our platform is built on a scalable cloud infrastructure for big data analysis and does allow you to analyze millions of records in one go. Big Data Platform Why did we build it? 3. Full scalability
  18. 18. /// AI-SDV 2022 // Integrated Data Platform at Bayer 18 4. Costs This platform allowed us to save money and reduce complexity be replacing various proprietary legacy platforms Big Data Platform Why did we build it?
  19. 19. /// AI-SDV 2022 // Integrated Data Platform at Bayer 19 5. One terminology across entire content and option to adjust it to our needs Individual sources / platforms typically have their own standards and terminologies One terminology for entire platform Big Data Platform Why did we build it?
  20. 20. /// AI-SDV 2022 // Integrated Data Platform at Bayer 20 6. Comprehensiveness and quality of meta-data Since we built on 20 years of thesauri and NLP algorithms optimized to Bayer’s needs, our terminologies cover the real-life use of science much better than established terminologies Big Data Platform Why did we build it? MeSH:
  21. 21. /// AI-SDV 2022 // Integrated Data Platform at Bayer 21 6. Comprehensiveness and quality of meta-data Proprietary disease thesaurus: Big Data Platform Why did we build it?
  22. 22. /// AI-SDV 2022 // Integrated Data Platform at Bayer 22 Architecture & Data enrichment 4
  23. 23. /// AI-SDV 2022 // Integrated Data Platform at Bayer 23 Conference Abstracts Literature Abstracts Literature Fulltexts Patents Patent Chemistry Clinical Trials Pipeline Information Market reports Company Websites Industry News Research Grants Tech Transfer Offers D A T A Data Engineering: Normalization, Deduplication, Classification, etc (Kafka Streams) Index, Search, and API Services (Elastic) Semantic Enrichment: Targets, Organisms, Sequences, Drugs, Active Ingredients, Companies/Organizations, Analytics, etc Automated Data Acquisition (Kafka Technology) P R O C E S S APIs & Data Science Platform architecture End User Products D E L I V E R Cross-search GUI Advanced literature GUI Advanced patent GUI System/Application Integrations Other proprietary platforms and workflows use this platform as source
  24. 24. /// AI-SDV 2022 // Integrated Data Platform at Bayer 24 Resolve all flavours of heterogeneity to make textual data FAIR Big Data Platform Semantic data integration at large Semantic data integration Structural heterogeneity Same facts expressed in different schemata Missing / additional attributes Technical heterogeneity Data formats (JSON vs. XML), communication protocols (REST vs. ODBC), query languages (SQL vs. SPARQL) Data model heterogeneity Relational vs. Semi-structured, Tuples vs. Graphs,… Syntactic heterogeneity Different presentation of the same fact (Unicode or ASCII, EUR or €,…) Semantic heterogeneity Same concepts are named differently ➢ Pulmonary carcinoma ➢ Neoplasm of the lung ➢ …. Different concepts are named same GSK Lung cancer
  25. 25. /// AI-SDV 2022 // Integrated Data Platform at Bayer 26 Challenges 5
  26. 26. Heterogeneous formats /// AI-SDV 2022 // Integrated Data Platform at Bayer 27 Challenges: Data ingestion Heterogeneous update schedules hourly daily weekly monthly
  27. 27. /// AI-SDV 2022 // Integrated Data Platform at Bayer 28 Challenges: Data ingestion Changes in record structure Changes in volume over time
  28. 28. /// AI-SDV 2022 // Integrated Data Platform at Bayer 29 Challenges: Data ingestion De-duplication De-duplication De-duplication De-duplication De-duplication
  29. 29. /// AI-SDV 2022 // Integrated Data Platform at Bayer 30 Challenges: Semantic enrichment Lack of universially accepted identifier for an entity class Human gene NCBI Gene ID Chemical compound INN name IUPAC CAS-Nr PubChem CID Canonical smiles Disease MeSH ID UMLS ID Snomed ID NCIT ID Orphanet ID Mondo ID ICD-10 ID MedDRA ID DO ID …..
  30. 30. /// AI-SDV 2022 // Integrated Data Platform at Bayer 31 Challenges: Semantic enrichment Identification of different entities require different technologies: ➢Terminology based NLP (e.g., disease names) ➢ML based NLP (e.g., for ambiguous acronyms like cell lines, gene acronyms etc.) ➢Rule/pattern-based extraction (e.g., IUPAC chemical names, gene mutations) “A lamp-snp assay detecting c580y mutation in pfkelch13 gene from clinically dried blood spot samples” ➢Image/graph processing (e.g., image2mol) C1=CC=C(C(=C1)CC(=O)[O-])NC2=C(C=CC=C2Cl)Cl.[Na+]
  31. 31. /// AI-SDV 2022 // Integrated Data Platform at Bayer 32 Status quo & Plans for the future 6
  32. 32. /// AI-SDV 2022 // Integrated Data Platform at Bayer 33 Are we now living in a fairytale where everything is perfect?
  33. 33. /// AI-SDV 2022 // Integrated Data Platform at Bayer 34 Are we now living in a fairytale where everything is perfect? There is still a lot to do… ➢Terminology is constantly evolving (new companies, new technologies etc.) ➢Development of scalable algorithms for complex entities ➢Finding the most relevant information in the ocean of data ➢Advanced visualization and analytics ➢Further standardization ➢…..
  34. 34. /// AI-SDV 2022 // Integrated Data Platform at Bayer 35 What can you do to help us in our endevour? Vendors / Publisher / Data base producers • Data quality • FAIRification • Using generally available standards & IDs • Consistency • Collecting scattered data • Harmonization
  35. 35. /// AI-SDV 2022 // Integrated Data Platform at Bayer 36 SOURCES e.g., drug labels, guidelines USABILITY THESAURI Automatization e.g. alerting CHEMISTRY ANALYSES features Big Data Platform Plans for the future
  36. 36. Thank you! Special thanks to my colleagues on the team

×