SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Big Data in Practice
Creation of an Agile and Scalable Data Science Platform
to Increase Information Find-ability and Accessibility in
Research and Development
John Koch
Merck & Co.
2
Overview
• The Problem
• The Approach
• Information Flow Modeling
• Big Data in Practice
3
R&D decisions rely on high quality information to steer
programs and the pipeline
Knowledge Assets
“Target validation plan”
Business Groups
“Early Development team”
People
“John Smith”
Information Types
“Clinical Trial Name”
Organization Units
“Analytical Chemistry”
Sources
“Electronic Lab Notebook”
Business Processes
“Integrative assessment of liver
toxicity”
Activities
“Refine model”
Roles
“Statistician”
Decisions/ Gateways
“Determine Patient
Stratification Biomarkers”
R&D Information Landscape
>27,000 entities and 70,000 relationships defined
The volume and sophistication of internal information and that available through external
sources continues to grow at a rapid and accelerating rate
The ability to readily find, access, and use information is absolutely critical
Capabilities
“Biomarker
Validation”
Feedback
Surveys, VoC
4
The Problem
1000’s
people
1000’s
information
types
1000’s
repositories
100’s
decisions
100,000’s
knowledge
assets
Scannell et al. 2012 Nature Rev. Drug Disc. 11, 191
100’s
teams
$
Information
Complexity
5
KnowledgeInformationData
Combine internal
and external data
Integrate &
Analyze
Present Decide
Culture of Single Use
6
5
Today Next 2-3 Years Beyond
Culture of Single Use
“Find & Access”
DecisionMaking
Quality
Vocabulary
Management
Embedded
Stewardship
Information
Flows Modeled
Effective
Search
Integrated
Information
Architecture
IM Challenges
Characterized
Fragmented
tools,
processes
Systematic
categorization
of data
Information
ManagementMaturity
As knowledge workers understand and embrace improved information management
practices, better decision making can be enabled by better access to information
Organization-Wide Information Re-Use
? Better Information Management  Better Decision Making: Better
analysis, more transparency and collaboration, better workflow
management, faster decisions
DecisionQualityAdoption,Maturity
Improving R&D Decision Making
Information
Flows Modeled
7
5
Engaging the business: Focus Area Key Questions
User Interface Engine Content Creators
Creators
ContentEngineQuery Results
Interface
What information is required to make those decisions? Who needs that information? How do they use that
information used to make those decisions?2
What are the critical business processes? What major decisions are associated with those processes?1
How is that information created? Who creates it? Where is that information stored?3
How is that information accessed (searched for, found, displayed)?4
What challenges are associated with accessing and using that information?5
How can access to and use of that information be improved? What value will those improvements deliver to
the business?
6
Users
Morville & Callendar. 2010 Search Patterns
8
Information Management CapabilitiesArchitectureSearchAccess
IM Capabilities Description
Search tools that enable users to locate scientific information across various sources,
both structured and unstructured, in various formats and across functional groups
Capability for users to identify colleagues with specific skills, expertise, or tacit
knowledge through a search tool and / or standardized profiles or tagging
System of access policies that prudently permits access to information and has clear
procedures for granting or restricting access
Shared practices for creating, storing, sharing, and maintaining explicit and tacit
information
Organization of critical data sources to make them more conducive to search,
retrieval, analysis and re-use through techniques including tagging and indexing
Well-maintained record of critical information and data sources across the
organization, including how the information is used or linked to other sources
Improving Information Management requires capabilities to enhance
information search, access, and architecture
Expertise Location
Access
Data Stewardship
Data Structuring
Key Data Assets
Scientific Search
9
ILLUSTRATIVE
Leaders in Search & Information Management
 Indexing of complex
hierarchical relationships
from relational database
tables
 Multi-faceted, interactive
filtering of search results
based on document
metadata
 Implementing solutions for
searching non-text
information (e.g., enterprise
video search)
 Advanced search analytics
 Integration with social media
 Highly scalable / extensible
Service-Oriented
Architecture
 Seamless information flow
between departments / sites
 Includes a data services and
exchange layer
 Reusable and configurable
code modules
 Closed-loop data flow via
integrated data sources
across the product life
cycle
 Consistent, personalized,
real-time access for internal
and external users
 Enterprise-wide technology
to capture, create, and
share knowledge / best
practices
 Data stewardship standards
and processes that ensure
consistency of data
quality, storage, and
exchange
BioPharma and other industry players have demonstrated innovative, peer-leading Search,
Access, and Architecture capabilities
Capability Maturity Stages
Basic
Developing
Functional
Advanced
World-class
1
2
3
4
5
Open
Access
Data
Stewardship
Data
Structuring
Key Data
Assets
Scientific
Search
Expertise
Location
ArchitectureSearch Access
10
5
Today Next 2-3 Years Beyond
Culture of Single Use
“Find & Access”
DecisionMaking
Quality
Vocabulary
Management
Embedded
Stewardship
Information
Flows Modeled
Effective
Search
Integrated
Information
Architecture
IM Challenges
Characterized
Fragmented
tools,
processes
Systematic
categorization
of data
Information
ManagementMaturity
As knowledge workers understand and embrace improved information management
practices, better decision making can be enabled by better access to information
Organization-Wide Information Re-Use
? Better Information Management  Better Decision Making: Better
analysis, more transparency and collaboration, better workflow
management, faster decisions
DecisionQualityAdoption,Maturity
Improving R&D Decision Making
Information
Flows Modeled
11
Clinical
Development
Consumer
Care
Research
Formulation
Safety
Regulatory
Manufacturing
Enterprise Business Analysis and IT Resource planning
tends to focus on organizational domains separately
12
“Google Street View” for information flows…
13
Merck
Analysts need a way to collaborate on mapping information flows
from different domains without explicit coordination
http://www.dwalls.com/Nature/Nature-World-Travel/Aerial+View+of+Downtown+Boston
14
Is a method of documenting and modeling the flow of information through an enterprise
that allows both targeted and holistic analysis across the information continuum.
Sales &
Marketing
MCC
•Regulatory
R&D
Manufacturing
Merck
Semantic Information Flow Modeling (sIFM)…
Regulatory
15
Data
Sources
Organizations
Business
Processes
Decisions
KM
Artifacts
Initiatives
/ Projects
Use
Cases
People Roles
Information
Types Capabilities
Business
Groups
Activities
Model types of things (entities) and the types of relationships that
we are trying to understand using a common semantic framework
16
Collaboration without Coordination
The use of an information modeling ontology allows multiple informatics and
business analysts to collaborate on the same model without explicit coordination
Analyst 1 Analyst 2 Analyst 3
Compound structure  ELN Medicinal Chemist uses ChemCart Pharm Sci uses ELN
Program Biologist uses ELN Compound Structure  ChemCart Active Pharmaceutical Ingredient  ELN
Toxicologist uses ELN Medicinal Chemist member-of Lead
optimization team
Compound Structure ELN
17
Leveraging the Information Flow Modeling to enable
Search and Analytics
By encoding this knowledge in a searchable semantic knowledgebase, we can discover details
about Merck’s information landscape on the fly, that were previously difficult to uncover.
Project Information
Types
Data
Sources
KM
Artifacts
Translational
PK/PD Modeling
?Information
Types
?KM
Artifacts
?Data
Sources
includes
flow
flow
What are the types
of information and
data sources
associated with
Translational PKPD
Modeling?
18
Within Life Sciences, there has been a lot of discussion about
the potential of Big Data
Personalized Medicine
Genomics
Evidence Based Medicine
Health Outcomes
Customer Insights
Patient Enrollment
Supply Chain Management
Predictive Modeling
Clinical Trial Monitoring
New Drug Discovery
Collaboration
Connected Health
Volume
Variety
Velocity
Revenue
Cost
19
Not a volume or velocity problem… yet, it’s about varietyOutputsInputs
Target ID/Val Lead Opt PreclinicalLead ID Phase 2
Reg /
Market
Phase 3Phase 1
Early Development Late Dev MarketDiscovery
Innovative research & breakthrough therapeutics
“culture of single use”
Internal External
20
• NoSQL can handle significant increases of data
• Many best of breed technologies are open source
products
• Strong compatibility with cloud infrastructures
allows for rapid scale up/scale down
Big Data tools are well suited to address this classic data
integration problem
Scalable
• Not Only SQL (NoSQL) enables agile access to
data with lightweight, use-case driven models.
• Structured and unstructured data can be readily
integrated
• Design and implementation are not dependent on
the up-front, comprehensive knowledge of data
Agile
21
Merck is using these tools and design patterns to create a
Data Science platform for research and development
R&D Information Landscape Big data platform
22
New design patterns allow rapid data integration into a
scalable platform
Current Design Patterns
New Design Patterns
 1-2 months
 Data Types
 Data Structure
 Use Cases
 2-3 months
 Data Model
 Data Mapping
 4-6 months
 Data Migration
 Data Validation
 Data Cleanup
Design
Development
Requirements
Requirements
Design
Development
 1 year
 Increased cost
 Limited flexibility
 4-5 months
 Agile
 Iterative
23
Enabling scientists by shifting the paradigm with Big Data tools
and design patterns
Time spent finding
75%
Time spent
analyzing
25%
Time spent finding
25%
Time spent
analyzing
75%
Analysis Analysis Analysis
Analysis
Analysis
Manual Assembly of Data Integrated, Cross Data Set Analysis
Insights
24
We began with a current problem facing Merck scientists…
… but kept the larger goal of changing the information management and
access paradigm in mind.
-like
Search Application
Today
Tomorrow
53% unable to find and access
information they need
Can
Find &
Access
47%
Cannot
Find or
Cannot
Access
53%
… a platform capable of handling new and additional data
… and capable of providing more complex analytics
…the future$131M productivity gap
Volume Velocity
25
First “analytic” built on the data science platform = Search
• 2014: focus on 3000 discovery and preclinical
scientists
• 100’s of use cases prioritized
• 30+ features developed
• Ingested 8 of top 10 content sources
• Agile design and development pattern
82% of feature-by-feature
feedback has been positive
Electronic Lab Notebooks
Documents
Compound Information
Chemical Registration
26
MRL Search (MRL Data Science Platform)
27
Leveraging the Information Flow Modeling to enable
Search and Analytics
By encoding this knowledge in a searchable semantic knowledgebase, we can discover details
about Merck’s information landscape on the fly, that were previously difficult to uncover.
Project Information
Types
Data
Sources
KM
Artifacts
Translational
PK/PD Modeling
?Information
Types
?KM
Artifacts
?Data
Sources
includes
flow
flow
What are the types
of information and
data sources
associated with
Translational PKPD
Modeling?
28
Building on data and platform, rapidly built next analytic -
QUICK (Quantitative Pharmacology Knowledgebase)
Data capture
QUICK = a single, authoritative portal for access to definitive, integrated data sets of
clinical and pre-clinical metabolism and in vivo pharmacology experimental results
Reduce time to collate definitive
data sets by ~95%
Stewardship AnalyticsCommon platform
50-75% increase in efficiency
of analysis
Improved cross departmental
collaboration through stewardship
+ + +
29
Future = include more data and more analytics within Merck
Research Labs
… in addition, other business units are already leveraging this data
sciences platform… and we are poised for volume and velocity
The key to our current and future success has been the
development of a flexible & scalable technology platform…
Identify a
Problem
Statement
•User Story
Estimate
Impact &
Benefits
•Prioritize
Develop
New
Capability
•Use Case(s)
•Feature(s)
Verify
Impact &
Benefits
•Feedback
•Refine
Extend
target user
group
•Referrals
Problem Statement,
Use case, User Story,
Question, Pain Point…
No
User feedback helps prioritize features
2 weeks
31
…combined with integrated, multi-disciplinary teams
People: engaged scientists, integrated agile team
Process: maturing process for data source access, ingestion and
feature development
Technology: extensible & flexible platform
Software
Engineers
Math &
Stats
Analysts
Domain
Experts
32
Acknowledgments
Merck IT - Scientific Information Architecture & Search
Merck IT – Informatics, Scientific Computing, Cloud COE
Merck Research Laboratories
Booz Allen Hamilton

Weitere ähnliche Inhalte

Was ist angesagt?

A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
Brett Tully
 
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014cVojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
Vojtech Huser
 
Sharon Dawes (CTG Albany) Open data quality: a practical view
Sharon Dawes (CTG Albany) Open data quality: a practical viewSharon Dawes (CTG Albany) Open data quality: a practical view
Sharon Dawes (CTG Albany) Open data quality: a practical view
Open City Foundation
 

Was ist angesagt? (20)

DDOD for FOIA organizations
DDOD for FOIA organizationsDDOD for FOIA organizations
DDOD for FOIA organizations
 
Health data mining
Health data miningHealth data mining
Health data mining
 
Impact of DDOD on Data Quality - White House 2016
Impact of DDOD on Data Quality -  White House 2016Impact of DDOD on Data Quality -  White House 2016
Impact of DDOD on Data Quality - White House 2016
 
Data Science in Biomedicine - Where Are We Headed?
Data Science in Biomedicine - Where Are We Headed?Data Science in Biomedicine - Where Are We Headed?
Data Science in Biomedicine - Where Are We Headed?
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
 
Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...
 
Industry Uses of HHS Data
Industry Uses of HHS DataIndustry Uses of HHS Data
Industry Uses of HHS Data
 
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
IMPACT OF DIFFERENT SELECTION STRATEGIES ON PERFORMANCE OF GA BASED INFORMATI...
 
effective data sharing for a learning healthcare system
effective data sharing for a learning healthcare systemeffective data sharing for a learning healthcare system
effective data sharing for a learning healthcare system
 
Sharing and standards christopher hart - clinical innovation and partnering...
Sharing and standards   christopher hart - clinical innovation and partnering...Sharing and standards   christopher hart - clinical innovation and partnering...
Sharing and standards christopher hart - clinical innovation and partnering...
 
Access Lab 2020: Context aware unified institutional knowledge services
Access Lab 2020: Context aware unified institutional knowledge servicesAccess Lab 2020: Context aware unified institutional knowledge services
Access Lab 2020: Context aware unified institutional knowledge services
 
Digital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceDigital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data science
 
How BrackenData Leverages Data on Over 250,000 Clinical Trials
How BrackenData Leverages Data on Over 250,000 Clinical TrialsHow BrackenData Leverages Data on Over 250,000 Clinical Trials
How BrackenData Leverages Data on Over 250,000 Clinical Trials
 
Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2 Low Hanging Fruit Breakout Discussion #2
Low Hanging Fruit Breakout Discussion #2
 
Developing A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product DataDeveloping A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product Data
 
Ez36937941
Ez36937941Ez36937941
Ez36937941
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014cVojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
Vojtech huser-data-warehouse-evaluation-2010-04-idr-snapshot014c
 
Sharon Dawes (CTG Albany) Open data quality: a practical view
Sharon Dawes (CTG Albany) Open data quality: a practical viewSharon Dawes (CTG Albany) Open data quality: a practical view
Sharon Dawes (CTG Albany) Open data quality: a practical view
 
Data management plan (important components and best practices) final v 1.0
Data management plan (important components and best practices) final v 1.0Data management plan (important components and best practices) final v 1.0
Data management plan (important components and best practices) final v 1.0
 

Ähnlich wie BigDataInPractice_EXLPHARMA_KOCH

SIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINALSIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINAL
John Koch
 
BigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINALBigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINAL
John Koch
 
Choosing an Analytics Solution in Healthcare
Choosing an Analytics Solution in HealthcareChoosing an Analytics Solution in Healthcare
Choosing an Analytics Solution in Healthcare
Dale Sanders
 
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Aridhia Informatics Ltd
 

Ähnlich wie BigDataInPractice_EXLPHARMA_KOCH (20)

SIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINALSIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINAL
 
BigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINALBigDataAnalytics_Talk_KOCH_FINAL
BigDataAnalytics_Talk_KOCH_FINAL
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in Healthcare
 
How to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in PharmaHow to Create a Big Data Culture in Pharma
How to Create a Big Data Culture in Pharma
 
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
Rubbish in Rubbish out: applying good data governance techniques to gain maxi...
 
Nicolson
NicolsonNicolson
Nicolson
 
What Are the Challenges and Opportunities in Big Data Analytics.pdf
What Are the Challenges and Opportunities in Big Data Analytics.pdfWhat Are the Challenges and Opportunities in Big Data Analytics.pdf
What Are the Challenges and Opportunities in Big Data Analytics.pdf
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
 
dissertation proposal writing service
dissertation proposal writing servicedissertation proposal writing service
dissertation proposal writing service
 
Choosing an Analytics Solution in Healthcare
Choosing an Analytics Solution in HealthcareChoosing an Analytics Solution in Healthcare
Choosing an Analytics Solution in Healthcare
 
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for Enterprises
 
The future of scholarly publishing under digital transformation data, ai an...
The future of scholarly publishing under digital transformation   data, ai an...The future of scholarly publishing under digital transformation   data, ai an...
The future of scholarly publishing under digital transformation data, ai an...
 
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Yale Day of Data
Yale Day of Data Yale Day of Data
Yale Day of Data
 
Clinical research innovation hub walking deck v12
Clinical research innovation hub walking deck v12Clinical research innovation hub walking deck v12
Clinical research innovation hub walking deck v12
 

BigDataInPractice_EXLPHARMA_KOCH

  • 1. Big Data in Practice Creation of an Agile and Scalable Data Science Platform to Increase Information Find-ability and Accessibility in Research and Development John Koch Merck & Co.
  • 2. 2 Overview • The Problem • The Approach • Information Flow Modeling • Big Data in Practice
  • 3. 3 R&D decisions rely on high quality information to steer programs and the pipeline Knowledge Assets “Target validation plan” Business Groups “Early Development team” People “John Smith” Information Types “Clinical Trial Name” Organization Units “Analytical Chemistry” Sources “Electronic Lab Notebook” Business Processes “Integrative assessment of liver toxicity” Activities “Refine model” Roles “Statistician” Decisions/ Gateways “Determine Patient Stratification Biomarkers” R&D Information Landscape >27,000 entities and 70,000 relationships defined The volume and sophistication of internal information and that available through external sources continues to grow at a rapid and accelerating rate The ability to readily find, access, and use information is absolutely critical Capabilities “Biomarker Validation” Feedback Surveys, VoC
  • 5. 5 KnowledgeInformationData Combine internal and external data Integrate & Analyze Present Decide Culture of Single Use
  • 6. 6 5 Today Next 2-3 Years Beyond Culture of Single Use “Find & Access” DecisionMaking Quality Vocabulary Management Embedded Stewardship Information Flows Modeled Effective Search Integrated Information Architecture IM Challenges Characterized Fragmented tools, processes Systematic categorization of data Information ManagementMaturity As knowledge workers understand and embrace improved information management practices, better decision making can be enabled by better access to information Organization-Wide Information Re-Use ? Better Information Management  Better Decision Making: Better analysis, more transparency and collaboration, better workflow management, faster decisions DecisionQualityAdoption,Maturity Improving R&D Decision Making Information Flows Modeled
  • 7. 7 5 Engaging the business: Focus Area Key Questions User Interface Engine Content Creators Creators ContentEngineQuery Results Interface What information is required to make those decisions? Who needs that information? How do they use that information used to make those decisions?2 What are the critical business processes? What major decisions are associated with those processes?1 How is that information created? Who creates it? Where is that information stored?3 How is that information accessed (searched for, found, displayed)?4 What challenges are associated with accessing and using that information?5 How can access to and use of that information be improved? What value will those improvements deliver to the business? 6 Users Morville & Callendar. 2010 Search Patterns
  • 8. 8 Information Management CapabilitiesArchitectureSearchAccess IM Capabilities Description Search tools that enable users to locate scientific information across various sources, both structured and unstructured, in various formats and across functional groups Capability for users to identify colleagues with specific skills, expertise, or tacit knowledge through a search tool and / or standardized profiles or tagging System of access policies that prudently permits access to information and has clear procedures for granting or restricting access Shared practices for creating, storing, sharing, and maintaining explicit and tacit information Organization of critical data sources to make them more conducive to search, retrieval, analysis and re-use through techniques including tagging and indexing Well-maintained record of critical information and data sources across the organization, including how the information is used or linked to other sources Improving Information Management requires capabilities to enhance information search, access, and architecture Expertise Location Access Data Stewardship Data Structuring Key Data Assets Scientific Search
  • 9. 9 ILLUSTRATIVE Leaders in Search & Information Management  Indexing of complex hierarchical relationships from relational database tables  Multi-faceted, interactive filtering of search results based on document metadata  Implementing solutions for searching non-text information (e.g., enterprise video search)  Advanced search analytics  Integration with social media  Highly scalable / extensible Service-Oriented Architecture  Seamless information flow between departments / sites  Includes a data services and exchange layer  Reusable and configurable code modules  Closed-loop data flow via integrated data sources across the product life cycle  Consistent, personalized, real-time access for internal and external users  Enterprise-wide technology to capture, create, and share knowledge / best practices  Data stewardship standards and processes that ensure consistency of data quality, storage, and exchange BioPharma and other industry players have demonstrated innovative, peer-leading Search, Access, and Architecture capabilities Capability Maturity Stages Basic Developing Functional Advanced World-class 1 2 3 4 5 Open Access Data Stewardship Data Structuring Key Data Assets Scientific Search Expertise Location ArchitectureSearch Access
  • 10. 10 5 Today Next 2-3 Years Beyond Culture of Single Use “Find & Access” DecisionMaking Quality Vocabulary Management Embedded Stewardship Information Flows Modeled Effective Search Integrated Information Architecture IM Challenges Characterized Fragmented tools, processes Systematic categorization of data Information ManagementMaturity As knowledge workers understand and embrace improved information management practices, better decision making can be enabled by better access to information Organization-Wide Information Re-Use ? Better Information Management  Better Decision Making: Better analysis, more transparency and collaboration, better workflow management, faster decisions DecisionQualityAdoption,Maturity Improving R&D Decision Making Information Flows Modeled
  • 11. 11 Clinical Development Consumer Care Research Formulation Safety Regulatory Manufacturing Enterprise Business Analysis and IT Resource planning tends to focus on organizational domains separately
  • 12. 12 “Google Street View” for information flows…
  • 13. 13 Merck Analysts need a way to collaborate on mapping information flows from different domains without explicit coordination http://www.dwalls.com/Nature/Nature-World-Travel/Aerial+View+of+Downtown+Boston
  • 14. 14 Is a method of documenting and modeling the flow of information through an enterprise that allows both targeted and holistic analysis across the information continuum. Sales & Marketing MCC •Regulatory R&D Manufacturing Merck Semantic Information Flow Modeling (sIFM)… Regulatory
  • 15. 15 Data Sources Organizations Business Processes Decisions KM Artifacts Initiatives / Projects Use Cases People Roles Information Types Capabilities Business Groups Activities Model types of things (entities) and the types of relationships that we are trying to understand using a common semantic framework
  • 16. 16 Collaboration without Coordination The use of an information modeling ontology allows multiple informatics and business analysts to collaborate on the same model without explicit coordination Analyst 1 Analyst 2 Analyst 3 Compound structure  ELN Medicinal Chemist uses ChemCart Pharm Sci uses ELN Program Biologist uses ELN Compound Structure  ChemCart Active Pharmaceutical Ingredient  ELN Toxicologist uses ELN Medicinal Chemist member-of Lead optimization team Compound Structure ELN
  • 17. 17 Leveraging the Information Flow Modeling to enable Search and Analytics By encoding this knowledge in a searchable semantic knowledgebase, we can discover details about Merck’s information landscape on the fly, that were previously difficult to uncover. Project Information Types Data Sources KM Artifacts Translational PK/PD Modeling ?Information Types ?KM Artifacts ?Data Sources includes flow flow What are the types of information and data sources associated with Translational PKPD Modeling?
  • 18. 18 Within Life Sciences, there has been a lot of discussion about the potential of Big Data Personalized Medicine Genomics Evidence Based Medicine Health Outcomes Customer Insights Patient Enrollment Supply Chain Management Predictive Modeling Clinical Trial Monitoring New Drug Discovery Collaboration Connected Health Volume Variety Velocity Revenue Cost
  • 19. 19 Not a volume or velocity problem… yet, it’s about varietyOutputsInputs Target ID/Val Lead Opt PreclinicalLead ID Phase 2 Reg / Market Phase 3Phase 1 Early Development Late Dev MarketDiscovery Innovative research & breakthrough therapeutics “culture of single use” Internal External
  • 20. 20 • NoSQL can handle significant increases of data • Many best of breed technologies are open source products • Strong compatibility with cloud infrastructures allows for rapid scale up/scale down Big Data tools are well suited to address this classic data integration problem Scalable • Not Only SQL (NoSQL) enables agile access to data with lightweight, use-case driven models. • Structured and unstructured data can be readily integrated • Design and implementation are not dependent on the up-front, comprehensive knowledge of data Agile
  • 21. 21 Merck is using these tools and design patterns to create a Data Science platform for research and development R&D Information Landscape Big data platform
  • 22. 22 New design patterns allow rapid data integration into a scalable platform Current Design Patterns New Design Patterns  1-2 months  Data Types  Data Structure  Use Cases  2-3 months  Data Model  Data Mapping  4-6 months  Data Migration  Data Validation  Data Cleanup Design Development Requirements Requirements Design Development  1 year  Increased cost  Limited flexibility  4-5 months  Agile  Iterative
  • 23. 23 Enabling scientists by shifting the paradigm with Big Data tools and design patterns Time spent finding 75% Time spent analyzing 25% Time spent finding 25% Time spent analyzing 75% Analysis Analysis Analysis Analysis Analysis Manual Assembly of Data Integrated, Cross Data Set Analysis Insights
  • 24. 24 We began with a current problem facing Merck scientists… … but kept the larger goal of changing the information management and access paradigm in mind. -like Search Application Today Tomorrow 53% unable to find and access information they need Can Find & Access 47% Cannot Find or Cannot Access 53% … a platform capable of handling new and additional data … and capable of providing more complex analytics …the future$131M productivity gap Volume Velocity
  • 25. 25 First “analytic” built on the data science platform = Search • 2014: focus on 3000 discovery and preclinical scientists • 100’s of use cases prioritized • 30+ features developed • Ingested 8 of top 10 content sources • Agile design and development pattern 82% of feature-by-feature feedback has been positive Electronic Lab Notebooks Documents Compound Information Chemical Registration
  • 26. 26 MRL Search (MRL Data Science Platform)
  • 27. 27 Leveraging the Information Flow Modeling to enable Search and Analytics By encoding this knowledge in a searchable semantic knowledgebase, we can discover details about Merck’s information landscape on the fly, that were previously difficult to uncover. Project Information Types Data Sources KM Artifacts Translational PK/PD Modeling ?Information Types ?KM Artifacts ?Data Sources includes flow flow What are the types of information and data sources associated with Translational PKPD Modeling?
  • 28. 28 Building on data and platform, rapidly built next analytic - QUICK (Quantitative Pharmacology Knowledgebase) Data capture QUICK = a single, authoritative portal for access to definitive, integrated data sets of clinical and pre-clinical metabolism and in vivo pharmacology experimental results Reduce time to collate definitive data sets by ~95% Stewardship AnalyticsCommon platform 50-75% increase in efficiency of analysis Improved cross departmental collaboration through stewardship + + +
  • 29. 29 Future = include more data and more analytics within Merck Research Labs … in addition, other business units are already leveraging this data sciences platform… and we are poised for volume and velocity
  • 30. The key to our current and future success has been the development of a flexible & scalable technology platform… Identify a Problem Statement •User Story Estimate Impact & Benefits •Prioritize Develop New Capability •Use Case(s) •Feature(s) Verify Impact & Benefits •Feedback •Refine Extend target user group •Referrals Problem Statement, Use case, User Story, Question, Pain Point… No User feedback helps prioritize features 2 weeks
  • 31. 31 …combined with integrated, multi-disciplinary teams People: engaged scientists, integrated agile team Process: maturing process for data source access, ingestion and feature development Technology: extensible & flexible platform Software Engineers Math & Stats Analysts Domain Experts
  • 32. 32 Acknowledgments Merck IT - Scientific Information Architecture & Search Merck IT – Informatics, Scientific Computing, Cloud COE Merck Research Laboratories Booz Allen Hamilton