SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
How Machine Learning and AI
can support the fight against COVID-19
Francesca Lazzeri, PhD
Principal Cloud Advocate Manager, Microsoft
@frlazzeri
Dmitry Soshnikov, PhD
Senior Cloud Advocate, Microsoft
@shwars
Problem
Around 30,000 scientific papers
related to COVID appear
monthly
CORD Papers Dataset
Data Source
https://allenai.org/data/cord-19
https://www.kaggle.com/allen-institute-for-ai/CORD-19-
research-challenge
CORD-19 Dataset
Contains over 400,000 scholarly articles about
COVID-19 and the coronavirus family of viruses
for use by the global research community
200,000 articles with full text
Natural Language Processing
Common tasks for NLP:
• Intent Classification
• Named Entity Recognition (NER)
• Keyword Extraction
• Text Summarization
• Question Answering
• Open Domain Question Answering
Language Models:
• Recurrent Neural Network (LSTM, GRU)
• Transformers
• GPT-2
• BERT
• Microsoft Turing-NLG
• GPT-3
Microsoft Learn Module:
Introduction to NLP with PyTorch
aka.ms/pytorch_nlp
docs.microsoft.com/en-us/learn/paths/pytorch-fundamentals/
How BERT Works (Simplified)
Masked Language Model + Next Sentence Prediction
During holidays, I like to ______ with my dog. It is so cute.
0.85 Play
0.05 Sleep
0.09 Fight
0.80 YES
0.20 NO
BERT contains 345 million parameters => very difficult to train from scratch! In
most of the cases it makes sense to use pre-trained language model.
Main Idea
Use NLP tools to extract semi-structured data from papers, to enable
semantically rich queries over the paper corpus.
Extracted
JSON
Cosmos
DB
Database
Power BI
Dashboard
SQL Queries
Azure
Semantic
Search
NER
Relations
Text
Analytics
for Health
CORD
Corpus
Part 1: Extracting Entities and Relations
Base Language Model
Dataset
Kaggle Medical NER:
• ~40 papers
• ~300 entities
Generic BC5CDR Dataset
• 1500 papers
• 5000 entities
• Disease / Chemical
Generic BERT Model
Pre-training BERT on Medical
texts
PubMedBERT pre-trained
model by Microsoft Research
Huggingface Transformer Library: https://huggingface.co/
6794356|t|Tricuspid valve regurgitation and lithium carbonate toxicity in a newborn
infant.
6794356|a|A newborn with massive tricuspid regurgitation, atrial flutter, congestive
heart failure, and a high serum lithium level is described. This is the first patient
to initially manifest tricuspid regurgitation and atrial flutter, and the 11th
described patient with cardiac disease among infants exposed to lithium compounds in
the first trimester of pregnancy. Sixty-three percent of these infants had tricuspid
valve involvement. Lithium carbonate may be a factor in the increasing incidence of
congenital heart disease when taken during early pregnancy. It also causes neurologic
depression, cyanosis, and cardiac arrhythmia when consumed prior to delivery.
6794356 0 29 Tricuspid valve regurgitation Disease D014262
6794356 34 51 lithium carbonate Chemical D016651
6794356 52 60 toxicity Disease D064420
6794356 105 128 tricuspid regurgitation Disease D014262
6794356 130 144 atrial flutter Disease D001282
6794356 146 170 congestive heart failure Disease D006333
6794356 189 196 lithium Chemical D008094
6794356 265 288 tricuspid regurgitation Disease D014262
6794356 293 307 atrial flutter Disease D001282
6794356 345 360 cardiac disease Disease D006331
6794356 386 393 lithium Chemical D008094
6794356 511 528 Lithium carbonate Chemical D016651
6794356 576 600 congenital heart disease Disease D006331
NER as Token Classification
Tricuspid valve regurgitation and lithium
carbonate toxicity in a newborn infant.
Tricuspid B-DIS
valve I-DIS
regurgitation I-DIS
and O
lithium B-CHEM
carbonate I-CHEM
toxicity B-DIS
in O
a O
newborn O
infant O
. O
PubMedBert, Microsoft Research
from transformers import
AutoTokenizer,
BertForTokenClassification,
Trainer
mname =
“microsoft/BiomedNLP-PubMedBERT-base-
uncased-abstract”
tokenizer =
AutoTokenizer.from_pretrained(mname)
model = BertForTokenClassification
.from_pretrained(mname,
num_labels=len(unique_tags))
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset)
trainer.train()
Notebooks Automated ML UX Designer
Reproducibility Automation Deployment Re-training
CPU, GPU, FPGAs IoT Edge
Azure Machine Learning
Enterprise grade service to build and deploy models at scale
Training NER Model Using PubMedBert on Azure ML
Describe Dataset:
name: bc5cdr
version: 1
local_path: BC5_data.txt
bc5cdr.yml
Upload to Azure ML:
$ az ml data create -f data_bc5cdr.yml
Describe Environment:
name: transformers-env
version: 1
docker:
image: mcr.microsoft.com/
azureml/openmpi3.1.2-
cuda10.1-cudnn7-ubuntu18.04
conda_file:
file: ./transformers_conda.yml
transformers-env.yml
channels:
- pytorch
dependencies:
- python=3.8
- pytorch
- pip
- pip:
- transformers
transformers_conda.yml
$ az ml environment create -f transformers-env.yml
Training NER Model Using PubMedBert on Azure ML
Describe Experiment:
experiment_name: nertrain
code:
local_path: .
command: >-
python
train.py --data {inputs.corpus}
environment:
azureml:transformers-env:1
compute:
target: azureml:AzMLGPUCompute
inputs:
corpus:
data: azureml:bc5cdr:1
mode: download
job.yml
Create Compute:
$ az ml compute create –n AzMLGPUCompute
--size Standard_NC6
--max-node-count 2
Submit Job:
$ az ml job create –f job.yml
Result
• COVID-19 not recognized,
because dataset is old
• Some other categories would
be helpful (pharmacokinetics,
biologic fluids, etc.)
• Common entities are also
needed (quantity,
temperature, etc.)
Get trained model:
$ az ml job download -n $ID
--outputs
Text Analytics for Health (Preview)
 Currently in Preview
 Gated service, need to apply for usage
(apply at https://aka.ms/csgate)
 Should not be implemented or deployed in any production use.
 Can be used through Web API or Container Service
 Supports:
 Named Entity Recognition (NER)
 Relation Extraction
 Entity Linking (Ontology Mapping)
 Negation Detection
Entity Extraction
+ Entity Linking, Negation Detection
Relation Extraction
Using Text Analytics for Health
Pip Install the Azure TextAnalytics SDK:
pip install azure.ai.textanalytics==5.1.0b5
from azure.core.credentials import AzureKeyCredential
from azure.ai.textanalytics import TextAnalyticsClient
client = TextAnalyticsClient(endpoint=endpoint,
credential=AzureKeyCredential(key), api_version="v3.1-preview.3")
Create the client:
documents = ["I have not been administered any aspirin, just 300 mg or favipiravir daily."]
poller = client.begin_analyze_healthcare_entities(documents)
result = poller.result()
Do the call:
Analysis Result
I have not been administered any aspirin, just 300 mg or favipiravir
daily.
HealthcareEntity(text=300 mg, category=Dosage, subcategory=None, length=6, offset=47, confidence_score=1.0,
data_sources=None, related_entities={HealthcareEntity(text=favipiravir, category=MedicationName, subcategory=None, length=11, offset=57,
confidence_score=1.0, data_sources=[HealthcareEntityDataSource(entity_id=C1138226, name=UMLS), HealthcareEntityDataSource(entity_id=J05AX27,
name=ATC), HealthcareEntityDataSource(entity_id=DB12466, name=DRUGBANK), HealthcareEntityDataSource(entity_id=398131, name=MEDCIN),
HealthcareEntityDataSource(entity_id=C462182, name=MSH), HealthcareEntityDataSource(entity_id=C81605, name=NCI),
HealthcareEntityDataSource(entity_id=EW5GL2X7E0, name=NCI_FDA)], related_entities={}): 'DosageOfMedication'})
aspirin (C0004057) [MedicationName]
300 mg [Dosage] --DosageOfMedication--> favipiravir (C1138226) [MedicationName]
favipiravir (C1138226) [MedicationName]
daily [Frequency] --FrequencyOfMedication--> favipiravir (C1138226)
[MedicationName]
Analyzing CORD Abstracts
• All abstracts contained in CSV metadata file
• Split 400k papers into chunks of 500
• Id, Title, Journal, Authors, Publication Date
• Shuffle by date in order to get representative sample in each chunk
• Enrich each json file with text analytics data
• Entities, Relations
• Parallel processing using Azure ML
Parallel Sweep Job in Azure ML
CORD Dataset
(metadata.csv)
Output
storage
(Database)
Azure ML Cluster
experiment_name: cog-sweep
algorithm: grid
type: sweep_job
search_space:
number:
type: choice
values: [0, 1]
trial:
command: >-
python process.py
--number {search_space.number}
--nodes 2
--data {inputs.metacord}
inputs:
metacord:
data: azureml:metacord:1
mode: download
max_total_trials: 2
max_concurrent_trials: 2
timeout_minutes: 10000
$ az ml job create –f sweepjob.yml
…
# Parse command-line
df = pd.read_csv(args.data)
for i,(id,x) in enumerate(df.iterrows()):
if i%args.nodes == args.number:
# Process the record
# Store the result
process.py
Results of Text Analytics Processing
{
"gh690dai": {
"id": "gh690dai",
"title": "Beef and Pork Marketing Margins
and Price Spreads during COVID-19",
"authors": "Lusk, Jayson L.; Tonsor,
Glynn T.; Schulz, Lee L.",
"journal": "Appl Econ Perspect Policy",
"abstract": "...",
"publish_time": "2020-10-02",
"entities": [
{
"offset": 0,
"length": 16,
"text": "COVID-19-related",
"category": "Diagnosis",
"confidenceScore": 0.79,
"isNegated": false
},..]
"relations": [
{
"relationType": "TimeOfTreatment",
"bidirectional": false,
"source": {
"uri": "#/documents/0/entities/15",
"text": "previous year",
"category": "Time",
"isNegated": false,
"offset": 704
},
"target": {
"uri": "#/documents/0/entities/13",
"text": "beef",
"category": "TreatmentName",
"isNegated": false,
"offset": 642
}}]},
…
Storing Semi-Structured Data into Cosmos DB
Cosmos DB – NoSQL universal solution
Querying semi-structured data with SQL-like language
Paper
Paper
Entity
Entity
Relation
Collection
…
…
Cosmos DB & Azure Data Solutions
• Real-time access with fast read and write latencies globally, and throughput and consistency all backed by SLAs
• Multi-region writes and data distribution to any Azure region with the click of a button.
• Independently and elastically scale storage and throughput across any Azure region – even during unpredictable traffic
bursts – for unlimited scale worldwide.
Cosmos DB SQL Queries
Get mentioned dosages of a particular medication and papers they
are mentioned in
SELECT p.title, r.source.text
FROM papers p JOIN r IN p.relations
WHERE r.relationType='DosageOfMedication’
AND CONTAINS(r.target.text,'hydro')
Further Exploration: Jupyter in Cosmos DB
SQL in Cosmos DB is somehow limited
Good strategy: make query in Cosmos DB, export to Pandas
Dataframe, final exploration in Python
Jupyter support is built into Cosmos DB
Makes exporting query results to DataFrame easy!
%%sql --database CORD --container Papers --output meds
SELECT e.text, e.isNegated, p.title, p.publish_time,
ARRAY (SELECT VALUE l.id FROM l IN e.links
WHERE l.dataSource='UMLS')[0] AS umls_id
FROM papers p
JOIN e IN p.entities
WHERE e.category = 'MedicationName'
How Medication Strategies Change
Term relations
Term Relations
Terms Co-occurence
Treatment
Medicine
Power BI and No Code / Low Code Data Visualization
• Connect to data, including multiple data sources.
• Shape the data with queries that build insightful, compelling data
models.
• Use the data models to create visualizations and reports.
• Share your report files for others to leverage, build upon, and share.
Exploration: PowerBI
Exploration: PowerBI
Conclusions
Text Mining for Medical Texts can be very valuable resource
for gaining insights into large text corpus.
❶
❷ A Range of Microsoft Technologies can be used to
effectively make this a reality:
• Azure ML for Custom NER training / Parallel Sweep Jobs
• Text Analytics for Health to do NER and ontology mapping
• Cosmos DB to store and query semi-structured data
• Power BI to explore the data interactively to gain insights
• Cosmos DB Jupyter Notebooks to do deep dive into the
data w/Python
Resources
• Article: https://soshnikov.com/science/analyzing-medical-papers-with-azure-and-text-
analytics-for-health/
• Text Analytics for Health
• Azure Machine Learning
• Cosmos DB
• Power BI
• Jupyter Notebooks on Azure Machine Learning
• MS LEARN
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
Thank You! Francesca Lazzeri, PhD
Principal Cloud Advocate Manager, Microsoft
@frlazzeri
Dmitry Soshnikov, PhD
Senior Cloud Advocate, Microsoft
@shwars

Weitere ähnliche Inhalte

Was ist angesagt?

What is artificial intelligence (IA) ?
What is artificial intelligence (IA) ?What is artificial intelligence (IA) ?
What is artificial intelligence (IA) ?Oussama Belakhdar
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its ApplicationsDr Ganesh Iyer
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...Edureka!
 
Pneumonia detection using cnn
Pneumonia detection using cnnPneumonia detection using cnn
Pneumonia detection using cnnTushar Dalvi
 
Introduction to AI & ML
Introduction to AI & MLIntroduction to AI & ML
Introduction to AI & MLMandy Sidana
 
What is Deep Learning?
What is Deep Learning?What is Deep Learning?
What is Deep Learning?NVIDIA
 
How is ai important to the future of cyber security
How is ai important to the future of cyber security How is ai important to the future of cyber security
How is ai important to the future of cyber security Robert Smith
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
Top 10 uses of AI in Healthcare
Top 10 uses of AI in Healthcare Top 10 uses of AI in Healthcare
Top 10 uses of AI in Healthcare Swathi Young
 
(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imagingKyuhwan Jung
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial IntelligencePrakhyath Rai
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 
Cancer detection using data mining
Cancer detection using data miningCancer detection using data mining
Cancer detection using data miningRishabhKumar283
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Edureka!
 

Was ist angesagt? (20)

Artificial Intelligence and Diagnostics
Artificial Intelligence and DiagnosticsArtificial Intelligence and Diagnostics
Artificial Intelligence and Diagnostics
 
Deep learning and Healthcare
Deep learning and HealthcareDeep learning and Healthcare
Deep learning and Healthcare
 
What is artificial intelligence (IA) ?
What is artificial intelligence (IA) ?What is artificial intelligence (IA) ?
What is artificial intelligence (IA) ?
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
 
Pneumonia detection using cnn
Pneumonia detection using cnnPneumonia detection using cnn
Pneumonia detection using cnn
 
Introduction to AI & ML
Introduction to AI & MLIntroduction to AI & ML
Introduction to AI & ML
 
What is Deep Learning?
What is Deep Learning?What is Deep Learning?
What is Deep Learning?
 
How is ai important to the future of cyber security
How is ai important to the future of cyber security How is ai important to the future of cyber security
How is ai important to the future of cyber security
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Machine learning
Machine learningMachine learning
Machine learning
 
Top 10 uses of AI in Healthcare
Top 10 uses of AI in Healthcare Top 10 uses of AI in Healthcare
Top 10 uses of AI in Healthcare
 
(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Cancer detection using data mining
Cancer detection using data miningCancer detection using data mining
Cancer detection using data mining
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
 

Ähnlich wie How Machine Learning and AI Can Support the Fight Against COVID-19

Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector BuilderMark Wilkinson
 
Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEJo-fai Chow
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingJan Wiegelmann
 
Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for EveryoneGiovanna Roda
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science PlatformQAware GmbH
 
ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyJuan Antonio Vizcaino
 
Protocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural NetworkProtocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural NetworkWaqas Tariq
 
API Training Hands-on - EUDAT Summer School
API Training Hands-on - EUDAT Summer SchoolAPI Training Hands-on - EUDAT Summer School
API Training Hands-on - EUDAT Summer SchoolEUDAT
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesJeremy Yang
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkQAware GmbH
 
Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Brendan Tierney
 
Linked Open Data (LOD) part 2
Linked Open Data (LOD)  part 2Linked Open Data (LOD)  part 2
Linked Open Data (LOD) part 2IPLODProject
 

Ähnlich wie How Machine Learning and AI Can Support the Fight Against COVID-19 (20)

Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
High-Performance Computing and OpenSolaris
High-Performance Computing and OpenSolarisHigh-Performance Computing and OpenSolaris
High-Performance Computing and OpenSolaris
 
Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for Everyone
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science Platform
 
ProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easyProteomeXchange: data deposition and data retrieval made easy
ProteomeXchange: data deposition and data retrieval made easy
 
Protocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural NetworkProtocol Type Based Intrusion Detection Using RBF Neural Network
Protocol Type Based Intrusion Detection Using RBF Neural Network
 
API Training Hands-on - EUDAT Summer School
API Training Hands-on - EUDAT Summer SchoolAPI Training Hands-on - EUDAT Summer School
API Training Hands-on - EUDAT Summer School
 
Weld Strata talk
Weld Strata talkWeld Strata talk
Weld Strata talk
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case Studies
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017
 
Linked Open Data (LOD) part 2
Linked Open Data (LOD)  part 2Linked Open Data (LOD)  part 2
Linked Open Data (LOD) part 2
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Kürzlich hochgeladen (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

How Machine Learning and AI Can Support the Fight Against COVID-19

  • 1. How Machine Learning and AI can support the fight against COVID-19 Francesca Lazzeri, PhD Principal Cloud Advocate Manager, Microsoft @frlazzeri Dmitry Soshnikov, PhD Senior Cloud Advocate, Microsoft @shwars
  • 2. Problem Around 30,000 scientific papers related to COVID appear monthly
  • 3. CORD Papers Dataset Data Source https://allenai.org/data/cord-19 https://www.kaggle.com/allen-institute-for-ai/CORD-19- research-challenge CORD-19 Dataset Contains over 400,000 scholarly articles about COVID-19 and the coronavirus family of viruses for use by the global research community 200,000 articles with full text
  • 4. Natural Language Processing Common tasks for NLP: • Intent Classification • Named Entity Recognition (NER) • Keyword Extraction • Text Summarization • Question Answering • Open Domain Question Answering Language Models: • Recurrent Neural Network (LSTM, GRU) • Transformers • GPT-2 • BERT • Microsoft Turing-NLG • GPT-3 Microsoft Learn Module: Introduction to NLP with PyTorch aka.ms/pytorch_nlp docs.microsoft.com/en-us/learn/paths/pytorch-fundamentals/
  • 5. How BERT Works (Simplified) Masked Language Model + Next Sentence Prediction During holidays, I like to ______ with my dog. It is so cute. 0.85 Play 0.05 Sleep 0.09 Fight 0.80 YES 0.20 NO BERT contains 345 million parameters => very difficult to train from scratch! In most of the cases it makes sense to use pre-trained language model.
  • 6. Main Idea Use NLP tools to extract semi-structured data from papers, to enable semantically rich queries over the paper corpus. Extracted JSON Cosmos DB Database Power BI Dashboard SQL Queries Azure Semantic Search NER Relations Text Analytics for Health CORD Corpus
  • 7. Part 1: Extracting Entities and Relations Base Language Model Dataset Kaggle Medical NER: • ~40 papers • ~300 entities Generic BC5CDR Dataset • 1500 papers • 5000 entities • Disease / Chemical Generic BERT Model Pre-training BERT on Medical texts PubMedBERT pre-trained model by Microsoft Research Huggingface Transformer Library: https://huggingface.co/
  • 8. 6794356|t|Tricuspid valve regurgitation and lithium carbonate toxicity in a newborn infant. 6794356|a|A newborn with massive tricuspid regurgitation, atrial flutter, congestive heart failure, and a high serum lithium level is described. This is the first patient to initially manifest tricuspid regurgitation and atrial flutter, and the 11th described patient with cardiac disease among infants exposed to lithium compounds in the first trimester of pregnancy. Sixty-three percent of these infants had tricuspid valve involvement. Lithium carbonate may be a factor in the increasing incidence of congenital heart disease when taken during early pregnancy. It also causes neurologic depression, cyanosis, and cardiac arrhythmia when consumed prior to delivery. 6794356 0 29 Tricuspid valve regurgitation Disease D014262 6794356 34 51 lithium carbonate Chemical D016651 6794356 52 60 toxicity Disease D064420 6794356 105 128 tricuspid regurgitation Disease D014262 6794356 130 144 atrial flutter Disease D001282 6794356 146 170 congestive heart failure Disease D006333 6794356 189 196 lithium Chemical D008094 6794356 265 288 tricuspid regurgitation Disease D014262 6794356 293 307 atrial flutter Disease D001282 6794356 345 360 cardiac disease Disease D006331 6794356 386 393 lithium Chemical D008094 6794356 511 528 Lithium carbonate Chemical D016651 6794356 576 600 congenital heart disease Disease D006331
  • 9. NER as Token Classification Tricuspid valve regurgitation and lithium carbonate toxicity in a newborn infant. Tricuspid B-DIS valve I-DIS regurgitation I-DIS and O lithium B-CHEM carbonate I-CHEM toxicity B-DIS in O a O newborn O infant O . O
  • 10. PubMedBert, Microsoft Research from transformers import AutoTokenizer, BertForTokenClassification, Trainer mname = “microsoft/BiomedNLP-PubMedBERT-base- uncased-abstract” tokenizer = AutoTokenizer.from_pretrained(mname) model = BertForTokenClassification .from_pretrained(mname, num_labels=len(unique_tags)) trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=val_dataset) trainer.train()
  • 11. Notebooks Automated ML UX Designer Reproducibility Automation Deployment Re-training CPU, GPU, FPGAs IoT Edge Azure Machine Learning Enterprise grade service to build and deploy models at scale
  • 12. Training NER Model Using PubMedBert on Azure ML Describe Dataset: name: bc5cdr version: 1 local_path: BC5_data.txt bc5cdr.yml Upload to Azure ML: $ az ml data create -f data_bc5cdr.yml Describe Environment: name: transformers-env version: 1 docker: image: mcr.microsoft.com/ azureml/openmpi3.1.2- cuda10.1-cudnn7-ubuntu18.04 conda_file: file: ./transformers_conda.yml transformers-env.yml channels: - pytorch dependencies: - python=3.8 - pytorch - pip - pip: - transformers transformers_conda.yml $ az ml environment create -f transformers-env.yml
  • 13. Training NER Model Using PubMedBert on Azure ML Describe Experiment: experiment_name: nertrain code: local_path: . command: >- python train.py --data {inputs.corpus} environment: azureml:transformers-env:1 compute: target: azureml:AzMLGPUCompute inputs: corpus: data: azureml:bc5cdr:1 mode: download job.yml Create Compute: $ az ml compute create –n AzMLGPUCompute --size Standard_NC6 --max-node-count 2 Submit Job: $ az ml job create –f job.yml
  • 14. Result • COVID-19 not recognized, because dataset is old • Some other categories would be helpful (pharmacokinetics, biologic fluids, etc.) • Common entities are also needed (quantity, temperature, etc.) Get trained model: $ az ml job download -n $ID --outputs
  • 15. Text Analytics for Health (Preview)  Currently in Preview  Gated service, need to apply for usage (apply at https://aka.ms/csgate)  Should not be implemented or deployed in any production use.  Can be used through Web API or Container Service  Supports:  Named Entity Recognition (NER)  Relation Extraction  Entity Linking (Ontology Mapping)  Negation Detection
  • 16. Entity Extraction + Entity Linking, Negation Detection
  • 18. Using Text Analytics for Health Pip Install the Azure TextAnalytics SDK: pip install azure.ai.textanalytics==5.1.0b5 from azure.core.credentials import AzureKeyCredential from azure.ai.textanalytics import TextAnalyticsClient client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key), api_version="v3.1-preview.3") Create the client: documents = ["I have not been administered any aspirin, just 300 mg or favipiravir daily."] poller = client.begin_analyze_healthcare_entities(documents) result = poller.result() Do the call:
  • 19. Analysis Result I have not been administered any aspirin, just 300 mg or favipiravir daily. HealthcareEntity(text=300 mg, category=Dosage, subcategory=None, length=6, offset=47, confidence_score=1.0, data_sources=None, related_entities={HealthcareEntity(text=favipiravir, category=MedicationName, subcategory=None, length=11, offset=57, confidence_score=1.0, data_sources=[HealthcareEntityDataSource(entity_id=C1138226, name=UMLS), HealthcareEntityDataSource(entity_id=J05AX27, name=ATC), HealthcareEntityDataSource(entity_id=DB12466, name=DRUGBANK), HealthcareEntityDataSource(entity_id=398131, name=MEDCIN), HealthcareEntityDataSource(entity_id=C462182, name=MSH), HealthcareEntityDataSource(entity_id=C81605, name=NCI), HealthcareEntityDataSource(entity_id=EW5GL2X7E0, name=NCI_FDA)], related_entities={}): 'DosageOfMedication'}) aspirin (C0004057) [MedicationName] 300 mg [Dosage] --DosageOfMedication--> favipiravir (C1138226) [MedicationName] favipiravir (C1138226) [MedicationName] daily [Frequency] --FrequencyOfMedication--> favipiravir (C1138226) [MedicationName]
  • 20. Analyzing CORD Abstracts • All abstracts contained in CSV metadata file • Split 400k papers into chunks of 500 • Id, Title, Journal, Authors, Publication Date • Shuffle by date in order to get representative sample in each chunk • Enrich each json file with text analytics data • Entities, Relations • Parallel processing using Azure ML
  • 21. Parallel Sweep Job in Azure ML CORD Dataset (metadata.csv) Output storage (Database) Azure ML Cluster experiment_name: cog-sweep algorithm: grid type: sweep_job search_space: number: type: choice values: [0, 1] trial: command: >- python process.py --number {search_space.number} --nodes 2 --data {inputs.metacord} inputs: metacord: data: azureml:metacord:1 mode: download max_total_trials: 2 max_concurrent_trials: 2 timeout_minutes: 10000 $ az ml job create –f sweepjob.yml … # Parse command-line df = pd.read_csv(args.data) for i,(id,x) in enumerate(df.iterrows()): if i%args.nodes == args.number: # Process the record # Store the result process.py
  • 22. Results of Text Analytics Processing { "gh690dai": { "id": "gh690dai", "title": "Beef and Pork Marketing Margins and Price Spreads during COVID-19", "authors": "Lusk, Jayson L.; Tonsor, Glynn T.; Schulz, Lee L.", "journal": "Appl Econ Perspect Policy", "abstract": "...", "publish_time": "2020-10-02", "entities": [ { "offset": 0, "length": 16, "text": "COVID-19-related", "category": "Diagnosis", "confidenceScore": 0.79, "isNegated": false },..] "relations": [ { "relationType": "TimeOfTreatment", "bidirectional": false, "source": { "uri": "#/documents/0/entities/15", "text": "previous year", "category": "Time", "isNegated": false, "offset": 704 }, "target": { "uri": "#/documents/0/entities/13", "text": "beef", "category": "TreatmentName", "isNegated": false, "offset": 642 }}]}, …
  • 23. Storing Semi-Structured Data into Cosmos DB Cosmos DB – NoSQL universal solution Querying semi-structured data with SQL-like language Paper Paper Entity Entity Relation Collection … …
  • 24. Cosmos DB & Azure Data Solutions • Real-time access with fast read and write latencies globally, and throughput and consistency all backed by SLAs • Multi-region writes and data distribution to any Azure region with the click of a button. • Independently and elastically scale storage and throughput across any Azure region – even during unpredictable traffic bursts – for unlimited scale worldwide.
  • 25. Cosmos DB SQL Queries Get mentioned dosages of a particular medication and papers they are mentioned in SELECT p.title, r.source.text FROM papers p JOIN r IN p.relations WHERE r.relationType='DosageOfMedication’ AND CONTAINS(r.target.text,'hydro')
  • 26. Further Exploration: Jupyter in Cosmos DB SQL in Cosmos DB is somehow limited Good strategy: make query in Cosmos DB, export to Pandas Dataframe, final exploration in Python Jupyter support is built into Cosmos DB Makes exporting query results to DataFrame easy! %%sql --database CORD --container Papers --output meds SELECT e.text, e.isNegated, p.title, p.publish_time, ARRAY (SELECT VALUE l.id FROM l IN e.links WHERE l.dataSource='UMLS')[0] AS umls_id FROM papers p JOIN e IN p.entities WHERE e.category = 'MedicationName'
  • 31. Power BI and No Code / Low Code Data Visualization • Connect to data, including multiple data sources. • Shape the data with queries that build insightful, compelling data models. • Use the data models to create visualizations and reports. • Share your report files for others to leverage, build upon, and share.
  • 34. Conclusions Text Mining for Medical Texts can be very valuable resource for gaining insights into large text corpus. ❶ ❷ A Range of Microsoft Technologies can be used to effectively make this a reality: • Azure ML for Custom NER training / Parallel Sweep Jobs • Text Analytics for Health to do NER and ontology mapping • Cosmos DB to store and query semi-structured data • Power BI to explore the data interactively to gain insights • Cosmos DB Jupyter Notebooks to do deep dive into the data w/Python
  • 35. Resources • Article: https://soshnikov.com/science/analyzing-medical-papers-with-azure-and-text- analytics-for-health/ • Text Analytics for Health • Azure Machine Learning • Cosmos DB • Power BI • Jupyter Notebooks on Azure Machine Learning • MS LEARN
  • 36. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  • 37. Thank You! Francesca Lazzeri, PhD Principal Cloud Advocate Manager, Microsoft @frlazzeri Dmitry Soshnikov, PhD Senior Cloud Advocate, Microsoft @shwars