SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Creating Knowledge
bases from text in
absence of training data.
Sanghamitra Deb
Accenture Technology Laboratory
Phil Rogers, Jana Thompson, Hans Li
Typical Business Process
Executive
Summary
Business
Decisions
hours of knowledge
curation by experts
The Generalized approach of extracting text: Parsing
Tokenization Normalization Parsing Lemmatization
Tokenization: Separating sentences, words, remove
special characters, phrase detections
Normalization: lowering words, word-sense
disambiguation
Parsing: Detecting parts of speech, nouns, verbs etc.
Lemmatization: Remove plurals and different word
forms to a single word (found in the dictionary).
Extract sentences that contain the
speciļ¬c attribute
POS tag and extract unigrams,bigrams
and trigrams centered on nouns
Extract Features: words around nouns:
bag of words/word vectors,
position of the noun and length of sentence.
Train a Machine Learning model to predict which unigrams, bigrams
or trigrams satisfy the speciļ¬c relationship: for example the drug-disease
treatment relationship.
Map training data to create a balanced
positive and negative training set.
The Generalized approach of extracting text : ML
Extract sentences that contain the
speciļ¬c attribute
POS tag and extract unigrams,bigrams
and trigrams centered on nouns
Extract Features: words around nouns:
bag of words/word vectors,
position of the noun and length of sentence.
Train a Machine Learning model to predict which unigrams, bigrams
or trigrams satisfy the speciļ¬c relationship: for example the drug-disease
treatment relationship.
Map training data to create a balanced
positive and negative training set.
The Generalized approach of extracting text : ML
How do we generate this training data?
A diļ¬€erent Approach
Stanford
Replaces training data by encoding domain knowledge
The snorkel approach of Entity Extraction
Extract sentences that contain the
speciļ¬c attribute
POS tag and extract unigrams,bigrams
and trigrams centered on nouns
Write Rules: Encode your domain knowledge
into rules.
Validate Rules: coverage, conļ¬‚icts, accuracy
Run learning: logistic regression, lstm, ā€¦
Examine a random
set of candidates,
create new rules
Observe the lowest
accuracy(highest conļ¬‚ict)
rules and edit them
iterate
Training Data | Rules
.
.
..
.*
.
.
..
.
.*
*
Planetary Orbits
How does snorkel work without training data
Write Rules: Encode your domain knowledge into rules.
The rules are modeled as a Naive Bayes model which assumes that the
rules are conditionally independent.
These probabilities are fed into Machine Learning algorithm: Logistic
Regression in the simplest case to create a model used to make
future predictions
Even though most of the time this is not true, in practice it generates a
pretty good training set with probabilities of being in either class.
http://arxiv.org/pdf/1512.06474v2.pdf
Data Dive: FDA Drug Labels
It is indicated for treating respiratory disorder caused
due to allergy.
For the relief of symptoms of depression.
Evidence supporting efficacy of carbamazepine as an
anticonvulsant was derived from active drug-controlled
studies that enrolled patients with the following seizure
types:
When oral therapy is not feasible and the strength ,
dosage form , and route of administration of the drug
reasonably lend the preparation to the treatment of the
condition
Data Dive: FDA Drug Labels
Candidate Extraction
Using domain knowledge and language structure collect
a set of high recall low precision. Typically this set should
have 80% recall and 20% precision.
60% accuracy, too speciļ¬c need to make it more general
30% accuracy, this looks ļ¬ne
ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦.
ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦.
Automated Features:
pos-tags
context
dep-tree
char-offsets
Rule Functions
Testing Rule Functions:
0
75
150
225
300
-1 0 1
Generation of training data
One rule
0
55
110
165
220
-1 0 1
Generation of training data
two rules
0
45
90
135
180
-1 0 1
Generation of training data
three rules
0
35
70
105
140
-1 0 1
Generation of training data
four rules
0
35
70
105
140
-1 0 1
Generation of training data
20 rules
Results and performance.
drug-name
disease
candidate
Candidates snorkel
Lithium
Carbonate
bipolar
disorder
1 1
Lithium
Carbonate
individual 1 0
Lithium
Carbonate
maintenance 1 0
Lithium
Carbonate
manic episode 1 1
Precision and recall ~90%
Evolution of F1-score with sample size
Relationship extractions
ā€¢Is person X married to person Y?
ā€¢Does drug X cure disease Y?
ā€¢Does software X (example: snorkel) run on programing language Y
(example: python3)
Deļ¬ne ļ¬lters for candidate extraction for a pair (X,Y)
example: (snorkel, python2.7), (snorkel,python3.1), ā€¦
Once you have the pairs , examine them using annotation tool.
Write rules ā€”ā€”> observe their performance against annotated data.
Iterate
Crowdsourced training data
In some cases training data is generated on the same dataset
by multiple people.
In snorkel each source can be incorporated as a separate
rule function.
The model for the rules ļ¬gure out the relative weights for each
person and create a cleaner training data.
Why Docker?
ā€¢ Portability: develop here run
there: Internal Clusters, aws,
google cloud etc, Reusable by
team and clients
ā€¢ isolation: os and docker
isolated from bugs.
ā€¢ Fast
ā€¢ Easy virtualization : hard ware
emulation, virtualized os.
ā€¢ Lightweight
Python stack on docker
FROM ubuntu:latest
# MAINTAINER Sanghamitra Deb <sangha123@gmail.com>
CMD echo Installing Accenture Tech Labs Scientific Python Enviro
RUN apt-get install python -y
RUN apt-get update && apt-get upgrade -y
RUN apt-get install curl -y
RUN apt-get install emacs -y
RUN curl -O https://bootstrap.pypa.io/get-pip.py
RUN python get-pip.py
RUN rm get-pip.py
RUN echo "export PATH=~/.local/bin:$PATH" >> ~/.bashrc
RUN apt-get install python-setuptools build-essential python-dev -y
RUN apt-get install gfortran swig -y
RUN apt-get install libatlas-dev liblapack-dev -y
RUN apt-get install libfreetype6 libfreetype6-dev -y
RUN apt-get install libxft-dev -y
RUN apt-get install libxml2-dev libxslt-dev zlib1g-dev
RUN apt-get install python-numpy
ADD requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt -q
Dockerļ¬le
scipy
matplotlib
ipython
jupyter
pandas
Bottleneck
patsy
pymc
statsmodels
scikit-learn
BeautifulSoup
seaborn
gensim
fuzzywuzzy
xmltodict
untangle
nltk
flask
enum34
requirements.txt
docker build -t sangha/python .
docker run -it -p 1108:1108 -p 1106:1106 --name pharmaExtraction0.1 -v
/location/in/hadoop/ sangha/python bash
docker exec -it pharmaExtraction0.1 bash
docker exec -dĀ  pharmaExtraction0.1 pythonĀ  /root/pycodes/rest_api.py
Building the Dockerļ¬le
Typical ML pipeline vs Snorkel
(1) Candidate Extraction.
(2) Rule Function
(3) Hyperparameter tuning
Snorkel :
Pros:
ā€¢ Very little training
data necessary
ā€¢ Do not have to
think about feature
generation
ā€¢ Do not need deep
knowledge in
Machine Learning
ā€¢ Convenient UI for
data annotation
ā€¢ Created structured
databases from
unstructured text
Cons:
ā€¢ Code is new, so it
may not be robust
to all situations.
ā€¢ Doing online
prediction is
difļ¬cult.
ā€¢ Not much
transparency in the
internal workings.
Banks: Loan
Approval Paleontology
Design of Clinical Trials
Legal
Investigation
Market Research
Reports
Human Trafficking
Skills extraction from resume
Content Marketing
Product descriptions and
reviews
Pharmaceutical
Industry
Applicability across ā€Ø
a variety of industries
and use cases
Where to get it?
https://github.com/HazyResearch/snorkel
http://arxiv.org/pdf/1512.06474v2.pdf

Weitere Ƥhnliche Inhalte

Andere mochten auch

Clinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya GlobalClinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya Globalikya global
Ā 
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6Perficient
Ā 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseJosh Patterson
Ā 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecJosh Patterson
Ā 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonSri Ambati
Ā 
Medical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareMedical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareNUS-ISS
Ā 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...David Talby
Ā 
Machine learning and big data
Machine learning and big dataMachine learning and big data
Machine learning and big dataPoo Kuan Hoong
Ā 
Protocol Understanding_ Clinical Data Management_KatalystHLS
Protocol Understanding_ Clinical Data Management_KatalystHLSProtocol Understanding_ Clinical Data Management_KatalystHLS
Protocol Understanding_ Clinical Data Management_KatalystHLSKatalyst HLS
Ā 
Argus Product Tab Screens - Katalyst HLS
Argus Product Tab Screens - Katalyst HLSArgus Product Tab Screens - Katalyst HLS
Argus Product Tab Screens - Katalyst HLSKatalyst HLS
Ā 
Big Data and Clinical Research: Trends, Issues and Considerations
Big Data and Clinical Research: Trends, Issues and ConsiderationsBig Data and Clinical Research: Trends, Issues and Considerations
Big Data and Clinical Research: Trends, Issues and ConsiderationsMerge eClinicalOS
Ā 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksJosh Patterson
Ā 
Adverse Events and Serious Adverse Events - Katalyst HLS
Adverse Events and Serious Adverse Events - Katalyst HLSAdverse Events and Serious Adverse Events - Katalyst HLS
Adverse Events and Serious Adverse Events - Katalyst HLSKatalyst HLS
Ā 
Overview of Validation in Pharma_Katalyst HLS
Overview of Validation in Pharma_Katalyst HLSOverview of Validation in Pharma_Katalyst HLS
Overview of Validation in Pharma_Katalyst HLSKatalyst HLS
Ā 
Argus Analysis Tab Screen - Katalyst HLS
Argus Analysis Tab Screen - Katalyst HLSArgus Analysis Tab Screen - Katalyst HLS
Argus Analysis Tab Screen - Katalyst HLSKatalyst HLS
Ā 
Argus Event Tab Screen - Katalyst HLS
Argus Event Tab Screen - Katalyst HLSArgus Event Tab Screen - Katalyst HLS
Argus Event Tab Screen - Katalyst HLSKatalyst HLS
Ā 
Clinical Data Management Process Overview_Katalyst HLS
Clinical Data Management Process Overview_Katalyst HLSClinical Data Management Process Overview_Katalyst HLS
Clinical Data Management Process Overview_Katalyst HLSKatalyst HLS
Ā 
Clinical data management process setup
Clinical data management process  setupClinical data management process  setup
Clinical data management process setupDr.K Pati
Ā 

Andere mochten auch (19)

Clinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya GlobalClinical research and clinical data management - Ikya Global
Clinical research and clinical data management - Ikya Global
Ā 
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Flexible Study Design in Oracle Clinical and Remote Data Capture 4.6
Ā 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the Enterprise
Ā 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
Ā 
H2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in PythonH2O for Medicine and Intro to H2O in Python
H2O for Medicine and Intro to H2O in Python
Ā 
Medical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in HealthcareMedical Informatics: Computational Analytics in Healthcare
Medical Informatics: Computational Analytics in Healthcare
Ā 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Ā 
Machine learning and big data
Machine learning and big dataMachine learning and big data
Machine learning and big data
Ā 
Protocol Understanding_ Clinical Data Management_KatalystHLS
Protocol Understanding_ Clinical Data Management_KatalystHLSProtocol Understanding_ Clinical Data Management_KatalystHLS
Protocol Understanding_ Clinical Data Management_KatalystHLS
Ā 
Clinical trial
Clinical trialClinical trial
Clinical trial
Ā 
Argus Product Tab Screens - Katalyst HLS
Argus Product Tab Screens - Katalyst HLSArgus Product Tab Screens - Katalyst HLS
Argus Product Tab Screens - Katalyst HLS
Ā 
Big Data and Clinical Research: Trends, Issues and Considerations
Big Data and Clinical Research: Trends, Issues and ConsiderationsBig Data and Clinical Research: Trends, Issues and Considerations
Big Data and Clinical Research: Trends, Issues and Considerations
Ā 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
Ā 
Adverse Events and Serious Adverse Events - Katalyst HLS
Adverse Events and Serious Adverse Events - Katalyst HLSAdverse Events and Serious Adverse Events - Katalyst HLS
Adverse Events and Serious Adverse Events - Katalyst HLS
Ā 
Overview of Validation in Pharma_Katalyst HLS
Overview of Validation in Pharma_Katalyst HLSOverview of Validation in Pharma_Katalyst HLS
Overview of Validation in Pharma_Katalyst HLS
Ā 
Argus Analysis Tab Screen - Katalyst HLS
Argus Analysis Tab Screen - Katalyst HLSArgus Analysis Tab Screen - Katalyst HLS
Argus Analysis Tab Screen - Katalyst HLS
Ā 
Argus Event Tab Screen - Katalyst HLS
Argus Event Tab Screen - Katalyst HLSArgus Event Tab Screen - Katalyst HLS
Argus Event Tab Screen - Katalyst HLS
Ā 
Clinical Data Management Process Overview_Katalyst HLS
Clinical Data Management Process Overview_Katalyst HLSClinical Data Management Process Overview_Katalyst HLS
Clinical Data Management Process Overview_Katalyst HLS
Ā 
Clinical data management process setup
Clinical data management process  setupClinical data management process  setup
Clinical data management process setup
Ā 

Ƅhnlich wie Data day2017

Arules_TM_Rpart_Markdown
Arules_TM_Rpart_MarkdownArules_TM_Rpart_Markdown
Arules_TM_Rpart_MarkdownAdrian Cuyugan
Ā 
Raptor user manual3.0
Raptor user manual3.0Raptor user manual3.0
Raptor user manual3.0Elizabeth Reyna
Ā 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
Ā 
Amazon cloud search comparison report
Amazon cloud search comparison reportAmazon cloud search comparison report
Amazon cloud search comparison reportDwarakanath Ramachandran
Ā 
Practical catalyst
Practical catalystPractical catalyst
Practical catalystdwm042
Ā 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
Ā 
Amazon cloud search_vs_apache_solr_vs_elasticsearch_comparison_report_v11
Amazon cloud search_vs_apache_solr_vs_elasticsearch_comparison_report_v11Amazon cloud search_vs_apache_solr_vs_elasticsearch_comparison_report_v11
Amazon cloud search_vs_apache_solr_vs_elasticsearch_comparison_report_v11Harish Ganesan
Ā 
TensorFlow BASTA2018 Machinelearning
TensorFlow BASTA2018 MachinelearningTensorFlow BASTA2018 Machinelearning
TensorFlow BASTA2018 MachinelearningMax Kleiner
Ā 
computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1ecomputernotes
Ā 
Good practices for PrestaShop code security and optimization
Good practices for PrestaShop code security and optimizationGood practices for PrestaShop code security and optimization
Good practices for PrestaShop code security and optimizationPrestaShop
Ā 
200612_BioPackathon_ss
200612_BioPackathon_ss200612_BioPackathon_ss
200612_BioPackathon_ssSatoshi Kume
Ā 
Computer notes - data structures
Computer notes - data structuresComputer notes - data structures
Computer notes - data structuresecomputernotes
Ā 
Object Oriented Programming in Matlab
Object Oriented Programming in Matlab Object Oriented Programming in Matlab
Object Oriented Programming in Matlab AlbanLevy
Ā 
Machine learning key to your formulation challenges
Machine learning key to your formulation challengesMachine learning key to your formulation challenges
Machine learning key to your formulation challengesMarc Borowczak
Ā 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiAmazon Web Services
Ā 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftSteve Feldman
Ā 
Exploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningExploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningYunyao Li
Ā 
Accurately and Reliably Extracting Data from the Web:
Accurately and Reliably Extracting Data from the Web: Accurately and Reliably Extracting Data from the Web:
Accurately and Reliably Extracting Data from the Web: butest
Ā 
Automated Unit Testing
Automated Unit TestingAutomated Unit Testing
Automated Unit TestingMike Lively
Ā 
Begin with Machine Learning
Begin with Machine LearningBegin with Machine Learning
Begin with Machine LearningNarong Intiruk
Ā 

Ƅhnlich wie Data day2017 (20)

Arules_TM_Rpart_Markdown
Arules_TM_Rpart_MarkdownArules_TM_Rpart_Markdown
Arules_TM_Rpart_Markdown
Ā 
Raptor user manual3.0
Raptor user manual3.0Raptor user manual3.0
Raptor user manual3.0
Ā 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
Ā 
Amazon cloud search comparison report
Amazon cloud search comparison reportAmazon cloud search comparison report
Amazon cloud search comparison report
Ā 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
Ā 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
Ā 
Amazon cloud search_vs_apache_solr_vs_elasticsearch_comparison_report_v11
Amazon cloud search_vs_apache_solr_vs_elasticsearch_comparison_report_v11Amazon cloud search_vs_apache_solr_vs_elasticsearch_comparison_report_v11
Amazon cloud search_vs_apache_solr_vs_elasticsearch_comparison_report_v11
Ā 
TensorFlow BASTA2018 Machinelearning
TensorFlow BASTA2018 MachinelearningTensorFlow BASTA2018 Machinelearning
TensorFlow BASTA2018 Machinelearning
Ā 
computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1
Ā 
Good practices for PrestaShop code security and optimization
Good practices for PrestaShop code security and optimizationGood practices for PrestaShop code security and optimization
Good practices for PrestaShop code security and optimization
Ā 
200612_BioPackathon_ss
200612_BioPackathon_ss200612_BioPackathon_ss
200612_BioPackathon_ss
Ā 
Computer notes - data structures
Computer notes - data structuresComputer notes - data structures
Computer notes - data structures
Ā 
Object Oriented Programming in Matlab
Object Oriented Programming in Matlab Object Oriented Programming in Matlab
Object Oriented Programming in Matlab
Ā 
Machine learning key to your formulation challenges
Machine learning key to your formulation challengesMachine learning key to your formulation challenges
Machine learning key to your formulation challenges
Ā 
MCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry PiMCL309_Deep Learning on a Raspberry Pi
MCL309_Deep Learning on a Raspberry Pi
Ā 
B2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draftB2 2005 introduction_load_testing_blackboard_primer_draft
B2 2005 introduction_load_testing_blackboard_primer_draft
Ā 
Exploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningExploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active Learning
Ā 
Accurately and Reliably Extracting Data from the Web:
Accurately and Reliably Extracting Data from the Web: Accurately and Reliably Extracting Data from the Web:
Accurately and Reliably Extracting Data from the Web:
Ā 
Automated Unit Testing
Automated Unit TestingAutomated Unit Testing
Automated Unit Testing
Ā 
Begin with Machine Learning
Begin with Machine LearningBegin with Machine Learning
Begin with Machine Learning
Ā 

Mehr von Sanghamitra Deb

Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
Ā 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
Ā 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingSanghamitra Deb
Ā 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
Ā 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
Ā 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...Sanghamitra Deb
Ā 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
Ā 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
Ā 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsSanghamitra Deb
Ā 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsSanghamitra Deb
Ā 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Sanghamitra Deb
Ā 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsSanghamitra Deb
Ā 

Mehr von Sanghamitra Deb (14)

odsc_2023.pdf
odsc_2023.pdfodsc_2023.pdf
odsc_2023.pdf
Ā 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
Ā 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
Ā 
Intro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic ModelingIntro to NLP: Text Categorization and Topic Modeling
Intro to NLP: Text Categorization and Topic Modeling
Ā 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
Ā 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Ā 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
Ā 
Developing Recommendation System to provide a Personalized Learning experienc...
Developing Recommendation System to provide a PersonalizedLearning experienc...Developing Recommendation System to provide a PersonalizedLearning experienc...
Developing Recommendation System to provide a Personalized Learning experienc...
Ā 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
Ā 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
Ā 
NLP and Machine Learning for non-experts
NLP and Machine Learning for non-expertsNLP and Machine Learning for non-experts
NLP and Machine Learning for non-experts
Ā 
Democratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUsDemocratizing NLP content modeling with transfer learning using GPUs
Democratizing NLP content modeling with transfer learning using GPUs
Ā 
Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.Natural Language Comprehension: Human Machine Collaboration.
Natural Language Comprehension: Human Machine Collaboration.
Ā 
Understanding Product Attributes from Reviews
Understanding Product Attributes from ReviewsUnderstanding Product Attributes from Reviews
Understanding Product Attributes from Reviews
Ā 

KĆ¼rzlich hochgeladen

nagpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
nagpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meetnagpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
nagpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetCall Girls Service
Ā 
Call Girls Service Anantapur šŸ“² 6297143586 Book Now VIP Call Girls in Anantapur
Call Girls Service Anantapur šŸ“² 6297143586 Book Now VIP Call Girls in AnantapurCall Girls Service Anantapur šŸ“² 6297143586 Book Now VIP Call Girls in Anantapur
Call Girls Service Anantapur šŸ“² 6297143586 Book Now VIP Call Girls in Anantapurgragmanisha42
Ā 
neemuch Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
neemuch Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meetneemuch Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
neemuch Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetCall Girls Service
Ā 
Nanded Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Nanded Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetNanded Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Nanded Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetCall Girls Service
Ā 
Call Girls in Udaipur Girija Udaipur Call Girl āœ” VQRWTO ā¤ļø 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  āœ” VQRWTO ā¤ļø 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  āœ” VQRWTO ā¤ļø 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl āœ” VQRWTO ā¤ļø 100% offer with...mahaiklolahd
Ā 
coimbatore Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
coimbatore Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meetcoimbatore Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
coimbatore Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetCall Girls Service
Ā 
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur RajasthanJaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthanindiancallgirl4rent
Ā 
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
Ā 
dehradun Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
dehradun Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meetdehradun Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
dehradun Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetCall Girls Service
Ā 
Hubli Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Hubli Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetHubli Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Hubli Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetCall Girls Service
Ā 
palanpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
palanpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meetpalanpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
palanpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetCall Girls Service
Ā 
Call Girl in Bangalore 9632137771 {LowPrice} ā¤ļø (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ā¤ļø (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ā¤ļø (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ā¤ļø (Navya) Bangalore Call Girls ...mahaiklolahd
Ā 
Mathura Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Mathura Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetMathura Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Mathura Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetCall Girls Service
Ā 
Patna Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Patna Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetPatna Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Patna Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetCall Girls Service
Ā 
Ernakulam Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetErnakulam Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetCall Girls Chandigarh
Ā 
Russian Call Girls Kota * 8250192130 Service starts from just ā‚¹9999 āœ…
Russian Call Girls Kota * 8250192130 Service starts from just ā‚¹9999 āœ…Russian Call Girls Kota * 8250192130 Service starts from just ā‚¹9999 āœ…
Russian Call Girls Kota * 8250192130 Service starts from just ā‚¹9999 āœ…gragmanisha42
Ā 
Call Now ā˜Ž 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ā˜Ž 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.Call Now ā˜Ž 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ā˜Ž 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.ktanvi103
Ā 
Ozhukarai Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetOzhukarai Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetCall Girls Service
Ā 
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
Ā 
ā¤ļøCall girls in Jalandhar ā˜Žļø9876848877ā˜Žļø Call Girl service in Jalandharā˜Žļø Jal...
ā¤ļøCall girls in Jalandhar ā˜Žļø9876848877ā˜Žļø Call Girl service in Jalandharā˜Žļø Jal...ā¤ļøCall girls in Jalandhar ā˜Žļø9876848877ā˜Žļø Call Girl service in Jalandharā˜Žļø Jal...
ā¤ļøCall girls in Jalandhar ā˜Žļø9876848877ā˜Žļø Call Girl service in Jalandharā˜Žļø Jal...chandigarhentertainm
Ā 

KĆ¼rzlich hochgeladen (20)

nagpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
nagpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meetnagpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
nagpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ā 
Call Girls Service Anantapur šŸ“² 6297143586 Book Now VIP Call Girls in Anantapur
Call Girls Service Anantapur šŸ“² 6297143586 Book Now VIP Call Girls in AnantapurCall Girls Service Anantapur šŸ“² 6297143586 Book Now VIP Call Girls in Anantapur
Call Girls Service Anantapur šŸ“² 6297143586 Book Now VIP Call Girls in Anantapur
Ā 
neemuch Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
neemuch Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meetneemuch Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
neemuch Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ā 
Nanded Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Nanded Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetNanded Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Nanded Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ā 
Call Girls in Udaipur Girija Udaipur Call Girl āœ” VQRWTO ā¤ļø 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  āœ” VQRWTO ā¤ļø 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  āœ” VQRWTO ā¤ļø 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl āœ” VQRWTO ā¤ļø 100% offer with...
Ā 
coimbatore Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
coimbatore Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meetcoimbatore Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
coimbatore Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ā 
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur RajasthanJaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Ā 
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 9907093804 Top Class Call Girl Service Available
Ā 
dehradun Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
dehradun Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meetdehradun Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
dehradun Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ā 
Hubli Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Hubli Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetHubli Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Hubli Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ā 
palanpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
palanpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meetpalanpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
palanpur Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ā 
Call Girl in Bangalore 9632137771 {LowPrice} ā¤ļø (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ā¤ļø (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ā¤ļø (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ā¤ļø (Navya) Bangalore Call Girls ...
Ā 
Mathura Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Mathura Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetMathura Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Mathura Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ā 
Patna Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Patna Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetPatna Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Patna Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ā 
Ernakulam Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetErnakulam Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ā 
Russian Call Girls Kota * 8250192130 Service starts from just ā‚¹9999 āœ…
Russian Call Girls Kota * 8250192130 Service starts from just ā‚¹9999 āœ…Russian Call Girls Kota * 8250192130 Service starts from just ā‚¹9999 āœ…
Russian Call Girls Kota * 8250192130 Service starts from just ā‚¹9999 āœ…
Ā 
Call Now ā˜Ž 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ā˜Ž 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.Call Now ā˜Ž 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ā˜Ž 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Ā 
Ozhukarai Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real MeetOzhukarai Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls šŸ‘™ 6297143586 šŸ‘™ Genuine WhatsApp Number for Real Meet
Ā 
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Ā 
ā¤ļøCall girls in Jalandhar ā˜Žļø9876848877ā˜Žļø Call Girl service in Jalandharā˜Žļø Jal...
ā¤ļøCall girls in Jalandhar ā˜Žļø9876848877ā˜Žļø Call Girl service in Jalandharā˜Žļø Jal...ā¤ļøCall girls in Jalandhar ā˜Žļø9876848877ā˜Žļø Call Girl service in Jalandharā˜Žļø Jal...
ā¤ļøCall girls in Jalandhar ā˜Žļø9876848877ā˜Žļø Call Girl service in Jalandharā˜Žļø Jal...
Ā 

Data day2017

  • 1. Creating Knowledge bases from text in absence of training data. Sanghamitra Deb Accenture Technology Laboratory Phil Rogers, Jana Thompson, Hans Li
  • 3. The Generalized approach of extracting text: Parsing Tokenization Normalization Parsing Lemmatization Tokenization: Separating sentences, words, remove special characters, phrase detections Normalization: lowering words, word-sense disambiguation Parsing: Detecting parts of speech, nouns, verbs etc. Lemmatization: Remove plurals and different word forms to a single word (found in the dictionary).
  • 4. Extract sentences that contain the speciļ¬c attribute POS tag and extract unigrams,bigrams and trigrams centered on nouns Extract Features: words around nouns: bag of words/word vectors, position of the noun and length of sentence. Train a Machine Learning model to predict which unigrams, bigrams or trigrams satisfy the speciļ¬c relationship: for example the drug-disease treatment relationship. Map training data to create a balanced positive and negative training set. The Generalized approach of extracting text : ML
  • 5. Extract sentences that contain the speciļ¬c attribute POS tag and extract unigrams,bigrams and trigrams centered on nouns Extract Features: words around nouns: bag of words/word vectors, position of the noun and length of sentence. Train a Machine Learning model to predict which unigrams, bigrams or trigrams satisfy the speciļ¬c relationship: for example the drug-disease treatment relationship. Map training data to create a balanced positive and negative training set. The Generalized approach of extracting text : ML How do we generate this training data?
  • 6. A diļ¬€erent Approach Stanford Replaces training data by encoding domain knowledge
  • 7. The snorkel approach of Entity Extraction Extract sentences that contain the speciļ¬c attribute POS tag and extract unigrams,bigrams and trigrams centered on nouns Write Rules: Encode your domain knowledge into rules. Validate Rules: coverage, conļ¬‚icts, accuracy Run learning: logistic regression, lstm, ā€¦ Examine a random set of candidates, create new rules Observe the lowest accuracy(highest conļ¬‚ict) rules and edit them iterate
  • 8. Training Data | Rules . . .. .* . . .. . .* * Planetary Orbits
  • 9. How does snorkel work without training data Write Rules: Encode your domain knowledge into rules. The rules are modeled as a Naive Bayes model which assumes that the rules are conditionally independent. These probabilities are fed into Machine Learning algorithm: Logistic Regression in the simplest case to create a model used to make future predictions Even though most of the time this is not true, in practice it generates a pretty good training set with probabilities of being in either class. http://arxiv.org/pdf/1512.06474v2.pdf
  • 10. Data Dive: FDA Drug Labels
  • 11. It is indicated for treating respiratory disorder caused due to allergy. For the relief of symptoms of depression. Evidence supporting efficacy of carbamazepine as an anticonvulsant was derived from active drug-controlled studies that enrolled patients with the following seizure types: When oral therapy is not feasible and the strength , dosage form , and route of administration of the drug reasonably lend the preparation to the treatment of the condition Data Dive: FDA Drug Labels
  • 12. Candidate Extraction Using domain knowledge and language structure collect a set of high recall low precision. Typically this set should have 80% recall and 20% precision. 60% accuracy, too speciļ¬c need to make it more general 30% accuracy, this looks ļ¬ne ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦. ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦ā€¦.
  • 16. 0 75 150 225 300 -1 0 1 Generation of training data One rule
  • 17. 0 55 110 165 220 -1 0 1 Generation of training data two rules
  • 18. 0 45 90 135 180 -1 0 1 Generation of training data three rules
  • 19. 0 35 70 105 140 -1 0 1 Generation of training data four rules
  • 20. 0 35 70 105 140 -1 0 1 Generation of training data 20 rules
  • 21. Results and performance. drug-name disease candidate Candidates snorkel Lithium Carbonate bipolar disorder 1 1 Lithium Carbonate individual 1 0 Lithium Carbonate maintenance 1 0 Lithium Carbonate manic episode 1 1 Precision and recall ~90%
  • 22. Evolution of F1-score with sample size
  • 23. Relationship extractions ā€¢Is person X married to person Y? ā€¢Does drug X cure disease Y? ā€¢Does software X (example: snorkel) run on programing language Y (example: python3) Deļ¬ne ļ¬lters for candidate extraction for a pair (X,Y) example: (snorkel, python2.7), (snorkel,python3.1), ā€¦ Once you have the pairs , examine them using annotation tool. Write rules ā€”ā€”> observe their performance against annotated data. Iterate
  • 24. Crowdsourced training data In some cases training data is generated on the same dataset by multiple people. In snorkel each source can be incorporated as a separate rule function. The model for the rules ļ¬gure out the relative weights for each person and create a cleaner training data.
  • 25. Why Docker? ā€¢ Portability: develop here run there: Internal Clusters, aws, google cloud etc, Reusable by team and clients ā€¢ isolation: os and docker isolated from bugs. ā€¢ Fast ā€¢ Easy virtualization : hard ware emulation, virtualized os. ā€¢ Lightweight Python stack on docker
  • 26. FROM ubuntu:latest # MAINTAINER Sanghamitra Deb <sangha123@gmail.com> CMD echo Installing Accenture Tech Labs Scientific Python Enviro RUN apt-get install python -y RUN apt-get update && apt-get upgrade -y RUN apt-get install curl -y RUN apt-get install emacs -y RUN curl -O https://bootstrap.pypa.io/get-pip.py RUN python get-pip.py RUN rm get-pip.py RUN echo "export PATH=~/.local/bin:$PATH" >> ~/.bashrc RUN apt-get install python-setuptools build-essential python-dev -y RUN apt-get install gfortran swig -y RUN apt-get install libatlas-dev liblapack-dev -y RUN apt-get install libfreetype6 libfreetype6-dev -y RUN apt-get install libxft-dev -y RUN apt-get install libxml2-dev libxslt-dev zlib1g-dev RUN apt-get install python-numpy ADD requirements.txt /tmp/requirements.txt RUN pip install -r /tmp/requirements.txt -q Dockerļ¬le scipy matplotlib ipython jupyter pandas Bottleneck patsy pymc statsmodels scikit-learn BeautifulSoup seaborn gensim fuzzywuzzy xmltodict untangle nltk flask enum34 requirements.txt docker build -t sangha/python . docker run -it -p 1108:1108 -p 1106:1106 --name pharmaExtraction0.1 -v /location/in/hadoop/ sangha/python bash docker exec -it pharmaExtraction0.1 bash docker exec -dĀ  pharmaExtraction0.1 pythonĀ  /root/pycodes/rest_api.py Building the Dockerļ¬le
  • 27. Typical ML pipeline vs Snorkel (1) Candidate Extraction. (2) Rule Function (3) Hyperparameter tuning
  • 28. Snorkel : Pros: ā€¢ Very little training data necessary ā€¢ Do not have to think about feature generation ā€¢ Do not need deep knowledge in Machine Learning ā€¢ Convenient UI for data annotation ā€¢ Created structured databases from unstructured text Cons: ā€¢ Code is new, so it may not be robust to all situations. ā€¢ Doing online prediction is difļ¬cult. ā€¢ Not much transparency in the internal workings.
  • 29. Banks: Loan Approval Paleontology Design of Clinical Trials Legal Investigation Market Research Reports Human Trafficking Skills extraction from resume Content Marketing Product descriptions and reviews Pharmaceutical Industry Applicability across ā€Ø a variety of industries and use cases
  • 30. Where to get it? https://github.com/HazyResearch/snorkel http://arxiv.org/pdf/1512.06474v2.pdf