SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Why Bacterial exotoxin identification?
   Major cause of diseases, leading to symptoms and lesions
   during infection
   • Becomes important to study there mechanism to fight against

   There toxins are specific to a species

   • So species specific information is needed

   Exotoxins in particular, though completely neutralized in
   vivo, are only partialy inhibited in vitro
   • Implying they are regulated by environmental signals as well, study of
     properties that interact with the environment becomes important

   Most bacteria become resistant to antibiotics because of
   mutation or genetic recombination
   • Requires identification of new sequences

   Futher inactive exotoxins that form toxoids, still reatining the
   antigenic properties can be used to cure cartain disesases
Support Vector Machine?
Introduced by Vapnik, in 1992.

Set of related supervised learning methods that analyze and
recognize patterns

Used for classification and regression analysis


Non-probablistic binary linear classifier


Based on statistical learning and optimization theories


Can handle multiple, continuous as well as categorical data
Principle
        • Representation of examples as points in space
        • Mapped such that examples of separate
          categories are divided by a gap as wide as
          possible




        • Constructs a hyperplane or a set of hyperplane
          in high or infinite dimensional space
        • Such that the hyperplane is at maximum
          distance from nearest data point of either of
          the classes
Working:
Given a training set of instance-label pairs (xi , yi ), i = 1, . . . , n , where xi ∈ Rn
and yi ∈ {1, −1} as below:
                                                Maximize the margin (from the nearest
                                                 data points of either classes), m = yi
                                                         (wTxi + b) = 1 /||w||
                       w/||w||

                              (x1, 1)

                                                Original problem in finite dimensional
                                                space may not be linearly separable , so
                                                 mapped to higher dimensional space
                          m



            (xn, -1)             wTx + b = 0
                                                Intoduction of kernel function to make
                                                  computations in higher dimenional
                                                             space easier.
Optimization problem
require the solution of the following optimization problem:

min w,b,ξ (1/2)wTw+C Σξi,
subject to yi (wT φ(xi ) + b) ≥ 1 − ξi ,
ξi ≥ 0, where

  φ – function mapping from input space to feature space
  C > 0 is the penalty parameter of the error term.
  ξi - error term introduced

The dual solution of the optimization problem found using Lagrange’s
theorem , depends only on the inner product of the support vectors
and the new vector x, to determine its class.

Kernel Function, given by K(x,z) = φ(x). φ(z) makes SVM to learn in
the high dimensional feature space without having to explicitly
calculate φ(x).
Kernel Function
A valid kernel function must satisfy Mercer Theorem which defines that the
corresponding kernel matrix be symmetric positive semi-definite (zTKz >= 0).
Following are commonly used kernel functions:

linear: K(xi , xj ) = xT xj

polynomial: K(xi , xj ) = (γxi T xj + r)d , γ > 0

radial basis function (RBF): K(xi , xj ) = exp(−γ|xi − xj|2 ), γ > 0

sigmoid: K(xi , xj ) = tanh(γxi T xj + r).


 Effectivenss of SVM depends on the selection of kernel, kernel parameters
 and the soft margin paarmeter C.
Data Collection
    To model SVM to classify human pathogenic bacterial toxins from nontoxins, 2 major
          databases were compiled, that of bacterial toxins and that of nontoxins.



  294 bacterial toxin sequences were taken from the Bacterial Toxin Database from the site
                             http://www.hpppi/iicb.res.in/btox



 It contained representative protein sequences from 24 different genus of human pathogenic
                                  bacteria inFASTA format



 this database created after evaluating and processing over the 4750 toxin sequences from 24
  different genus, retrieved from NCBI: www.ncbi.nlm.nih.gov, to remove the redundancies,




                               and obtain the representatives
Next 2940 nontoxinsequences were manually assembled from NCBI,




      Selecting protein sequences siginificant to metabolic processes and others




and then removing the sequences with more than 90% sequence identity using CDhit




Of the 294 toxin(positive samples) and 2940 nontoxin(negative samples) sequences,
44 toxin and 440 nontoxin set apart for       remaining 250 toxin and 2500 nontoxin
                testing                                  feature vectors.
Feature Extraction
 twelve physicochemicalproperties have been employed to describe
 each protein

  • Including include Hydrophobicity, Contact Features,Absolute Entropy, Hydration
    Potential, Isoelectric point, Net Charge, Normalisedflexibility parameters, Relative
    Mutability, Side chain Oriental Preference,Occurence frequency, PkARcooh,and
    Polarity

 ith feature in the feature vector of jth protein sequence, for i = 1, 2,
 ...,12 is given by,
 Fj(i) = Σ(prpk(i) * Nk)/N, where
  • prpk(i) : ith property of the kth aminoacid,∀ k=1, 2, ..., 20
  • Nk : number of kth aminoacid residue in the sequence
  • N : length of the sequence

 dipeptides and tripeptides composition; to reduce the dimensionality
 of feature space, amino acids grouped according to properties into 11
 groups:
  • FWY, R, K, DE, H, M, QN, ST,C, and AGILVP
LIBSVM tool
      svmtrain:               svmpredict:
 for preparing models     that predicts the class
  (classifiers) trained        of the test or
   from training sets     experimental samples

 Steps followed before applying svmtrain module:

 • checkdata.py from the tools folder in the package to check if the data
 intances are in acceptable format.
 • Application of subset.py from the tools folder to subset the data instances
 into 80% and remaining 20%, training and testing modules
 • Scale the data, using svmscale
 • Application of grid.py from the tools folder again for selection of optimal
 parameter values to the kernel function and parameter, C
           The values for g and C were incremented stepwise(step 1) through a
 combination of :
                    powers of 2 from -11 through to +3 for g, and
                    powers of 2 from -9 to +5 for C using the tool grid.py,
 which used 5fold cross validation accuracy to select the optimal parameter
 set.
LIBSVM also provides a tool fselect.py to remove possible redundant
features from original feature set.

fselect.py ranks the features by assigning them a Fscore value.
Higher the value, more significant is the feature in prediction of classes.


Performance Evaluation


· Accuracy = (TP + TN)/(TP +TN + FP + FN)
· Balanced Accuracy, BAC = (Specificity + Sensitivity)/2 , where
◦ Specificity = TP/(TP + FP)
◦ Sensitivity = TP/(TP + FN)
· AUC : area under the curve of sensitivity against (1specificity)
· Matthew's correlation coefficient[1],
MCC = (TP*TN – FP*FN)/((TN+FN)*(TN+FP)*(TP+FP)*(TP+FN))^(1/2)
Result
•92.27% average accuracy and 0.998 area under curve (AUC) values were
obtained when all the features (298) were utilized whereas ,
•91.16% accuracy and 0.94 AUC were achieved with an optimized set of 114
features (supplementary file 2).
•Much higher accuracies were achieved (98.13% and 97.92% for 298 and 114
features, respectively) when an absolutely separate test set consisting of
39toxins and 390 non-toxins (1:10 ratio) were used to test.


Conclusion
 The top features can be studied to identify the important functionalities of the
 toxic proteins.

 Effective in identifying the bacterial toxins, not being computationally
 intensive at the same time.
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing홍배 김
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersIDES Editor
 
Lecture 2
Lecture 2Lecture 2
Lecture 2butest
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorizationrecsysfr
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...Kazuki Fujikawa
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkKazuki Fujikawa
 
Amnestic neural network for classification
Amnestic neural network for classificationAmnestic neural network for classification
Amnestic neural network for classificationlolokikipipi
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function InterpolationJesse Bettencourt
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier홍배 김
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machinesNawal Sharma
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsKen Kuroki
 

Was ist angesagt? (20)

Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR Filters
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
ICPR 2016
ICPR 2016ICPR 2016
ICPR 2016
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
Amnestic neural network for classification
Amnestic neural network for classificationAmnestic neural network for classification
Amnestic neural network for classification
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function Interpolation
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Zoooooohaib
ZoooooohaibZoooooohaib
Zoooooohaib
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 

Andere mochten auch

Andere mochten auch (9)

Bacterial chemotaxis swaati
Bacterial chemotaxis swaatiBacterial chemotaxis swaati
Bacterial chemotaxis swaati
 
Chemotaxis
ChemotaxisChemotaxis
Chemotaxis
 
Chemotaxis
ChemotaxisChemotaxis
Chemotaxis
 
Phagocytosis and immunity
Phagocytosis and immunityPhagocytosis and immunity
Phagocytosis and immunity
 
Phagocytosis
PhagocytosisPhagocytosis
Phagocytosis
 
Phagocytosis
PhagocytosisPhagocytosis
Phagocytosis
 
Phagocytosis
PhagocytosisPhagocytosis
Phagocytosis
 
Macrophages
MacrophagesMacrophages
Macrophages
 
Antigen and antibody reaction
Antigen and antibody reactionAntigen and antibody reaction
Antigen and antibody reaction
 

Ähnlich wie Svm dbeth

Introduction
IntroductionIntroduction
Introductionbutest
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
Support vector machine
Support vector machineSupport vector machine
Support vector machinePrasenjit Dey
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...cscpconf
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
support vector machine algorithm in machine learning
support vector machine algorithm in machine learningsupport vector machine algorithm in machine learning
support vector machine algorithm in machine learningSamGuy7
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svmtaikhoan262
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...cscpconf
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackarogozhnikov
 

Ähnlich wie Svm dbeth (20)

support vector machine
support vector machinesupport vector machine
support vector machine
 
Introduction
IntroductionIntroduction
Introduction
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Support Vector Machine.ppt
Support Vector Machine.pptSupport Vector Machine.ppt
Support Vector Machine.ppt
 
svm.ppt
svm.pptsvm.ppt
svm.ppt
 
support vector machine algorithm in machine learning
support vector machine algorithm in machine learningsupport vector machine algorithm in machine learning
support vector machine algorithm in machine learning
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Guide
GuideGuide
Guide
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 

Kürzlich hochgeladen

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Kürzlich hochgeladen (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Svm dbeth

  • 1.
  • 2. Why Bacterial exotoxin identification? Major cause of diseases, leading to symptoms and lesions during infection • Becomes important to study there mechanism to fight against There toxins are specific to a species • So species specific information is needed Exotoxins in particular, though completely neutralized in vivo, are only partialy inhibited in vitro • Implying they are regulated by environmental signals as well, study of properties that interact with the environment becomes important Most bacteria become resistant to antibiotics because of mutation or genetic recombination • Requires identification of new sequences Futher inactive exotoxins that form toxoids, still reatining the antigenic properties can be used to cure cartain disesases
  • 3. Support Vector Machine? Introduced by Vapnik, in 1992. Set of related supervised learning methods that analyze and recognize patterns Used for classification and regression analysis Non-probablistic binary linear classifier Based on statistical learning and optimization theories Can handle multiple, continuous as well as categorical data
  • 4. Principle • Representation of examples as points in space • Mapped such that examples of separate categories are divided by a gap as wide as possible • Constructs a hyperplane or a set of hyperplane in high or infinite dimensional space • Such that the hyperplane is at maximum distance from nearest data point of either of the classes
  • 5. Working: Given a training set of instance-label pairs (xi , yi ), i = 1, . . . , n , where xi ∈ Rn and yi ∈ {1, −1} as below: Maximize the margin (from the nearest data points of either classes), m = yi (wTxi + b) = 1 /||w|| w/||w|| (x1, 1) Original problem in finite dimensional space may not be linearly separable , so mapped to higher dimensional space m (xn, -1) wTx + b = 0 Intoduction of kernel function to make computations in higher dimenional space easier.
  • 6. Optimization problem require the solution of the following optimization problem: min w,b,ξ (1/2)wTw+C Σξi, subject to yi (wT φ(xi ) + b) ≥ 1 − ξi , ξi ≥ 0, where φ – function mapping from input space to feature space C > 0 is the penalty parameter of the error term. ξi - error term introduced The dual solution of the optimization problem found using Lagrange’s theorem , depends only on the inner product of the support vectors and the new vector x, to determine its class. Kernel Function, given by K(x,z) = φ(x). φ(z) makes SVM to learn in the high dimensional feature space without having to explicitly calculate φ(x).
  • 7. Kernel Function A valid kernel function must satisfy Mercer Theorem which defines that the corresponding kernel matrix be symmetric positive semi-definite (zTKz >= 0). Following are commonly used kernel functions: linear: K(xi , xj ) = xT xj polynomial: K(xi , xj ) = (γxi T xj + r)d , γ > 0 radial basis function (RBF): K(xi , xj ) = exp(−γ|xi − xj|2 ), γ > 0 sigmoid: K(xi , xj ) = tanh(γxi T xj + r). Effectivenss of SVM depends on the selection of kernel, kernel parameters and the soft margin paarmeter C.
  • 8. Data Collection To model SVM to classify human pathogenic bacterial toxins from nontoxins, 2 major databases were compiled, that of bacterial toxins and that of nontoxins. 294 bacterial toxin sequences were taken from the Bacterial Toxin Database from the site http://www.hpppi/iicb.res.in/btox It contained representative protein sequences from 24 different genus of human pathogenic bacteria inFASTA format this database created after evaluating and processing over the 4750 toxin sequences from 24 different genus, retrieved from NCBI: www.ncbi.nlm.nih.gov, to remove the redundancies, and obtain the representatives
  • 9. Next 2940 nontoxinsequences were manually assembled from NCBI, Selecting protein sequences siginificant to metabolic processes and others and then removing the sequences with more than 90% sequence identity using CDhit Of the 294 toxin(positive samples) and 2940 nontoxin(negative samples) sequences, 44 toxin and 440 nontoxin set apart for remaining 250 toxin and 2500 nontoxin testing feature vectors.
  • 10. Feature Extraction twelve physicochemicalproperties have been employed to describe each protein • Including include Hydrophobicity, Contact Features,Absolute Entropy, Hydration Potential, Isoelectric point, Net Charge, Normalisedflexibility parameters, Relative Mutability, Side chain Oriental Preference,Occurence frequency, PkARcooh,and Polarity ith feature in the feature vector of jth protein sequence, for i = 1, 2, ...,12 is given by, Fj(i) = Σ(prpk(i) * Nk)/N, where • prpk(i) : ith property of the kth aminoacid,∀ k=1, 2, ..., 20 • Nk : number of kth aminoacid residue in the sequence • N : length of the sequence dipeptides and tripeptides composition; to reduce the dimensionality of feature space, amino acids grouped according to properties into 11 groups: • FWY, R, K, DE, H, M, QN, ST,C, and AGILVP
  • 11. LIBSVM tool svmtrain: svmpredict: for preparing models that predicts the class (classifiers) trained of the test or from training sets experimental samples Steps followed before applying svmtrain module: • checkdata.py from the tools folder in the package to check if the data intances are in acceptable format. • Application of subset.py from the tools folder to subset the data instances into 80% and remaining 20%, training and testing modules • Scale the data, using svmscale • Application of grid.py from the tools folder again for selection of optimal parameter values to the kernel function and parameter, C The values for g and C were incremented stepwise(step 1) through a combination of : powers of 2 from -11 through to +3 for g, and powers of 2 from -9 to +5 for C using the tool grid.py, which used 5fold cross validation accuracy to select the optimal parameter set.
  • 12. LIBSVM also provides a tool fselect.py to remove possible redundant features from original feature set. fselect.py ranks the features by assigning them a Fscore value. Higher the value, more significant is the feature in prediction of classes. Performance Evaluation · Accuracy = (TP + TN)/(TP +TN + FP + FN) · Balanced Accuracy, BAC = (Specificity + Sensitivity)/2 , where ◦ Specificity = TP/(TP + FP) ◦ Sensitivity = TP/(TP + FN) · AUC : area under the curve of sensitivity against (1specificity) · Matthew's correlation coefficient[1], MCC = (TP*TN – FP*FN)/((TN+FN)*(TN+FP)*(TP+FP)*(TP+FN))^(1/2)
  • 13. Result •92.27% average accuracy and 0.998 area under curve (AUC) values were obtained when all the features (298) were utilized whereas , •91.16% accuracy and 0.94 AUC were achieved with an optimized set of 114 features (supplementary file 2). •Much higher accuracies were achieved (98.13% and 97.92% for 298 and 114 features, respectively) when an absolutely separate test set consisting of 39toxins and 390 non-toxins (1:10 ratio) were used to test. Conclusion The top features can be studied to identify the important functionalities of the toxic proteins. Effective in identifying the bacterial toxins, not being computationally intensive at the same time.