Final Project Transciption Factor DNA binding Prediction

•Als PPT, PDF herunterladen•

1 gefällt mir•444 views

UT, San Antonio

Technologie Bildung

Transcription Factor-DNA binding
prediction
Tahmina Ahmed
Prosunjit Biswas
Iffat Sharmin Chowdhury
Badri Sampath

1

Motivation
• Label the unlabeled DNA sequences by the model,
built by examining the labeled DNA sequences
and be able to perceive some real world Machine
Learning problems.

2

Approaches
• K-mer based
Fixed length K-mer
K-mer with Mismatches
Using Regular Expression
• PWM based
MEME and MAST
• Combined Model
Unite both model

3

K-mer Approach Based on Regular
Expression
Motivation
2-mer appears mostly in the sequences. So, emphasize
mostly on 2-mer.

Strategy
- For any two 2-mers X & Y, generate regular expression
X(.*)Y and Y(.*)X.
- Use these Regular expression as candidate attribute.

Classifier Selection

Fig : Around 9 classifiers applied on TF data set
Algorithms are numbered as follows -
(1)Logistic (2)SMO (3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging
7)LogitBoost (8)RandomForest (9)J48
Summary -
* 9 classifiers are applied on 10 data set. 3 are shown among them
* choosing an absolute classifier is not a trivial task
* same classifier behaves differently on different data sets
5

Change in Accuracy due to Different Classifiers

Logistic J48 RandomForest NaiveBayes Logistic J48 RandomForest NaiveBayes

Fig : The performance of different types of Classifiers on TF_3 data set Fig : The performance of different types of Classifiers on TF_5 data set

Summary -
* classifiers have great consequences on accuracy
* one has to be prudent when choosing classifiers

6

Change in Accuracy due to Different K-mer
Length

4-mer 5-mer 6-mer
Fig : The performance of different length K-mer on TF_3 data set

Summary -
* K-mer length also has consequences on accuracy
* not trivial, difficult to find the absolute one

7

Attribute Space Selection

Fig : The performance of different selecting k-mer on TF_4 data set

Summary -
* considering number of attributes also has consequences on accuracy
* accuracy increases if we consider greater number of attributes, but from such
saturation point it decreases.

8

PWM based Analysis on Accuracy
(TF_1 data set)

Fig : J48, minW 6 - maxW 15, no. of sites 10 Fig : J48, minW 6 – maxW 15, no. of motifs 5
Summary -
* accuracy increases when we have more motifs but fixed no. of sites
* accuracy increases when we have more sites but fixed no. of motifs
* what happened when we increases both ?????

9

PWM based Analysis

Fig : Accuracy vary on no. of motifs and no. of sites

* 1st bar concern with no. of sites
* 2nd bar concern with no. of motifs
* 3rd bar concern with accuracy
* the point is that accuracy decreases when we increases no. of motifs and no. of sites.

Extra Work for TF_20

Sequences
identified by
both model
K-mer
The New Model
+ for TF-20
Pwm Sequences Biased 2- Newly
identified mer Model Labeled
differently Sequences

Fig : Flow diagram of Building New Model for TF-20

Summary -
* we have done some extra work for TF_20

AUC based on the Feedback (bonus model)

Fig : AUC of 10 data sets based on last submission

* accuracy improved than first submission
* PWM does not have pleasant result

12

Participation
Background Working Working Paramete Automation
Study with Tools with r Tuning
Models
Badri DNA,RNA, AlignAce, PWM K-mer Arff Writer,
Sampath protein, MEME, Mast output
motif MAST writer
Iffat Protein, Weka, K-mer PWM Script for
Sharmin Motif, AlignAce, FASTA,
Chowdhury Transcriptio ScanAce Weka
n
Prosunjit DNA, MEME, K-mer PWM Script for
Biswas Transcriptio MAST RE, for new
nK-mer model
Tahmina MEME, MEME, PWM K-mer Script for
Ahmed MAST, MAST, MEME,
PWM Weka MAST

13

Weitere ähnliche Inhalte

Ähnlich wie Final Project Transciption Factor DNA binding Prediction

Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software

2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduhoKim Du-Ho

Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...TSC University of Mondragon

Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta

Pragmatic model checking: from theory to implementationsUniversität Rostock

(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...Naoki Shibata

The CTO's Espresso Guide to SONMindspeed Technologies

Inference acceleratorsDarshanG13

Exploiting contextual information for improved phoeneme recognitionSebastian Hafner

Presentation of the open source CFD code Code_SaturneRenuda SARL

BWA-MEM2-IPDPS 2019Sanchit Misra

Neural Field aware Factorization MachineInMobi

Ai final ppt with InMobi templateGunjan Sharma

Rethinking Attention with PerformersJoonhyung Lee

Solido Pvt Corner Package DatasheetSolido Design Automation

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf

Conv-TasNet.pdfssuser849b73

Optimization of Electrical Machines in the Cloud with SyMSpace by LCMcloudSME

Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...Mumbai B.Sc.IT Study

UNIT 2.pptxlalithamani sampath

Ähnlich wie Final Project Transciption Factor DNA binding Prediction (20)

Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors

2010 nephee 01_smart_grid과제진행및이슈사항_20100630_kimduho

Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH...

Large Scale Kernel Learning using Block Coordinate Descent

Pragmatic model checking: from theory to implementations

(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...

The CTO's Espresso Guide to SON

Inference accelerators

Exploiting contextual information for improved phoeneme recognition

Presentation of the open source CFD code Code_Saturne

BWA-MEM2-IPDPS 2019

Neural Field aware Factorization Machine

Ai final ppt with InMobi template

Rethinking Attention with Performers

Solido Pvt Corner Package Datasheet

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...

Conv-TasNet.pdf

Optimization of Electrical Machines in the Cloud with SyMSpace by LCM

Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2017...

UNIT 2.pptx

Mehr von UT, San Antonio

digital certificate - types and formatsUT, San Antonio

Saml metadataUT, San Antonio

Static Analysis with SonarlintUT, San Antonio

Shellshock- from bug towards vulnerabilityUT, San Antonio

Abac17 prosun-slidesUT, San Antonio

RecitationUT, San Antonio

Big Data Processing: Performance Gain Through In-Memory ComputationUT, San Antonio

Enumerated authorization policy ABAC (EP-ABAC) modelUT, San Antonio

Where is my Privacy presentation slideshow (one page only)UT, San Antonio

Three month courseUT, San Antonio

One month-syllabusUT, San Antonio

Zerovm backgroudUT, San Antonio

Security_of_openstack_keystoneUT, San Antonio

Research seminar group_1_prosunjitUT, San Antonio

KsiUT, San Antonio

Attribute Based EncryptionUT, San Antonio

Cyber Security Exam 2UT, San Antonio

Transcription Factor DNA Binding PredictionUT, San Antonio

Mehr von UT, San Antonio (20)

digital certificate - types and formats

Saml metadata

Static Analysis with Sonarlint

Shellshock- from bug towards vulnerability

Abac17 prosun-slides

Recitation

Big Data Processing: Performance Gain Through In-Memory Computation

Enumerated authorization policy ABAC (EP-ABAC) model

Where is my Privacy presentation slideshow (one page only)

Three month course

One month-syllabus

Zerovm backgroud

Security_of_openstack_keystone

Research seminar group_1_prosunjit

Ksi

Attribute Based Encryption

Cyber Security Exam 2

Transcription Factor DNA Binding Prediction

Kürzlich hochgeladen

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Scaling API-first – The story of a global engineering organizationRadu Cotescu

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

GenCyber Cyber Security Day PresentationMichael W. Hawkins

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors

Boost Fertility New Invention Ups Success Rates.pdf

What Are The Drone Anti-jamming Systems Technology?

Partners Life - Insurer Innovation Award 2024

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

How to Troubleshoot Apps for the Modern Connected Worker

Powerful Google developer tools for immediate impact! (2023-24 C)

How to Troubleshoot Apps for the Modern Connected Worker

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Scaling API-first – The story of a global engineering organization

[2024]Digital Global Overview Report 2024 Meltwater.pdf

GenCyber Cyber Security Day Presentation

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Driving Behavioral Change for Information Management through Data-Driven Gree...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

The 7 Things I Know About Cyber Security After 25 Years | April 2024

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Apidays New York 2024 - The value of a flexible API Management solution for O...

Strategies for Landing an Oracle DBA Job as a Fresher

Final Project Transciption Factor DNA binding Prediction

1. Transcription Factor-DNA binding prediction Tahmina Ahmed Prosunjit Biswas Iffat Sharmin Chowdhury Badri Sampath 1

2. Motivation • Label the unlabeled DNA sequences by the model, built by examining the labeled DNA sequences and be able to perceive some real world Machine Learning problems. 2

3. Approaches • K-mer based Fixed length K-mer K-mer with Mismatches Using Regular Expression • PWM based MEME and MAST • Combined Model Unite both model 3

4. K-mer Approach Based on Regular Expression Motivation 2-mer appears mostly in the sequences. So, emphasize mostly on 2-mer. Strategy - For any two 2-mers X & Y, generate regular expression X(.*)Y and Y(.*)X. - Use these Regular expression as candidate attribute.

5. Classifier Selection Fig : Around 9 classifiers applied on TF data set Algorithms are numbered as follows - (1)Logistic (2)SMO (3)NaiveBayes (4)BayesianLogisticRegression (5)Kstar (6)Bagging 7)LogitBoost (8)RandomForest (9)J48 Summary - * 9 classifiers are applied on 10 data set. 3 are shown among them * choosing an absolute classifier is not a trivial task * same classifier behaves differently on different data sets 5

6. Change in Accuracy due to Different Classifiers Logistic J48 RandomForest NaiveBayes Logistic J48 RandomForest NaiveBayes Fig : The performance of different types of Classifiers on TF_3 data set Fig : The performance of different types of Classifiers on TF_5 data set Summary - * classifiers have great consequences on accuracy * one has to be prudent when choosing classifiers 6

7. Change in Accuracy due to Different K-mer Length 4-mer 5-mer 6-mer Fig : The performance of different length K-mer on TF_3 data set Summary - * K-mer length also has consequences on accuracy * not trivial, difficult to find the absolute one 7

8. Attribute Space Selection Fig : The performance of different selecting k-mer on TF_4 data set Summary - * considering number of attributes also has consequences on accuracy * accuracy increases if we consider greater number of attributes, but from such saturation point it decreases. 8

9. PWM based Analysis on Accuracy (TF_1 data set) Fig : J48, minW 6 - maxW 15, no. of sites 10 Fig : J48, minW 6 – maxW 15, no. of motifs 5 Summary - * accuracy increases when we have more motifs but fixed no. of sites * accuracy increases when we have more sites but fixed no. of motifs * what happened when we increases both ????? 9

10. PWM based Analysis Fig : Accuracy vary on no. of motifs and no. of sites * 1st bar concern with no. of sites * 2nd bar concern with no. of motifs * 3rd bar concern with accuracy * the point is that accuracy decreases when we increases no. of motifs and no. of sites.

11. Extra Work for TF_20 Sequences identified by both model K-mer The New Model + for TF-20 Pwm Sequences Biased 2- Newly identified mer Model Labeled differently Sequences Fig : Flow diagram of Building New Model for TF-20 Summary - * we have done some extra work for TF_20

12. AUC based on the Feedback (bonus model) Fig : AUC of 10 data sets based on last submission * accuracy improved than first submission * PWM does not have pleasant result 12

13. Participation Background Working Working Paramete Automation Study with Tools with r Tuning Models Badri DNA,RNA, AlignAce, PWM K-mer Arff Writer, Sampath protein, MEME, Mast output motif MAST writer Iffat Protein, Weka, K-mer PWM Script for Sharmin Motif, AlignAce, FASTA, Chowdhury Transcriptio ScanAce Weka n Prosunjit DNA, MEME, K-mer PWM Script for Biswas Transcriptio MAST RE, for new nK-mer model Tahmina MEME, MEME, PWM K-mer Script for Ahmed MAST, MAST, MEME, PWM Weka MAST 13

14. Acknowledgment 14

15. Questions ???

Final Project Transciption Factor DNA binding Prediction

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Final Project Transciption Factor DNA binding Prediction

Ähnlich wie Final Project Transciption Factor DNA binding Prediction (20)

Mehr von UT, San Antonio

Mehr von UT, San Antonio (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Final Project Transciption Factor DNA binding Prediction