SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Using a Google Cloud virtual machine for
Prosit peptide MS/MS and retention time prediction
Tobias Kind
UC Davis Genome Center
2019
General steps and URLs
1) VM Setup
2) Utility installation
3) Prosit installation
4) Benchmarks
…estimated time for setup ca 20 min
• Prosit
https://github.com/kusterlab/prosit
• Prosit data files
https://figshare.com/projects/Prosit/35582
• Google Cloud Console
https://console.cloud.google.com
• Nvidia GPU Cloud Image
https://console.cloud.google.com/marketplace/details/nvidia-ngc-public/nvidia_gpu_cloud_image
VM setup
Visit the google cloud console (billing has to be enabled)
https://console.cloud.google.com
Search the NVIDIA GPU Cloud Image for Deep Learning, Data Science, and HPC
In the search bar type:
NVIDA GPU Cloud Image
Launch the NVIDIA VM on compute Engine
Select size of GPU and memory and GPU type
CPU and memory recommendations for Prosit
Tesla P100: $941.77 per month - Effective hourly rate $1.29
Tesla V100: $1,462.99 per month - Effective hourly rate $2.004
Source: https://www.xcelerit.com/computing-benchmarks/insights/benchmarks-deep-learning-nvidia-p100-vs-v100-gpu/
During the prediction phase Prosit does not use the GPU heavily,
it is therefore recommended to use the cheapest GPU option.
During training of new models the faster GPU is recommended.
CPU and memory recommendations for Prosit
MEM Rule of thumb: 100k CSV input files require 100 Gbyte RAM
CPU Rule of thumb: 8 CPU cores minimum 16 is better
During the prediction phase the initial calculation is single threaded
later processes use multiple threads. Benchmark if needed
Deploy the image after selecting proper memory and CPU options
Wait until the machine is fully deployed
(2 min)
Go back to side panel “Compute Engine” menu
Select Compute Engine
Go back to side panel “Compute Engine” menu
Open terminal via SSH
Finish setup
Finish setup by pressing Y
Finished installation
htop is for monitoring CPU use
nvidia-smi is for monitoring GPU use
mc (Midnight Commander) is for file based processes
They should be run in independent windows
Utility installations
Go back to console and open another ssh window then type: htop
Go back to console and open another ssh window the type: nvidia-smi -l
htop is for monitoring CPU use
nvidia-smi is for monitoring GPU use
mc (Midnight Commander) is for file based processes
They should be run in independent windows
If the GPU shows up we have successfully deployed the engine with CUDA 10.1
and Tensorflow and all other related dependencies
Install mc (Midnight Commander)
1) on console type: sudo apt install mc
2) on console type: sudo mc (files copied will have root permission)
Add user to the docker group
See: https://cloud.google.com/container-registry/docs/troubleshooting
1) run the following command in console: sudo usermod -a -G docker ${USER}
2) restart VM!
3) Check if docker is executable without sudo: docker ps
STOP
…
START
Prosit installation
Installation of Prosit from GitHub
1) Go to https://github.com/kusterlab/prosit
2) Copy: https://github.com/kusterlab/prosit.git
3) In google cloud type: git clone https://github.com/kusterlab/prosit.git
4) Check if Prosit is installed: ls –l
5) Type: cd prosit && ls –l && pwd
Upload of learning files from Figshare
1) Goto https://figshare.com/projects/Prosit/35582
and download the RT and MS/MS prediction files
a) RT models https://figshare.com/articles/Prosit_-_Model_-_iRT/6965801
b) MS/MS models: https://figshare.com/articles/Prosit_-_Model_-_Fragmentation/6965753
It is recommended to process and pack those locally and then move
a ZIP file to the cloud VM or follow the next slide to extract directly to VM
make sure config.yml and model.yml files + weight hdf5 files are there.
Load and install trained models for MS/MS from Figshare
On cloud command line type:
1) cd prosit
2) ls –l
3) mkdir prosit-msms
4) cd prosit-msms
5) wget https://ndownloader.figshare.com/files/13687205 -O msms.zip
6) unzip msms.zip
7) cp prosit1/* .
8) ls –l
9) Check the three files config.yml, mode.yml and weight_32_0.10211.hdf5
10) rm msms.zip
Load and install trained models for retention times (RT) from Figshare
On cloud command line type:
We need to go back to prosit main and so type: cd.. && pwd ls –l
1) mkdir prosit-iRT
2) cd prosit-iRT/
3) wget https://ndownloader.figshare.com/files/13698893 -O iRT.zip
4) unzip iRT.zip
5) cp model_irt_prediction/* .
6) ls –l
7) Check the three files config.yml, mode.yml and weight_66_0.00796.hdf5
8) If not rename model.yaml to model.yml with: mv model.yaml model.yml
9) rm iRT.zip
wrong!
rename to
model.yml
Run the docker Prosit installation
1) Change back to the prosit main directory (cd ..)
2) on command line type: make build
The process includes pulling the docker file from the Docker repository and
can take up to 5 minutes
Run the prosit example
Template
make server MODEL_SPECTRA=/home/user/prosit/prosit-msms/ MODEL_IRT=/home/user/prosit/prosit-iRT/
becomes in my case the user name is tkind
make server MODEL_SPECTRA=/home/tkind/prosit/prosit-msms/ MODEL_IRT=/home/tkind/prosit/prosit-iRT/
1) the make command requires the model files as absolute path
2) check your username and current directory
3) replace the user with your username on your VM save the
command for future use
current directory
(your) username
4) save the command for further use in run-prosit.sh using the nano editor
call nano on the commandline and then copy/paste your command: nano
5) change the file mode to executable : chmod +x run-prosit.sh
Start the Prosit server
1) use the newly created file: ./run-prosit.sh
2) or run the following command with your replaced user name and file location
make server MODEL_SPECTRA=/home/tkind/prosit/prosit-msms/ MODEL_IRT=/home/tkind/prosit/prosit-iRT/
3) Open another ssh terminal and execute jobs from there. The server window above
stays open as logging window and for viewing of potential errors
Run the Prosit example
1) in new ssh terminal type: cat README.md
2) then copy/paste:
curl -F "peptides=@examples/peptidelist.csv" http://127.0.0.1:5000/predict/generic
3) The server should receive the data and inform about the progress
4) The output window should list all the data
5) Prosit is ready to roll!
WARNING
Cloud VMs can accumulate costs very fast, it is imperative to stop the VM
in the cloud console. For the small p100 instance including traffic costs of
around $1.50 per hour ($36 per day) are incurred.
General use
General use and preparation of digested FASTApeptide data
using Encyclopedia
1) Download Encyclopedia (current version 9.0): encyclopedia-0.9.0-executable.jar
2) Link: https://bitbucket.org/searleb/encyclopedia/downloads/
3) Download FASTA files from Uniprot via: https://www.uniprot.org/proteomes/
4) Convert >> Create Prosit CSV from FASTA
Upload and download of own data
1) via the ssh google upload button
2) via the terminal upload download
3) other ways (Cloud storage, FileZilla)
Recommendation:
All data should be compressed
via zip and unzip
Link: https://www.cloudbooklet.com/6-ways-to-transfer-files-in-google-cloud-platform/
Benchmark of 100k CSV data set
(100,000 digested peptides)
curl -F "peptides=@examples/prosit-100k.csv" http://127.0.0.1:5000/predict/generic > "prosit-100k-out.csv"
Tesla P100 instance (8 cores CPU): 10 min
Tesla V100 instance (16 cores CPU): 14 min
Results inconclusive, probably due to single thread CPU speed. The V100 instance used
a lower clock speed Xeon @ 2.20 Ghz. Very little time is spent on the GPU for prediction.
File size and compute considerations
• A FASTA input file containing 40,000 proteins will be around 150 KByte large.
• The tryptic digest file with one voltage (z=2,3) will have 4 million lines and will be 100 kByte in size.
• The Prosit prediction file will have 130 million lines and a size of 18 Gbyte large.
• Around 100 GByte main memory are needed on the prosit server or the input file has to be split
• 100k tryptic digests are processed in 10 min; 1 million tryptic digests can be processed in 100 min
• 1 million tryptic digests will cost $2.25 on the cloud, easy to deploy, easy to scale
• A new local LINUX PC with fast CPU and 8 Gbyte GPU and 100 GByte RAM ~ $2,200
The end

Weitere ähnliche Inhalte

Was ist angesagt?

GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
Etsuji Nakai
 

Was ist angesagt? (20)

Linux Kernel Init Process
Linux Kernel Init ProcessLinux Kernel Init Process
Linux Kernel Init Process
 
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
Up and Running with Glusto & Glusto-Tests in 5 Minutes (or less)
 
Object Storage with Gluster
Object Storage with GlusterObject Storage with Gluster
Object Storage with Gluster
 
Linux Containers From Scratch: Makfile MicroVPS
Linux Containers From Scratch: Makfile MicroVPSLinux Containers From Scratch: Makfile MicroVPS
Linux Containers From Scratch: Makfile MicroVPS
 
Practical Glusto Example
Practical Glusto ExamplePractical Glusto Example
Practical Glusto Example
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFS
 
Advanced Namespaces and cgroups
Advanced Namespaces and cgroupsAdvanced Namespaces and cgroups
Advanced Namespaces and cgroups
 
testing-nfs
testing-nfstesting-nfs
testing-nfs
 
Kernel Recipes 2015 - Porting Linux to a new processor architecture
Kernel Recipes 2015 - Porting Linux to a new processor architectureKernel Recipes 2015 - Porting Linux to a new processor architecture
Kernel Recipes 2015 - Porting Linux to a new processor architecture
 
Nvmfs benchmark
Nvmfs benchmarkNvmfs benchmark
Nvmfs benchmark
 
DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2DOXLON November 2016: Facebook Engineering on cgroupv2
DOXLON November 2016: Facebook Engineering on cgroupv2
 
High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)
 
Docker Container: isolation and security
Docker Container: isolation and securityDocker Container: isolation and security
Docker Container: isolation and security
 
Storage based on_openstack_mariocho
Storage based on_openstack_mariochoStorage based on_openstack_mariocho
Storage based on_openstack_mariocho
 
Fun with FUSE
Fun with FUSEFun with FUSE
Fun with FUSE
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
 
Kernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power ManagementKernel Recipes 2015: Introduction to Kernel Power Management
Kernel Recipes 2015: Introduction to Kernel Power Management
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
bcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challengesbcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challenges
 
Containers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux KernelContainers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux Kernel
 

Ähnlich wie Prosit google-cloud

CloudMan workshop
CloudMan workshopCloudMan workshop
CloudMan workshop
Enis Afgan
 
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
 Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2   Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
Adil Khan
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questions
Teja Bheemanapally
 

Ähnlich wie Prosit google-cloud (20)

Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]Cloud init and cloud provisioning [openstack summit vancouver]
Cloud init and cloud provisioning [openstack summit vancouver]
 
Install websphere message broker 8 RHEL 6 64 bits
Install websphere message broker 8 RHEL 6 64 bitsInstall websphere message broker 8 RHEL 6 64 bits
Install websphere message broker 8 RHEL 6 64 bits
 
linux installation.pdf
linux installation.pdflinux installation.pdf
linux installation.pdf
 
Howto Pxeboot
Howto PxebootHowto Pxeboot
Howto Pxeboot
 
CloudStack and cloud-init
CloudStack and cloud-initCloudStack and cloud-init
CloudStack and cloud-init
 
Drupal, Memcache and Solr on Windows
Drupal, Memcache and Solr on WindowsDrupal, Memcache and Solr on Windows
Drupal, Memcache and Solr on Windows
 
CloudMan workshop
CloudMan workshopCloudMan workshop
CloudMan workshop
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...
 
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
 Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2   Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
Recipe to build open splice dds 6.3.xxx Hello World example over Qt 5.2
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questions
 
R server and spark
R server and sparkR server and spark
R server and spark
 
Front end development gurant
Front end development gurantFront end development gurant
Front end development gurant
 
Elastic101tutorial Percona Live Europe 2018
Elastic101tutorial Percona Live Europe 2018Elastic101tutorial Percona Live Europe 2018
Elastic101tutorial Percona Live Europe 2018
 
Elastic 101 tutorial - Percona Europe 2018
Elastic 101 tutorial - Percona Europe 2018 Elastic 101 tutorial - Percona Europe 2018
Elastic 101 tutorial - Percona Europe 2018
 
Lecture 4 Cluster Computing
Lecture 4 Cluster ComputingLecture 4 Cluster Computing
Lecture 4 Cluster Computing
 
How to manage Azure with open source
How to manage Azure with open sourceHow to manage Azure with open source
How to manage Azure with open source
 
How to manage Microsoft Azure with open source
How to manage Microsoft Azure with open sourceHow to manage Microsoft Azure with open source
How to manage Microsoft Azure with open source
 
One-Man Ops
One-Man OpsOne-Man Ops
One-Man Ops
 
Linux Security and How Web Browser Sandboxes Really Work (NDC Oslo 2017)
Linux Security  and How Web Browser Sandboxes Really Work (NDC Oslo 2017)Linux Security  and How Web Browser Sandboxes Really Work (NDC Oslo 2017)
Linux Security and How Web Browser Sandboxes Really Work (NDC Oslo 2017)
 
Enhancing and Preparing TIMES for High Performance Computing
Enhancing and Preparing TIMES for High Performance ComputingEnhancing and Preparing TIMES for High Performance Computing
Enhancing and Preparing TIMES for High Performance Computing
 

Mehr von UC Davis

Mehr von UC Davis (10)

Presentation phinney abrf 2019
Presentation phinney abrf 2019Presentation phinney abrf 2019
Presentation phinney abrf 2019
 
Phinney 2019 ASMS Proteome software Users group Talk
Phinney 2019 ASMS Proteome software Users group TalkPhinney 2019 ASMS Proteome software Users group Talk
Phinney 2019 ASMS Proteome software Users group Talk
 
Genome web july 2019 presentation phinney
Genome web july 2019 presentation phinneyGenome web july 2019 presentation phinney
Genome web july 2019 presentation phinney
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysis
 
Gene Ontology Network Enrichment Analysis
Gene Ontology Network Enrichment AnalysisGene Ontology Network Enrichment Analysis
Gene Ontology Network Enrichment Analysis
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Asms qc Will Thompson Duke
Asms qc Will Thompson DukeAsms qc Will Thompson Duke
Asms qc Will Thompson Duke
 
Phinney varibility workshop
Phinney varibility workshopPhinney varibility workshop
Phinney varibility workshop
 
Colangelo asms workshop_061714
Colangelo asms workshop_061714Colangelo asms workshop_061714
Colangelo asms workshop_061714
 
Moeller proteomics course
Moeller proteomics courseMoeller proteomics course
Moeller proteomics course
 

Kürzlich hochgeladen

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Kürzlich hochgeladen (20)

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 

Prosit google-cloud

  • 1. Using a Google Cloud virtual machine for Prosit peptide MS/MS and retention time prediction Tobias Kind UC Davis Genome Center 2019
  • 2. General steps and URLs 1) VM Setup 2) Utility installation 3) Prosit installation 4) Benchmarks …estimated time for setup ca 20 min • Prosit https://github.com/kusterlab/prosit • Prosit data files https://figshare.com/projects/Prosit/35582 • Google Cloud Console https://console.cloud.google.com • Nvidia GPU Cloud Image https://console.cloud.google.com/marketplace/details/nvidia-ngc-public/nvidia_gpu_cloud_image
  • 4. Visit the google cloud console (billing has to be enabled) https://console.cloud.google.com
  • 5. Search the NVIDIA GPU Cloud Image for Deep Learning, Data Science, and HPC In the search bar type: NVIDA GPU Cloud Image
  • 6. Launch the NVIDIA VM on compute Engine
  • 7. Select size of GPU and memory and GPU type
  • 8. CPU and memory recommendations for Prosit Tesla P100: $941.77 per month - Effective hourly rate $1.29 Tesla V100: $1,462.99 per month - Effective hourly rate $2.004 Source: https://www.xcelerit.com/computing-benchmarks/insights/benchmarks-deep-learning-nvidia-p100-vs-v100-gpu/ During the prediction phase Prosit does not use the GPU heavily, it is therefore recommended to use the cheapest GPU option. During training of new models the faster GPU is recommended.
  • 9. CPU and memory recommendations for Prosit MEM Rule of thumb: 100k CSV input files require 100 Gbyte RAM CPU Rule of thumb: 8 CPU cores minimum 16 is better During the prediction phase the initial calculation is single threaded later processes use multiple threads. Benchmark if needed
  • 10. Deploy the image after selecting proper memory and CPU options
  • 11. Wait until the machine is fully deployed (2 min)
  • 12. Go back to side panel “Compute Engine” menu Select Compute Engine
  • 13. Go back to side panel “Compute Engine” menu Open terminal via SSH
  • 14. Finish setup Finish setup by pressing Y
  • 16. htop is for monitoring CPU use nvidia-smi is for monitoring GPU use mc (Midnight Commander) is for file based processes They should be run in independent windows Utility installations
  • 17. Go back to console and open another ssh window then type: htop
  • 18. Go back to console and open another ssh window the type: nvidia-smi -l htop is for monitoring CPU use nvidia-smi is for monitoring GPU use mc (Midnight Commander) is for file based processes They should be run in independent windows If the GPU shows up we have successfully deployed the engine with CUDA 10.1 and Tensorflow and all other related dependencies
  • 19. Install mc (Midnight Commander) 1) on console type: sudo apt install mc 2) on console type: sudo mc (files copied will have root permission)
  • 20. Add user to the docker group See: https://cloud.google.com/container-registry/docs/troubleshooting 1) run the following command in console: sudo usermod -a -G docker ${USER} 2) restart VM! 3) Check if docker is executable without sudo: docker ps STOP … START
  • 22. Installation of Prosit from GitHub 1) Go to https://github.com/kusterlab/prosit 2) Copy: https://github.com/kusterlab/prosit.git 3) In google cloud type: git clone https://github.com/kusterlab/prosit.git 4) Check if Prosit is installed: ls –l 5) Type: cd prosit && ls –l && pwd
  • 23. Upload of learning files from Figshare 1) Goto https://figshare.com/projects/Prosit/35582 and download the RT and MS/MS prediction files a) RT models https://figshare.com/articles/Prosit_-_Model_-_iRT/6965801 b) MS/MS models: https://figshare.com/articles/Prosit_-_Model_-_Fragmentation/6965753 It is recommended to process and pack those locally and then move a ZIP file to the cloud VM or follow the next slide to extract directly to VM make sure config.yml and model.yml files + weight hdf5 files are there.
  • 24. Load and install trained models for MS/MS from Figshare On cloud command line type: 1) cd prosit 2) ls –l 3) mkdir prosit-msms 4) cd prosit-msms 5) wget https://ndownloader.figshare.com/files/13687205 -O msms.zip 6) unzip msms.zip 7) cp prosit1/* . 8) ls –l 9) Check the three files config.yml, mode.yml and weight_32_0.10211.hdf5 10) rm msms.zip
  • 25. Load and install trained models for retention times (RT) from Figshare On cloud command line type: We need to go back to prosit main and so type: cd.. && pwd ls –l 1) mkdir prosit-iRT 2) cd prosit-iRT/ 3) wget https://ndownloader.figshare.com/files/13698893 -O iRT.zip 4) unzip iRT.zip 5) cp model_irt_prediction/* . 6) ls –l 7) Check the three files config.yml, mode.yml and weight_66_0.00796.hdf5 8) If not rename model.yaml to model.yml with: mv model.yaml model.yml 9) rm iRT.zip wrong! rename to model.yml
  • 26. Run the docker Prosit installation 1) Change back to the prosit main directory (cd ..) 2) on command line type: make build The process includes pulling the docker file from the Docker repository and can take up to 5 minutes
  • 27. Run the prosit example Template make server MODEL_SPECTRA=/home/user/prosit/prosit-msms/ MODEL_IRT=/home/user/prosit/prosit-iRT/ becomes in my case the user name is tkind make server MODEL_SPECTRA=/home/tkind/prosit/prosit-msms/ MODEL_IRT=/home/tkind/prosit/prosit-iRT/ 1) the make command requires the model files as absolute path 2) check your username and current directory 3) replace the user with your username on your VM save the command for future use current directory (your) username 4) save the command for further use in run-prosit.sh using the nano editor call nano on the commandline and then copy/paste your command: nano 5) change the file mode to executable : chmod +x run-prosit.sh
  • 28. Start the Prosit server 1) use the newly created file: ./run-prosit.sh 2) or run the following command with your replaced user name and file location make server MODEL_SPECTRA=/home/tkind/prosit/prosit-msms/ MODEL_IRT=/home/tkind/prosit/prosit-iRT/ 3) Open another ssh terminal and execute jobs from there. The server window above stays open as logging window and for viewing of potential errors
  • 29. Run the Prosit example 1) in new ssh terminal type: cat README.md 2) then copy/paste: curl -F "peptides=@examples/peptidelist.csv" http://127.0.0.1:5000/predict/generic 3) The server should receive the data and inform about the progress 4) The output window should list all the data 5) Prosit is ready to roll!
  • 30. WARNING Cloud VMs can accumulate costs very fast, it is imperative to stop the VM in the cloud console. For the small p100 instance including traffic costs of around $1.50 per hour ($36 per day) are incurred.
  • 32. General use and preparation of digested FASTApeptide data using Encyclopedia 1) Download Encyclopedia (current version 9.0): encyclopedia-0.9.0-executable.jar 2) Link: https://bitbucket.org/searleb/encyclopedia/downloads/ 3) Download FASTA files from Uniprot via: https://www.uniprot.org/proteomes/ 4) Convert >> Create Prosit CSV from FASTA
  • 33. Upload and download of own data 1) via the ssh google upload button 2) via the terminal upload download 3) other ways (Cloud storage, FileZilla) Recommendation: All data should be compressed via zip and unzip Link: https://www.cloudbooklet.com/6-ways-to-transfer-files-in-google-cloud-platform/
  • 34. Benchmark of 100k CSV data set (100,000 digested peptides) curl -F "peptides=@examples/prosit-100k.csv" http://127.0.0.1:5000/predict/generic > "prosit-100k-out.csv" Tesla P100 instance (8 cores CPU): 10 min Tesla V100 instance (16 cores CPU): 14 min Results inconclusive, probably due to single thread CPU speed. The V100 instance used a lower clock speed Xeon @ 2.20 Ghz. Very little time is spent on the GPU for prediction.
  • 35. File size and compute considerations • A FASTA input file containing 40,000 proteins will be around 150 KByte large. • The tryptic digest file with one voltage (z=2,3) will have 4 million lines and will be 100 kByte in size. • The Prosit prediction file will have 130 million lines and a size of 18 Gbyte large. • Around 100 GByte main memory are needed on the prosit server or the input file has to be split • 100k tryptic digests are processed in 10 min; 1 million tryptic digests can be processed in 100 min • 1 million tryptic digests will cost $2.25 on the cloud, easy to deploy, easy to scale • A new local LINUX PC with fast CPU and 8 Gbyte GPU and 100 GByte RAM ~ $2,200