SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
File handling
     Karin Lagesen

karin.lagesen@bio.uio.no
Homework
●   ATCurve.py
      ●   take an input string from the user
      ●   check if the sequence only contains DNA – if
          not, prompt for new sequence.
      ●   calculate a running average of AT content
          along the sequence. Window size should be
          3, and the step size should be 1. Print one
          value per line.
●   Note: you need to include several runtime
    examples to show that all parts of the code
    works.
ATCurve.py - thinking
●   Take input from user:
     ●   raw_input
●   Check for the presence of !ATCG
     ●   use sets – very easy
●   Calculate AT – window = 3, step = 1
     ●   iterate over string in slices of three
ATCurve.py
# variable valid is used to see if the string is ok or not.
valid = False
while not valid:
   # promt user for input using raw_input() and store in string,
   # convert all characters into uppercase
   test_string = raw_input("Enter string: ")
   upper_string = test_string.upper()

  # Figure out if anything else than ATGCs are present
  dnaset = set(list("ATGC"))
  upper_string_set = set(list(upper_string))

  if len(upper_string_set - dnaset) > 0:
         print "Non-DNA present in your string, try again"
  else:
         valid = True



if valid:
    for i in range(0, len(upper_string)-3, 1):
       at_sum = 0.0
        at_sum += upper_string.count("A",i,i+2)
        at_sum += upper_string.count("T",i,i+2)
Homework
●   CodonFrequency.py
     ●   take an input string from the user
     ●   if the sequence only contains DNA
           –   find a start codon in your string
           –   if startcodon is present
                  ●   count the occurrences of each three-mer from start
                      codon and onwards
                  ●   print the results
CodonFrequency.py - thinking
●   First part – same as earlier
●   Find start codon: locate index of AUG
      ●   Note, can simplify and find ATG
●   If start codon is found:
      ●   create dictionary
      ●   for slice of three in input[StartCodon:]:
            –   get codon
            –   if codon is in dict:
                    ●   add to count
            –   if not:
                    ●   create key-value pair in dict
CodonFrequency.py
input = raw_input("Type a piece of DNA here: ")

if len(set(input) - set(list("ATGC"))) > 0:
    print "Not a valid DNA sequence"
else:
    atg = input.find("ATG")
    if atg == -1:
        print "Start codon not found"
    else:
        codondict = {}
        for i in xrange(atg,len(input)-3,3):
           codon = input[i:i+3]
           if codon not in codondict:
               codondict[codon] = 1
           else:
               codondict[codon] +=1

     for codon in codondict:
        print codon, codondict[codon]
CodonFrequency.py w/
     stopcodon
input = raw_input("Type a piece of DNA here: ")

if len(set(input) - set(list("ATGC"))) > 0:
    print "Not a valid DNA sequence"
else:
    atg = input.find("ATG")
    if atg == -1:
        print "Start codon not found"
    else:
        codondict = {}
        for i in xrange(atg,len(input) -3,3):
           codon = input[i:i+3]
           if codon in ['UAG', 'UAA', 'UAG']:
               break
           elif codon not in codondict:
               codondict[codon] = 1
           else:
               codondict[codon] +=1

     for codon in codondict:
        print codon, codondict[codon]
Results

[karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py
Type a piece of DNA here: ATGATTATTTAAATG
ATG 1
ATT 2
TAA 1
[karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py
Type a piece of DNA here: ATGATTATTTAAATGT
ATG 2
ATT 2
TAA 1
[karinlag@freebee]/projects/temporary/cees-python-course/Karin%
Working with files
●   Reading – get info into your program
●   Parsing – processing file contents
●   Writing – get info out of your program
Reading and writing
●   Three-step process
     ●   Open file
           –   create file handle – reference to file
     ●   Read or write to file
     ●   Close file
           –   will be automatically close on program end, but
               bad form to not close
Opening files
●   Opening modes:
     ●   “r” - read file
     ●   “w” - write file
     ●   “a” - append to end of file
●   fh = open(“filename”, “mode”)
●   fh = filehandle, reference to a file, NOT the
    file itself
Reading a file
●   Three ways to read
     ●   read([n]) - n = bytes to read, default is all
     ●   readline() - read one line, incl. newline
     ●   readlines() - read file into a list, one element
         per line, including newline
Reading example
●   Log on to freebee, and go to your area
●   do cp ../Karin/fastafile.fsa .
●   open python
       >>> fh = open("fastafile.fsa", "r")
       >>> fh



●   Q: what does the response mean?
Read example
●   Use all three methods to read the file. Print
    the results.
     ●   read
     ●   readlines
     ●   readline
●   Q: what happens after you have read the
    file?
●   Q: What is the difference between the
    three?
Read example
>>> fh = open("fastafile.fsa", "r")
>>> withread = fh.read()
>>> withread
'>This is the description linenATGCGCTTAGGATCGATAGCGATTTAGAnTTAGCGGAn'
>>> withreadlines = fh.readlines()
>>> withreadlines
[]
>>> fh = open("fastafile.fsa", "r")
>>> withreadlines = fh.readlines()
>>> withreadlines
['>This is the description linen', 'ATGCGCTTAGGATCGATAGCGATTTAGAn', 'TTAGCGGAn']
>>> fh = open("fastafile.fsa", "r")
>>> withreadline = fh.readline()
>>> withreadline
'>This is the description linen'
>>>
Parsing
●   Getting information out of a file
●   Commonly used string methods
      ●   split([character]) – default is whitespace
      ●   replace(“in string”, “put into instead”)
      ●   “string character”.join(list)
            –   joins all elements in the list with string
                character as a separator
            –   common construction: ''.join(list)
      ●   slicing
Type conversions
●   Everything that comes on the command
    line or from a file is a string
●   Conversions:
     ●   int(X)
           –   string cannot have decimals
           –   floats will be floored
     ●   float(X)
     ●   str(X)
Parsing example
●   Continue using fastafile.fsa
●   Print only the description line to screen
●   Print the whole DNA string
    >>> fh = open("fastafile.fsa", "r")
    >>> firstline = fh.readline()
    >>> print firstline[1:-1]
    This is the description line
    >>> sequence = ''
    >>> for line in fh:
    ... sequence += line.replace("n", "")
    ...
    >>> print sequence
    ATGCGCTTAGGATCGATAGCGATTTAGA
    >>>
Accepting input from
             command line
●   Need to be able to specify file name on
    command line
●   Command line parameters stored in list
    called sys.argv – program name is 0
●   Usage:
      ●   python pythonscript.py arg1 arg2 arg3....
●   In script:
      ●   at the top of the file, write import sys
      ●
          arg1 = sys.argv[1]
Batch example
●   Read fastafile.fsa with all three methods
●   Per method, print method, name and
    sequence
●   Remember to close the file at the end!
Batch example
import sys
filename = sys.argv[1]
#using readline
fh = open(filename, "r")
firstline = fh.readline()
name = firstline[1:-1]
sequence =''
for line in fh:
    sequence += line.replace("n", "")
print "Readline", name, sequence

#using readlines()
fh = open(filename, "r")
inputlines = fh.readlines()
name = inputlines[0][1:-1]
sequence = ''
for line in inputlines[1:]:
   sequence += line.replace("n", "")
print "Readlines", name, sequence

#using read
fh = open(filename, "r")
inputlines = fh.read()
name = inputlines.split("n")[0][1:-1]
sequence = "".join(inputlines.split("n")[1:])
print "Read", name, sequence

fh.close()
Classroom exercise
●   Modify ATCurve.py script so that it accepts
    the following input on the command line:
      ●   fasta filename
      ●   window size
●   Let the user input an alternate filename if it
    contains !ATGC
●   Print results to screen
ATCurve2.py
import sys
# Define filename
filename = sys.argv[1]
windowsize = int(sys.argv[2])

# variable valid is used to see if the string is ok or not.
valid = False
while not valid:
   fh = open(filename, "r")
   inputlines = fh.readlines()
   name = inputlines[0][1:-1]
   sequence = ''
   for line in inputlines[1:]:
          sequence += line.replace("n", "")
   upper_string = sequence.upper()

  # Figure out if anything else than ATGCs are present
  dnaset = set(list("ATGC"))
  upper_string_set = set(list(upper_string))

  if len(upper_string_set - dnaset) > 0:
        print "Non-DNA present in your file, try again"
        filename = raw_input("Type in filename: ")
  else:
        valid = True

if valid:
    for i in range(0, len(upper_string)-windowsize + 1, 1):
       at_sum = 0.0
       at_sum += upper_string.count("A",i,i+windowsize)
       at_sum += upper_string.count("T",i,i+windowsize)
       print i + 1, at_sum/windowsize
Writing to files
●   Similar procedure as for read
     ●   Open file, mode is “w” or “a”
     ●   fh.write(string)
           –   Note: one single string
           –   No newlines are added
     ●   fh.close()
ATContent3.py
●   Modify previous script so that you have the
    following on the command line
     ●   fasta filename for input file
     ●   window size
     ●   output file
●   Output should be on the format
     ●   number, AT content
     ●   number is the 1-based position of the first
         nucleotide in the window
ATCurve3.py

 import sys
 # Define filename
 filename = sys.argv[1]
 windowsize = int(sys.argv[2])
 outputfile = sys.argv[3]



if valid:
    fh = open(outputfile, "w")
    for i in range(0, len(upper_string)-windowsize + 1, 1):
       at_sum = 0.0
       at_sum += upper_string.count("A",i,i+windowsize)
       at_sum += upper_string.count("T",i,i+windowsize)
       fh.write(str(i + 1) + " " + str(at_sum/windowsize) + "n")
    fh.close()
Homework:
            TranslateProtein.py
●   Input files are in
    /projects/temporary/cees-python-course/Karin
      ●   translationtable.txt - tab separated
      ●   dna31.fsa
●   Script should:
      ●   Open the translationtable.txt file and read it into a
          dictionary
      ●   Open the dna31.fsa file and read the contents.
      ●   Translates the DNA into protein using the dictionary
      ●   Prints the translation in a fasta format to the file
          TranslateProtein.fsa. Each protein line should be 60
          characters long.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Functions in python
Functions in pythonFunctions in python
Functions in python
 
Programming in Python
Programming in Python Programming in Python
Programming in Python
 
python codes
python codespython codes
python codes
 
Python
PythonPython
Python
 
Python programming
Python  programmingPython  programming
Python programming
 
Biopython
BiopythonBiopython
Biopython
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python Programming
 
Python ppt
Python pptPython ppt
Python ppt
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Python programming Workshop SITTTR - Kalamassery
Python programming Workshop SITTTR - KalamasseryPython programming Workshop SITTTR - Kalamassery
Python programming Workshop SITTTR - Kalamassery
 
Intro to Functions Python
Intro to Functions PythonIntro to Functions Python
Intro to Functions Python
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Python basic
Python basicPython basic
Python basic
 
Python for Beginners(v1)
Python for Beginners(v1)Python for Beginners(v1)
Python for Beginners(v1)
 
Pythonppt28 11-18
Pythonppt28 11-18Pythonppt28 11-18
Pythonppt28 11-18
 
Python basics
Python basicsPython basics
Python basics
 
4 b file-io-if-then-else
4 b file-io-if-then-else4 b file-io-if-then-else
4 b file-io-if-then-else
 
Python Basics
Python BasicsPython Basics
Python Basics
 
Python programing
Python programingPython programing
Python programing
 
Iteration
IterationIteration
Iteration
 

Andere mochten auch

Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FPOrganización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FPMónica Diz Besada
 
Sesión informativa 1º PCPI 2014
Sesión informativa 1º PCPI 2014Sesión informativa 1º PCPI 2014
Sesión informativa 1º PCPI 2014Mónica Diz Besada
 

Andere mochten auch (6)

Charla orientación 4º eso
Charla orientación 4º esoCharla orientación 4º eso
Charla orientación 4º eso
 
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FPOrganización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
Organización da PAU 2014 en Galicia, preinscripción e matrícula no SUG e na FP
 
Sesión informativa 1º PCPI 2014
Sesión informativa 1º PCPI 2014Sesión informativa 1º PCPI 2014
Sesión informativa 1º PCPI 2014
 
Charla orientación 4º eso
Charla orientación 4º esoCharla orientación 4º eso
Charla orientación 4º eso
 
Presentation1
Presentation1Presentation1
Presentation1
 
2015 12-09 nmdd
2015 12-09 nmdd2015 12-09 nmdd
2015 12-09 nmdd
 

Ähnlich wie Day3

Ähnlich wie Day3 (20)

iPython
iPythoniPython
iPython
 
L8 file
L8 fileL8 file
L8 file
 
Introduction To Programming with Python
Introduction To Programming with PythonIntroduction To Programming with Python
Introduction To Programming with Python
 
Productive bash
Productive bashProductive bash
Productive bash
 
Five
FiveFive
Five
 
Python 101
Python 101Python 101
Python 101
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
File management
File managementFile management
File management
 
System Calls.pptxnsjsnssbhsbbebdbdbshshsbshsbbs
System Calls.pptxnsjsnssbhsbbebdbdbshshsbshsbbsSystem Calls.pptxnsjsnssbhsbbebdbdbshshsbshsbbs
System Calls.pptxnsjsnssbhsbbebdbdbshshsbshsbbs
 
file.ppt
file.pptfile.ppt
file.ppt
 
shellScriptAlt.pptx
shellScriptAlt.pptxshellScriptAlt.pptx
shellScriptAlt.pptx
 
Python overview
Python   overviewPython   overview
Python overview
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
Introduction to Python for Bioinformatics
Introduction to Python for BioinformaticsIntroduction to Python for Bioinformatics
Introduction to Python for Bioinformatics
 
Program 1 (Practicing an example of function using call by referenc.pdf
Program 1 (Practicing an example of function using call by referenc.pdfProgram 1 (Practicing an example of function using call by referenc.pdf
Program 1 (Practicing an example of function using call by referenc.pdf
 

Kürzlich hochgeladen

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 

Kürzlich hochgeladen (20)

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 

Day3

  • 1. File handling Karin Lagesen karin.lagesen@bio.uio.no
  • 2. Homework ● ATCurve.py ● take an input string from the user ● check if the sequence only contains DNA – if not, prompt for new sequence. ● calculate a running average of AT content along the sequence. Window size should be 3, and the step size should be 1. Print one value per line. ● Note: you need to include several runtime examples to show that all parts of the code works.
  • 3. ATCurve.py - thinking ● Take input from user: ● raw_input ● Check for the presence of !ATCG ● use sets – very easy ● Calculate AT – window = 3, step = 1 ● iterate over string in slices of three
  • 4. ATCurve.py # variable valid is used to see if the string is ok or not. valid = False while not valid: # promt user for input using raw_input() and store in string, # convert all characters into uppercase test_string = raw_input("Enter string: ") upper_string = test_string.upper() # Figure out if anything else than ATGCs are present dnaset = set(list("ATGC")) upper_string_set = set(list(upper_string)) if len(upper_string_set - dnaset) > 0: print "Non-DNA present in your string, try again" else: valid = True if valid: for i in range(0, len(upper_string)-3, 1): at_sum = 0.0 at_sum += upper_string.count("A",i,i+2) at_sum += upper_string.count("T",i,i+2)
  • 5. Homework ● CodonFrequency.py ● take an input string from the user ● if the sequence only contains DNA – find a start codon in your string – if startcodon is present ● count the occurrences of each three-mer from start codon and onwards ● print the results
  • 6. CodonFrequency.py - thinking ● First part – same as earlier ● Find start codon: locate index of AUG ● Note, can simplify and find ATG ● If start codon is found: ● create dictionary ● for slice of three in input[StartCodon:]: – get codon – if codon is in dict: ● add to count – if not: ● create key-value pair in dict
  • 7. CodonFrequency.py input = raw_input("Type a piece of DNA here: ") if len(set(input) - set(list("ATGC"))) > 0: print "Not a valid DNA sequence" else: atg = input.find("ATG") if atg == -1: print "Start codon not found" else: codondict = {} for i in xrange(atg,len(input)-3,3): codon = input[i:i+3] if codon not in codondict: codondict[codon] = 1 else: codondict[codon] +=1 for codon in codondict: print codon, codondict[codon]
  • 8. CodonFrequency.py w/ stopcodon input = raw_input("Type a piece of DNA here: ") if len(set(input) - set(list("ATGC"))) > 0: print "Not a valid DNA sequence" else: atg = input.find("ATG") if atg == -1: print "Start codon not found" else: codondict = {} for i in xrange(atg,len(input) -3,3): codon = input[i:i+3] if codon in ['UAG', 'UAA', 'UAG']: break elif codon not in codondict: codondict[codon] = 1 else: codondict[codon] +=1 for codon in codondict: print codon, codondict[codon]
  • 9. Results [karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py Type a piece of DNA here: ATGATTATTTAAATG ATG 1 ATT 2 TAA 1 [karinlag@freebee]/projects/temporary/cees-python-course/Karin% python CodonFrequency2.py Type a piece of DNA here: ATGATTATTTAAATGT ATG 2 ATT 2 TAA 1 [karinlag@freebee]/projects/temporary/cees-python-course/Karin%
  • 10. Working with files ● Reading – get info into your program ● Parsing – processing file contents ● Writing – get info out of your program
  • 11. Reading and writing ● Three-step process ● Open file – create file handle – reference to file ● Read or write to file ● Close file – will be automatically close on program end, but bad form to not close
  • 12. Opening files ● Opening modes: ● “r” - read file ● “w” - write file ● “a” - append to end of file ● fh = open(“filename”, “mode”) ● fh = filehandle, reference to a file, NOT the file itself
  • 13. Reading a file ● Three ways to read ● read([n]) - n = bytes to read, default is all ● readline() - read one line, incl. newline ● readlines() - read file into a list, one element per line, including newline
  • 14. Reading example ● Log on to freebee, and go to your area ● do cp ../Karin/fastafile.fsa . ● open python >>> fh = open("fastafile.fsa", "r") >>> fh ● Q: what does the response mean?
  • 15. Read example ● Use all three methods to read the file. Print the results. ● read ● readlines ● readline ● Q: what happens after you have read the file? ● Q: What is the difference between the three?
  • 16. Read example >>> fh = open("fastafile.fsa", "r") >>> withread = fh.read() >>> withread '>This is the description linenATGCGCTTAGGATCGATAGCGATTTAGAnTTAGCGGAn' >>> withreadlines = fh.readlines() >>> withreadlines [] >>> fh = open("fastafile.fsa", "r") >>> withreadlines = fh.readlines() >>> withreadlines ['>This is the description linen', 'ATGCGCTTAGGATCGATAGCGATTTAGAn', 'TTAGCGGAn'] >>> fh = open("fastafile.fsa", "r") >>> withreadline = fh.readline() >>> withreadline '>This is the description linen' >>>
  • 17. Parsing ● Getting information out of a file ● Commonly used string methods ● split([character]) – default is whitespace ● replace(“in string”, “put into instead”) ● “string character”.join(list) – joins all elements in the list with string character as a separator – common construction: ''.join(list) ● slicing
  • 18. Type conversions ● Everything that comes on the command line or from a file is a string ● Conversions: ● int(X) – string cannot have decimals – floats will be floored ● float(X) ● str(X)
  • 19. Parsing example ● Continue using fastafile.fsa ● Print only the description line to screen ● Print the whole DNA string >>> fh = open("fastafile.fsa", "r") >>> firstline = fh.readline() >>> print firstline[1:-1] This is the description line >>> sequence = '' >>> for line in fh: ... sequence += line.replace("n", "") ... >>> print sequence ATGCGCTTAGGATCGATAGCGATTTAGA >>>
  • 20. Accepting input from command line ● Need to be able to specify file name on command line ● Command line parameters stored in list called sys.argv – program name is 0 ● Usage: ● python pythonscript.py arg1 arg2 arg3.... ● In script: ● at the top of the file, write import sys ● arg1 = sys.argv[1]
  • 21. Batch example ● Read fastafile.fsa with all three methods ● Per method, print method, name and sequence ● Remember to close the file at the end!
  • 22. Batch example import sys filename = sys.argv[1] #using readline fh = open(filename, "r") firstline = fh.readline() name = firstline[1:-1] sequence ='' for line in fh: sequence += line.replace("n", "") print "Readline", name, sequence #using readlines() fh = open(filename, "r") inputlines = fh.readlines() name = inputlines[0][1:-1] sequence = '' for line in inputlines[1:]: sequence += line.replace("n", "") print "Readlines", name, sequence #using read fh = open(filename, "r") inputlines = fh.read() name = inputlines.split("n")[0][1:-1] sequence = "".join(inputlines.split("n")[1:]) print "Read", name, sequence fh.close()
  • 23. Classroom exercise ● Modify ATCurve.py script so that it accepts the following input on the command line: ● fasta filename ● window size ● Let the user input an alternate filename if it contains !ATGC ● Print results to screen
  • 24. ATCurve2.py import sys # Define filename filename = sys.argv[1] windowsize = int(sys.argv[2]) # variable valid is used to see if the string is ok or not. valid = False while not valid: fh = open(filename, "r") inputlines = fh.readlines() name = inputlines[0][1:-1] sequence = '' for line in inputlines[1:]: sequence += line.replace("n", "") upper_string = sequence.upper() # Figure out if anything else than ATGCs are present dnaset = set(list("ATGC")) upper_string_set = set(list(upper_string)) if len(upper_string_set - dnaset) > 0: print "Non-DNA present in your file, try again" filename = raw_input("Type in filename: ") else: valid = True if valid: for i in range(0, len(upper_string)-windowsize + 1, 1): at_sum = 0.0 at_sum += upper_string.count("A",i,i+windowsize) at_sum += upper_string.count("T",i,i+windowsize) print i + 1, at_sum/windowsize
  • 25. Writing to files ● Similar procedure as for read ● Open file, mode is “w” or “a” ● fh.write(string) – Note: one single string – No newlines are added ● fh.close()
  • 26. ATContent3.py ● Modify previous script so that you have the following on the command line ● fasta filename for input file ● window size ● output file ● Output should be on the format ● number, AT content ● number is the 1-based position of the first nucleotide in the window
  • 27. ATCurve3.py import sys # Define filename filename = sys.argv[1] windowsize = int(sys.argv[2]) outputfile = sys.argv[3] if valid: fh = open(outputfile, "w") for i in range(0, len(upper_string)-windowsize + 1, 1): at_sum = 0.0 at_sum += upper_string.count("A",i,i+windowsize) at_sum += upper_string.count("T",i,i+windowsize) fh.write(str(i + 1) + " " + str(at_sum/windowsize) + "n") fh.close()
  • 28. Homework: TranslateProtein.py ● Input files are in /projects/temporary/cees-python-course/Karin ● translationtable.txt - tab separated ● dna31.fsa ● Script should: ● Open the translationtable.txt file and read it into a dictionary ● Open the dna31.fsa file and read the contents. ● Translates the DNA into protein using the dictionary ● Prints the translation in a fasta format to the file TranslateProtein.fsa. Each protein line should be 60 characters long.