SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Programming for Evolutionary Biology
        March 17th - April 1st 2012
            Leipzig, Germany




Introduction to Unix systems
    Part 4 – awk & make
        Giovanni Marco Dall'Olio
        Universitat Pompeu Fabra
            Barcelona (Spain)
Schedule
   9.30 – 10.30: “What is Unix?”
   11.00 – 12.00: Introducing the terminal
   13:30 – 16:30: Grep, & Unix philosophy
   17:00 – 18:00: awk, make, and question time
awk
   awk is the Unix command to work with tabular 
    data
awk
   Things you can do with awk:
          Extract all the lines of a file that match a pattern, 
           and print only some of the columns (instead of 
           “grep | cut”)
          Add a prefix/suffix to all the element of a column 
           (instead of “cut | paste”)
          Sum values of different columns
awk and gawk
   Today we will use the GNU version of awk, 
    called gawk
Awk exercise
   Let's go to the “bed” directory
   cd leipzig_course/unix_intro/bed
The BED format
   The BED format is used to store annotations on 
    genomic coordinates
   Example:
          Annotate gene position
          Annotate Trascription Factor binding sites
          Annotate SNP genotypes
          Annotate Gene Expression
Example awk usage
   Select only the lines matching chromosome 7, 
    and print only the second column, summing 100 
    to it
   awk '$1 ~ chr7 {print $2+100}' annotations.bed
          $0 ~ AAC → select all the lines that contain AAC
          {print} → for each line that matches the previous 
           expression, print it
Example of BED file
Columns in a BED file:
   Chromosome
   Start
   End               chr7   127471196   127472363   gene1   0     +
                      chr7   127472363   127473530   gene2   2     +
   Label
                      chr7   127473530   127474697   gene3   20    +
   Score             chr7   127474697   127475864   gene4   3     +
                      chr7   127475864   127477031   gene5   100   -
   Strand (+ or ­)   chr7   127477031   127478198   gene6   3     -
                      chr7   127478198   127479365   gene7   5     -
   ….                chr7   127479365   127480532   gene8   1     +
                      chr7   127480532   127481699   gene9   3     -
Basic awk usage
   awk '<pattern to select lines> {instructions to be 
    executed on each line}' 
Selecting columns in awk
   In awk, each column can be accessed by 
    $<column­number> 
   For example, 
          $1 → first column of the file
          $2 → second column
          $NF → last column
   $0 matches all the columns of the file
Printing columns in awk
   The basic awk usage is to print specific columns
   The syntax is the following:
          awk '{print $1, $2, $3}' annotations.bed → prints 
           the first three columns
Printing columns in awk
   The basic awk usage is to print specific columns
   The syntax is the following:
          awk '{print $1, $2, $3}' annotations.bed → prints 
           the first three columns
          awk '{print $0}' annotations.bed → print all the 
           columns
Adding a prefix to a column
         with awk
    A common awk usage is to add a prefix or suffix 
     to all the entries of a column
    Example: 
           awk '{print $2 “my_prefix”$2}' annotations.bed
Summing columns in awk
   If two columns contain numeric values, we can 
    use awk to sum them
   Usage:
          awk '{print $2 + $3}' annotations.bed → sums 
           columns 2 and 3
Filter lines and select
        columns with awk
   Awk can apply a filter and print only the lines 
    matching a pattern
          Like grep, but more powerful
   Print all the lines that have “chr7” in their first 
    column:
          awk '$1 ~ “chr7” {print $0}' myfile.txt 
Advanced: redirecting to
         files
   The following will split the contents of 
    annotations.bed into two files, according to the 
    1st column:
   awk '{print $0 > $1".file"}' annotations.bed
Advanced: print something
  before reading the file
    The BEGIN statement can be used to execute 
     commands before reading the file:
    awk 'BEGIN {print "position"} {print $2}' 
     annotations.bed
More on awk
   awk is a complete programming language
   It is the equivalent of a spreadsheet for the 
    command line
   If you want to know more, check the book “Gawk 
    effective AWK Programming” at 
    http://www.gnu.org/software/gawk/manual
GNU/make
   make is a tool used by programmer to define how 
    software should be compiled and installed
   It is also used as a way to store a set of 
    commands, to recall them later
Simplest Makefile example
   The simplest Makefile contains just the name of a task 
    and the commands associated with it:




   print_hello is a makefile 'rule': it stores the commands 
    needed to say 'Hello, world!' to the screen.
Simplest Makefile example




                                             Makefile 
Target of                                    rule
the rule


                      Commands associated 
   This is a          with the rule
   tabulation (not 
   8 spaces)
Simplest Makefile example
    Create a file in your 
     computer and save it as 
     'Makefile'.
    Write these instructions in 
     it:

     print_hello:
         echo 'Hello, world!!'
                                     This is a tabulation 
                                     (<Tab> key)
    Then, open a terminal and 
     type:
      make ­f Makefile print_hello
Simplest Makefile example
Simplest Makefile example –

                  explanation




    When invoked, the program 'make' looks for a file in the 
     current directory called 'Makefile'
    When we type 'make print_hello', it executes any 
     procedure (target) called 'print_hello' in the makefile
    It then shows the commands executed and their output
A sligthly longer example
   You can add as many 
    commands you like 
    to a rule
   For example, this 
    'print_hello' rule 
    contains 5 commands
   Note: ignore the '@' 
    thing, it is only to 
    disable verbose mode 
    (explained later)
A more complex example
The target syntax
   The target of a rule can be either a title for the task, or a 
    file name.
   Everytime you call a make rule (example: 'make all'), the 
    program looks for a file called like the target name (e.g. 
    'all', 'clean', 'inputdata.txt', 'results.txt')
   The rule is executed only if that file doesn't exists.
Filename as target names
                     In this
                      makefile, we
                      have two rules:
                      'testfile.txt' and
                      'clean'
Filename as target names
                     In this
                      makefile, we
                      have two rules:
                      'testfile.txt' and
                      'clean'

                     When we call
                      'make
                      testfile.txt',
                      make checks if
                      a file called
                      'testfile.txt'
                      already exists.
Filename as target names




                  The commands
                  associated with the
                  rule 'testfile.txt' are
                  executed only if
                  that file doesn't
                  exists already
Real Makefile-rule syntax
   Complete syntax for a Makefile rule:
       <target>: <list of prerequisites>
          <commands associated to the rule>


   Example:
       result1.txt: data1.txt data2.txt
          cat data1.txt data2.txt > result1.txt
          @echo 'result1.txt' has been calculated'


   Prerequisites are files (or rules) that need to exists already 
    in order to create the target file.
   If 'data1.txt' and 'data2.txt' don't exist, the rule 'result1.txt' 
    will exit with an error (no rule to create them)
Piping Makefile rules
           together
   You can pipe two Makefile rules together by 
    defining prerequisites
Piping Makefile rules
           together
   The rule 'result1.txt' depends on the rule 
    'data1.txt', which should be executed first
Conditional execution by
   modification date
   We have seen how make can be used to create a 
    file, if it doesn't exists.

    file.txt:
        # if file.txt doesn't exists, then create it:
        echo 'contents of file.txt' > file.txt

   We can do better: create or update a file only if it 
    is newer than its prerequisites
Conditional execution by
   modification date
   Let's have a better look at this example:

    result1.txt: data1.txt calculate_result.py
      python calculate_result.txt --input
    data1.txt
   A great feature of make is that it execute a rule not 
    only if the target file doesn't exist, but also if it has a 
    'last modification date' earlier than all of its 
    prerequisites
Conditional execution by
   modification date
    result1.txt: data1.txt
      @sed 's/b/B/i' data1.txt > result1.txt
      @echo 'result1.txt has been calculated'
   In this example, result1.txt will be recalculated 
    every time 'data1.txt' is modified
    $: touch data1.txt calculate_result.py
    $: make result1.txt
    result1.txt has been calculated
    $: make result1.txt
    result1.txt is already up-to-date
    $: touch data1.txt
    $: make result1.txt
    result1.txt has been calculated
Make - advantages
   Make allows you to save shell commands along 
    with their parameters and re­execute them;
   It allows you to use command­line tools which 
    are more flexible;
   Combined with a revision control software, it 
    makes possible to reproduce all the operations 
    made to your data;
Resume of the day
   Unix → an operating system from the '70s, which 
    was successful for its philosophy and technical 
    novelties
   The terminal → a way to execute commands by 
    typing
   How to get help → man, info, ­­help, internet
   Grep, awk → search patterns in a file
   Other Unix tools → each specialized on a single 
    data manipulation task

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Html
HtmlHtml
Html
 
Html
HtmlHtml
Html
 
File Handling Python
File Handling PythonFile Handling Python
File Handling Python
 
Introduction to css
Introduction to cssIntroduction to css
Introduction to css
 
Coding conventions
Coding conventionsCoding conventions
Coding conventions
 
PythonOOP
PythonOOPPythonOOP
PythonOOP
 
Basic HTML CSS Slides
Basic HTML CSS Slides Basic HTML CSS Slides
Basic HTML CSS Slides
 
Introduction to Makefile
Introduction to MakefileIntroduction to Makefile
Introduction to Makefile
 
Css
CssCss
Css
 
Document Object Model (DOM)
Document Object Model (DOM)Document Object Model (DOM)
Document Object Model (DOM)
 
Linux Basic commands and VI Editor
Linux Basic commands and VI EditorLinux Basic commands and VI Editor
Linux Basic commands and VI Editor
 
File handling in Python
File handling in PythonFile handling in Python
File handling in Python
 
cascading style sheet ppt
cascading style sheet pptcascading style sheet ppt
cascading style sheet ppt
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
CSS notes
CSS notesCSS notes
CSS notes
 
Introduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiIntroduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache Jackrabbi
 
File handling & regular expressions in python programming
File handling & regular expressions in python programmingFile handling & regular expressions in python programming
File handling & regular expressions in python programming
 
Practices of the Python Pro: coding best practices
Practices of the Python Pro: coding best practicesPractices of the Python Pro: coding best practices
Practices of the Python Pro: coding best practices
 
Creating Excel files with Python and XlsxWriter
Creating Excel files with Python and XlsxWriterCreating Excel files with Python and XlsxWriter
Creating Excel files with Python and XlsxWriter
 
Introduction to Javascript
Introduction to JavascriptIntroduction to Javascript
Introduction to Javascript
 

Andere mochten auch

Learning sed and awk
Learning sed and awkLearning sed and awk
Learning sed and awkYogesh Sawant
 
Unix Operating System
Unix Operating SystemUnix Operating System
Unix Operating Systemsubhsikha
 
Unix command-line tools
Unix command-line toolsUnix command-line tools
Unix command-line toolsEric Wilson
 
Intro to linux performance analysis
Intro to linux performance analysisIntro to linux performance analysis
Intro to linux performance analysisChris McEniry
 
History of L0phtCrack
History of L0phtCrackHistory of L0phtCrack
History of L0phtCrackcwysopal
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureData Science London
 
Nigerian design and digital marketing agency
Nigerian design and digital marketing agencyNigerian design and digital marketing agency
Nigerian design and digital marketing agencySamson Aligba
 
VideoLan VLC Player App Artifact Report
VideoLan VLC Player App Artifact ReportVideoLan VLC Player App Artifact Report
VideoLan VLC Player App Artifact ReportAziz Sasmaz
 
脆弱性診断って何をどうすればいいの?(おかわり)
脆弱性診断って何をどうすればいいの?(おかわり)脆弱性診断って何をどうすればいいの?(おかわり)
脆弱性診断って何をどうすればいいの?(おかわり)脆弱性診断研究会
 
Open Source Security Testing Methodology Manual - OSSTMM by Falgun Rathod
Open Source Security Testing Methodology Manual - OSSTMM by Falgun RathodOpen Source Security Testing Methodology Manual - OSSTMM by Falgun Rathod
Open Source Security Testing Methodology Manual - OSSTMM by Falgun RathodFalgun Rathod
 
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...Giovanni Marco Dall'Olio
 
Nmap not only a port scanner by ravi rajput comexpo security awareness meet
Nmap not only a port scanner by ravi rajput comexpo security awareness meet Nmap not only a port scanner by ravi rajput comexpo security awareness meet
Nmap not only a port scanner by ravi rajput comexpo security awareness meet Ravi Rajput
 

Andere mochten auch (20)

Learning sed and awk
Learning sed and awkLearning sed and awk
Learning sed and awk
 
Linux intro 5 extra: makefiles
Linux intro 5 extra: makefilesLinux intro 5 extra: makefiles
Linux intro 5 extra: makefiles
 
Linux intro 2 basic terminal
Linux intro 2   basic terminalLinux intro 2   basic terminal
Linux intro 2 basic terminal
 
Linux intro 5 extra: awk
Linux intro 5 extra: awkLinux intro 5 extra: awk
Linux intro 5 extra: awk
 
Linux intro 1 definitions
Linux intro 1  definitionsLinux intro 1  definitions
Linux intro 1 definitions
 
Linux intro 3 grep + Unix piping
Linux intro 3 grep + Unix pipingLinux intro 3 grep + Unix piping
Linux intro 3 grep + Unix piping
 
Unix Operating System
Unix Operating SystemUnix Operating System
Unix Operating System
 
Unix command-line tools
Unix command-line toolsUnix command-line tools
Unix command-line tools
 
Samsung mobile root
Samsung mobile rootSamsung mobile root
Samsung mobile root
 
Intro to linux performance analysis
Intro to linux performance analysisIntro to linux performance analysis
Intro to linux performance analysis
 
History of L0phtCrack
History of L0phtCrackHistory of L0phtCrack
History of L0phtCrack
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
 
Nigerian design and digital marketing agency
Nigerian design and digital marketing agencyNigerian design and digital marketing agency
Nigerian design and digital marketing agency
 
VideoLan VLC Player App Artifact Report
VideoLan VLC Player App Artifact ReportVideoLan VLC Player App Artifact Report
VideoLan VLC Player App Artifact Report
 
脆弱性診断って何をどうすればいいの?(おかわり)
脆弱性診断って何をどうすればいいの?(おかわり)脆弱性診断って何をどうすればいいの?(おかわり)
脆弱性診断って何をどうすればいいの?(おかわり)
 
Open Source Security Testing Methodology Manual - OSSTMM by Falgun Rathod
Open Source Security Testing Methodology Manual - OSSTMM by Falgun RathodOpen Source Security Testing Methodology Manual - OSSTMM by Falgun Rathod
Open Source Security Testing Methodology Manual - OSSTMM by Falgun Rathod
 
Dangerous google dorks
Dangerous google dorksDangerous google dorks
Dangerous google dorks
 
How to Setup A Pen test Lab and How to Play CTF
How to Setup A Pen test Lab and How to Play CTF How to Setup A Pen test Lab and How to Play CTF
How to Setup A Pen test Lab and How to Play CTF
 
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
 
Nmap not only a port scanner by ravi rajput comexpo security awareness meet
Nmap not only a port scanner by ravi rajput comexpo security awareness meet Nmap not only a port scanner by ravi rajput comexpo security awareness meet
Nmap not only a port scanner by ravi rajput comexpo security awareness meet
 

Ähnlich wie Linux intro 4 awk + makefile

import java.io.BufferedReader;import java.io.BufferedWriter;.docx
import java.io.BufferedReader;import java.io.BufferedWriter;.docximport java.io.BufferedReader;import java.io.BufferedWriter;.docx
import java.io.BufferedReader;import java.io.BufferedWriter;.docxwilcockiris
 
Linux basic commands with examples
Linux basic commands with examplesLinux basic commands with examples
Linux basic commands with examplesabclearnn
 
Basic shell programs assignment 1_solution_manual
Basic shell programs assignment 1_solution_manualBasic shell programs assignment 1_solution_manual
Basic shell programs assignment 1_solution_manualKuntal Bhowmick
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxcargillfilberto
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxdrandy1
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxmonicafrancis71118
 
The Ring programming language version 1.10 book - Part 14 of 212
The Ring programming language version 1.10 book - Part 14 of 212The Ring programming language version 1.10 book - Part 14 of 212
The Ring programming language version 1.10 book - Part 14 of 212Mahmoud Samir Fayed
 
LOSS_C11- Programming Linux 20221006.pdf
LOSS_C11- Programming Linux 20221006.pdfLOSS_C11- Programming Linux 20221006.pdf
LOSS_C11- Programming Linux 20221006.pdfThninh2
 
Course 102: Lecture 12: Basic Text Handling
Course 102: Lecture 12: Basic Text Handling Course 102: Lecture 12: Basic Text Handling
Course 102: Lecture 12: Basic Text Handling Ahmed El-Arabawy
 
4 b file-io-if-then-else
4 b file-io-if-then-else4 b file-io-if-then-else
4 b file-io-if-then-elseMalik Alig
 
1 CMPS 12M Data Structures Lab Lab Assignment 1 .docx
1 CMPS 12M Data Structures Lab Lab Assignment 1 .docx1 CMPS 12M Data Structures Lab Lab Assignment 1 .docx
1 CMPS 12M Data Structures Lab Lab Assignment 1 .docxtarifarmarie
 
Love Your Command Line
Love Your Command LineLove Your Command Line
Love Your Command LineLiz Henry
 
intro unix/linux 02
intro unix/linux 02intro unix/linux 02
intro unix/linux 02duquoi
 
Can someone put this code in a zip file. I tried running it last tim.pdf
Can someone put this code in a zip file. I tried running it last tim.pdfCan someone put this code in a zip file. I tried running it last tim.pdf
Can someone put this code in a zip file. I tried running it last tim.pdffedosys
 

Ähnlich wie Linux intro 4 awk + makefile (20)

Makefiles Bioinfo
Makefiles BioinfoMakefiles Bioinfo
Makefiles Bioinfo
 
import java.io.BufferedReader;import java.io.BufferedWriter;.docx
import java.io.BufferedReader;import java.io.BufferedWriter;.docximport java.io.BufferedReader;import java.io.BufferedWriter;.docx
import java.io.BufferedReader;import java.io.BufferedWriter;.docx
 
Linux basic commands with examples
Linux basic commands with examplesLinux basic commands with examples
Linux basic commands with examples
 
Basic shell programs assignment 1_solution_manual
Basic shell programs assignment 1_solution_manualBasic shell programs assignment 1_solution_manual
Basic shell programs assignment 1_solution_manual
 
Examples -partII
Examples -partIIExamples -partII
Examples -partII
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
 
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docxCOMM 166 Final Research Proposal GuidelinesThe proposal should.docx
COMM 166 Final Research Proposal GuidelinesThe proposal should.docx
 
The Ring programming language version 1.10 book - Part 14 of 212
The Ring programming language version 1.10 book - Part 14 of 212The Ring programming language version 1.10 book - Part 14 of 212
The Ring programming language version 1.10 book - Part 14 of 212
 
LOSS_C11- Programming Linux 20221006.pdf
LOSS_C11- Programming Linux 20221006.pdfLOSS_C11- Programming Linux 20221006.pdf
LOSS_C11- Programming Linux 20221006.pdf
 
50 Most Frequently Used UNIX Linux Commands -hmftj
50 Most Frequently Used UNIX  Linux Commands -hmftj50 Most Frequently Used UNIX  Linux Commands -hmftj
50 Most Frequently Used UNIX Linux Commands -hmftj
 
Course 102: Lecture 12: Basic Text Handling
Course 102: Lecture 12: Basic Text Handling Course 102: Lecture 12: Basic Text Handling
Course 102: Lecture 12: Basic Text Handling
 
4 b file-io-if-then-else
4 b file-io-if-then-else4 b file-io-if-then-else
4 b file-io-if-then-else
 
1 CMPS 12M Data Structures Lab Lab Assignment 1 .docx
1 CMPS 12M Data Structures Lab Lab Assignment 1 .docx1 CMPS 12M Data Structures Lab Lab Assignment 1 .docx
1 CMPS 12M Data Structures Lab Lab Assignment 1 .docx
 
Love Your Command Line
Love Your Command LineLove Your Command Line
Love Your Command Line
 
intro unix/linux 02
intro unix/linux 02intro unix/linux 02
intro unix/linux 02
 
Introduction to Makefile
Introduction to MakefileIntroduction to Makefile
Introduction to Makefile
 
Ch05
Ch05Ch05
Ch05
 
15. text files
15. text files15. text files
15. text files
 
Can someone put this code in a zip file. I tried running it last tim.pdf
Can someone put this code in a zip file. I tried running it last tim.pdfCan someone put this code in a zip file. I tried running it last tim.pdf
Can someone put this code in a zip file. I tried running it last tim.pdf
 

Mehr von Giovanni Marco Dall'Olio (17)

Fehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal ClubFehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal Club
 
Agile bioinf
Agile bioinfAgile bioinf
Agile bioinf
 
Version control
Version controlVersion control
Version control
 
Wagner chapter 5
Wagner chapter 5Wagner chapter 5
Wagner chapter 5
 
Wagner chapter 4
Wagner chapter 4Wagner chapter 4
Wagner chapter 4
 
Wagner chapter 3
Wagner chapter 3Wagner chapter 3
Wagner chapter 3
 
Wagner chapter 2
Wagner chapter 2Wagner chapter 2
Wagner chapter 2
 
Wagner chapter 1
Wagner chapter 1Wagner chapter 1
Wagner chapter 1
 
Hg for bioinformatics, second part
Hg for bioinformatics, second partHg for bioinformatics, second part
Hg for bioinformatics, second part
 
Hg version control bioinformaticians
Hg version control bioinformaticiansHg version control bioinformaticians
Hg version control bioinformaticians
 
The true story behind the annotation of a pathway
The true story behind the annotation of a pathwayThe true story behind the annotation of a pathway
The true story behind the annotation of a pathway
 
Plotting data with python and pylab
Plotting data with python and pylabPlotting data with python and pylab
Plotting data with python and pylab
 
Pycon
PyconPycon
Pycon
 
biopython, doctest and makefiles
biopython, doctest and makefilesbiopython, doctest and makefiles
biopython, doctest and makefiles
 
Web 2.0 e ricerca scientifica - Web 2.0 and scientific research
Web 2.0 e ricerca scientifica - Web 2.0 and scientific researchWeb 2.0 e ricerca scientifica - Web 2.0 and scientific research
Web 2.0 e ricerca scientifica - Web 2.0 and scientific research
 
Perl Bioinfo
Perl BioinfoPerl Bioinfo
Perl Bioinfo
 
(draft) perl e bioinformatica - presentazione per ipw2008
(draft) perl e bioinformatica - presentazione per ipw2008(draft) perl e bioinformatica - presentazione per ipw2008
(draft) perl e bioinformatica - presentazione per ipw2008
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingSelcen Ozturkcan
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Linux intro 4 awk + makefile

  • 1. Programming for Evolutionary Biology March 17th - April 1st 2012 Leipzig, Germany Introduction to Unix systems Part 4 – awk & make Giovanni Marco Dall'Olio Universitat Pompeu Fabra Barcelona (Spain)
  • 2. Schedule  9.30 – 10.30: “What is Unix?”  11.00 – 12.00: Introducing the terminal  13:30 – 16:30: Grep, & Unix philosophy  17:00 – 18:00: awk, make, and question time
  • 3. awk  awk is the Unix command to work with tabular  data
  • 4. awk  Things you can do with awk:  Extract all the lines of a file that match a pattern,  and print only some of the columns (instead of  “grep | cut”)  Add a prefix/suffix to all the element of a column  (instead of “cut | paste”)  Sum values of different columns
  • 5. awk and gawk  Today we will use the GNU version of awk,  called gawk
  • 6. Awk exercise  Let's go to the “bed” directory  cd leipzig_course/unix_intro/bed
  • 7. The BED format  The BED format is used to store annotations on  genomic coordinates  Example:  Annotate gene position  Annotate Trascription Factor binding sites  Annotate SNP genotypes  Annotate Gene Expression
  • 8. Example awk usage  Select only the lines matching chromosome 7,  and print only the second column, summing 100  to it  awk '$1 ~ chr7 {print $2+100}' annotations.bed  $0 ~ AAC → select all the lines that contain AAC  {print} → for each line that matches the previous  expression, print it
  • 9. Example of BED file Columns in a BED file:  Chromosome  Start  End chr7 127471196 127472363 gene1 0 + chr7 127472363 127473530 gene2 2 +  Label chr7 127473530 127474697 gene3 20 +  Score chr7 127474697 127475864 gene4 3 + chr7 127475864 127477031 gene5 100 -  Strand (+ or ­) chr7 127477031 127478198 gene6 3 - chr7 127478198 127479365 gene7 5 -  …. chr7 127479365 127480532 gene8 1 + chr7 127480532 127481699 gene9 3 -
  • 10. Basic awk usage  awk '<pattern to select lines> {instructions to be  executed on each line}' 
  • 11. Selecting columns in awk  In awk, each column can be accessed by  $<column­number>   For example,   $1 → first column of the file  $2 → second column  $NF → last column  $0 matches all the columns of the file
  • 12. Printing columns in awk  The basic awk usage is to print specific columns  The syntax is the following:  awk '{print $1, $2, $3}' annotations.bed → prints  the first three columns
  • 13. Printing columns in awk  The basic awk usage is to print specific columns  The syntax is the following:  awk '{print $1, $2, $3}' annotations.bed → prints  the first three columns  awk '{print $0}' annotations.bed → print all the  columns
  • 14. Adding a prefix to a column with awk  A common awk usage is to add a prefix or suffix  to all the entries of a column  Example:   awk '{print $2 “my_prefix”$2}' annotations.bed
  • 15. Summing columns in awk  If two columns contain numeric values, we can  use awk to sum them  Usage:  awk '{print $2 + $3}' annotations.bed → sums  columns 2 and 3
  • 16. Filter lines and select columns with awk  Awk can apply a filter and print only the lines  matching a pattern  Like grep, but more powerful  Print all the lines that have “chr7” in their first  column:  awk '$1 ~ “chr7” {print $0}' myfile.txt 
  • 17. Advanced: redirecting to files  The following will split the contents of  annotations.bed into two files, according to the  1st column:  awk '{print $0 > $1".file"}' annotations.bed
  • 18. Advanced: print something before reading the file  The BEGIN statement can be used to execute  commands before reading the file:  awk 'BEGIN {print "position"} {print $2}'  annotations.bed
  • 19. More on awk  awk is a complete programming language  It is the equivalent of a spreadsheet for the  command line  If you want to know more, check the book “Gawk  effective AWK Programming” at  http://www.gnu.org/software/gawk/manual
  • 20. GNU/make  make is a tool used by programmer to define how  software should be compiled and installed  It is also used as a way to store a set of  commands, to recall them later
  • 21. Simplest Makefile example  The simplest Makefile contains just the name of a task  and the commands associated with it:  print_hello is a makefile 'rule': it stores the commands  needed to say 'Hello, world!' to the screen.
  • 22. Simplest Makefile example Makefile  Target of  rule the rule Commands associated  This is a  with the rule tabulation (not  8 spaces)
  • 23. Simplest Makefile example  Create a file in your  computer and save it as  'Makefile'.  Write these instructions in  it: print_hello: echo 'Hello, world!!' This is a tabulation  (<Tab> key)  Then, open a terminal and  type: make ­f Makefile print_hello
  • 25. Simplest Makefile example – explanation  When invoked, the program 'make' looks for a file in the  current directory called 'Makefile'  When we type 'make print_hello', it executes any  procedure (target) called 'print_hello' in the makefile  It then shows the commands executed and their output
  • 26. A sligthly longer example  You can add as many  commands you like  to a rule  For example, this  'print_hello' rule  contains 5 commands  Note: ignore the '@'  thing, it is only to  disable verbose mode  (explained later)
  • 27. A more complex example
  • 28. The target syntax  The target of a rule can be either a title for the task, or a  file name.  Everytime you call a make rule (example: 'make all'), the  program looks for a file called like the target name (e.g.  'all', 'clean', 'inputdata.txt', 'results.txt')  The rule is executed only if that file doesn't exists.
  • 29. Filename as target names  In this makefile, we have two rules: 'testfile.txt' and 'clean'
  • 30. Filename as target names  In this makefile, we have two rules: 'testfile.txt' and 'clean'  When we call 'make testfile.txt', make checks if a file called 'testfile.txt' already exists.
  • 31. Filename as target names The commands associated with the rule 'testfile.txt' are executed only if that file doesn't exists already
  • 32. Real Makefile-rule syntax  Complete syntax for a Makefile rule: <target>: <list of prerequisites> <commands associated to the rule>  Example: result1.txt: data1.txt data2.txt cat data1.txt data2.txt > result1.txt @echo 'result1.txt' has been calculated'  Prerequisites are files (or rules) that need to exists already  in order to create the target file.  If 'data1.txt' and 'data2.txt' don't exist, the rule 'result1.txt'  will exit with an error (no rule to create them)
  • 33. Piping Makefile rules together  You can pipe two Makefile rules together by  defining prerequisites
  • 34. Piping Makefile rules together  The rule 'result1.txt' depends on the rule  'data1.txt', which should be executed first
  • 35. Conditional execution by modification date  We have seen how make can be used to create a  file, if it doesn't exists. file.txt: # if file.txt doesn't exists, then create it: echo 'contents of file.txt' > file.txt  We can do better: create or update a file only if it  is newer than its prerequisites
  • 36. Conditional execution by modification date  Let's have a better look at this example: result1.txt: data1.txt calculate_result.py python calculate_result.txt --input data1.txt  A great feature of make is that it execute a rule not  only if the target file doesn't exist, but also if it has a  'last modification date' earlier than all of its  prerequisites
  • 37. Conditional execution by modification date result1.txt: data1.txt @sed 's/b/B/i' data1.txt > result1.txt @echo 'result1.txt has been calculated'  In this example, result1.txt will be recalculated  every time 'data1.txt' is modified $: touch data1.txt calculate_result.py $: make result1.txt result1.txt has been calculated $: make result1.txt result1.txt is already up-to-date $: touch data1.txt $: make result1.txt result1.txt has been calculated
  • 38. Make - advantages  Make allows you to save shell commands along  with their parameters and re­execute them;  It allows you to use command­line tools which  are more flexible;  Combined with a revision control software, it  makes possible to reproduce all the operations  made to your data;
  • 39. Resume of the day  Unix → an operating system from the '70s, which  was successful for its philosophy and technical  novelties  The terminal → a way to execute commands by  typing  How to get help → man, info, ­­help, internet  Grep, awk → search patterns in a file  Other Unix tools → each specialized on a single  data manipulation task