1. Programming for Evolutionary Biology
March 17th - April 1st 2012
Leipzig, Germany
Introduction to Unix systems
Extra: awk and gawk
Giovanni Marco Dall'Olio
Universitat Pompeu Fabra
Barcelona (Spain)
2. awk
“awk” is a “swiss army” command line tool to
manipulate tabular files
Things you can do with awk:
Extract all the lines of a file that match a pattern, and
print only some of the columns (instead of “grep |
cut”)
Add a prefix/suffix to all the element of a column
(instead of “cut | paste”)
Sum values of different columsn
3. awk and gawk
In these slides we will be talking about awk
In reality, the original awk is not available
anymore. We will use gawk, a free version of
gawk developed by the GNU project
4. Basic awk usage
“awk '<pattern to select lines> {instructions to be
executed on each line}' ”
5. Example awk usage
“awk '$0 ~ AAC {print}' sample_vcf.vcf”
$0 ~ AAC → select all the lines that contain AAC
{print} → for each line that matches the previous
expression, print it
6. Column names in awk
awk assumes that you are working on tabular files
Each column of the file can be accessed by
$<columnname>. For example, $2 is the second
column of the file
$0 matches all the columns of the file
7. Accessing columns in awk
“awk '{print $1, $2, $3}' sample_vcf.vcf” → prints
the first three columns
“awk '{print $0}' sample_vcf.vcf” → print all the
columns
8. Adding a prefix to a column
with awk
A common awk usage is to add a prefix or suffix to
all the entries of a column
Example:
awk '{print $2 “my_prefix”$2}' myfile.txt
9. Summing columns in awk
If two columns contain numeric values, we can use
awk to sum them
Usage:
“awk '{print $1 + $2}' myfile.txt
10. Selecting columns with awk
Awk can be used to select columns,
It is like grep, but more powerful, because it let you
specify on which columns the match must be
This example will print all the lines that have a
AAC in their first colum:
“awk '$1 ~ AAC {print}' myfile.txt
11. More on awk
awk is a complete programming language
It is the equivalent of a spreadsheet for the
command line
If you want to know more, check the book “Gawk
effective AWK Programming” at
http://www.gnu.org/software/gawk/manual