1. Essential UNIX Skills for
Biologists
Yannick Pouliot, PhD
Bioresearch Informationist
Lane Medical Library & Knowledge Management Center
1/14/2009
Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
2. The Bioresearch Informationist: At Your Service
Yannick Pouliot, PhD, Lane Medical Library &
Knowledge Management Center
Bioresearch Informationist ≈ computational biologist in
residence
Lane Library service
Closely coordinated with CMGM
Role: Support laboratory researchers regarding
biocomputational resources and their use
…especially postdocs
Contact: lanebioresearch@stanford.edu
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
2
3. Goals
Deliver basic understanding of core
UNIX commands
Tips on running UNIX on Mac and Windows
… and on a procedural note,
we’ll be using anonymous
polling to determine whether
you’re happy with the
material and speed of
delivery …
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
3
4. But First: LaneConnex -- Your Key to Finding
Resources Quickly
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
4
5. So, Why UNIX?
UNIX is good for:
performing complex operations with very few key strokes
operating on large number of objects for e.g.,
1.
2.
UNIX is fast…
searching file contents very specifically
renaming files
moving/copying files
Fast running and fast to invoke
LINUX (≈ UNIX) is free and runs on everything
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
5
6. UNIX Trip-Ups
UNIX is capitalization-sensitive
ls ≠ Ls
What you type is what you get
no mistyping!
mind those commands
e.g., rm –fr = delete everything in current directory and
subdirectories! → DON’T
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
DO THIS AT HOME!
6
7. So How Does One Access UNIX?
Mac: UNIX underlies Mac’s graphical
interface
Applications → Utilities → Terminal
Windows: Must install code (more later)
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
7
9. Key UNIX Concepts
UNIX is command-line based (no cute icons).
There are flavors of UNIX
“Mac” UNIX ≈ Linux ≈ UNIX
“Shell” = command line interface
different shells exist, all with identical basic functionality
Anything you can imagine, UNIX can do
… but you may have to think about it…
In UNIX, anything can be done in at least three different ways…
UNIX has:
commands (built-in) → most of today’s workshop
utilities
≈ “super-commands”, e.g., grep, for parsing text
not built-in but usually there
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
9
10. Concept: Redirection ***
Redirection operator
“>” or “<“ : add to file (overwrite)
“>>” or “<<“: add to file (don’t overwrite)
Applies to both input and output
file.txt > prog.exe
prog.exe > file.txt
File.txt > prog.exe > file1.txt
prog.exe >> file.txt
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
10
11. Concept: Metacharacters ***
“*”= 0 or more characters of any kind
‘.’ or ‘?’ = exactly one character of any kind
Exact character depends on the tool…
Metacharacters can be used with nearly any other
command, e.g.,
ls file?.txt
ls file*.txt
ls *.*
more *.txt
grep *omics *.txt
NB: There are lots of other kinds of metacharacters…
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
11
12. Concept: Stringing Commands
Together Using Pipes
“I” = pipe, e.g.:
ls -1 | more
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
12
13. Polling Time: How’s the speed?
1: Too fast
2. Too slow
3. More or less OK
4. I feel nauseous
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
13
15. ls [options] [names]
Lists contents of directories, including directories themselves
****
Basically, lists files…
When names are provides, lists files contained in a directory name or that
match a file name.
names can include filename metacharacters.
The options display information in different formats. The most useful
options include -F, -R, -l, and -s.
Examples
1. list all details of all files in current directory
ls –l
2. list just the filenames
ls -1
3. create a file that contains a list of the filenames
ls -1 > mylist.txt
4. List files of type with word “example” followed by single character, e.g.,
example1.txt, etc
ls -1 example?.txt
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
15
16. cat/more/head/tail
→ commands to look at content of files
cat: returns everything
more: same but one page at a time ****
head: returns top x lines
tail: returns bottom x lines
all can operate on multiple files
Examples
1. show contents of all txt files
cat *.txt
2. show first 100 lines of file
head +100 file.txt
3. show first 1000 lines of file and paginate:
head +1000 file.txt | more
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
16
17. grep: Searching File Contents Using “Regular
Expressions” ****
grep [options] pattern [files]
Very powerful: Searches file contents for presence of a string
grep protein *.pdf
about a million options…
Also searches using regular expressions
Definition: a mathematical expression that expresses the characteristics of
one or more strings, e.g.:
te?xt
*omics
Examples
1. Find all text files whose contents contain words ending in “omics”
(“genomics”, “proteomics”, “transcriptomics”):
grep *omics *.txt
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
17
18. Polling Time: How’s the speed?
1: Too fast
2. Too slow
3. More or less OK
4. Need coffee
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
18
19. uniq options filename1 **
Very handy for listing unique (or duplicate) lines in a file
Has options to…
ignore first or last n fields delimited by tabs or spaces
compare only the first n characters
Operates ONLY on sorted files
Examples
1. List unique lines using unsorted file
sort test1.txt | uniq
2. Count number of unique instances using sorted file
uniq –c test2.txt
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
19
20. find [pathnames] [conditions] ***
Very powerful: can specify anything, including exclusions and
negations
Descends the directory tree beginning at each pathname and
locates files that meet the specified conditions. The default
pathname is the current directory.
Most useful conditions are -name and -type (for general use)
Can search very large numbers of file names, if slowly…
Examples
1. List all files named chapter1 in the /work directory:
find /work -name chapter1 -print
2. Look for filenames in current directory that don't begin with a capital letter
find . ! -name '[A-Z]+' -print
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
20
21. UNIX on Windows
Easy: UnxUtls
= UNIX “light”
Excellent for most tasks
Not a complete emulation of UNIX
Download here; make sure to follow installation instructions
Hard: Cygwin
More later…
difficult to make it behave perfectly
can run in parallel with Windows
Easier: create a dual boot
Provides ability to boot either Windows or Linux
Requires reboot to go switch…
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
21
23. Everything You Need to Know About UNIX
in Short Form: eBooks from Lane
• The ultimate quick reference for LINUX
• More than you typically need, but you
can zoom into what you need
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
23
24. UnxUtils Installation: The MiniMe
of UNIX
Download
Installation instructions
→ Let’s do it together if you have
a PC and want it
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu
24