9. Repetition
? 0 or 1 time
/^w{4}Gw{2}AA$/
* 0 or more times
/^d{1,2}d{1,2}d{2,4}$/
+ 1 or more times
*? ungreedy *
+? ungreedy +
{m} m times
{m, n} m up to n times
{m, n}? ungreedy {m,n}
10. Grouping
[ABC] any of these
characters
(AB|BC|CA) any of
these expressions
(THIS!) save this
[A-Za-z0-9] ranges
/^[ACTG]{4}G[ACTG]
{2}AA$/
/^(0?[1-9]|[0-2]d|3[01])
(0?d|1[0-2])
(d{2}|d{4})$/
15. E.g.: demultiplexing fasta
1. Barcode
2. Primer
3. Random nucleotides
grep -P '1:N:0:ACTGGTT' -A3 –no-group-separator
multiplex_R1.fastq | grep -P '^[ACTGN]
{4}CCC[ACGT]T[GC]AGATA' -A2 -B1 --no-group-separator >
deplexed_R1.fq
16. E.g.: paper figures!
From the subset of unique sequences that span the
entire region under study, how many unique
sequences are matched by each primer combination?
17. Sed: find & replace
“Are you gonna talk about
vim regexes?”
“Sed regexes are weird”
My work around:
use ranges
[0-9]
[A-Z]
[a-z]
[A-Za-z]
18. Sed: find & replace
“Are you gonna talk about
vim regexes?”
Sed regexes are weird”
My work around:
use ranges
[0-9]
[A-Z]
[a-z]
[A-Za-z]
E.g.:
“Oh noes, Americans don't know how to
separate decimals!”
sed 's/./,/g' hisfile.tab > myfile.tab
“Oh noes, this bloody file was edited in
Windows!”
sed 's/r/n/' theirfile.tab > decentfile.tab
“Oh noes, Cassava 1.6 has a slash in it!”
sed 's,/1, 1:N:0:NNNNNN,' oldfile.fq > newfile.fq