2. What is regex? “ Regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters.” ( Wikipedia: http://en.wikipedia.org/wiki/Regex) In plain English: Regex is a text-searching “language.”
23. Literals All characters search for their literal selves except for the following: “ [$.|?*+()” – they require being escaped when searched for as a literal. Example: [string_findregexp('LDC is fun!',-find='fun')] LP8: array: (fun) L9: array(fun)
24. Literals (cont) By default, regex is case-sensitive. Use the (?i) switch to make it case-insensitive. Examples: [string_findregexp('ABC abc',-find='abc')] LP8: array: (abc) L9: array(abc) [string_findregexp('ABC abc',-find='(?i)abc')] LP8: array: (ABC), (abc) L9: array(ABC, abc)
25. Escaping Characters In regular expressions, depending on the context, various characters have special meaning. In order to specify the literal character, you must escape it with a backslash (“). And because the backslash has special meaning in Lasso, it means you must double the backslashes in Lasso (“”).
26. Escaping Characters (cont) Example: [string_findregexp('[date] returns the date', -find='[date]')] LP8: array: ([date]) L9: array([date]) [string_findregexp('[date] returns the date', -find='[date]')] LP8: array:(d),(a),(t),(e),(e),(t),(t),(e),(d),(a),(t),(e) L9: array(d, a, t, e, e, t, t, e, d, a, t, e)
27. Dot A dot (aka period symbol “.”) will match any single character except line returns. Use the switch “(?s)” to turn on matching line returns too. Example: [string_findregexp('LDC is fun! Turn on a fan.', -find='f.n')] LP8: array: (fun), (fan) L9: array(fun, fan)
29. White Space To find white space, use the Lasso equivalents: Return = Newline = Tab = Example: [string_findregexp('123',-find='')] LP8: array: ( ), ( ) L9: array( , )
30. Character Classes Used to match against a set of characters contained within square brackets “[ … ]”. Order of characters within the class does not matter (i.e. [abc] == [cba]). Reserved characters are “ ^-]. Example: [string_findregexp('New Years Eve is 2009-12-31', -find='[123ae]')] LP8:array: (e), (e), (a), (e), (2), (1), (2), (3), (1) L9: array(e, e, a, e, 2, 1, 2, 3, 1)
31. Character Classes (cont) Hyphen denotes a range (e.g. “[0-9]” means 0,1,2,..,9 and [a-z] means a,b,c,...,z). Example: [string_findregexp('abcdef',-find='[b-d]')] LP8: array: (b), (c), (d) L9: array(b, c, d)
32. Character Classes (cont) A caret after the opening square bracket denotes characters to omit instead of find. Example: [string_findregexp('abcdef',-find='[^b-d]')] LP8: array: (a), (e), (f) L9: array(a, e, f)
33. Shorthand Character Classes d = [0-9] D = [^0-9] w ≈ [a-zA-Z0-9_] W ≈ [^a-zA-Z0-9_] s ≈ [] S ≈ [^] Example: [string_findregexp('1a2b3c',-find='d')] LP8: array: (1), (2), (3) L9: array(1, 2, 3) [string_findregexp('1a2b3c',-find='D')] LP8: array: (a), (b), (c) L9: array(a, b, c)
35. Positional Matching “^” matches beginning of text, “$” matches end of text, and (?m) switch makes ^ and $ match beginning and ending of each line. Example: [string_findregexp('123',-find='^d')] LP8: array: (1) L9: array(1) [string_findregexp('123',-find='(?m)^d')] LP8: array: (1), (2), (3) L9: array(1, 2, 3)
36. Positional Matching (cont) “b” matches a word boundary (the position between a word character and a non-word character or start/end of line). Example: [string_findregexp('cape and ape',-find='bape')] LP8: array: (ape) L9: array(ape) [string_findregexp('cape and ape',-find='ape')] LP8: array: (ape), (ape) L9: array(ape, ape)
37. Alternation Vertical bar (“|”) is an OR operand for regex. Example: [string_findregexp('cat and rat',-find='cat|rat')] LP8: array: (cat), (rat) L9: array(cat, rat)
38. Quantifiers Specifies the number to find: * = 0 or more + = 1 or more ? = 0 or 1 {n} = n times {n,m} = min n, max m times {n, } = min n, no max Example: [string_findregexp('123aaabbb', -find='0*1+2?3{1}a{1,2}ab{2,}')] LP8: array: (123aaabbb) L9: array(123aaabbb)
39. Grouping Round brackets “( )” group the regex together, allowing quantifiers to be used on the group or to perform AND/OR with regex. They also create backreferences, which we won't cover in this session, but know that Lasso returns the group match in addition to the overall match. Example: [string_findregexp('cat and rat',-find='(c|r)at')] LP8: array: (cat), (c), (rat), (r) L9: array(cat, c, rat, r)
40. Grouping (cont) There is an option for non-capturing groups: “(?: … regex here...)” Example: [string_findregexp('cat and rat',-find='(?:c|r)at')] LP8: array: (cat), (rat) L9: array(cat, rat)
41.
42. When using regular expressions obtained from outside sources, you'll need to double-up the backslashes (“) for Lasso (e.g. “+” becomes “d+”).
43. User-input used as part of a regular expression must be encoded (http://tagswap.net/lp_regexp_encode)
44.
45. Often, there are several ways to match. If one approach doesn't work, try another.