SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Mahzad.Zahedi@rcisp.com
May 2014
REGULAR EXPRESSIONS)REGEX)
1
REFERENCE
The Complete Tutorial (Temp folder)
2
TABLE OF CONTENT
 1. Introduction
 2. Literal Characters
 3. First Look at How a Regex Engine Works
 4. Character Classes or Character Sets
 5. The Dot Matches (Almost) Any Character .
 6. Start of String and End of String Anchors
 7. Word Boundaries
 8. Alternation with The Vertical Bar or Pipe Symbol
 9. Optional Items
 10. Repetition with Star and Plus
 11. Use Round Brackets for Grouping
3
INTRODUCTION
 A regular expression (regex or regexp) is a special text
string for describing a search pattern, validate data.
 A regular expression “engine” is a piece of software that
can process regular expressions.
 There are many software applications and programming
languages that support regular expressions.
4
.NET 1.0–4.5
Java 4–8
Perl 5.8–5.18
PCRE C library = Perl Compatible Regular Expressions
PHP
Delphi
R
JavaScript
VBScript
XRegExp
Python
Ruby
Tcl ARE
POSIX BRE
• similar to the one used by the traditional UNIX grep command
• most metacharacters require a backslash to give the
metacharacter a{1,2} matches a or aa
POSIX ERE
• similar to the one used by the UNIX egrep command
• Quantifiers ?, +, {n}, {n,m} and {n,}
• Backreferences
• Alternation
GNU BRE
GNU ERE
Oracle
XML
XPath
5
EditPrO
INTRODUCTION(CONT.)
 Advantage:
 Reducing development time for Programmer
 Fast executing
 Ex : reg(ular expressions?|ex(p|es)?)
 regular expressions
 regular
 regexp
 regexp
 regexes
6
INTRODUCTION(CONT.)
 Pattern matching an esentional problem
 Many applications need to "parse" a input
1) URLs
2) Log Files:
3) XML
http://first.dk/index.php?id=141&view=details
13/02/2010 66.249.65.107 get /support.html
20/02/2010 42.116.32.64 post /search.html
protocol host path query-string
(list of key-value pairs)
<article>
<title>Three Models for
the...</title>
<author>Noam Chomsky</author>
<year>1956</year>
</article>
7
LITERAL CHARACTERS & SPECIAL
CHARACTERS
 Literals
 A single literal character, ex : «a»
 “Jack is a boy”
 Some literal characters
 Apply «cat» to “He captured a catfish for his cat.”
 Non-Printable Characters
 «t » tab character (ASCII 0x09)
 «r» for carriage return (0x0D)
 «n» for line feed (0x0A)
 …
1-‫انطباق‬‫ای‬‫ر‬‫ب‬‫تالش‬‫ج‬‫ر‬‫توکن‬ ‫اولین‬‫کس‬
‫شته‬‫ر‬ ‫با‬:C‫با‬H‫شکست‬ ‫و‬!!
2-‫موجود‬ ‫جکس‬‫ر‬‫از‬ ‫ی‬‫دیگر‬‫جایگشت‬
‫شته‬‫ر‬ ‫در‬‫بعدی‬‫اکتر‬‫ر‬‫کا‬‫لذا‬‫نیست‬
3-‫تو‬‫اغ‬‫ر‬‫س‬ ‫موفق‬‫انطباق‬ ‫مین‬‫ر‬‫چها‬ ‫در‬‫کن‬
‫بعدی‬
4-‫میخورد‬‫شکست‬‫انطباق‬ ‫ششمین‬‫در‬‫و‬
‫مین‬‫ر‬‫چها‬ ‫در‬‫میشود‬‫متوجه‬‫موف‬‫ی‬ ‫بررس‬‫ق‬
‫م‬ ‫ادامه‬ ‫اکتر‬‫ر‬‫کا‬ ‫پنجمین‬‫از‬ ‫و‬ ‫نبودده‬‫ی‬
‫دهد‬
∕∕
8
LITERAL CHARACTERS & SPECIAL
CHARACTERS (CONT.)
 Reserve certain characters for special use
Meaning char
Beginning of string ^ caret
End of string $ doller sign
Any character except newline . dot
Match 0 or more * star
Match 1 or more + plus
Match 0 or 1 ? Question mark
alternative | pipe symbol
Grouping; ”store” ( ) parenthesis
Special  backslash
opening square bracket [
9
LITERAL CHARACTERS & SPECIAL
CHARACTERS (CONT.)
 use any of these characters as a literal in a regex
If you forget to escape a special character
NOTE !
 Most regular expression flavors treat the brace «{» as a literal
character, unless it is part of a repetition operator like « M{1,3}».
 An exception to this rule is the java.util.regex
1+1=2 literal
123+111=234 other meaning
+1=2 ERROR
1+1=2
10
LITERAL CHARACTERS & SPECIAL
CHARACTERS (CONT.)
 Q...E escape sequence
 E.g. «Q*d+*E» matches the literal text „*d+*”.
o Special Characters and Programming Languages
 Compiler will turn the escaped backslash in the source
code into a single backslash in the string that is passed
on to the regex library
 The regex «1+1=2» as “1+1=2”
 The regex «c:temp» as “c:temp”
compiler Regex lib 
11
FIRST LOOK AT HOW A REGEX ENGINE
WORKS INTERNALL
 Two kinds of regular
expression engines:
 text-directed- DFA
 regex-directed- NFA
 awk, egrep, flex, lex,
MySQL are text-directed
 A few of versions are regex-
directed
 The regex directed is
more powerful.
12
TABLE OF CONTENT
 1. Introduction
 2. Literal Characters
 3. First Look at How a Regex Engine Works
 4. Character Classes or Character Sets
 5. The Dot Matches (Almost) Any Character .
 6. Start of String and End of String Anchors
 7. Word Boundaries
 8. Alternation with The Vertical Bar or Pipe Symbol
 9. Optional Items
 10. Repetition with Star and Plus
 11. Use Round Brackets for Grouping
13
CHARACTER CLASSES OR CHARACTER
SETS
 To match only one out of several characters
 «gr[ae]y» = > „ gray” or „grey” (for both American or British English )
 Using a hyphen inside a character class to specify a
range of characters.
 [0-9a-fA-F]
 Useful Application
 Find a word, even if it is misspelled => «sep[ae]r[ae]te»
 Find an identifier => «[A-Za-z_][A-Za-z_0-9]*»
 Find a C-style hexadecimal number => «0[xX][A-Fa-f0-9]+»
14
NEGATED CHARACTER CLASSES
 [^ …]
 Match any character that is not in the character
class.
 «q[^u]» : “a q followed by a character that is not a
u”. => Iraq is a country
 Negated character class still must match a
character. [^abc] => fdgha
 Unlike the dot, negated character classes also
match line break characters. 15
METACHARACTERS INSIDE CHARACTER
CLASSES
 Metacharacters inside a character class are:
 The closing bracket ( ] )=>(])
 The backslash ()=> []
 The caret (^) => [^]
 The hyphen (-)=> [-]
 [+*]=[+*] -- > reducing readability
 Other Solutions:
 Placing them in a position where they do not take on their
special meaning.
 Closing bracket right after the opening bracket []x]
 Caret anywhere except right after the opening bracket [x^]
 Hyphen any where except middle [-x] 16
METACHARACTERS INSIDE CHARACTER
CLASSES
 All non-printable characters in character classes
just like outside of character classes.
 E.g. [$u20AC] : dollar or euro sign
 Perl and PCRE also support the Q...E sequence
inside character classes
 E.g. «[Q[-]E]» matches „[”, „-” or „]”.
 POSIX regular expressions treat the backslash as a
literal character inside character classes.
 Can’t use backslashes to escape
 So just use in correct position
17
SHORTHAND CHARACTER CLASSES
 Both inside and outside the square brackets are
used
 Ex: 1 + 2 = 3
 sd=whitespace followed by a digit “ 2”
 [sd]=whitespace or digit “1 2 3”
Class Meaninig
w Word character, [a-zA-z0-9_].
d Digit character, [0-9].
s Whitespace character, [ nrt ].
W Non-word character, [^a-zA-z0-9_] =[^w]
D Non-digit character, [^0-9]=[^d]
S Non-whitespace character, [^ nrft ]=[^s]
18
NEGATED SHORTHAND CHARACTER
CLASSES
 «[DS]» is not the same as «[^ds]».
 [^ds]= any char that is not a digit or whitespace.
 123 5] ⌐(a U b )
 [DS] =any char that is either not a digit, or is not
whitespace.
 123 5] ⌐ a U ⌐b
19
REPEATING CHARACTER CLASSES
 By using the «?», «*» or «+» operators
 «[0-9]+» “833337” „222” …
 For repeating the matched character, rather than
the class we need “backreferences”
 «([0-9])1+»
 will match „222”
 will match „3333” for “833337 “
20
LOOKING INSIDE THE REGEX ENGINE
 The order inside a character class does not matter
 Ex : «gr[ae]y» “Is his hair grey or gray?”
1. Failing to match “g” every 12 steps
2. „g” is matched in 13th step
3. Matching “r” token in the regex with “r” in text
4. Failing to match “a“ token with “e”
5. Try to match other permutations of the regex
pattern
6. Matching the last regex token with “y” in text
7. the leftmost match was returned : grey
21
TABLE OF CONTENT
 1. Introduction
 2. Literal Characters
 3. First Look at How a Regex Engine Works
 4. Character Classes or Character Sets
 5. The Dot Matches (Almost) Any Character .
 6. Start of String and End of String Anchors
 7. Word Boundaries
 8. Alternation with The Vertical Bar or Pipe Symbol
 9. Optional Items
 10. Repetition with Star and Plus
 11. Use Round Brackets for Grouping
22
THE DOT MATCHES (ALMOST) ANY
CHARACTER
 The most commonly misused metacharacter.
 The dot will not match a newline character by
default (Why)?
 «[^n]» (UNIX regex flavors)
 «[^rn]» (Widows regex flavors)
 In Perl, the mode where the dot also matches
newlines is called "single-line mode“
 In .NET framework “Regex.Match("string", "regex",
RegexOptions.Singleline)”
 JavaScript and VBScript do not have an option to
make the dot match line break characters : «[sS]»
23
USE THE DOT SPARINGLY
 The dot is a very powerful regex metacharacter
 It allows you to be lazy ,Ex: mm/dd/yy format
 Solutions:
• dd.dd.dd 02512703
• dd[- /.]dd[- /.]dd 99/99/99
• (0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01]) [- .](19|20)dd
09/31/2079
24
USE NEGATED CHARACTER SETS INSTEAD
OF THE DOT
 star is greedy
 Ex: we have a problem with "string one" and "string
two "
 Regexp : ".*"
 "string one" and "string two“
 Regexp : "[^"rn]*"
 "string one" "string two"
25
TABLE OF CONTENT
 1. Introduction
 2. Literal Characters
 3. First Look at How a Regex Engine Works
 4. Character Classes or Character Sets
 5. The Dot Matches (Almost) Any Character .
 6. Start of String and End of String Anchors
 7. Word Boundaries
 8. Alternation with The Vertical Bar or Pipe Symbol
 9. Optional Items
 10. Repetition with Star and Plus
 11. Use Round Brackets for Grouping
26
START OF STRING AND END OF STRING
ANCHORS
 Literals and class characters match a character
 Anchors do not match any character at all. Instead,
they match a position
 Caret «^» : «^a» to “abc”
 Dolor sign «$» : «c$» to “abc”
 Useful Application
 For validating user input, using anchors is very important
 if ($input =~ m/d+/) qsdf4ghjk => «^d+$» qsdf4ghjk 44467
27
USING ^ AND $ AS START OF LINE AND END
OF LINE ANCHORS
 If you have a string consisting of multiple lines,Ex:
 “ first linen second line”
 In tools as EditPad Pro (work with entire files)
 In Programming Languages
 Perl : "multi-line mode“
 m/^regex$/m
28
PERMANENT START OF STRING AND END OF
STRING ANCHORS
 «A» : only ever matches at the start of the file
 «Z» : only ever matches at the end of the file
 Anchors match at a position, rather than matching a
character
 Anchors can result in a zero-length match.
 Since the match does not include any characters,
nothing is deleted in replcament
 In VB.NET
 Dim Quoted as String = Regex.Replace(Original, "^", "> ",
RegexOptions.Multiline)
29
TABLE OF CONTENT
 1. Introduction
 2. Literal Characters
 3. First Look at How a Regex Engine Works
 4. Character Classes or Character Sets
 5. The Dot Matches (Almost) Any Character .
 6. Start of String and End of String Anchors
 7. Word Boundaries
 8. Alternation with The Vertical Bar or Pipe Symbol
 9. Optional Items
 10. Repetition with Star and Plus
 11. Use Round Brackets for Grouping
30
WORD BOUNDARIES
 The metacharacter «b» is an anchor like ^“ ” & “$”
 This match is zero-length.
 Simply put: «b» allows you to perform a “whole
words only”
 «b4b» matches a “4” 44 a4
 2 positions :
 Before the first & last word character
 Between a word character and a non-word character
31
LOOKING INSIDE THE REGEX ENGINE
 Ex: «bisb» string : “This island is beautiful”.
 “b” matches position before “T”
 Matching the next token: the literal «i»
 The engine does not advance to the next character in the
string, because the previous regex token was zero-lenght, «i»
does not match “T”.
 «b» can not match at the position between the “T” and
the “h”.
 ….
 POSIX does not support word boundaries at all.
32
TABLE OF CONTENT
 1. Introduction
 2. Literal Characters
 3. First Look at How a Regex Engine Works
 4. Character Classes or Character Sets
 5. The Dot Matches (Almost) Any Character .
 6. Start of String and End of String Anchors
 7. Word Boundaries
 8. Alternation with The Vertical Bar or Pipe Symbol
 9. Optional Items
 10. Repetition with Star and Plus
 11. Use Round Brackets for Grouping
33
ALTERNATION WITH THE VERTICAL BAR OR
PIPE SYMBOL
 Similar to character classes to match a single
character
 Remember That The Regex Engine Is Eager
 It will stop searching as soon as it finds a valid match.
 RE: Get|GetValue|Set|SetValue
 Str : SetValue
What are solutions?
1-‫توکن‬ ‫اولین‬G‫و‬ ‫داده‬‫انطباق‬ ‫ل‬‫او‬ ‫اکتر‬‫ر‬‫کا‬‫با‬
‫شکست‬!!
2-‫بعدی‬ ‫های‬‫گزینه‬«‫یا‬»‫شکست‬ ‫و‬!!
3-‫بعدی‬‫توکن‬S‫با‬S‫و‬ ‫داده‬ ‫ق‬‫انطبا‬‫شته‬‫ر‬‫در‬
‫موفق‬!‫تا‬‫ادامه‬ ‫و‬«t»
4-،‫جیکس‬‫ر‬‫بودن‬ ‫مشتاق‬ ‫خاطر‬‫به‬SET
‫میشود‬‫برگردونده‬ 34
ALTERNATION WITH THE VERTICAL BAR OR
PIPE SYMBOL (CONT.)
 Solutions are:
 Changing the order of options
 GetValue|Get|SetValue|Set
 Using greedy feature of question mark ”?”
 Get(Value)?|Set(Value)?
 Using b
 b(Get|GetValue|Set|SetValue)b
 The POSIX standard mandates that the longest match be
returned, regardless if the regex an NFA or DFA algorithm.
35
TABLE OF CONTENT
 1. Introduction
 2. Literal Characters
 3. First Look at How a Regex Engine Works
 4. Character Classes or Character Sets
 5. The Dot Matches (Almost) Any Character .
 6. Start of String and End of String Anchors
 7. Word Boundaries
 8. Alternation with The Vertical Bar or Pipe Symbol
 9. Optional Items
 10. Repetition with Star and Plus
 11. Use Round Brackets for Grouping
36
OPTIONAL ITEMS
 “?” makes the preceding token in the regular
expression optional.
 You can make several tokens optional by grouping
them together using round brackets
 Feb(ruary)? 23(rd)?
 „February 23rd”, „February 23”, „Feb 23rd” , „Feb 23”.
 Important Regex Concept: Greediness
 The engine will always try to match that part. Only if this
causes the entire regular expression to fail, will try
ignoring the part the question mark applies to. 37
LOOKING INSIDE THE REGEX ENGINE
 EX: «colou?r» , Str: “The colonel likes the color
green”.
1. 5th char matches successfully from “c” to “o”
2. Checking wheather “u” matches “n” and fail
 Question mark : failing is accesptable.
3. Next token , fails to match “n”.
4. starts again trying to match «c» to the first o in
“colonel”.
5. ….
38
TABLE OF CONTENT
 1. Introduction
 2. Literal Characters
 3. First Look at How a Regex Engine Works
 4. Character Classes or Character Sets
 5. The Dot Matches (Almost) Any Character .
 6. Start of String and End of String Anchors
 7. Word Boundaries
 8. Alternation with The Vertical Bar or Pipe Symbol
 9. Optional Items
 10. Repetition with Star and Plus
 11. Use Round Brackets for Grouping
39
REPETITION WITH STAR AND PLUS
 Valid HTML tag
 «<[A-Za-z][A-Za-z0-9]*>»
 «<[A-Za-z0-9]+>» „<1>”
Class Meaninig
* 0 or more
+ 1 or more
? 0 or 1
{3} Exactly 3
{3,} 3 or more
{3,5} 3, 4 or 5
Add a ? to a quantifier to make it ungreedy. 40
WATCH OUT FOR THE GREEDINESS!
 EX: Matching HTML tag
 This is a <EM>first</EM> test
 <.+>
1. The first token in the regex is «<».
2. The next token is the dot, which matches any character except
newlines
3. The dot is repeated by the plus. The plus is greedy
4. The dot fails when the engine has reached the void after the
end of the string.
5. Engine continue with the next token «>» &can not match
6. The engine remembers that the plus has repeated the dot more
often than is required so it backtrack
7. It is reduced to „EM>first</EM> tes” and next token in the regex
is still «>»
8. It will continue for the first valid match (eager)
41
LAZINESS INSTEAD OF GREEDINESS
 Lazy quantifiers are sometimes also called “ungreedy”
 This is a <EM>first</EM> test
 “<.+?>”
1. «<» matches the first „<” in the string
2. The next token is the dot, this time repeated by a lazy
plus
 This tells the regex engine to repeat the dot as few times
as possible (MIN=1)
 Matches “.” With “E”
3. Matches “>” with “M” and fails
• But this time, the backtracking will force the lazy plus to
expand
4. Return <EM> </EM>
42
LOOKING INSIDE THE REGEX ENGINE
 Ex: <([A-Z][A-Z0-9]*)[^>]*>.*?</1>
 Str: “Testing <B><I>bold italic</I></B> text”
1. Matching at the first „<”
2. «[A-Z]» matches „B” & advances to «[A-Z0-9]» and “>”
3. This match fails. However, because of the star, that’s
perfectly fine
4. Storing what was matched inside them, „B” is stored
5. The regex is advanced to «[^>]» & string remains at “>” &
go to 3
6. Matching “>” with “>”
7. The next token is a dot, repeated by a lazy star
43
AN ALTERNATIVE TO LAZINESS
 An option for making the plus lazy instead of
backtracking
 Greedy plus and a negated character class
 <EM>first</EM>
 «<[^>]+>»
 Backtracking slows down the regex engine
 you will save plenty of CPU cycles when using such a
regex
44
REPEATING Q...E ESCAPE SEQUENCE
 «Q*d+*E+»
 In Perl : “*d+**d+*”
 In java : “*d+**d+*”
 If you want Java to return the same match as Perl
 «Q*d+E*+»
 If you want Perl to repeat the whole sequence like
Java does
 «(Q*d+*E)+»
45
TABLE OF CONTENT
 1. Introduction
 2. Literal Characters
 3. First Look at How a Regex Engine Works
 4. Character Classes or Character Sets
 5. The Dot Matches (Almost) Any Character .
 6. Start of String and End of String Anchors
 7. Word Boundaries
 8. Alternation with The Vertical Bar or Pipe Symbol
 9. Optional Items
 10. Repetition with Star and Plus
 11. Use Round Brackets for Grouping
46
USE ROUND BRACKETS FOR GROUPING
 Grouping the part of the regular expression
together for applying a regex operator
 Creating a Backreference
 reuses part of the regex match
 slows down the regex engine
 Optimize this regular expression into «Set(?:Value)?
 How to Use Backreferences
 abc5abc
 )[abc])+51$ 1=> a b c - > abbc5c
 <div>hello</div>
 <([a-z]*)>.*</1> 47
REPETITION AND BACKREFERENCES
 ([abc]+)» & «([abc])+» to “cab” string
 ([abc]+)» : “cab” to be referenced
 «([abc])+» : “b” to be referenced
48
USE ROUND BRACKETS FOR GROUPING
(CONT.)
 Reusing the same backreference more than once.
 ([a-c])x1x1» „axaxa” „bxbxb” „cxcxc”
 Backreferences Cannot be used inside itself.
 ([abc]1)
 Round brackets Cannot be used inside character
classes, as metacharacters.
 (a)[1b]
 Useful Example: Checking for Doubled Words
 «b(w+)s+1b»
49
POSIX CLASS
Posix ASCII ‫توضیح‬
[:alnum:] [A-Za-z0-9] ‫همه‬‫وعددی‬ ‫حرفی‬ ‫های‬‫اکتر‬‫ر‬‫کا‬
[:alpha:] [A-Za-z] ‫و‬‫بزرگ‬ ‫حروف‬‫کوچک‬
[:blank:] [t] ‫فاصله‬‫تب‬ ‫و‬
[:digit:] [0-9] ‫اعداد‬
[:punct:] [?<=>;:/.,+*()'
&%$#”!-]
@[]^_`{|}~]
‫عالیم‬‫ی‬ ‫ش‬‫ر‬‫نگا‬
50
ٍEXAMPLE 1
 Example 1. Beginning of line ( ^ )
 grep "^Nov 10" messages.1
 Example 2. End of the line ( $)
 grep "terminating.$" messages
.
Nov 10 01:12:55 gs123 ntpd[2241]: time reset +0.177479 s
Nov 10 01:17:17 gs123 ntpd[2241]: synchronized to OCAL(0)
Nov 10 01:18:49 gs123 ntpd[2241]: synchronized to 15.1.13.13
Jul 12 17:01:09 cloneme kernel: Kernel log daemon terminating.
Oct 28 06:29:54 cloneme kernel: Kernel log daemon terminating
51
EXAMPLE 2
 Example 3. quantifier (*)(+)(?)
 [hc]*at =cchat,hcat,hhhat,at 0 or more
 [hc]+at= ccchat, hcat, No at 1 or more
 [hc]?at= hat, cat, at 0 or 1
 Example 4.Escaping the special character ()
 grep "127.0.0.1" /var/log/messages.4
Oct 28 06:31:10 btovm871 ntpd[2241]: Listening on interface lo,
127.0.0.1#123 Enabled
52
ٍEXAMPLE 3
 Example 5.Excluding specific characters
‫الف‬:
 Match text hog
 Match text dog ---- > [^b]og
 Skip Text bog
‫ب‬:‫غیر‬ ‫ی‬‫اکتر‬‫ر‬‫کا‬a‫یا‬b‫یا‬c
[^abc]
abccc
adb
gh
53
EXAMPLE 4
 Example 6. Composite syntax
‫است‬ ‫یر‬‫ز‬ ‫شرح‬ ‫به‬ ‫الگ‬‫اطالعات‬:
 Sun Jun 4 22:08:39 2006 [pid 21611] [dcid] OK
LOGIN: Client “192.168.1.1”
 ^w+sw+sd+ S+ d+ [pid d+]s [(w+)] OK LOGIN:
Client “(d+.d+.d+.d+)”$
54

Weitere ähnliche Inhalte

Was ist angesagt?

Regular Expression
Regular ExpressionRegular Expression
Regular Expressionvaluebound
 
Regular Expressions 101
Regular Expressions 101Regular Expressions 101
Regular Expressions 101Raj Rajandran
 
Introducing Regular Expressions
Introducing Regular ExpressionsIntroducing Regular Expressions
Introducing Regular ExpressionsNeha Jain
 
Regular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website AnalysisRegular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website AnalysisGlobal Media Insight
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressionsBen Brumfield
 
Regular Expression
Regular ExpressionRegular Expression
Regular ExpressionLambert Lum
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expressionGagan019
 
Regular expression
Regular expressionRegular expression
Regular expressionLarry Nung
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsEran Zimbler
 
Regular Expression in Compiler design
Regular Expression in Compiler designRegular Expression in Compiler design
Regular Expression in Compiler designRiazul Islam
 
The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++Anjesh Tuladhar
 
Introducing Modern Perl
Introducing Modern PerlIntroducing Modern Perl
Introducing Modern PerlDave Cross
 
LPW: Beginners Perl
LPW: Beginners PerlLPW: Beginners Perl
LPW: Beginners PerlDave Cross
 
Regular expression
Regular expressionRegular expression
Regular expressionRajon
 
Python regular expressions
Python regular expressionsPython regular expressions
Python regular expressionsKrishna Nanda
 
Compiler design syntax analysis
Compiler design syntax analysisCompiler design syntax analysis
Compiler design syntax analysisRicha Sharma
 
정규표현식(Regular expressions)
정규표현식(Regular expressions)정규표현식(Regular expressions)
정규표현식(Regular expressions)Juhee Kim
 

Was ist angesagt? (20)

Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Regular Expressions 101
Regular Expressions 101Regular Expressions 101
Regular Expressions 101
 
Introducing Regular Expressions
Introducing Regular ExpressionsIntroducing Regular Expressions
Introducing Regular Expressions
 
Regular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website AnalysisRegular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website Analysis
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Regex Basics
Regex BasicsRegex Basics
Regex Basics
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expression
 
Regular expression
Regular expressionRegular expression
Regular expression
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular Expression in Compiler design
Regular Expression in Compiler designRegular Expression in Compiler design
Regular Expression in Compiler design
 
Python : Regular expressions
Python : Regular expressionsPython : Regular expressions
Python : Regular expressions
 
The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++
 
Introducing Modern Perl
Introducing Modern PerlIntroducing Modern Perl
Introducing Modern Perl
 
LPW: Beginners Perl
LPW: Beginners PerlLPW: Beginners Perl
LPW: Beginners Perl
 
Regular expression
Regular expressionRegular expression
Regular expression
 
Andrei's Regex Clinic
Andrei's Regex ClinicAndrei's Regex Clinic
Andrei's Regex Clinic
 
Python regular expressions
Python regular expressionsPython regular expressions
Python regular expressions
 
Compiler design syntax analysis
Compiler design syntax analysisCompiler design syntax analysis
Compiler design syntax analysis
 
정규표현식(Regular expressions)
정규표현식(Regular expressions)정규표현식(Regular expressions)
정규표현식(Regular expressions)
 

Andere mochten auch

Andere mochten auch (15)

Lecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular LanguagesLecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular Languages
 
Regular expression (compiler)
Regular expression (compiler)Regular expression (compiler)
Regular expression (compiler)
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
NFA or Non deterministic finite automata
NFA or Non deterministic finite automataNFA or Non deterministic finite automata
NFA or Non deterministic finite automata
 
Finite Automata
Finite AutomataFinite Automata
Finite Automata
 
Learn PHP Lacture1
Learn PHP Lacture1Learn PHP Lacture1
Learn PHP Lacture1
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Finite automata
Finite automataFinite automata
Finite automata
 
Reflexive Access List
Reflexive Access ListReflexive Access List
Reflexive Access List
 
validation-of-email-addresses-collected-offline
validation-of-email-addresses-collected-offlinevalidation-of-email-addresses-collected-offline
validation-of-email-addresses-collected-offline
 
Human Cloning and Genetic Modification
Human Cloning and Genetic ModificationHuman Cloning and Genetic Modification
Human Cloning and Genetic Modification
 
Java Regular Expression PART II
Java Regular Expression PART IIJava Regular Expression PART II
Java Regular Expression PART II
 
Nfa to-dfa
Nfa to-dfaNfa to-dfa
Nfa to-dfa
 
Email Validation
Email ValidationEmail Validation
Email Validation
 
Regular expression examples
Regular expression examplesRegular expression examples
Regular expression examples
 

Ähnlich wie Regular Expression

Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracleLogan Palanisamy
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20Max Kleiner
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionskeeyre
 
Introduction To Regex in Lasso 8.5
Introduction To Regex in Lasso 8.5Introduction To Regex in Lasso 8.5
Introduction To Regex in Lasso 8.5bilcorry
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentationarnolambert
 
3.2 javascript regex
3.2 javascript regex3.2 javascript regex
3.2 javascript regexJalpesh Vasa
 
Don't Fear the Regex LSP15
Don't Fear the Regex LSP15Don't Fear the Regex LSP15
Don't Fear the Regex LSP15Sandy Smith
 
Don't Fear the Regex WordCamp DC 2017
Don't Fear the Regex WordCamp DC 2017Don't Fear the Regex WordCamp DC 2017
Don't Fear the Regex WordCamp DC 2017Sandy Smith
 
Php String And Regular Expressions
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressionsmussawir20
 
How to check valid Email? Find using regex.
How to check valid Email? Find using regex.How to check valid Email? Find using regex.
How to check valid Email? Find using regex.Poznań Ruby User Group
 
Python - Regular Expressions
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular ExpressionsMukesh Tekwani
 
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Sandy Smith
 
Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015Sandy Smith
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionProf. Wim Van Criekinge
 
RegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing ExamplesRegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing Exampleszeteo12
 

Ähnlich wie Regular Expression (20)

Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracle
 
Quick start reg ex
Quick start reg exQuick start reg ex
Quick start reg ex
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
A regex ekon16
A regex ekon16A regex ekon16
A regex ekon16
 
Introduction To Regex in Lasso 8.5
Introduction To Regex in Lasso 8.5Introduction To Regex in Lasso 8.5
Introduction To Regex in Lasso 8.5
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentation
 
3.2 javascript regex
3.2 javascript regex3.2 javascript regex
3.2 javascript regex
 
Don't Fear the Regex LSP15
Don't Fear the Regex LSP15Don't Fear the Regex LSP15
Don't Fear the Regex LSP15
 
Don't Fear the Regex WordCamp DC 2017
Don't Fear the Regex WordCamp DC 2017Don't Fear the Regex WordCamp DC 2017
Don't Fear the Regex WordCamp DC 2017
 
Php String And Regular Expressions
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressions
 
How to check valid Email? Find using regex.
How to check valid Email? Find using regex.How to check valid Email? Find using regex.
How to check valid Email? Find using regex.
 
Python - Regular Expressions
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular Expressions
 
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014
 
Regex lecture
Regex lectureRegex lecture
Regex lecture
 
Adv. python regular expression by Rj
Adv. python regular expression by RjAdv. python regular expression by Rj
Adv. python regular expression by Rj
 
Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
 
2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex
 
RegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing ExamplesRegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing Examples
 

Kürzlich hochgeladen

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 

Kürzlich hochgeladen (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 

Regular Expression

  • 3. TABLE OF CONTENT  1. Introduction  2. Literal Characters  3. First Look at How a Regex Engine Works  4. Character Classes or Character Sets  5. The Dot Matches (Almost) Any Character .  6. Start of String and End of String Anchors  7. Word Boundaries  8. Alternation with The Vertical Bar or Pipe Symbol  9. Optional Items  10. Repetition with Star and Plus  11. Use Round Brackets for Grouping 3
  • 4. INTRODUCTION  A regular expression (regex or regexp) is a special text string for describing a search pattern, validate data.  A regular expression “engine” is a piece of software that can process regular expressions.  There are many software applications and programming languages that support regular expressions. 4
  • 5. .NET 1.0–4.5 Java 4–8 Perl 5.8–5.18 PCRE C library = Perl Compatible Regular Expressions PHP Delphi R JavaScript VBScript XRegExp Python Ruby Tcl ARE POSIX BRE • similar to the one used by the traditional UNIX grep command • most metacharacters require a backslash to give the metacharacter a{1,2} matches a or aa POSIX ERE • similar to the one used by the UNIX egrep command • Quantifiers ?, +, {n}, {n,m} and {n,} • Backreferences • Alternation GNU BRE GNU ERE Oracle XML XPath 5 EditPrO
  • 6. INTRODUCTION(CONT.)  Advantage:  Reducing development time for Programmer  Fast executing  Ex : reg(ular expressions?|ex(p|es)?)  regular expressions  regular  regexp  regexp  regexes 6
  • 7. INTRODUCTION(CONT.)  Pattern matching an esentional problem  Many applications need to "parse" a input 1) URLs 2) Log Files: 3) XML http://first.dk/index.php?id=141&view=details 13/02/2010 66.249.65.107 get /support.html 20/02/2010 42.116.32.64 post /search.html protocol host path query-string (list of key-value pairs) <article> <title>Three Models for the...</title> <author>Noam Chomsky</author> <year>1956</year> </article> 7
  • 8. LITERAL CHARACTERS & SPECIAL CHARACTERS  Literals  A single literal character, ex : «a»  “Jack is a boy”  Some literal characters  Apply «cat» to “He captured a catfish for his cat.”  Non-Printable Characters  «t » tab character (ASCII 0x09)  «r» for carriage return (0x0D)  «n» for line feed (0x0A)  … 1-‫انطباق‬‫ای‬‫ر‬‫ب‬‫تالش‬‫ج‬‫ر‬‫توکن‬ ‫اولین‬‫کس‬ ‫شته‬‫ر‬ ‫با‬:C‫با‬H‫شکست‬ ‫و‬!! 2-‫موجود‬ ‫جکس‬‫ر‬‫از‬ ‫ی‬‫دیگر‬‫جایگشت‬ ‫شته‬‫ر‬ ‫در‬‫بعدی‬‫اکتر‬‫ر‬‫کا‬‫لذا‬‫نیست‬ 3-‫تو‬‫اغ‬‫ر‬‫س‬ ‫موفق‬‫انطباق‬ ‫مین‬‫ر‬‫چها‬ ‫در‬‫کن‬ ‫بعدی‬ 4-‫میخورد‬‫شکست‬‫انطباق‬ ‫ششمین‬‫در‬‫و‬ ‫مین‬‫ر‬‫چها‬ ‫در‬‫میشود‬‫متوجه‬‫موف‬‫ی‬ ‫بررس‬‫ق‬ ‫م‬ ‫ادامه‬ ‫اکتر‬‫ر‬‫کا‬ ‫پنجمین‬‫از‬ ‫و‬ ‫نبودده‬‫ی‬ ‫دهد‬ ∕∕ 8
  • 9. LITERAL CHARACTERS & SPECIAL CHARACTERS (CONT.)  Reserve certain characters for special use Meaning char Beginning of string ^ caret End of string $ doller sign Any character except newline . dot Match 0 or more * star Match 1 or more + plus Match 0 or 1 ? Question mark alternative | pipe symbol Grouping; ”store” ( ) parenthesis Special backslash opening square bracket [ 9
  • 10. LITERAL CHARACTERS & SPECIAL CHARACTERS (CONT.)  use any of these characters as a literal in a regex If you forget to escape a special character NOTE !  Most regular expression flavors treat the brace «{» as a literal character, unless it is part of a repetition operator like « M{1,3}».  An exception to this rule is the java.util.regex 1+1=2 literal 123+111=234 other meaning +1=2 ERROR 1+1=2 10
  • 11. LITERAL CHARACTERS & SPECIAL CHARACTERS (CONT.)  Q...E escape sequence  E.g. «Q*d+*E» matches the literal text „*d+*”. o Special Characters and Programming Languages  Compiler will turn the escaped backslash in the source code into a single backslash in the string that is passed on to the regex library  The regex «1+1=2» as “1+1=2”  The regex «c:temp» as “c:temp” compiler Regex lib 11
  • 12. FIRST LOOK AT HOW A REGEX ENGINE WORKS INTERNALL  Two kinds of regular expression engines:  text-directed- DFA  regex-directed- NFA  awk, egrep, flex, lex, MySQL are text-directed  A few of versions are regex- directed  The regex directed is more powerful. 12
  • 13. TABLE OF CONTENT  1. Introduction  2. Literal Characters  3. First Look at How a Regex Engine Works  4. Character Classes or Character Sets  5. The Dot Matches (Almost) Any Character .  6. Start of String and End of String Anchors  7. Word Boundaries  8. Alternation with The Vertical Bar or Pipe Symbol  9. Optional Items  10. Repetition with Star and Plus  11. Use Round Brackets for Grouping 13
  • 14. CHARACTER CLASSES OR CHARACTER SETS  To match only one out of several characters  «gr[ae]y» = > „ gray” or „grey” (for both American or British English )  Using a hyphen inside a character class to specify a range of characters.  [0-9a-fA-F]  Useful Application  Find a word, even if it is misspelled => «sep[ae]r[ae]te»  Find an identifier => «[A-Za-z_][A-Za-z_0-9]*»  Find a C-style hexadecimal number => «0[xX][A-Fa-f0-9]+» 14
  • 15. NEGATED CHARACTER CLASSES  [^ …]  Match any character that is not in the character class.  «q[^u]» : “a q followed by a character that is not a u”. => Iraq is a country  Negated character class still must match a character. [^abc] => fdgha  Unlike the dot, negated character classes also match line break characters. 15
  • 16. METACHARACTERS INSIDE CHARACTER CLASSES  Metacharacters inside a character class are:  The closing bracket ( ] )=>(])  The backslash ()=> []  The caret (^) => [^]  The hyphen (-)=> [-]  [+*]=[+*] -- > reducing readability  Other Solutions:  Placing them in a position where they do not take on their special meaning.  Closing bracket right after the opening bracket []x]  Caret anywhere except right after the opening bracket [x^]  Hyphen any where except middle [-x] 16
  • 17. METACHARACTERS INSIDE CHARACTER CLASSES  All non-printable characters in character classes just like outside of character classes.  E.g. [$u20AC] : dollar or euro sign  Perl and PCRE also support the Q...E sequence inside character classes  E.g. «[Q[-]E]» matches „[”, „-” or „]”.  POSIX regular expressions treat the backslash as a literal character inside character classes.  Can’t use backslashes to escape  So just use in correct position 17
  • 18. SHORTHAND CHARACTER CLASSES  Both inside and outside the square brackets are used  Ex: 1 + 2 = 3  sd=whitespace followed by a digit “ 2”  [sd]=whitespace or digit “1 2 3” Class Meaninig w Word character, [a-zA-z0-9_]. d Digit character, [0-9]. s Whitespace character, [ nrt ]. W Non-word character, [^a-zA-z0-9_] =[^w] D Non-digit character, [^0-9]=[^d] S Non-whitespace character, [^ nrft ]=[^s] 18
  • 19. NEGATED SHORTHAND CHARACTER CLASSES  «[DS]» is not the same as «[^ds]».  [^ds]= any char that is not a digit or whitespace.  123 5] ⌐(a U b )  [DS] =any char that is either not a digit, or is not whitespace.  123 5] ⌐ a U ⌐b 19
  • 20. REPEATING CHARACTER CLASSES  By using the «?», «*» or «+» operators  «[0-9]+» “833337” „222” …  For repeating the matched character, rather than the class we need “backreferences”  «([0-9])1+»  will match „222”  will match „3333” for “833337 “ 20
  • 21. LOOKING INSIDE THE REGEX ENGINE  The order inside a character class does not matter  Ex : «gr[ae]y» “Is his hair grey or gray?” 1. Failing to match “g” every 12 steps 2. „g” is matched in 13th step 3. Matching “r” token in the regex with “r” in text 4. Failing to match “a“ token with “e” 5. Try to match other permutations of the regex pattern 6. Matching the last regex token with “y” in text 7. the leftmost match was returned : grey 21
  • 22. TABLE OF CONTENT  1. Introduction  2. Literal Characters  3. First Look at How a Regex Engine Works  4. Character Classes or Character Sets  5. The Dot Matches (Almost) Any Character .  6. Start of String and End of String Anchors  7. Word Boundaries  8. Alternation with The Vertical Bar or Pipe Symbol  9. Optional Items  10. Repetition with Star and Plus  11. Use Round Brackets for Grouping 22
  • 23. THE DOT MATCHES (ALMOST) ANY CHARACTER  The most commonly misused metacharacter.  The dot will not match a newline character by default (Why)?  «[^n]» (UNIX regex flavors)  «[^rn]» (Widows regex flavors)  In Perl, the mode where the dot also matches newlines is called "single-line mode“  In .NET framework “Regex.Match("string", "regex", RegexOptions.Singleline)”  JavaScript and VBScript do not have an option to make the dot match line break characters : «[sS]» 23
  • 24. USE THE DOT SPARINGLY  The dot is a very powerful regex metacharacter  It allows you to be lazy ,Ex: mm/dd/yy format  Solutions: • dd.dd.dd 02512703 • dd[- /.]dd[- /.]dd 99/99/99 • (0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01]) [- .](19|20)dd 09/31/2079 24
  • 25. USE NEGATED CHARACTER SETS INSTEAD OF THE DOT  star is greedy  Ex: we have a problem with "string one" and "string two "  Regexp : ".*"  "string one" and "string two“  Regexp : "[^"rn]*"  "string one" "string two" 25
  • 26. TABLE OF CONTENT  1. Introduction  2. Literal Characters  3. First Look at How a Regex Engine Works  4. Character Classes or Character Sets  5. The Dot Matches (Almost) Any Character .  6. Start of String and End of String Anchors  7. Word Boundaries  8. Alternation with The Vertical Bar or Pipe Symbol  9. Optional Items  10. Repetition with Star and Plus  11. Use Round Brackets for Grouping 26
  • 27. START OF STRING AND END OF STRING ANCHORS  Literals and class characters match a character  Anchors do not match any character at all. Instead, they match a position  Caret «^» : «^a» to “abc”  Dolor sign «$» : «c$» to “abc”  Useful Application  For validating user input, using anchors is very important  if ($input =~ m/d+/) qsdf4ghjk => «^d+$» qsdf4ghjk 44467 27
  • 28. USING ^ AND $ AS START OF LINE AND END OF LINE ANCHORS  If you have a string consisting of multiple lines,Ex:  “ first linen second line”  In tools as EditPad Pro (work with entire files)  In Programming Languages  Perl : "multi-line mode“  m/^regex$/m 28
  • 29. PERMANENT START OF STRING AND END OF STRING ANCHORS  «A» : only ever matches at the start of the file  «Z» : only ever matches at the end of the file  Anchors match at a position, rather than matching a character  Anchors can result in a zero-length match.  Since the match does not include any characters, nothing is deleted in replcament  In VB.NET  Dim Quoted as String = Regex.Replace(Original, "^", "> ", RegexOptions.Multiline) 29
  • 30. TABLE OF CONTENT  1. Introduction  2. Literal Characters  3. First Look at How a Regex Engine Works  4. Character Classes or Character Sets  5. The Dot Matches (Almost) Any Character .  6. Start of String and End of String Anchors  7. Word Boundaries  8. Alternation with The Vertical Bar or Pipe Symbol  9. Optional Items  10. Repetition with Star and Plus  11. Use Round Brackets for Grouping 30
  • 31. WORD BOUNDARIES  The metacharacter «b» is an anchor like ^“ ” & “$”  This match is zero-length.  Simply put: «b» allows you to perform a “whole words only”  «b4b» matches a “4” 44 a4  2 positions :  Before the first & last word character  Between a word character and a non-word character 31
  • 32. LOOKING INSIDE THE REGEX ENGINE  Ex: «bisb» string : “This island is beautiful”.  “b” matches position before “T”  Matching the next token: the literal «i»  The engine does not advance to the next character in the string, because the previous regex token was zero-lenght, «i» does not match “T”.  «b» can not match at the position between the “T” and the “h”.  ….  POSIX does not support word boundaries at all. 32
  • 33. TABLE OF CONTENT  1. Introduction  2. Literal Characters  3. First Look at How a Regex Engine Works  4. Character Classes or Character Sets  5. The Dot Matches (Almost) Any Character .  6. Start of String and End of String Anchors  7. Word Boundaries  8. Alternation with The Vertical Bar or Pipe Symbol  9. Optional Items  10. Repetition with Star and Plus  11. Use Round Brackets for Grouping 33
  • 34. ALTERNATION WITH THE VERTICAL BAR OR PIPE SYMBOL  Similar to character classes to match a single character  Remember That The Regex Engine Is Eager  It will stop searching as soon as it finds a valid match.  RE: Get|GetValue|Set|SetValue  Str : SetValue What are solutions? 1-‫توکن‬ ‫اولین‬G‫و‬ ‫داده‬‫انطباق‬ ‫ل‬‫او‬ ‫اکتر‬‫ر‬‫کا‬‫با‬ ‫شکست‬!! 2-‫بعدی‬ ‫های‬‫گزینه‬«‫یا‬»‫شکست‬ ‫و‬!! 3-‫بعدی‬‫توکن‬S‫با‬S‫و‬ ‫داده‬ ‫ق‬‫انطبا‬‫شته‬‫ر‬‫در‬ ‫موفق‬!‫تا‬‫ادامه‬ ‫و‬«t» 4-،‫جیکس‬‫ر‬‫بودن‬ ‫مشتاق‬ ‫خاطر‬‫به‬SET ‫میشود‬‫برگردونده‬ 34
  • 35. ALTERNATION WITH THE VERTICAL BAR OR PIPE SYMBOL (CONT.)  Solutions are:  Changing the order of options  GetValue|Get|SetValue|Set  Using greedy feature of question mark ”?”  Get(Value)?|Set(Value)?  Using b  b(Get|GetValue|Set|SetValue)b  The POSIX standard mandates that the longest match be returned, regardless if the regex an NFA or DFA algorithm. 35
  • 36. TABLE OF CONTENT  1. Introduction  2. Literal Characters  3. First Look at How a Regex Engine Works  4. Character Classes or Character Sets  5. The Dot Matches (Almost) Any Character .  6. Start of String and End of String Anchors  7. Word Boundaries  8. Alternation with The Vertical Bar or Pipe Symbol  9. Optional Items  10. Repetition with Star and Plus  11. Use Round Brackets for Grouping 36
  • 37. OPTIONAL ITEMS  “?” makes the preceding token in the regular expression optional.  You can make several tokens optional by grouping them together using round brackets  Feb(ruary)? 23(rd)?  „February 23rd”, „February 23”, „Feb 23rd” , „Feb 23”.  Important Regex Concept: Greediness  The engine will always try to match that part. Only if this causes the entire regular expression to fail, will try ignoring the part the question mark applies to. 37
  • 38. LOOKING INSIDE THE REGEX ENGINE  EX: «colou?r» , Str: “The colonel likes the color green”. 1. 5th char matches successfully from “c” to “o” 2. Checking wheather “u” matches “n” and fail  Question mark : failing is accesptable. 3. Next token , fails to match “n”. 4. starts again trying to match «c» to the first o in “colonel”. 5. …. 38
  • 39. TABLE OF CONTENT  1. Introduction  2. Literal Characters  3. First Look at How a Regex Engine Works  4. Character Classes or Character Sets  5. The Dot Matches (Almost) Any Character .  6. Start of String and End of String Anchors  7. Word Boundaries  8. Alternation with The Vertical Bar or Pipe Symbol  9. Optional Items  10. Repetition with Star and Plus  11. Use Round Brackets for Grouping 39
  • 40. REPETITION WITH STAR AND PLUS  Valid HTML tag  «<[A-Za-z][A-Za-z0-9]*>»  «<[A-Za-z0-9]+>» „<1>” Class Meaninig * 0 or more + 1 or more ? 0 or 1 {3} Exactly 3 {3,} 3 or more {3,5} 3, 4 or 5 Add a ? to a quantifier to make it ungreedy. 40
  • 41. WATCH OUT FOR THE GREEDINESS!  EX: Matching HTML tag  This is a <EM>first</EM> test  <.+> 1. The first token in the regex is «<». 2. The next token is the dot, which matches any character except newlines 3. The dot is repeated by the plus. The plus is greedy 4. The dot fails when the engine has reached the void after the end of the string. 5. Engine continue with the next token «>» &can not match 6. The engine remembers that the plus has repeated the dot more often than is required so it backtrack 7. It is reduced to „EM>first</EM> tes” and next token in the regex is still «>» 8. It will continue for the first valid match (eager) 41
  • 42. LAZINESS INSTEAD OF GREEDINESS  Lazy quantifiers are sometimes also called “ungreedy”  This is a <EM>first</EM> test  “<.+?>” 1. «<» matches the first „<” in the string 2. The next token is the dot, this time repeated by a lazy plus  This tells the regex engine to repeat the dot as few times as possible (MIN=1)  Matches “.” With “E” 3. Matches “>” with “M” and fails • But this time, the backtracking will force the lazy plus to expand 4. Return <EM> </EM> 42
  • 43. LOOKING INSIDE THE REGEX ENGINE  Ex: <([A-Z][A-Z0-9]*)[^>]*>.*?</1>  Str: “Testing <B><I>bold italic</I></B> text” 1. Matching at the first „<” 2. «[A-Z]» matches „B” & advances to «[A-Z0-9]» and “>” 3. This match fails. However, because of the star, that’s perfectly fine 4. Storing what was matched inside them, „B” is stored 5. The regex is advanced to «[^>]» & string remains at “>” & go to 3 6. Matching “>” with “>” 7. The next token is a dot, repeated by a lazy star 43
  • 44. AN ALTERNATIVE TO LAZINESS  An option for making the plus lazy instead of backtracking  Greedy plus and a negated character class  <EM>first</EM>  «<[^>]+>»  Backtracking slows down the regex engine  you will save plenty of CPU cycles when using such a regex 44
  • 45. REPEATING Q...E ESCAPE SEQUENCE  «Q*d+*E+»  In Perl : “*d+**d+*”  In java : “*d+**d+*”  If you want Java to return the same match as Perl  «Q*d+E*+»  If you want Perl to repeat the whole sequence like Java does  «(Q*d+*E)+» 45
  • 46. TABLE OF CONTENT  1. Introduction  2. Literal Characters  3. First Look at How a Regex Engine Works  4. Character Classes or Character Sets  5. The Dot Matches (Almost) Any Character .  6. Start of String and End of String Anchors  7. Word Boundaries  8. Alternation with The Vertical Bar or Pipe Symbol  9. Optional Items  10. Repetition with Star and Plus  11. Use Round Brackets for Grouping 46
  • 47. USE ROUND BRACKETS FOR GROUPING  Grouping the part of the regular expression together for applying a regex operator  Creating a Backreference  reuses part of the regex match  slows down the regex engine  Optimize this regular expression into «Set(?:Value)?  How to Use Backreferences  abc5abc  )[abc])+51$ 1=> a b c - > abbc5c  <div>hello</div>  <([a-z]*)>.*</1> 47
  • 48. REPETITION AND BACKREFERENCES  ([abc]+)» & «([abc])+» to “cab” string  ([abc]+)» : “cab” to be referenced  «([abc])+» : “b” to be referenced 48
  • 49. USE ROUND BRACKETS FOR GROUPING (CONT.)  Reusing the same backreference more than once.  ([a-c])x1x1» „axaxa” „bxbxb” „cxcxc”  Backreferences Cannot be used inside itself.  ([abc]1)  Round brackets Cannot be used inside character classes, as metacharacters.  (a)[1b]  Useful Example: Checking for Doubled Words  «b(w+)s+1b» 49
  • 50. POSIX CLASS Posix ASCII ‫توضیح‬ [:alnum:] [A-Za-z0-9] ‫همه‬‫وعددی‬ ‫حرفی‬ ‫های‬‫اکتر‬‫ر‬‫کا‬ [:alpha:] [A-Za-z] ‫و‬‫بزرگ‬ ‫حروف‬‫کوچک‬ [:blank:] [t] ‫فاصله‬‫تب‬ ‫و‬ [:digit:] [0-9] ‫اعداد‬ [:punct:] [?<=>;:/.,+*()' &%$#”!-] @[]^_`{|}~] ‫عالیم‬‫ی‬ ‫ش‬‫ر‬‫نگا‬ 50
  • 51. ٍEXAMPLE 1  Example 1. Beginning of line ( ^ )  grep "^Nov 10" messages.1  Example 2. End of the line ( $)  grep "terminating.$" messages . Nov 10 01:12:55 gs123 ntpd[2241]: time reset +0.177479 s Nov 10 01:17:17 gs123 ntpd[2241]: synchronized to OCAL(0) Nov 10 01:18:49 gs123 ntpd[2241]: synchronized to 15.1.13.13 Jul 12 17:01:09 cloneme kernel: Kernel log daemon terminating. Oct 28 06:29:54 cloneme kernel: Kernel log daemon terminating 51
  • 52. EXAMPLE 2  Example 3. quantifier (*)(+)(?)  [hc]*at =cchat,hcat,hhhat,at 0 or more  [hc]+at= ccchat, hcat, No at 1 or more  [hc]?at= hat, cat, at 0 or 1  Example 4.Escaping the special character ()  grep "127.0.0.1" /var/log/messages.4 Oct 28 06:31:10 btovm871 ntpd[2241]: Listening on interface lo, 127.0.0.1#123 Enabled 52
  • 53. ٍEXAMPLE 3  Example 5.Excluding specific characters ‫الف‬:  Match text hog  Match text dog ---- > [^b]og  Skip Text bog ‫ب‬:‫غیر‬ ‫ی‬‫اکتر‬‫ر‬‫کا‬a‫یا‬b‫یا‬c [^abc] abccc adb gh 53
  • 54. EXAMPLE 4  Example 6. Composite syntax ‫است‬ ‫یر‬‫ز‬ ‫شرح‬ ‫به‬ ‫الگ‬‫اطالعات‬:  Sun Jun 4 22:08:39 2006 [pid 21611] [dcid] OK LOGIN: Client “192.168.1.1”  ^w+sw+sd+ S+ d+ [pid d+]s [(w+)] OK LOGIN: Client “(d+.d+.d+.d+)”$ 54

Hinweis der Redaktion

  1. Portable Operating System Interface for uniX
  2. When applying «cat» to “He captured a catfish for his cat.”, the engine will try to match the first token in the regex «c» to the first character in the match “H”. This fails. There are no other possible permutations of this regex, because it merely consists of a sequence of literal characters. So the regex engine tries to match the «c» with the “e”. This fails too, as does matching the «c» with the space. Arriving at the 4th character in the match, «c» matches „c”. The engine will then try to match the second token «a» to the 5th character, „a”. This succeeds too. But then, «t» fails to match “p”. At that point, the engine knows the regex cannot be matched starting at the 4th character in the match. So it will continue with the 5th: “a”. Again, «c» fails to match here and the engine carries on. At the 15th character in the match, «c» again matches „c”. The engine then proceeds to attempt to match the remainder of the regex at character 15 and finds that «a» matches „a” and «t» matches „t”. The entire regular expression could be matched starting at character 15. The engine is "eager" to report a match. It will therefore report the first three letters of catfish as a valid match. The engine never proceeds beyond this point to see if there are any “better” matches
  3. If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash. If you want to match „1+1=2”, the correct regex is «1+1=2». Otherwise, the plus sign will have a special meaning. Note that «1+1=2», with the backslash omitted, is a valid regex. So you will not get an error message. But it will not match “1+1=2”. It would match „111=2” in “123+111=234”, due to the special meaning of the plus character. If you forget to escape a special character where its use is not allowed, such as in «+1», then you will get an error message.
  4. on POSIX bracket expressions for more information.
  5. ایزارهای قدیمی خط به خط یک فایل را میخواندند و ریجگس را به آن اعمال میکردند لذا خط جدید متج نمیشد ولی بعدا ابزارهای مدرن کل فایل
  6. In the date-matching example, we improved our regex by replacing the dot with a character class. Here, we will do the same. Our original definition of a double-quoted string was faulty. We do not want any number of any character between the quotes. We want any number of characters that are not double quotes or newlines between the quotes. So the proper regex is «"[^" ]*"».
  7. Thus far, I have explained literal characters and character classes. In both cases, putting one in a regex will cause the regex engine to try to match a single character. Anchors are a different breed. They do not match any character at all. Instead, they match a position before, after or between characters. They can be used to “anchor” the regex match at a certain position. The caret «^» matches the position before the first character in the string. Applying «^a» to “abc” matches „a”. «^b» will not match “abc” at all, because the «b» cannot be matched right after the start of the string, matched by «^». See below for the inside view of the regex engine. Similarly, «$» matches right after the last character in the string. «c$» matches „c” in “abc”, while «a$» does not match at all.