SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Introduction to
Boost.Regex
Yongqiang Li
Boost Libs
• Boost libraries are intended to be widely useful, and usable across
a broad spectrum of applications.
• Boost works on almost any modern operating system, including
UNIX and Windows variants.
• Latest version is 1.34.1 .
• Boost.Regex is a C++ library which can be used to parse the text or
strings and decide whether they match the regular expression we
defined.
• Boost.Regex was written by Dr. John Maddock.
Installation
• Step 1: Download boost_1_34_1.zip
http://sourceforge.net/project/showfiles.php?group_id=7586
• Step 2: Unzip the files to proper directory.
• Step 3: Use “Visual Studio .NET 2003 Command Prompt” to
open a command line window.
• Step 4: Go the %BOOST%/libs/regex/build
• Step 5: Compile and install the lib
• nmake –fvc71.mak
• namke –fvc71.mak install
• Step 6: Add include directory to VStudio.
• Note:
• If you want to have the feature of getting “repeated captures”,
you should uncomment BOOST_REGEX_MATCH_EXTRA in
boost/regex/user.hpp before compile.
• If the version you download is 1.34.1, you may change the
filename of libs after install. The filename should be
“***34_1.lib”, not “***34.lib”. Default lib directory of VC is
“partition_you_install/Program Files/Microsoft Visual Studio .NET
2003/Vc7/lib”
Main classes and typedefs
• boost::base_regex
• It stores a regular expression.
• It is very closely modeled on std::string.
• typedef basic_regex<char> regex;
• typedef basic_regex<wchar_t> wregex;
• boost::match_results
• It stores the matching result.
• typedef match_results<const char*> cmatch;
• typedef match_results<const wchar_t*> wcmatch;
• typedef match_results<string::const_iterator> smatch;
• typedef match_results<wstring::const_iterator> wsmatch;
Note: all of them are included in <boost/regex.hpp>.
• boost::regex_iterator
typedef regex_iterator<const char*> cregex_iterator;
typedef regex_iterator<std::string::const_iterator> sregex_iterator;
typedef regex_iterator<const wchar_t*> wcregex_iterator;
typedef regex_iterator<std::wstring::const_iterator>
wsregex_iterator;
• boost::regex_token_iterator
typedef regex_token_iterator<const char*> cregex_token_iterator;
typedef regex_token_iterator<std::string::const_iterator>
sregex_token_iterator;
typedef regex_token_iterator<const wchar_t*>
wcregex_token_iterator;
typedef regex_token_iterator<<std::wstring::const_iterator>
wsregex_token_iterator;
How to define a regular
expression?
• boost::basic_regex constructor:
explicit basic_regex(const basic_string<charT, ST, SA>& p, flag_type
f = regex_constants::normal);
• Example:
boost::regex ip_re("^(d{1,2}|1dd|2[0-4]d|25[0-5])."
"(d{1,2}|1dd|2[0-4]d|25[0-5])."
"(d{1,2}|1dd|2[0-4]d|25[0-5])."
"(d{1,2}|1dd|2[0-4]d|25[0-5])$");
boost::regex credit_re(“(d{4}[- ]){3}d{4}”);
• Boost.regex supports many different ways to interprete the
regular expression string. Type syntax_option_type is an
implementation specific bitmask type that controls the
method we want to use, for example:
static const syntax_option_type normal;
static const syntax_option_type ECMAScript = normal; static
const syntax_option_type JavaScript = normal; static const
syntax_option_type JScript = normal;
static const syntax_option_type perl = normal;
static const syntax_option_type basic;
static const syntax_option_type sed = basic;
…
How to do the match?
• bool boost::regex_match(…)
template <class BidirectionalIterator, class Allocator, class charT,
class traits>
bool regex_match(
BidirectionalIterator first,
BidirectionalIterator last,
match_results<BidirectionalIterator, Allocator>& m, const
basic_regex <charT, traits>& e,
match_flag_type flags = match_default);
• What to give:
• What to be matched (strings, char*, or the range)
• Where the result to be put(cmatch, smatch)
• The RE defined(regex, wregex)
• How the expression is matched(some match flags)
• Note that regex_match’s result is true only if the expression
matches the whole of the input sequence. If you want to
search for an expression somewhere within the sequence
then use regex_search.
• Sample:
std::string credit_num(“1111-2222-3333-4444”);
boost::regex credit_re(“(d{4}[- ]){3}d{4}”);
boost::smatch what;
…
if (regex_match(credit_num, what, credit_re,
boost::match_default)
…
else
…
Understanding Captures
• Captures are the iterator ranges that are "captured" by
marked sub-expressions as a regular expression gets matched.
• Each marked sub-expression can result in more than one
capture, if it is matched more than once.
Marked sub-expression
• Every time a Perl regular expression contains a parenthesis
group (), it spits out an extra field, known as a marked sub-
expression, for example the expression:
(w+)W+(w+)
$1 $2
$&
^(d{1,2}|1dd|2[0-4]d|25[0-
5]).
(d{1,2}|1dd|2[0-4]d|25[0-
5]).
(d{1,2}|1dd|2[0-4]d|25[0-
5]).
(d{1,2}|1dd|2[0-4]d|25[0-5])
$);
$1
$2
$3
$4
• So if the above expression is searched for within "@abc def--“
Perl Boost.Regex Text found
$` m.prefix() “@”
$& m[0] “abc def”
$1 m[1] “abc”
$2 m[2] “def”
$’ m.suffix() “--”
• When a regular expression match is found there is no need for
all of the marked sub-expressions to have participated in the
match, for example the expression:
(abc)|(def)
can match either $1 or $2, but never both at the same time.
Unmatched Sub-Expressions
• When a marked sub-expression is repeated, then the sub-expression
gets "captured" multiple times, however normally only the final
capture is available, for example if
(?:(w+)W+)+
is matched against
one fine day
Then $1 will contain the string "day", and all the previous captures
will have been forgotten.
Repeated CapturesRepeated Captures
What can we get from
match_result?
• If the function “regex_match” returns true,
Element Value
what.size() e.mark_count()
what.empty() false
what.prefix().first first
what.prefix().last first
what.prefix().matched false
what.suffix().first last
m.suffix().last last
m.suffix().matched false
m[0].first first
m[0].second last
m[0].matched
true if a full match was found,
and false if it was a partial
match.
m[n].first
For all integers n < m.size(), the
start of the sequence that
matched sub-expression n.
Alternatively, if sub-expression n
did not participate in the match,
then last.
m[n].second
For all integers n < m.size(), the
end of the sequence that matched
sub-expression n. Alternatively, if
sub-expression n did not
participate in the match, then last.
m[n].matched
For all integers n < m.size(), true
if sub-expression n participated in
the match, false otherwise.
• Note: If the function returns false, then the effect on
parameter what is undefined.
• Example:
• Method
• Use for loop
What about repeated
captures?
• Unfortunately enabling this feature has an impact on
performance (even if you don't use it), and a much bigger
impact if you do use it, therefore to use this feature you need
to:
• Define BOOST_REGEX_MATCH_EXTRA for all translation units
including the library source (the best way to do this is to
uncomment this define in boost/regex/user.hpp and then rebuild
everything.
• Pass the match_extra flag to the particular algorithms where you
actually need the captures information (regex_search,
regex_match, or regex_iterator).
• Example:
boost::regex e("^(?:(w+)|(?>W+))*$“);
std::string text("now is the time for all good men to come to the aid
of the party“);
…
if(boost::regex_match(text, what, e, boost::match_extra))
//do some to get all captures information
else
…
• Method
How many
repeated
captures
Get them out!
Other match flags…
• There are many match flags which control how a regular
expression is matched against a character sequence.
• Take someone for example:
Element Effect if set
match_not_bob
Specifies that the expressions "A" and
"`" should not match against the sub-
sequence [first,first).
match_not_eob
Specifies that the expressions "'", "z"
and "Z" should not match against the
sub-sequence [last,last).
match_not_null
Specifies that the expression can not be
matched against an empty sequence.
Partial Matches
• The match-flag match_partial can be passed to the following
algorithms: regex_match, regex_search, and used with the
iterator regex_iterator.
• When used it indicates that partial as well as full matches
should be found. A partial match is one that matched one or
more characters at the end of the text input, but did not
match all of the regular expression.
• Partial matches are typically used when either validating data
input , or when searching texts that are either too long to load
into memory.
• We can use match_normal | match_partial.
Resul
t
M[0].matche
d
M[0].first M[0].second
No Match False undefined Undefined Undefined
Partial
match
True False
Start of
partial
match
End of partial
match
Full match True True
Start of full
match
End of full
match
Others…
• bool boost::regex_search(…)
template <class BidirectionalIterator, class Allocator, class
charT, class traits>
bool regex_search(
BidirectionalIterator first,
BidirectionalIterator last,
match_results<BidirectionalIterator, Allocator>& m,
const basic_regex<charT, traits>& e,
match_flag_type flags = match_default);
It’s almost the same with regex_match(). The difference is
regex_search don’t not require the expression matches the
whole of the input sequence, like this:
std::string regstr = "(d+)";
boost::regex expression(regstr);
std::string testString = "192.168.4.1";
boost::smatch what;
if( boost::regex_search(testString, expression) )
{
std::cout<< "Have digit" << std::endl;
}
• std::string regstr = "(d+)";
boost::regex expression(regstr);
std::string testString = "192.168.4.1";
boost::smatch what;
std::string::const_iterator start = testString.begin();
std::string::const_iterator end = testString.end();
while( boost::regex_search(start, end, what, expression) )
{
std::cout<< "Have digit : " ;
std::string msg(what[1].first, what[1].second);
std::cout<< msg.c_str() << std::endl;
start = what[0].second;
}
• boost::regex_replace()
The algorithm regex_replace searches through a string finding
all the matches to the regular expression: for each match it
then calls match_results::format to format the string and
sends the result to the output iterator.
template <class OutputIterator, class BidirectionalIterator, class traits,
class charT>
OutputIterator regex_replace(OutputIterator out,
BidirectionalIterator first,
BidirectionalIterator last,
const basic_regex<charT, traits>& e,
const basic_string<charT>& fmt,
match_flag_type flags = match_default);
Example:
static const boost::regex e("A(d{4})[- ]?(d{4})[- ]?(d{4})[- ]?
(d{4})z");
const std::string machine_format("1234");
const std::string human_format("1-2-3-4");
…
std::string machine_readable_card_number(const std::string& s) { return
boost::regex_replace(s, e, machine_format, boost::match_default |
boost::format_sed); }
std::string human_readable_card_number(const std::string& s) {
return boost::regex_replace(s, e, human_format,
boost::match_default | boost::format_sed); }
• Result:
• string s[4] = { "0000111122223333",
"0000 1111 2222 3333" };
machine_format:
0000111122223333
0000111122223333
human_format:
0000-1111-2222-3333
0000-1111-2222-3333
• boost::regex_iterator
The iterator type regex_iterator will enumerate all of the
regular expression matches found in some sequence:
dereferencing a regex_iterator yields a reference to
a match_results object.
• Example:
…
boost::sregex_iterator m1(text.begin(), text.end(),
expression);
boost::sregex_iterator m2;
std::for_each(m1, m2, &regex_callback);
…
• boost::regex_token_iterator
The template class regex_token_iterator is an iterator
adapter; that is to say it represents a new view of an existing
iterator sequence, by enumerating all the occurrences of a
regular expression within that sequence, and presenting one
or more character sequence for each match found.
• regex_token_iterator is almost like regex_iterator, but it can
be used to list every sequence that doesn’t match the regular
expression.
• Example 1:
boost::regex re("s+");
boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
boost::sregex_token_iterator j; unsigned count = 0;
while(i != j)
{
cout << *i++ << endl;
count++;
}
• Example 2:
boost::regex e("<s*As+[^>]*hrefs*=s*"([^"]*)"",
boost::regex::normal | boost::regbase::icase);
…
const int subs[] = {1, 0,};
boost::sregex_token_iterator i(s.begin(), s.end(), e, subs);
boost::sregex_token_iterator j;
while(i != j)
{
std::cout << *i++ << std::endl;
}
What’s more?
• Thread Safety
• Performance
References
• http://www.boost.org
• Beyond the C++ Standard Library: An Introduction to Boost -- Library 5.2 U
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

RubyMiniGuide-v1.0_0
RubyMiniGuide-v1.0_0RubyMiniGuide-v1.0_0
RubyMiniGuide-v1.0_0
tutorialsruby
 

Was ist angesagt? (19)

Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
 
Functions, List and String methods
Functions, List and String methodsFunctions, List and String methods
Functions, List and String methods
 
RubyMiniGuide-v1.0_0
RubyMiniGuide-v1.0_0RubyMiniGuide-v1.0_0
RubyMiniGuide-v1.0_0
 
Erlang kickstart
Erlang kickstartErlang kickstart
Erlang kickstart
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python
 
Strings in Python
Strings in PythonStrings in Python
Strings in Python
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way PresentationLearn Python The Hard Way Presentation
Learn Python The Hard Way Presentation
 
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
 
python.ppt
python.pptpython.ppt
python.ppt
 
DEFUN 2008 - Real World Haskell
DEFUN 2008 - Real World HaskellDEFUN 2008 - Real World Haskell
DEFUN 2008 - Real World Haskell
 
Python 101++: Let's Get Down to Business!
Python 101++: Let's Get Down to Business!Python 101++: Let's Get Down to Business!
Python 101++: Let's Get Down to Business!
 
Python Basics
Python BasicsPython Basics
Python Basics
 
Python basics
Python basicsPython basics
Python basics
 
Regular Expressions in PHP
Regular Expressions in PHPRegular Expressions in PHP
Regular Expressions in PHP
 
BayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore HaskellBayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore Haskell
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20
 
Python revision tour i
Python revision tour iPython revision tour i
Python revision tour i
 
Intro to Functions Python
Intro to Functions PythonIntro to Functions Python
Intro to Functions Python
 
Python strings
Python stringsPython strings
Python strings
 

Ähnlich wie Introduction to Boost regex

Regular expressions
Regular expressionsRegular expressions
Regular expressions
Raghu nath
 
Using Rhino Mocks for Effective Unit Testing
Using Rhino Mocks for Effective Unit TestingUsing Rhino Mocks for Effective Unit Testing
Using Rhino Mocks for Effective Unit Testing
Mike Clement
 
PYTHON -Chapter 2 - Functions, Exception, Modules and Files -MAULIK BOR...
PYTHON -Chapter 2 - Functions,   Exception, Modules  and    Files -MAULIK BOR...PYTHON -Chapter 2 - Functions,   Exception, Modules  and    Files -MAULIK BOR...
PYTHON -Chapter 2 - Functions, Exception, Modules and Files -MAULIK BOR...
Maulik Borsaniya
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
Kushaal Singla
 
Csharp4 strings and_regular_expressions
Csharp4 strings and_regular_expressionsCsharp4 strings and_regular_expressions
Csharp4 strings and_regular_expressions
Abed Bukhari
 
Introduction to Intermediate Java
Introduction to Intermediate JavaIntroduction to Intermediate Java
Introduction to Intermediate Java
Philip Johnson
 

Ähnlich wie Introduction to Boost regex (20)

Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
lab4_php
lab4_phplab4_php
lab4_php
 
lab4_php
lab4_phplab4_php
lab4_php
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
2.regular expressions
2.regular expressions2.regular expressions
2.regular expressions
 
Using Rhino Mocks for Effective Unit Testing
Using Rhino Mocks for Effective Unit TestingUsing Rhino Mocks for Effective Unit Testing
Using Rhino Mocks for Effective Unit Testing
 
PYTHON -Chapter 2 - Functions, Exception, Modules and Files -MAULIK BOR...
PYTHON -Chapter 2 - Functions,   Exception, Modules  and    Files -MAULIK BOR...PYTHON -Chapter 2 - Functions,   Exception, Modules  and    Files -MAULIK BOR...
PYTHON -Chapter 2 - Functions, Exception, Modules and Files -MAULIK BOR...
 
Modern C++
Modern C++Modern C++
Modern C++
 
php&mysql with Ethical Hacking
php&mysql with Ethical Hackingphp&mysql with Ethical Hacking
php&mysql with Ethical Hacking
 
BITM3730 10-17.pptx
BITM3730 10-17.pptxBITM3730 10-17.pptx
BITM3730 10-17.pptx
 
Php
PhpPhp
Php
 
Generic Programming
Generic ProgrammingGeneric Programming
Generic Programming
 
C interview-questions-techpreparation
C interview-questions-techpreparationC interview-questions-techpreparation
C interview-questions-techpreparation
 
7986-lect 7.pdf
7986-lect 7.pdf7986-lect 7.pdf
7986-lect 7.pdf
 
Csharp4 strings and_regular_expressions
Csharp4 strings and_regular_expressionsCsharp4 strings and_regular_expressions
Csharp4 strings and_regular_expressions
 
JavaScript.pptx
JavaScript.pptxJavaScript.pptx
JavaScript.pptx
 
Introduction to Intermediate Java
Introduction to Intermediate JavaIntroduction to Intermediate Java
Introduction to Intermediate Java
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 

Mehr von Yongqiang Li (8)

Why Kotlin?
Why Kotlin?Why Kotlin?
Why Kotlin?
 
How to Recognize Henry's Face
How to Recognize Henry's FaceHow to Recognize Henry's Face
How to Recognize Henry's Face
 
Let's talk about java class loader
Let's talk about java class loaderLet's talk about java class loader
Let's talk about java class loader
 
Brief introduction to domain-driven design
Brief introduction to domain-driven designBrief introduction to domain-driven design
Brief introduction to domain-driven design
 
Let's talk about java class file
Let's talk about java class fileLet's talk about java class file
Let's talk about java class file
 
Eclipse GEF (Part I)
Eclipse GEF (Part I)Eclipse GEF (Part I)
Eclipse GEF (Part I)
 
Garbage Collection of Java VM
Garbage Collection of Java VMGarbage Collection of Java VM
Garbage Collection of Java VM
 
Let's talk about jni
Let's talk about jniLet's talk about jni
Let's talk about jni
 

Kürzlich hochgeladen

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 

Kürzlich hochgeladen (20)

WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 

Introduction to Boost regex

  • 2. Boost Libs • Boost libraries are intended to be widely useful, and usable across a broad spectrum of applications. • Boost works on almost any modern operating system, including UNIX and Windows variants. • Latest version is 1.34.1 . • Boost.Regex is a C++ library which can be used to parse the text or strings and decide whether they match the regular expression we defined. • Boost.Regex was written by Dr. John Maddock.
  • 3. Installation • Step 1: Download boost_1_34_1.zip http://sourceforge.net/project/showfiles.php?group_id=7586 • Step 2: Unzip the files to proper directory. • Step 3: Use “Visual Studio .NET 2003 Command Prompt” to open a command line window. • Step 4: Go the %BOOST%/libs/regex/build • Step 5: Compile and install the lib • nmake –fvc71.mak • namke –fvc71.mak install • Step 6: Add include directory to VStudio.
  • 4. • Note: • If you want to have the feature of getting “repeated captures”, you should uncomment BOOST_REGEX_MATCH_EXTRA in boost/regex/user.hpp before compile. • If the version you download is 1.34.1, you may change the filename of libs after install. The filename should be “***34_1.lib”, not “***34.lib”. Default lib directory of VC is “partition_you_install/Program Files/Microsoft Visual Studio .NET 2003/Vc7/lib”
  • 5. Main classes and typedefs • boost::base_regex • It stores a regular expression. • It is very closely modeled on std::string. • typedef basic_regex<char> regex; • typedef basic_regex<wchar_t> wregex; • boost::match_results • It stores the matching result. • typedef match_results<const char*> cmatch; • typedef match_results<const wchar_t*> wcmatch; • typedef match_results<string::const_iterator> smatch; • typedef match_results<wstring::const_iterator> wsmatch; Note: all of them are included in <boost/regex.hpp>.
  • 6. • boost::regex_iterator typedef regex_iterator<const char*> cregex_iterator; typedef regex_iterator<std::string::const_iterator> sregex_iterator; typedef regex_iterator<const wchar_t*> wcregex_iterator; typedef regex_iterator<std::wstring::const_iterator> wsregex_iterator; • boost::regex_token_iterator typedef regex_token_iterator<const char*> cregex_token_iterator; typedef regex_token_iterator<std::string::const_iterator> sregex_token_iterator; typedef regex_token_iterator<const wchar_t*> wcregex_token_iterator; typedef regex_token_iterator<<std::wstring::const_iterator> wsregex_token_iterator;
  • 7. How to define a regular expression? • boost::basic_regex constructor: explicit basic_regex(const basic_string<charT, ST, SA>& p, flag_type f = regex_constants::normal); • Example: boost::regex ip_re("^(d{1,2}|1dd|2[0-4]d|25[0-5])." "(d{1,2}|1dd|2[0-4]d|25[0-5])." "(d{1,2}|1dd|2[0-4]d|25[0-5])." "(d{1,2}|1dd|2[0-4]d|25[0-5])$"); boost::regex credit_re(“(d{4}[- ]){3}d{4}”);
  • 8. • Boost.regex supports many different ways to interprete the regular expression string. Type syntax_option_type is an implementation specific bitmask type that controls the method we want to use, for example: static const syntax_option_type normal; static const syntax_option_type ECMAScript = normal; static const syntax_option_type JavaScript = normal; static const syntax_option_type JScript = normal; static const syntax_option_type perl = normal; static const syntax_option_type basic; static const syntax_option_type sed = basic; …
  • 9. How to do the match? • bool boost::regex_match(…) template <class BidirectionalIterator, class Allocator, class charT, class traits> bool regex_match( BidirectionalIterator first, BidirectionalIterator last, match_results<BidirectionalIterator, Allocator>& m, const basic_regex <charT, traits>& e, match_flag_type flags = match_default);
  • 10. • What to give: • What to be matched (strings, char*, or the range) • Where the result to be put(cmatch, smatch) • The RE defined(regex, wregex) • How the expression is matched(some match flags) • Note that regex_match’s result is true only if the expression matches the whole of the input sequence. If you want to search for an expression somewhere within the sequence then use regex_search.
  • 11. • Sample: std::string credit_num(“1111-2222-3333-4444”); boost::regex credit_re(“(d{4}[- ]){3}d{4}”); boost::smatch what; … if (regex_match(credit_num, what, credit_re, boost::match_default) … else …
  • 12. Understanding Captures • Captures are the iterator ranges that are "captured" by marked sub-expressions as a regular expression gets matched. • Each marked sub-expression can result in more than one capture, if it is matched more than once.
  • 13. Marked sub-expression • Every time a Perl regular expression contains a parenthesis group (), it spits out an extra field, known as a marked sub- expression, for example the expression: (w+)W+(w+) $1 $2 $&
  • 15. • So if the above expression is searched for within "@abc def--“ Perl Boost.Regex Text found $` m.prefix() “@” $& m[0] “abc def” $1 m[1] “abc” $2 m[2] “def” $’ m.suffix() “--”
  • 16. • When a regular expression match is found there is no need for all of the marked sub-expressions to have participated in the match, for example the expression: (abc)|(def) can match either $1 or $2, but never both at the same time. Unmatched Sub-Expressions
  • 17. • When a marked sub-expression is repeated, then the sub-expression gets "captured" multiple times, however normally only the final capture is available, for example if (?:(w+)W+)+ is matched against one fine day Then $1 will contain the string "day", and all the previous captures will have been forgotten. Repeated CapturesRepeated Captures
  • 18. What can we get from match_result? • If the function “regex_match” returns true, Element Value what.size() e.mark_count() what.empty() false what.prefix().first first what.prefix().last first what.prefix().matched false what.suffix().first last
  • 19. m.suffix().last last m.suffix().matched false m[0].first first m[0].second last m[0].matched true if a full match was found, and false if it was a partial match.
  • 20. m[n].first For all integers n < m.size(), the start of the sequence that matched sub-expression n. Alternatively, if sub-expression n did not participate in the match, then last. m[n].second For all integers n < m.size(), the end of the sequence that matched sub-expression n. Alternatively, if sub-expression n did not participate in the match, then last. m[n].matched For all integers n < m.size(), true if sub-expression n participated in the match, false otherwise.
  • 21. • Note: If the function returns false, then the effect on parameter what is undefined. • Example:
  • 22. • Method • Use for loop
  • 23. What about repeated captures? • Unfortunately enabling this feature has an impact on performance (even if you don't use it), and a much bigger impact if you do use it, therefore to use this feature you need to: • Define BOOST_REGEX_MATCH_EXTRA for all translation units including the library source (the best way to do this is to uncomment this define in boost/regex/user.hpp and then rebuild everything. • Pass the match_extra flag to the particular algorithms where you actually need the captures information (regex_search, regex_match, or regex_iterator).
  • 24. • Example: boost::regex e("^(?:(w+)|(?>W+))*$“); std::string text("now is the time for all good men to come to the aid of the party“); … if(boost::regex_match(text, what, e, boost::match_extra)) //do some to get all captures information else …
  • 26.
  • 27. Other match flags… • There are many match flags which control how a regular expression is matched against a character sequence. • Take someone for example: Element Effect if set match_not_bob Specifies that the expressions "A" and "`" should not match against the sub- sequence [first,first). match_not_eob Specifies that the expressions "'", "z" and "Z" should not match against the sub-sequence [last,last). match_not_null Specifies that the expression can not be matched against an empty sequence.
  • 28. Partial Matches • The match-flag match_partial can be passed to the following algorithms: regex_match, regex_search, and used with the iterator regex_iterator. • When used it indicates that partial as well as full matches should be found. A partial match is one that matched one or more characters at the end of the text input, but did not match all of the regular expression. • Partial matches are typically used when either validating data input , or when searching texts that are either too long to load into memory. • We can use match_normal | match_partial.
  • 29. Resul t M[0].matche d M[0].first M[0].second No Match False undefined Undefined Undefined Partial match True False Start of partial match End of partial match Full match True True Start of full match End of full match
  • 30. Others… • bool boost::regex_search(…) template <class BidirectionalIterator, class Allocator, class charT, class traits> bool regex_search( BidirectionalIterator first, BidirectionalIterator last, match_results<BidirectionalIterator, Allocator>& m, const basic_regex<charT, traits>& e, match_flag_type flags = match_default);
  • 31. It’s almost the same with regex_match(). The difference is regex_search don’t not require the expression matches the whole of the input sequence, like this: std::string regstr = "(d+)"; boost::regex expression(regstr); std::string testString = "192.168.4.1"; boost::smatch what; if( boost::regex_search(testString, expression) ) { std::cout<< "Have digit" << std::endl; }
  • 32. • std::string regstr = "(d+)"; boost::regex expression(regstr); std::string testString = "192.168.4.1"; boost::smatch what; std::string::const_iterator start = testString.begin(); std::string::const_iterator end = testString.end(); while( boost::regex_search(start, end, what, expression) ) { std::cout<< "Have digit : " ; std::string msg(what[1].first, what[1].second); std::cout<< msg.c_str() << std::endl; start = what[0].second; }
  • 33. • boost::regex_replace() The algorithm regex_replace searches through a string finding all the matches to the regular expression: for each match it then calls match_results::format to format the string and sends the result to the output iterator. template <class OutputIterator, class BidirectionalIterator, class traits, class charT> OutputIterator regex_replace(OutputIterator out, BidirectionalIterator first, BidirectionalIterator last, const basic_regex<charT, traits>& e, const basic_string<charT>& fmt, match_flag_type flags = match_default);
  • 34. Example: static const boost::regex e("A(d{4})[- ]?(d{4})[- ]?(d{4})[- ]? (d{4})z"); const std::string machine_format("1234"); const std::string human_format("1-2-3-4"); … std::string machine_readable_card_number(const std::string& s) { return boost::regex_replace(s, e, machine_format, boost::match_default | boost::format_sed); } std::string human_readable_card_number(const std::string& s) { return boost::regex_replace(s, e, human_format, boost::match_default | boost::format_sed); }
  • 35. • Result: • string s[4] = { "0000111122223333", "0000 1111 2222 3333" }; machine_format: 0000111122223333 0000111122223333 human_format: 0000-1111-2222-3333 0000-1111-2222-3333
  • 36. • boost::regex_iterator The iterator type regex_iterator will enumerate all of the regular expression matches found in some sequence: dereferencing a regex_iterator yields a reference to a match_results object. • Example: … boost::sregex_iterator m1(text.begin(), text.end(), expression); boost::sregex_iterator m2; std::for_each(m1, m2, &regex_callback); …
  • 37. • boost::regex_token_iterator The template class regex_token_iterator is an iterator adapter; that is to say it represents a new view of an existing iterator sequence, by enumerating all the occurrences of a regular expression within that sequence, and presenting one or more character sequence for each match found. • regex_token_iterator is almost like regex_iterator, but it can be used to list every sequence that doesn’t match the regular expression.
  • 38. • Example 1: boost::regex re("s+"); boost::sregex_token_iterator i(s.begin(), s.end(), re, -1); boost::sregex_token_iterator j; unsigned count = 0; while(i != j) { cout << *i++ << endl; count++; }
  • 39.
  • 40. • Example 2: boost::regex e("<s*As+[^>]*hrefs*=s*"([^"]*)"", boost::regex::normal | boost::regbase::icase); … const int subs[] = {1, 0,}; boost::sregex_token_iterator i(s.begin(), s.end(), e, subs); boost::sregex_token_iterator j; while(i != j) { std::cout << *i++ << std::endl; }
  • 41.
  • 42. What’s more? • Thread Safety • Performance
  • 43. References • http://www.boost.org • Beyond the C++ Standard Library: An Introduction to Boost -- Library 5.2 U