SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Regular Expressions –
SASÂź (RX) vs. Perl (PRX)
              P l
     Mark Tabladillo Ph.D.
        April 10, 2005



     © 2005, markTab Consulting, All Rights Reserved
Motivation
The SAS System Version 9 introduces Perl
regular expressions (PRX)
Earlier software versions already had SAS
regular expressions (RX)




         © 2005, markTab Consulting, All Rights Reserved
Purpose
This presentation will compare and
contrast the two types of regular
expressions (RX and PRX) from both the
functionality and performance viewpoints
The goal: Offer recommendations on
when to use the two types
Application: Two generic examples will
A li ti        T         i       l  ill
illustrate the recommended strategy

         © 2005, markTab Consulting, All Rights Reserved
Outline
Background
Similarities between SAS (RX) and Perl
Regular Expressions (PRX)
Unique Perl Regular Expression (PRX)
Capabilities
C    biliti
Recommended Strategy for SAS (RX) and
Perl Regular Expressions (PRX)
Two Examples of Recommended Strategy
            p                          gy
         © 2005, markTab Consulting, All Rights Reserved
Outline
Background
Similarities between SAS (RX) and Perl
Regular Expressions (PRX)
Unique Perl Regular Expression (PRX)
Capabilities
C    biliti
Recommended Strategy for SAS (RX) and
Perl Regular Expressions (PRX)
Two Examples of Recommended Strategy
            p                          gy
         © 2005, markTab Consulting, All Rights Reserved
Vocabulary
Pattern matching enables you to search for and
                    g          y
extract multiple matching patterns from a character
string in one step, as well as to make several
substitutions in a string in one step
                        g           p
Regular expressions are a pattern language which
provides fast tools for parsing large amounts of text.
Metacharacters are special combinations of
alphanumeric and/or symbolic characters which have
specific meaning in defining a regular expression.
Ch       t    l
Character classes are single or combinations of
                            i l        bi ti      f
alphanumeric and/or symbolic characters which
represent themselves.

            © 2005, markTab Consulting, All Rights Reserved
Is “One Step Realistic?
        One Step”
Practical uses of regular expressions use
more than one step
Regular expressions provide a powerful
parsimonious syntax for string
manipulation




         © 2005, markTab Consulting, All Rights Reserved
When to Use Regular Expressions
Anything done in regular expressions
could be coded another way
Many people do not use metacharacters in
(for example) GoogleÂź searches
Hi h-volume or complex string processing
High-
High l                l    ti           i
(such as in a data step) provides excellent
potential
   t ti l


         © 2005, markTab Consulting, All Rights Reserved
Why Regular Expressions can be
         Confusing
         C f i
Regular expressions are a combination of:
– Alphanumeric and/or symbolic characters
  representing themselves (character classes)
                           (character classes)
– Special combinations of alphanumeric and/or
  symbolic characters (metacharacters) representing
                      (metacharacters)
  zero or more combinations of alphanumeric and/or
  symbolic characters
– Specially flagged combinations of alphanumeric
  and/or symbolic characters which would normally be
  interpreted as metacharacters, but instead represent
  themselves (character classes)
               (character classes)

           © 2005, markTab Consulting, All Rights Reserved
Outline
Background
Similarities between SAS (RX) and Perl
Regular Expressions (PRX)
Unique Perl Regular Expression (PRX)
Capabilities
C    biliti
Recommended Strategy for SAS (RX) and
Perl Regular Expressions (PRX)
Two Examples of Recommended Strategy
            p                          gy
         © 2005, markTab Consulting, All Rights Reserved
Similarity One: Parse Function
PARSE is the core function of creating a
regular expression in memory using
metacharacters, and assigning this regular
                ,        g g           g
expression to a numeric SAS variable,
called the regular expression ID. ID.
The term ID refers to identification, and
SAS will assign every PARSE function to a
different and unique numeric value, and
diff    t d i               i    l       d
track those values automatically.

         © 2005, markTab Consulting, All Rights Reserved
Similarity One: Parse Function
The programming challenge is to create a
regular expression which generically
describes a character string pattern
Metacharacters for SAS (RX) and Perl
(PRX) regular expressions are usually
different, but either method can be used
to create a similar if not identical result


         © 2005, markTab Consulting, All Rights Reserved
Similarity One: Example
In this first e a p e (S S Institute, 2003), t e
   t s st example (SAS st tute, 003), the
goal is to find a pattern that matches (XXX) XXX-
                                              XXX-
XXXX or XXX-XXX-XXXX for phone numbers in
           XXX-XXX-
the United States.
             States
– The first three digits are the area code, and by
  standardized rules, the area code cannot start with a
  zero or a one.
– The fourth through sixth digits are the prefix, and
  again by standard rules, the prefix also cannot start
  with a zero or one.
– The suffix may have any digit, including zero or one,
  in any of the four places.
                     places
           © 2005, markTab Consulting, All Rights Reserved
Phone Number: Perl (PRX)
paren = quot;([2-9]dd) ?[2-9]dd-
         quot;([2-9]     ?[2-9]
ddddquot;;
dash = quot;[2-9]dd-[2-9]dd-ddddquot;;
        [2-
        [2 9]     [2-9]             d ;
regexp = quot;/(quot; || paren || quot;)|(quot; || dash ||
quot;)/quot;;
quot;)/quot;
See the Paper for the full code and
explanation


         © 2005, markTab Consulting, All Rights Reserved
Phone Number: SAS (RX)
paren = quot;'('$'2-9 $d$d ) [ ']$'2-9'$d$d'-
         quot;'('$'2-9'$d$d')'[' ']$'2-9'$d$d'-
           ($2                ]$ 2 9 $d$d
'$d$d$d$dquot;;
dash = quot;$'2-9'$d$d'-'$'2-9'$d$d'-
        $ 2 9 $d$d $ 2 9 $d$d
           2-           2-
'$d$d$d$dquot;;
regexp = paren || quot;|quot; || d h
                           dash;
See the Paper for the full code and
explanation


          © 2005, markTab Consulting, All Rights Reserved
Comparing the Methods
A SAS Macro was created to compare the
methods
One iteration did not show a difference, so
                              difference
the iterations were increased to 500
SAS (RX) wins at 3.69 seconds compared
            i    t 3 69      d           d
to Perl (PRX) at 3.80 seconds
Point: If speed is an issue, you may try
the two methods to see who wins
         © 2005, markTab Consulting, All Rights Reserved
Similarity Two: Matching
The matching function uses the regular
expression to determine a specific numeric
position in a string
The return from a match function is a
number representing a character position




         © 2005, markTab Consulting, All Rights Reserved
Similarity Three: Substring
The substring routine allows for inputting
a regular expression and string, and
outputting a position and length
Routines (unlike functions) can have
variable numbers of inputs and outputs,
                                outputs
as in the substring routine



         © 2005, markTab Consulting, All Rights Reserved
Similarity Four: Change
The change routine allows for inputting a
regular expression, a maximum number of
times to replace an old string and
         replace,        string,
outputs a new string
Both SAS (RX) and Perl (PRX) allow for
changing a string in place



         © 2005, markTab Consulting, All Rights Reserved
Similarity Five: Free
The free routine releases the memory
allocation for the regular expression
It is recommended to always include a
FREE routine to prevent problems




         © 2005, markTab Consulting, All Rights Reserved
Outline
Background
Similarities between SAS (RX) and Perl
Regular Expressions (PRX)
Unique Perl Regular Expression (PRX)
Capabilities
C    biliti
Recommended Strategy for SAS (RX) and
Perl Regular Expressions (PRX)
Two Examples of Recommended Strategy
            p                          gy
         © 2005, markTab Consulting, All Rights Reserved
Capture Buffers
Perl (PRX) regular expressions can use
capture buffers, defined as part of a
match explicitly specified in the Perl
          p    y p
regular expression
The capture buffers are collectively a one-
        p                           y one-
dimensional numbered array of results
(starting at one, not zero)
Example: Parts of a phone number
More than one step is required
                   p      q
         © 2005, markTab Consulting, All Rights Reserved
Unique Feature One: PRXPOSN
            Routine
                i
The PRXPOSN routine finds the start
position and length of a numbered capture
buffer




         © 2005, markTab Consulting, All Rights Reserved
Unique Feature Two: PRXPOSN
           Function
                i
The PRXPOSN Function uses the positional
capture buffer number to return the actual
string in the capture buffer
This function is probably more useful than
the PRXPOSN routine




         © 2005, markTab Consulting, All Rights Reserved
Unique Feature Three: PRXPAREN
The PRXPAREN function assumes that the
capture buffer was an ordered hierarchical
array and will return the highest non-
array,                            non-
missing capture buffer number
See the paper for an example




         © 2005, markTab Consulting, All Rights Reserved
Unique Feature Four: PRXNEXT
Similar to PRXMATCH the PRXNEXT
           PRXMATCH,
routine will iteratively search a string for
matches
Not based on the capture buffer
Useful h
U f l when a string can have multiple,
                  ti        h        lti l
even overlapping, matches



          © 2005, markTab Consulting, All Rights Reserved
Unique Feature Five: PRXDEBUG
The PRXDEBUG routine writes debugging
messages to the log
Provides insight into how regular
expression functions and routines search
through specific strings
Debugging works best when smaller
pieces are checked first, building toward
 i          h k d fi t b ildi t         d
the whole regular expression

         © 2005, markTab Consulting, All Rights Reserved
Outline
Background
Similarities between SAS (RX) and Perl
Regular Expressions (PRX)
Unique Perl Regular Expression (PRX)
Capabilities
C    biliti
Recommended Strategy for SAS (RX) and
Perl Regular Expressions (PRX)
Two Examples of Recommended Strategy
            p                          gy
         © 2005, markTab Consulting, All Rights Reserved
Recommended Strategy
Use the type which has the desired
functionality
If you don’t know either, start with Perl
       don t       either
regular expressions (PRX)
If you are l ki at performance or
           looking t      f
speed issues, try tests both ways (RX and
PRX)


         © 2005, markTab Consulting, All Rights Reserved
Outline
Background
Similarities between SAS (RX) and Perl
Regular Expressions (PRX)
Unique Perl Regular Expression (PRX)
Capabilities
C    biliti
Recommended Strategy for SAS (RX) and
Perl Regular Expressions (PRX)
Two Examples of Recommended Strategy
            p                          gy
         © 2005, markTab Consulting, All Rights Reserved
Example One: Printer Names
The Universal Naming Convention
describes printers as:
computer nameprinter_shared_name
  computer_name printer shared name
  computer_name
             name
The SYSPRINT option returns or sets the
UNC printer name




         © 2005, markTab Consulting, All Rights Reserved
Example One: Printer Name
Problem: A variety of legal UNC formats:
– computer_nameprinter_shared_name
    computer_name
– (computer_nameprinter shared name)
     computer_name printer_shared_name)
     computer nameprinter_shared_name
              name               name)
– (“computer_nameprinter_shared_name’)
  (“ computer_nameprinter_shared_name’)
12 printers * 3 formats = 36 combinations
      i t       f    t          bi ti
SAS (RX) could be used with 3 separate
regular expressions
Perl (PRX) capture buffer used
     (    ) p
         © 2005, markTab Consulting, All Rights Reserved
Example One: PRX
'/(
'/([-w]+|[-w]+)/'
 /(
 /(         w]+|[- w]+)/
The regular expression will extract the
printer name without the braces, or
         name,             braces
brackets, or quotation marks
See the
S th paper f explanation
                for   l   ti




         © 2005, markTab Consulting, All Rights Reserved
Example Two: Windows
         Subdirectory
         S bdi
Get the subdirectory from the longer
string which started with the drive name
and ended with a specific filename:
– X:Sub_Directory_1Sub_Directory_2...Sub
  X: Sub_Directory_1Sub_Directory_2...
  _Directory_NFilename Extension
  _Directory_NFilename.Extension
   Directory N
             N
As in the previous example, the original
string includes the backslash, which is a
                    backslash
Perl delimiting metacharacter

         © 2005, markTab Consulting, All Rights Reserved
Example Two: Regular Expression
'/([A-Za-z]:[.
'/([A-Za-z]:[ -w]+)([ -w]+)([ -
 /([A             w]+) ([. w]+) ([.
w]+)/'
The regular expression creates three
capture buffers, with the second capture
buffer containing the string of interest
See the paper for a full explanation



         © 2005, markTab Consulting, All Rights Reserved
Conclusion
With version 9, SAS programmers have
             9
two regular expression choices: SAS (RX)
and Perl (PRX)
The presentation described similarities and
differences and offered a recommended
differences,
strategy
The
Th paper contains three detailed
               t i th    d t il d
examples, and an annotated bibliography

         © 2005, markTab Consulting, All Rights Reserved

Weitere Àhnliche Inhalte

Was ist angesagt?

Lec 12. Multidimensional Arrays / Passing Arrays to Functions
Lec 12. Multidimensional Arrays / Passing Arrays to FunctionsLec 12. Multidimensional Arrays / Passing Arrays to Functions
Lec 12. Multidimensional Arrays / Passing Arrays to FunctionsRushdi Shams
 
Joint Alignment of Segmentation and Labelling for Arabic Morphosyntactic Taggers
Joint Alignment of Segmentation and Labelling for Arabic Morphosyntactic TaggersJoint Alignment of Segmentation and Labelling for Arabic Morphosyntactic Taggers
Joint Alignment of Segmentation and Labelling for Arabic Morphosyntactic TaggersCSCJournals
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)bolovv
 
Designing Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoDesigning Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoJoel Falcou
 
Boost.Dispatch
Boost.DispatchBoost.Dispatch
Boost.DispatchJoel Falcou
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel ArchitecturesJoel Falcou
 
Naming Conventions
Naming ConventionsNaming Conventions
Naming ConventionsPubudu Bandara
 
HDR Defence - Software Abstractions for Parallel Architectures
HDR Defence - Software Abstractions for Parallel ArchitecturesHDR Defence - Software Abstractions for Parallel Architectures
HDR Defence - Software Abstractions for Parallel ArchitecturesJoel Falcou
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionProf. Wim Van Criekinge
 
2.regular expressions
2.regular expressions2.regular expressions
2.regular expressionsPraveen Gorantla
 
Access Control via Belnap Logic
Access Control via Belnap LogicAccess Control via Belnap Logic
Access Control via Belnap LogicAndrada Astefanoaie
 
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position EmbeddingRoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embeddingtaeseon ryu
 
7 relational database design algorithms and further dependencies
7 relational database design algorithms and further dependencies7 relational database design algorithms and further dependencies
7 relational database design algorithms and further dependenciesKumar
 
3. Lexical analysis
3. Lexical analysis3. Lexical analysis
3. Lexical analysisSaeed Parsa
 
Regular expressionfunction
Regular expressionfunctionRegular expressionfunction
Regular expressionfunctionADARSH BHATT
 

Was ist angesagt? (20)

Lec 12. Multidimensional Arrays / Passing Arrays to Functions
Lec 12. Multidimensional Arrays / Passing Arrays to FunctionsLec 12. Multidimensional Arrays / Passing Arrays to Functions
Lec 12. Multidimensional Arrays / Passing Arrays to Functions
 
Joint Alignment of Segmentation and Labelling for Arabic Morphosyntactic Taggers
Joint Alignment of Segmentation and Labelling for Arabic Morphosyntactic TaggersJoint Alignment of Segmentation and Labelling for Arabic Morphosyntactic Taggers
Joint Alignment of Segmentation and Labelling for Arabic Morphosyntactic Taggers
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Chapter05
Chapter05Chapter05
Chapter05
 
Chtp409
Chtp409Chtp409
Chtp409
 
Designing Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.ProtoDesigning Architecture-aware Library using Boost.Proto
Designing Architecture-aware Library using Boost.Proto
 
Boost.Dispatch
Boost.DispatchBoost.Dispatch
Boost.Dispatch
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures
 
Naming Conventions
Naming ConventionsNaming Conventions
Naming Conventions
 
HDR Defence - Software Abstractions for Parallel Architectures
HDR Defence - Software Abstractions for Parallel ArchitecturesHDR Defence - Software Abstractions for Parallel Architectures
HDR Defence - Software Abstractions for Parallel Architectures
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
 
2.regular expressions
2.regular expressions2.regular expressions
2.regular expressions
 
Access Control via Belnap Logic
Access Control via Belnap LogicAccess Control via Belnap Logic
Access Control via Belnap Logic
 
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position EmbeddingRoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
 
7 relational database design algorithms and further dependencies
7 relational database design algorithms and further dependencies7 relational database design algorithms and further dependencies
7 relational database design algorithms and further dependencies
 
Adv. python regular expression by Rj
Adv. python regular expression by RjAdv. python regular expression by Rj
Adv. python regular expression by Rj
 
3. Lexical analysis
3. Lexical analysis3. Lexical analysis
3. Lexical analysis
 
Spsl II unit
Spsl   II unitSpsl   II unit
Spsl II unit
 
Regular expressionfunction
Regular expressionfunctionRegular expressionfunction
Regular expressionfunction
 
RegexCat
RegexCatRegexCat
RegexCat
 

Ähnlich wie SAS RX vs Perl PRX - A Comparison of Regular Expression Capabilities

New features in abap
New features in abapNew features in abap
New features in abapSrihari J
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20Max Kleiner
 
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Sandy Smith
 
Perl_Part4
Perl_Part4Perl_Part4
Perl_Part4Frank Booth
 
RegEx Parsing
RegEx ParsingRegEx Parsing
RegEx ParsingAnjali Rao
 
Regular Expressions in PHP
Regular Expressions in PHPRegular Expressions in PHP
Regular Expressions in PHPAndrew Kandels
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in PythonSujith Kumar
 
Regular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsRegular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsDanny Bryant
 
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
 JavaScript - Chapter 9 - TypeConversion and Regular Expressions  JavaScript - Chapter 9 - TypeConversion and Regular Expressions
JavaScript - Chapter 9 - TypeConversion and Regular Expressions WebStackAcademy
 
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracleLogan Palanisamy
 
Javascriptæ­Łćˆ™èĄšèŸŸćŒ
Javascriptæ­Łćˆ™èĄšèŸŸćŒJavascriptæ­Łćˆ™èĄšèŸŸćŒ
Javascriptæ­Łćˆ™èĄšèŸŸćŒji guang
 
Perl Presentation
Perl PresentationPerl Presentation
Perl PresentationSopan Shewale
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in RRajarshi Guha
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeProf. Wim Van Criekinge
 
A SAS<sup>Âź</sup> Users Guide to Regular Expressions When the Data Resi...
A SAS<sup>Âź</sup> Users Guide to Regular Expressions When the Data Resi...A SAS<sup>Âź</sup> Users Guide to Regular Expressions When the Data Resi...
A SAS<sup>Âź</sup> Users Guide to Regular Expressions When the Data Resi...Ken Borowiak
 
Tutorial on Regular Expression in Perl (perldoc Perlretut)
Tutorial on Regular Expression in Perl (perldoc Perlretut)Tutorial on Regular Expression in Perl (perldoc Perlretut)
Tutorial on Regular Expression in Perl (perldoc Perlretut)FrescatiStory
 
Plunging Into Perl While Avoiding the Deep End (mostly)
Plunging Into Perl While Avoiding the Deep End (mostly)Plunging Into Perl While Avoiding the Deep End (mostly)
Plunging Into Perl While Avoiding the Deep End (mostly)Roy Zimmer
 
Chapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular ExpressionChapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular Expressionazzamhadeel89
 
Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015Sandy Smith
 

Ähnlich wie SAS RX vs Perl PRX - A Comparison of Regular Expression Capabilities (20)

New features in abap
New features in abapNew features in abap
New features in abap
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20
 
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014
 
Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 
Perl_Part4
Perl_Part4Perl_Part4
Perl_Part4
 
RegEx Parsing
RegEx ParsingRegEx Parsing
RegEx Parsing
 
Regular Expressions in PHP
Regular Expressions in PHPRegular Expressions in PHP
Regular Expressions in PHP
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
 
Regular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsRegular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular Expressions
 
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
 JavaScript - Chapter 9 - TypeConversion and Regular Expressions  JavaScript - Chapter 9 - TypeConversion and Regular Expressions
JavaScript - Chapter 9 - TypeConversion and Regular Expressions
 
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracle
 
Javascriptæ­Łćˆ™èĄšèŸŸćŒ
Javascriptæ­Łćˆ™èĄšèŸŸćŒJavascriptæ­Łćˆ™èĄšèŸŸćŒ
Javascriptæ­Łćˆ™èĄšèŸŸćŒ
 
Perl Presentation
Perl PresentationPerl Presentation
Perl Presentation
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in R
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
 
A SAS<sup>Âź</sup> Users Guide to Regular Expressions When the Data Resi...
A SAS<sup>Âź</sup> Users Guide to Regular Expressions When the Data Resi...A SAS<sup>Âź</sup> Users Guide to Regular Expressions When the Data Resi...
A SAS<sup>Âź</sup> Users Guide to Regular Expressions When the Data Resi...
 
Tutorial on Regular Expression in Perl (perldoc Perlretut)
Tutorial on Regular Expression in Perl (perldoc Perlretut)Tutorial on Regular Expression in Perl (perldoc Perlretut)
Tutorial on Regular Expression in Perl (perldoc Perlretut)
 
Plunging Into Perl While Avoiding the Deep End (mostly)
Plunging Into Perl While Avoiding the Deep End (mostly)Plunging Into Perl While Avoiding the Deep End (mostly)
Plunging Into Perl While Avoiding the Deep End (mostly)
 
Chapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular ExpressionChapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular Expression
 
Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015
 

Mehr von Mark Tabladillo

How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006Mark Tabladillo
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMark Tabladillo
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for DevelopersMark Tabladillo
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0Mark Tabladillo
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019Mark Tabladillo
 
201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusMLMark Tabladillo
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...Mark Tabladillo
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Mark Tabladillo
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Mark Tabladillo
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureMark Tabladillo
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Mark Tabladillo
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Mark Tabladillo
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Mark Tabladillo
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Mark Tabladillo
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610Mark Tabladillo
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Mark Tabladillo
 

Mehr von Mark Tabladillo (20)

How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science Recap
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on Azure
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016
 

KĂŒrzlich hochgeladen

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

KĂŒrzlich hochgeladen (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

SAS RX vs Perl PRX - A Comparison of Regular Expression Capabilities

  • 1. Regular Expressions – SASÂź (RX) vs. Perl (PRX) P l Mark Tabladillo Ph.D. April 10, 2005 © 2005, markTab Consulting, All Rights Reserved
  • 2. Motivation The SAS System Version 9 introduces Perl regular expressions (PRX) Earlier software versions already had SAS regular expressions (RX) © 2005, markTab Consulting, All Rights Reserved
  • 3. Purpose This presentation will compare and contrast the two types of regular expressions (RX and PRX) from both the functionality and performance viewpoints The goal: Offer recommendations on when to use the two types Application: Two generic examples will A li ti T i l ill illustrate the recommended strategy © 2005, markTab Consulting, All Rights Reserved
  • 4. Outline Background Similarities between SAS (RX) and Perl Regular Expressions (PRX) Unique Perl Regular Expression (PRX) Capabilities C biliti Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX) Two Examples of Recommended Strategy p gy © 2005, markTab Consulting, All Rights Reserved
  • 5. Outline Background Similarities between SAS (RX) and Perl Regular Expressions (PRX) Unique Perl Regular Expression (PRX) Capabilities C biliti Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX) Two Examples of Recommended Strategy p gy © 2005, markTab Consulting, All Rights Reserved
  • 6. Vocabulary Pattern matching enables you to search for and g y extract multiple matching patterns from a character string in one step, as well as to make several substitutions in a string in one step g p Regular expressions are a pattern language which provides fast tools for parsing large amounts of text. Metacharacters are special combinations of alphanumeric and/or symbolic characters which have specific meaning in defining a regular expression. Ch t l Character classes are single or combinations of i l bi ti f alphanumeric and/or symbolic characters which represent themselves. © 2005, markTab Consulting, All Rights Reserved
  • 7. Is “One Step Realistic? One Step” Practical uses of regular expressions use more than one step Regular expressions provide a powerful parsimonious syntax for string manipulation © 2005, markTab Consulting, All Rights Reserved
  • 8. When to Use Regular Expressions Anything done in regular expressions could be coded another way Many people do not use metacharacters in (for example) GoogleÂź searches Hi h-volume or complex string processing High- High l l ti i (such as in a data step) provides excellent potential t ti l © 2005, markTab Consulting, All Rights Reserved
  • 9. Why Regular Expressions can be Confusing C f i Regular expressions are a combination of: – Alphanumeric and/or symbolic characters representing themselves (character classes) (character classes) – Special combinations of alphanumeric and/or symbolic characters (metacharacters) representing (metacharacters) zero or more combinations of alphanumeric and/or symbolic characters – Specially flagged combinations of alphanumeric and/or symbolic characters which would normally be interpreted as metacharacters, but instead represent themselves (character classes) (character classes) © 2005, markTab Consulting, All Rights Reserved
  • 10. Outline Background Similarities between SAS (RX) and Perl Regular Expressions (PRX) Unique Perl Regular Expression (PRX) Capabilities C biliti Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX) Two Examples of Recommended Strategy p gy © 2005, markTab Consulting, All Rights Reserved
  • 11. Similarity One: Parse Function PARSE is the core function of creating a regular expression in memory using metacharacters, and assigning this regular , g g g expression to a numeric SAS variable, called the regular expression ID. ID. The term ID refers to identification, and SAS will assign every PARSE function to a different and unique numeric value, and diff t d i i l d track those values automatically. © 2005, markTab Consulting, All Rights Reserved
  • 12. Similarity One: Parse Function The programming challenge is to create a regular expression which generically describes a character string pattern Metacharacters for SAS (RX) and Perl (PRX) regular expressions are usually different, but either method can be used to create a similar if not identical result © 2005, markTab Consulting, All Rights Reserved
  • 13. Similarity One: Example In this first e a p e (S S Institute, 2003), t e t s st example (SAS st tute, 003), the goal is to find a pattern that matches (XXX) XXX- XXX- XXXX or XXX-XXX-XXXX for phone numbers in XXX-XXX- the United States. States – The first three digits are the area code, and by standardized rules, the area code cannot start with a zero or a one. – The fourth through sixth digits are the prefix, and again by standard rules, the prefix also cannot start with a zero or one. – The suffix may have any digit, including zero or one, in any of the four places. places © 2005, markTab Consulting, All Rights Reserved
  • 14. Phone Number: Perl (PRX) paren = quot;([2-9]dd) ?[2-9]dd- quot;([2-9] ?[2-9] ddddquot;; dash = quot;[2-9]dd-[2-9]dd-ddddquot;; [2- [2 9] [2-9] d ; regexp = quot;/(quot; || paren || quot;)|(quot; || dash || quot;)/quot;; quot;)/quot; See the Paper for the full code and explanation © 2005, markTab Consulting, All Rights Reserved
  • 15. Phone Number: SAS (RX) paren = quot;'('$'2-9 $d$d ) [ ']$'2-9'$d$d'- quot;'('$'2-9'$d$d')'[' ']$'2-9'$d$d'- ($2 ]$ 2 9 $d$d '$d$d$d$dquot;; dash = quot;$'2-9'$d$d'-'$'2-9'$d$d'- $ 2 9 $d$d $ 2 9 $d$d 2- 2- '$d$d$d$dquot;; regexp = paren || quot;|quot; || d h dash; See the Paper for the full code and explanation © 2005, markTab Consulting, All Rights Reserved
  • 16. Comparing the Methods A SAS Macro was created to compare the methods One iteration did not show a difference, so difference the iterations were increased to 500 SAS (RX) wins at 3.69 seconds compared i t 3 69 d d to Perl (PRX) at 3.80 seconds Point: If speed is an issue, you may try the two methods to see who wins © 2005, markTab Consulting, All Rights Reserved
  • 17. Similarity Two: Matching The matching function uses the regular expression to determine a specific numeric position in a string The return from a match function is a number representing a character position © 2005, markTab Consulting, All Rights Reserved
  • 18. Similarity Three: Substring The substring routine allows for inputting a regular expression and string, and outputting a position and length Routines (unlike functions) can have variable numbers of inputs and outputs, outputs as in the substring routine © 2005, markTab Consulting, All Rights Reserved
  • 19. Similarity Four: Change The change routine allows for inputting a regular expression, a maximum number of times to replace an old string and replace, string, outputs a new string Both SAS (RX) and Perl (PRX) allow for changing a string in place © 2005, markTab Consulting, All Rights Reserved
  • 20. Similarity Five: Free The free routine releases the memory allocation for the regular expression It is recommended to always include a FREE routine to prevent problems © 2005, markTab Consulting, All Rights Reserved
  • 21. Outline Background Similarities between SAS (RX) and Perl Regular Expressions (PRX) Unique Perl Regular Expression (PRX) Capabilities C biliti Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX) Two Examples of Recommended Strategy p gy © 2005, markTab Consulting, All Rights Reserved
  • 22. Capture Buffers Perl (PRX) regular expressions can use capture buffers, defined as part of a match explicitly specified in the Perl p y p regular expression The capture buffers are collectively a one- p y one- dimensional numbered array of results (starting at one, not zero) Example: Parts of a phone number More than one step is required p q © 2005, markTab Consulting, All Rights Reserved
  • 23. Unique Feature One: PRXPOSN Routine i The PRXPOSN routine finds the start position and length of a numbered capture buffer © 2005, markTab Consulting, All Rights Reserved
  • 24. Unique Feature Two: PRXPOSN Function i The PRXPOSN Function uses the positional capture buffer number to return the actual string in the capture buffer This function is probably more useful than the PRXPOSN routine © 2005, markTab Consulting, All Rights Reserved
  • 25. Unique Feature Three: PRXPAREN The PRXPAREN function assumes that the capture buffer was an ordered hierarchical array and will return the highest non- array, non- missing capture buffer number See the paper for an example © 2005, markTab Consulting, All Rights Reserved
  • 26. Unique Feature Four: PRXNEXT Similar to PRXMATCH the PRXNEXT PRXMATCH, routine will iteratively search a string for matches Not based on the capture buffer Useful h U f l when a string can have multiple, ti h lti l even overlapping, matches © 2005, markTab Consulting, All Rights Reserved
  • 27. Unique Feature Five: PRXDEBUG The PRXDEBUG routine writes debugging messages to the log Provides insight into how regular expression functions and routines search through specific strings Debugging works best when smaller pieces are checked first, building toward i h k d fi t b ildi t d the whole regular expression © 2005, markTab Consulting, All Rights Reserved
  • 28. Outline Background Similarities between SAS (RX) and Perl Regular Expressions (PRX) Unique Perl Regular Expression (PRX) Capabilities C biliti Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX) Two Examples of Recommended Strategy p gy © 2005, markTab Consulting, All Rights Reserved
  • 29. Recommended Strategy Use the type which has the desired functionality If you don’t know either, start with Perl don t either regular expressions (PRX) If you are l ki at performance or looking t f speed issues, try tests both ways (RX and PRX) © 2005, markTab Consulting, All Rights Reserved
  • 30. Outline Background Similarities between SAS (RX) and Perl Regular Expressions (PRX) Unique Perl Regular Expression (PRX) Capabilities C biliti Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX) Two Examples of Recommended Strategy p gy © 2005, markTab Consulting, All Rights Reserved
  • 31. Example One: Printer Names The Universal Naming Convention describes printers as: computer nameprinter_shared_name computer_name printer shared name computer_name name The SYSPRINT option returns or sets the UNC printer name © 2005, markTab Consulting, All Rights Reserved
  • 32. Example One: Printer Name Problem: A variety of legal UNC formats: – computer_nameprinter_shared_name computer_name – (computer_nameprinter shared name) computer_name printer_shared_name) computer nameprinter_shared_name name name) – (“computer_nameprinter_shared_name’) (“ computer_nameprinter_shared_name’) 12 printers * 3 formats = 36 combinations i t f t bi ti SAS (RX) could be used with 3 separate regular expressions Perl (PRX) capture buffer used ( ) p © 2005, markTab Consulting, All Rights Reserved
  • 33. Example One: PRX '/( '/([-w]+|[-w]+)/' /( /( w]+|[- w]+)/ The regular expression will extract the printer name without the braces, or name, braces brackets, or quotation marks See the S th paper f explanation for l ti © 2005, markTab Consulting, All Rights Reserved
  • 34. Example Two: Windows Subdirectory S bdi Get the subdirectory from the longer string which started with the drive name and ended with a specific filename: – X:Sub_Directory_1Sub_Directory_2...Sub X: Sub_Directory_1Sub_Directory_2... _Directory_NFilename Extension _Directory_NFilename.Extension Directory N N As in the previous example, the original string includes the backslash, which is a backslash Perl delimiting metacharacter © 2005, markTab Consulting, All Rights Reserved
  • 35. Example Two: Regular Expression '/([A-Za-z]:[. '/([A-Za-z]:[ -w]+)([ -w]+)([ - /([A w]+) ([. w]+) ([. w]+)/' The regular expression creates three capture buffers, with the second capture buffer containing the string of interest See the paper for a full explanation © 2005, markTab Consulting, All Rights Reserved
  • 36. Conclusion With version 9, SAS programmers have 9 two regular expression choices: SAS (RX) and Perl (PRX) The presentation described similarities and differences and offered a recommended differences, strategy The Th paper contains three detailed t i th d t il d examples, and an annotated bibliography © 2005, markTab Consulting, All Rights Reserved