SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
How to check
valid email?
Not only in Ruby
brought to DRUG by
Piotr Wasiak 20.02.2023
Find using RegEx(p?)
Agenda
2
1. RegEx overview
2. Recommendations
3. Ruby quirks / amenities
4. Tools / Resources
5. Advanced RE(2)
6. Ruby 3.2 RE changes
Who am I?
Piotr Wasiak
Ruby, Rails developer
Current PRUG organiser
3
Interests:
● climbing, hiking, squash
● contract bridge, chess
● ruby, programming, crypto
Regular Expression
is a character sequence, that defines a search pattern
The purpose is:
● validate the string by the pattern
● get parts of the content (e.g. find or find_and_replace in text editors)
4
RegEx history
● Concept of language arose in the 1950s
● Different syntaxes (1980+):
○ POSIX (Basic - or Extended Regular Expressions)
○ Perl (influenced/imported to other languages as PCRE 1997, PCRE2 2015)
5
RegEx as a state machine
6
Statement validation: /(?<name>ADAM|PIOTR)s?[=><]{1,2}s*"(?:PIENIĄDZ|KUKU)"/g
Basics
7
Find RegEx
In replace we can use
matched whole
phrase or groups.
Group number is
ordered by starting
bracket index and is
limited to 1 - 9
8
Valid email (1/3)
Rails popular gem solution:
9
Valid email (2/3)
10
Email validation:
/(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"
(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c
x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9]
(?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0
bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
Valid email (3/3)
11
Email validation:
/(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"
(?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c
x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9]
(?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0
bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
12
2. Recommendations
original_regexp =
%r{(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]|[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[[:alnum:]](?:[a-z0-9
-]*[[:alnum:]])?.)+[[:alnum:]](?:[a-z0-9-]*[[:alnum:]])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[[:alnum:]]:(?:[x01-x08x0bx
0cx0e-x1f!-Z]-x7f]|[x01-x09x0bx0cx0e-x7f])+)])}
alnum_with_hypen = /[a-z0-9-]/.source # posix alternative /[-[:alnum:]]/
ip_number_type = /25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/.source
common_parts = /[x01-x08x0bx0cx0e-x1f]-x7f]/.source
username_without_backslash_prepended_set = /[#{common_parts}!#-x5b]/.source
domain_port_unescaped_set = /[#{common_parts}!-Z]/.source
domain_port_escaped_chars_set = /[#{common_parts}x0e-x7f]/.source
non_ending_chars = %r{[a-z0-9!#$%&'*+/=?^_`{|}~-]+}.source
final_with_variables =
/(?:#{non_ending_chars}(?:.#{non_ending_chars})*|"(?:#{username_without_backslash
_prepended_set}|#{domain_port_escaped_chars_set})*")@(?:(?:[[:alnum:]](?:#{alnum
_with_hypen}*[[:alnum:]])?.)+[[:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?|[(?
:(?:#{ip_number_type}).){3}(?:#{ip_number_type}|#{alnum_with_hypen}*[[:alnum:]]:(
?:#{domain_port_unescaped_set}|#{domain_port_escaped_chars_set})+)])/
13
Simplify valid email
original_regexp =
%r{(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]|[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[[:alnum:]](?:[a-z0-9
-]*[[:alnum:]])?.)+[[:alnum:]](?:[a-z0-9-]*[[:alnum:]])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[[:alnum:]]:(?:[x01-x08x0bx
0cx0e-x1f!-Z]-x7f]|[x01-x09x0bx0cx0e-x7f])+)])}
alnum_with_hypen = /[a-z0-9-]/.source # posix alternative /[-[:alnum:]]/
ip_number_type = /25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/.source
ascii_wo_tabs_cr_nl = /[[:ascii:]&&[^x09-x0ax0d]]/.source
domain_port_escaped_chars_set = /[#{ascii_wo_tabs_cr_nl}x09x20"]/.source
domain_port_unescaped_set = /[#{ascii_wo_tabs_cr_nl}&&[^x20]]/.source
username = /[#{domain_port_unescaped_set}&&[^"]]/.source
non_ending_chars = %r{[a-z0-9!#$%&'*+/=?^_`{|}~-]+}.source
final_with_variables =
/(?:#{non_ending_chars}(?:.#{non_ending_chars})*|"(?:#{username}|#{domain_port_
escaped_chars_set})*")@(?:(?:[[:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?.)+[[
:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?|[(?:(?:#{ip_number_type}).){3}(?:#
{ip_number_type}|#{alnum_with_hypen}*[[:alnum:]]:(?:#{domain_port_unescaped_set}|
#{domain_port_escaped_chars_set})+)])/
14
Simplify valid email (more ruby version)
original_regexp = %r{ # there is no heredoc for regexp
(?: # strings with some special chars, but not ending with .
[a-z0-9!#$%&'*+/=?^_`{|}~-]+
(?:
.[a-z0-9!#$%&'*+/=?^_`{|}~-]+
)*
|
"
(?: # special chars enquoted
[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]
|
 # prepended with backslash, here escaped
[x01-x09x0bx0cx0e-x7f] # more special chars
)*
" # closing quote
)
@ # the most crucial ampersand
(?: # domain regexp
(?: # at least one subdomain joined and finished with .
[[:alnum:]]
(?:
[a-z0-9-]* # subdomain can have many alphanumeric or - inside
[[:alnum:]] # subdomain have to finish with alphanumeric char
)?
. # dot separator
)+
[[:alnum:]] # domain have to start with alphanumeric char
(?:
[a-z0-9-]* # domain can have many alphanumeric or - inside
[[:alnum:]] # domain have to finish with alphanumeric char
)? 15
/x comments mode
| # or direct ip implementation or 3 numbers
with . suffix and some special usecases
[ # enquoted with square brackets
(?:
(?: # numbers are quite complex in RegEx
25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]? #
0-255
). # . suffix
){3} # 3 times
(?:
25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]? # 0-255
| # or 3 numbers with . suffix and some
special usecases
[a-z0-9-]* # alnums also starting with -
[[:alnum:]] # finishing without -
:
(?:
[x01-x08x0bx0cx0e-x1f!-Z]-x7f] #
many chars
|
 # more ansii chars prefixed with
backslash
[x01-x09x0bx0cx0e-x7f]
)+
)
] # closing square bracket
)
}x # switch to treat spaces/new lines and `# `
suffix as comments
Ruby simply string methods are faster and more meaningful:
● .start_with? / .end_with?
● .include?(‘some substring’)
● .chomp
● .strip
● .lines
● .split(‘ ’) # without regexp
● .tr(‘ !?‘, ‘1-9’)
16
Do not overuse regular expression (1/2)
Libraries and gems for common concepts:
● URI(url)
+ .host / .path / .query / .fragment
● File(path_to_file)
+ .dirname / .basename / .extname
● Nokogiri::HTML(
open('https://nokogiri.org/’)
)
17
Do not overuse regular expression (2/2)
Do not use REGEX as language parser
Programming languages depend more on language nodes/tree.
There will be always a problem with some exceptions, different coding
styles
In Ruby we need to use Ripper or other tools to decompose Ruby code
into pieces
Markup languages can be parsed by e.g. Nokogiri, Ox, Oj gems easier
and more secure
18
Clear RegEx
● extract common parts in alternation
● put more likely to appear words in the front of alternation
● use comments and whitespace with /x modifier
● give a name for captured groups, use also non-captured
● split code to smaller logical pieces
● lint code with ruby -w for warnings
19
3. Ruby quirks / flavor
20
mix ? Interpolation of RegEx
MULTILINE
IGNORECASE
EXTENDED
21
Joke
Scrabble: what is a longest word from combined RE switch letters?
22
I M N O X
Joke
Scrabble: what is a longest word from combined RE switch letters?
23
I M N O X
- in general "dot matches at line breaks mode" is turn on with s flag
instead of ruby m flag
- In Ruby, ^ and $ always match on every line.
If you want to specify the beginning of the string, use A.
For the very end of the string, use z (or Z including final line break).
Quirks in Ruby RegEx engine (1/3)
24
Quirks in Ruby RegEx engine (2/3)
Ruby does not allow
● look-ahead
● negative look-behind
inside a look-behind, such as:
25
- Intersection […&&[…]]
- Subtraction […&&[^…]]
26
Quirks in Ruby RegEx engine (3/3)
Character classes operators
Ruby amenities (1/3)
27
Ruby amenities (2/3)
28
Ruby amenities (3/3)
29
4. Tools / Resources
30
Tools / Websites
● regex101.com/
nicest editor, explanation on hover, cheatset, performance analysis
● www.debuggex.com/ visualized graphs with cheat-set
● Visualization plugins for Visual Studio Code
● rubocop and rubocop-performance have some rules for regex
● rubular.com/ check if RegEx works in Ruby 2.5. Other with 2.1
● rubyapi.org/3.1/o/regexp good Ruby docs
31
32
5. Advanced RE(2)
33
Backtracking
problem
34
/d-d+$/g
Catastrophic backtracking case /a?n
an
=~ an
/
35
“Most modern engines are regex-directed because this is the only way to
implement useful features such as lazy quantifiers and backreferences;
and atomic grouping and possessive quantifiers that give extra control
to backtracking.”
PCRE like solutions
36
37
38
Back to Finite Automaton - (D/N) FA
39
/abb*a/
RegEx to Deterministic Finite Automaton
What RegEx is it?
40
RegEx to Deterministic Finite Automaton
/(100?)*1/ matches: [ 1010101, 1, 10101, 1001001]
41
RegEx to Deterministic Finite Automaton
/(100?)*1/
42
RegEx to Deterministic Finite Automaton
/(100?)*1/
43
RE2
PCRE2
44
6. Ruby 3.2 RE changes
45
Regexp improvements against ReDoS
It is known that Regexp matching may take unexpectedly long.
If your code attempts to match a possibly inefficient Regexp against an
untrusted input, an attacker may exploit it for efficient Denial of Service
ReDoS improvements (1/2)
46
ReDoS improvements (2/2)
47
Improved Regexp matching algorithm using a memoization technique
Sources
48
● devopedia.org/regex-engines
● patshaughnessy.net/2012/4/3/ (...) rubys-regular-expression-algorithm
● github.com/google/re2/wiki/Syntax
● optimized re2 called hyperscan
● wiki/Determinizacja_automatu_skonczonego
● regular-expressions.info/refrepeat.html
● rexegg.com/regex-optimizations.html
● bugs.ruby-lang.org/issues/19104 selective memiozation
Thanks for listening
What’s your question?
49

Weitere ähnliche Inhalte

Ähnlich wie How to check valid Email? Find using regex.

Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Data Con LA
 
Stop overusing regular expressions!
Stop overusing regular expressions!Stop overusing regular expressions!
Stop overusing regular expressions!Franklin Chen
 
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...Alexandre Morgaut
 
Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007Guillaume Laforge
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009spierre
 
Hacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 AutumnHacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 AutumnMoriyoshi Koizumi
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]RootedCON
 
Build Your Own Tools
Build Your Own ToolsBuild Your Own Tools
Build Your Own ToolsShugo Maeda
 
Go 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX GoGo 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX GoRodolfo Carvalho
 
Perly Parsing with Regexp::Grammars
Perly Parsing with Regexp::GrammarsPerly Parsing with Regexp::Grammars
Perly Parsing with Regexp::GrammarsWorkhorse Computing
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingPositive Hack Days
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Guillaume Laforge
 
Going to Mars with Groovy Domain-Specific Languages
Going to Mars with Groovy Domain-Specific LanguagesGoing to Mars with Groovy Domain-Specific Languages
Going to Mars with Groovy Domain-Specific LanguagesGuillaume Laforge
 
Adventurous Merb
Adventurous MerbAdventurous Merb
Adventurous MerbMatt Todd
 
Iron Languages - NYC CodeCamp 2/19/2011
Iron Languages - NYC CodeCamp 2/19/2011Iron Languages - NYC CodeCamp 2/19/2011
Iron Languages - NYC CodeCamp 2/19/2011Jimmy Schementi
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manualSami Said
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchAndrew Lowe
 
Dart the Better JavaScript
Dart the Better JavaScriptDart the Better JavaScript
Dart the Better JavaScriptJorg Janke
 

Ähnlich wie How to check valid Email? Find using regex. (20)

Go. Why it goes
Go. Why it goesGo. Why it goes
Go. Why it goes
 
Ruby on Rails Presentation
Ruby on Rails PresentationRuby on Rails Presentation
Ruby on Rails Presentation
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 
Stop overusing regular expressions!
Stop overusing regular expressions!Stop overusing regular expressions!
Stop overusing regular expressions!
 
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
Lint, coverage, doc, autocompletion, transpilation, minification... powered b...
 
Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007Groovy Update - JavaPolis 2007
Groovy Update - JavaPolis 2007
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009
 
Hacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 AutumnHacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 Autumn
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
 
Build Your Own Tools
Build Your Own ToolsBuild Your Own Tools
Build Your Own Tools
 
Go 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX GoGo 1.10 Release Party - PDX Go
Go 1.10 Release Party - PDX Go
 
Perly Parsing with Regexp::Grammars
Perly Parsing with Regexp::GrammarsPerly Parsing with Regexp::Grammars
Perly Parsing with Regexp::Grammars
 
Specialized Compiler for Hash Cracking
Specialized Compiler for Hash CrackingSpecialized Compiler for Hash Cracking
Specialized Compiler for Hash Cracking
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
 
Going to Mars with Groovy Domain-Specific Languages
Going to Mars with Groovy Domain-Specific LanguagesGoing to Mars with Groovy Domain-Specific Languages
Going to Mars with Groovy Domain-Specific Languages
 
Adventurous Merb
Adventurous MerbAdventurous Merb
Adventurous Merb
 
Iron Languages - NYC CodeCamp 2/19/2011
Iron Languages - NYC CodeCamp 2/19/2011Iron Languages - NYC CodeCamp 2/19/2011
Iron Languages - NYC CodeCamp 2/19/2011
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manual
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
Dart the Better JavaScript
Dart the Better JavaScriptDart the Better JavaScript
Dart the Better JavaScript
 

Kürzlich hochgeladen

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 

Kürzlich hochgeladen (20)

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 

How to check valid Email? Find using regex.

  • 1. How to check valid email? Not only in Ruby brought to DRUG by Piotr Wasiak 20.02.2023 Find using RegEx(p?)
  • 2. Agenda 2 1. RegEx overview 2. Recommendations 3. Ruby quirks / amenities 4. Tools / Resources 5. Advanced RE(2) 6. Ruby 3.2 RE changes
  • 3. Who am I? Piotr Wasiak Ruby, Rails developer Current PRUG organiser 3 Interests: ● climbing, hiking, squash ● contract bridge, chess ● ruby, programming, crypto
  • 4. Regular Expression is a character sequence, that defines a search pattern The purpose is: ● validate the string by the pattern ● get parts of the content (e.g. find or find_and_replace in text editors) 4
  • 5. RegEx history ● Concept of language arose in the 1950s ● Different syntaxes (1980+): ○ POSIX (Basic - or Extended Regular Expressions) ○ Perl (influenced/imported to other languages as PCRE 1997, PCRE2 2015) 5
  • 6. RegEx as a state machine 6 Statement validation: /(?<name>ADAM|PIOTR)s?[=><]{1,2}s*"(?:PIENIĄDZ|KUKU)"/g
  • 8. Find RegEx In replace we can use matched whole phrase or groups. Group number is ordered by starting bracket index and is limited to 1 - 9 8
  • 9. Valid email (1/3) Rails popular gem solution: 9
  • 10. Valid email (2/3) 10 Email validation: /(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" (?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9] (?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0 bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
  • 11. Valid email (3/3) 11 Email validation: /(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|" (?:[x01-x08x0bx0cx0e-x1fx21x23-x5bx5d-x7f]|[x01-x09x0bx0c x0e-x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?.)+[a-z0-9] (?:[a-z0-9-]*[a-z0-9])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[x01-x08x0 bx0cx0e-x1fx21-x5ax5d-x7f]|[x01-x09x0bx0cx0e-x7f])+)])/g
  • 13. original_regexp = %r{(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]|[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[[:alnum:]](?:[a-z0-9 -]*[[:alnum:]])?.)+[[:alnum:]](?:[a-z0-9-]*[[:alnum:]])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[[:alnum:]]:(?:[x01-x08x0bx 0cx0e-x1f!-Z]-x7f]|[x01-x09x0bx0cx0e-x7f])+)])} alnum_with_hypen = /[a-z0-9-]/.source # posix alternative /[-[:alnum:]]/ ip_number_type = /25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/.source common_parts = /[x01-x08x0bx0cx0e-x1f]-x7f]/.source username_without_backslash_prepended_set = /[#{common_parts}!#-x5b]/.source domain_port_unescaped_set = /[#{common_parts}!-Z]/.source domain_port_escaped_chars_set = /[#{common_parts}x0e-x7f]/.source non_ending_chars = %r{[a-z0-9!#$%&'*+/=?^_`{|}~-]+}.source final_with_variables = /(?:#{non_ending_chars}(?:.#{non_ending_chars})*|"(?:#{username_without_backslash _prepended_set}|#{domain_port_escaped_chars_set})*")@(?:(?:[[:alnum:]](?:#{alnum _with_hypen}*[[:alnum:]])?.)+[[:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?|[(? :(?:#{ip_number_type}).){3}(?:#{ip_number_type}|#{alnum_with_hypen}*[[:alnum:]]:( ?:#{domain_port_unescaped_set}|#{domain_port_escaped_chars_set})+)])/ 13 Simplify valid email
  • 14. original_regexp = %r{(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[x01-x08x0bx0cx0e-x1f!#-x5b]-x7f]|[x01-x09x0bx0cx0e-x7f])*")@(?:(?:[[:alnum:]](?:[a-z0-9 -]*[[:alnum:]])?.)+[[:alnum:]](?:[a-z0-9-]*[[:alnum:]])?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[[:alnum:]]:(?:[x01-x08x0bx 0cx0e-x1f!-Z]-x7f]|[x01-x09x0bx0cx0e-x7f])+)])} alnum_with_hypen = /[a-z0-9-]/.source # posix alternative /[-[:alnum:]]/ ip_number_type = /25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/.source ascii_wo_tabs_cr_nl = /[[:ascii:]&&[^x09-x0ax0d]]/.source domain_port_escaped_chars_set = /[#{ascii_wo_tabs_cr_nl}x09x20"]/.source domain_port_unescaped_set = /[#{ascii_wo_tabs_cr_nl}&&[^x20]]/.source username = /[#{domain_port_unescaped_set}&&[^"]]/.source non_ending_chars = %r{[a-z0-9!#$%&'*+/=?^_`{|}~-]+}.source final_with_variables = /(?:#{non_ending_chars}(?:.#{non_ending_chars})*|"(?:#{username}|#{domain_port_ escaped_chars_set})*")@(?:(?:[[:alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?.)+[[ :alnum:]](?:#{alnum_with_hypen}*[[:alnum:]])?|[(?:(?:#{ip_number_type}).){3}(?:# {ip_number_type}|#{alnum_with_hypen}*[[:alnum:]]:(?:#{domain_port_unescaped_set}| #{domain_port_escaped_chars_set})+)])/ 14 Simplify valid email (more ruby version)
  • 15. original_regexp = %r{ # there is no heredoc for regexp (?: # strings with some special chars, but not ending with . [a-z0-9!#$%&'*+/=?^_`{|}~-]+ (?: .[a-z0-9!#$%&'*+/=?^_`{|}~-]+ )* | " (?: # special chars enquoted [x01-x08x0bx0cx0e-x1f!#-x5b]-x7f] | # prepended with backslash, here escaped [x01-x09x0bx0cx0e-x7f] # more special chars )* " # closing quote ) @ # the most crucial ampersand (?: # domain regexp (?: # at least one subdomain joined and finished with . [[:alnum:]] (?: [a-z0-9-]* # subdomain can have many alphanumeric or - inside [[:alnum:]] # subdomain have to finish with alphanumeric char )? . # dot separator )+ [[:alnum:]] # domain have to start with alphanumeric char (?: [a-z0-9-]* # domain can have many alphanumeric or - inside [[:alnum:]] # domain have to finish with alphanumeric char )? 15 /x comments mode | # or direct ip implementation or 3 numbers with . suffix and some special usecases [ # enquoted with square brackets (?: (?: # numbers are quite complex in RegEx 25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]? # 0-255 ). # . suffix ){3} # 3 times (?: 25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]? # 0-255 | # or 3 numbers with . suffix and some special usecases [a-z0-9-]* # alnums also starting with - [[:alnum:]] # finishing without - : (?: [x01-x08x0bx0cx0e-x1f!-Z]-x7f] # many chars | # more ansii chars prefixed with backslash [x01-x09x0bx0cx0e-x7f] )+ ) ] # closing square bracket ) }x # switch to treat spaces/new lines and `# ` suffix as comments
  • 16. Ruby simply string methods are faster and more meaningful: ● .start_with? / .end_with? ● .include?(‘some substring’) ● .chomp ● .strip ● .lines ● .split(‘ ’) # without regexp ● .tr(‘ !?‘, ‘1-9’) 16 Do not overuse regular expression (1/2)
  • 17. Libraries and gems for common concepts: ● URI(url) + .host / .path / .query / .fragment ● File(path_to_file) + .dirname / .basename / .extname ● Nokogiri::HTML( open('https://nokogiri.org/’) ) 17 Do not overuse regular expression (2/2)
  • 18. Do not use REGEX as language parser Programming languages depend more on language nodes/tree. There will be always a problem with some exceptions, different coding styles In Ruby we need to use Ripper or other tools to decompose Ruby code into pieces Markup languages can be parsed by e.g. Nokogiri, Ox, Oj gems easier and more secure 18
  • 19. Clear RegEx ● extract common parts in alternation ● put more likely to appear words in the front of alternation ● use comments and whitespace with /x modifier ● give a name for captured groups, use also non-captured ● split code to smaller logical pieces ● lint code with ruby -w for warnings 19
  • 20. 3. Ruby quirks / flavor 20
  • 21. mix ? Interpolation of RegEx MULTILINE IGNORECASE EXTENDED 21
  • 22. Joke Scrabble: what is a longest word from combined RE switch letters? 22 I M N O X
  • 23. Joke Scrabble: what is a longest word from combined RE switch letters? 23 I M N O X
  • 24. - in general "dot matches at line breaks mode" is turn on with s flag instead of ruby m flag - In Ruby, ^ and $ always match on every line. If you want to specify the beginning of the string, use A. For the very end of the string, use z (or Z including final line break). Quirks in Ruby RegEx engine (1/3) 24
  • 25. Quirks in Ruby RegEx engine (2/3) Ruby does not allow ● look-ahead ● negative look-behind inside a look-behind, such as: 25
  • 26. - Intersection […&&[…]] - Subtraction […&&[^…]] 26 Quirks in Ruby RegEx engine (3/3) Character classes operators
  • 30. 4. Tools / Resources 30
  • 31. Tools / Websites ● regex101.com/ nicest editor, explanation on hover, cheatset, performance analysis ● www.debuggex.com/ visualized graphs with cheat-set ● Visualization plugins for Visual Studio Code ● rubocop and rubocop-performance have some rules for regex ● rubular.com/ check if RegEx works in Ruby 2.5. Other with 2.1 ● rubyapi.org/3.1/o/regexp good Ruby docs 31
  • 32. 32
  • 35. Catastrophic backtracking case /a?n an =~ an / 35
  • 36. “Most modern engines are regex-directed because this is the only way to implement useful features such as lazy quantifiers and backreferences; and atomic grouping and possessive quantifiers that give extra control to backtracking.” PCRE like solutions 36
  • 37. 37
  • 38. 38
  • 39. Back to Finite Automaton - (D/N) FA 39 /abb*a/
  • 40. RegEx to Deterministic Finite Automaton What RegEx is it? 40
  • 41. RegEx to Deterministic Finite Automaton /(100?)*1/ matches: [ 1010101, 1, 10101, 1001001] 41
  • 42. RegEx to Deterministic Finite Automaton /(100?)*1/ 42
  • 43. RegEx to Deterministic Finite Automaton /(100?)*1/ 43
  • 45. 6. Ruby 3.2 RE changes 45 Regexp improvements against ReDoS It is known that Regexp matching may take unexpectedly long. If your code attempts to match a possibly inefficient Regexp against an untrusted input, an attacker may exploit it for efficient Denial of Service
  • 47. ReDoS improvements (2/2) 47 Improved Regexp matching algorithm using a memoization technique
  • 48. Sources 48 ● devopedia.org/regex-engines ● patshaughnessy.net/2012/4/3/ (...) rubys-regular-expression-algorithm ● github.com/google/re2/wiki/Syntax ● optimized re2 called hyperscan ● wiki/Determinizacja_automatu_skonczonego ● regular-expressions.info/refrepeat.html ● rexegg.com/regex-optimizations.html ● bugs.ruby-lang.org/issues/19104 selective memiozation
  • 49. Thanks for listening What’s your question? 49