SlideShare ist ein Scribd-Unternehmen logo
1 von 63
Downloaden Sie, um offline zu lesen
Some people, when faced with a problem think,
         “I know, I’ll use regular expressions”.
         Now they have two problems.




         I’d rather have one problem.




         Treetop • Roland Swingler • LRUG May 2009

Tuesday, 19 May 2009

This quotation is used a lot in presentations, normally before the presenter delves into some
gnarly regexps. I’m looking for a better way.
Example 1



Tuesday, 19 May 2009
Tuesday, 19 May 2009

I run a film listing site: http://filmli.st. All the data is scraped from other sites - getting the
data is easy with net/http or httparty or similar and then parsing the html with nokogiri or
hpricot, but...
<span>
               Fri/Sun-Tue 10.45 12.30 (Tue) 12.40 (not Tue)
               4.00 7.00 9.30; Wed 3.00 7.30 9.00
               </span>




Tuesday, 19 May 2009

... you still need to turn a text string like this into a list of Times so you can do interesting
things with it. Regexps? No. That way lies madness.
Example 2



Tuesday, 19 May 2009
Tuesday, 19 May 2009

Chatroom bots need to be able to distinguish between messages that they should take
actions on and those which they should ignore. How should we define what messages they
should listen out for?
/^s*whereiss+(.+?)(?:s+(?:ons+)?(.+?))?s*$/




Tuesday, 19 May 2009

Regular expressions? Pretty confusing.
whereis <person> [[on] <day>]




Tuesday, 19 May 2009

Much nicer to have a simpler language.
Example 3



Tuesday, 19 May 2009
Scenario: producing human-readable tests
                 Given I have non-technical stakeholders
                 When I write some integration tests
                 Then they should be understandable by everyone




Tuesday, 19 May 2009

Wouldn’t it be great if someone had written a library like this?
Tuesday, 19 May 2009

They have! Cucumber. Cucumber’s implementation got me started looking into...
Tuesday, 19 May 2009

Treetop. A ruby Parsing Expression Grammar. Basically a parser generator, but really simple.
What is a parser?



Tuesday, 19 May 2009

A parser determines whether strings are syntactically valid according to a set of rules known
as a grammar.
Yes / No



Tuesday, 19 May 2009

From a theoretical viewpoint, parsers just say true or false, depending on whether the string
is valid or not.
Syntax Tree



Tuesday, 19 May 2009

Not so useful, so instead we get back a syntax tree we can do useful things with.
whereis <person> [on <day>]




Tuesday, 19 May 2009

Lets try building a tree for this example. You can consider a string to be a list of characters,
but to start getting meaning from it, you need a tree.
words          words

                        whereis <person> [on <day>]




Tuesday, 19 May 2009

We have some words...
words   variable   words   variable

                       whereis <person>    [on     <day>]




Tuesday, 19 May 2009

variables...
optional part

                        words      variable      words       variable

                       whereis <person>            [on        <day>]




Tuesday, 19 May 2009

an optional part of an expression (enclosed with square brackets)
expression

                                             optional part

                       words     variable   words   variable

                       whereis <person>      [on     <day>]




Tuesday, 19 May 2009

and a root node for the whole expression
grammar Message
               end




Tuesday, 19 May 2009

lets build that up in treetop. Each of those four types of node in the tree is going to have a
rule. We write these rules in a grammar - you think of it like a ruby module.
grammar Message
                 rule expression
                   (words / variable / optional_part)+
                 end
               end




Tuesday, 19 May 2009

The first rule for the whole expression. Lots of things should be familiar from regular
expressions - ‘+’ for one or more, brackets for grouping, and ‘/’ is like the regexp ‘|’ for
alternation. So this says an expression is one or more words, variables or optional parts, in
any order.
grammar Message
                 rule expression
                   (words / variable / optional_part)+
                 end

                 rule words
                   [^><[]]+
                 end
               end




Tuesday, 19 May 2009

words - character classes, just like regexps
grammar Message
                 rule expression
                   (words / variable / optional_part)+
                 end

                       rule words
                         [^><[]]+
                       end

                 rule variable
                   '<' identifier:( [a-zA-Z_] [a-zA-Z_0-9 ]* ) '>'
                 end
               end




Tuesday, 19 May 2009

variables are enclosed with angle brackets, can be any valid ruby identifier string, and are
labeled so we can use part of the text later.
grammar Message
                 rule expression
                   (words / variable / optional_part)+
                 end

                       rule words
                         [^><[]]+
                       end

                       rule variable
                         '<' identifier:( [a-zA-Z_] [a-zA-Z_0-9 ]* ) '>'
                       end

                 rule optional_part
                   quot;[quot; expression quot;]quot;
                 end
               end


Tuesday, 19 May 2009

optional parts are enclosed with square brackets. Here we see that rules can be recursive -
which makes the parser significantly more powerful than regular expressions.
$ tt message.treetop




Tuesday, 19 May 2009

We compile the grammar with the command line tt command - you can also load grammars
dynamicaly
require ‘message’

               parser = MessageParser.new
               tree = parser.parse(“whereis <person>...”)




Tuesday, 19 May 2009

this gives us a parser we can call from ruby code
require ‘message’

               parser = MessageParser.new
               tree = parser.parse(“whereis <person>...”)

               tree.elements[0].text_value
               #=> “whereis ”

               tree.elements[1].identifier.text_value
               #=> “person”




Tuesday, 19 May 2009

each node knows about its children and its text_value. The label we defined earlier provides
sugar methods to access particular subnodes.
Fri/Sun-Tue 4.00 7.00




Tuesday, 19 May 2009

Another example. This time we’ll think about the tree in a top down fashion rather than
bottom up. This is closer to how treetop will actually evaluate an expression.
expression




                       Fri/Sun-Tue 4.00 7.00




Tuesday, 19 May 2009
expression

                         days                       times




                       Fri/Sun-Tue                4.00 7.00




Tuesday, 19 May 2009
expression

                                 days                             times

                       day        day range                time           time



                       Fri   /     Sun-Tue                 4.00           7.00




Tuesday, 19 May 2009
expression

                               days                               times

                       day      day range                 time                time

                               day     day           hrs       mins   hrs          mins

                       Fri /   Sun -   Tue            4    .     00       7    .     00




Tuesday, 19 May 2009
rule expression
                 days “ ” times
               end




Tuesday, 19 May 2009
rule times
                 time (“ ” time)+
               end

               rule time
                 hours “.” minutes
               end

               rule hours
                 1 [0-2] / [0-9]
               end

               rule minutes
                 [0-5] [0-9]
               end



Tuesday, 19 May 2009
rule days
                 (day !“-” / day_range) (“/” days)?
               end

               rule day_range
                 day “-” day
               end

               rule day
                 “Mon”/“Tue”/“Wed”/“Thu”/“Fri”/“Sat”/“Sun”
               end




Tuesday, 19 May 2009

The bit highlighted in red is a negative lookahead assertion. We need this because treetop
evaluates alternatives from left to right - if we didn’t have the assertion then Sun-Tue would
match Sun as a Day, not a DayRange, and we’d be left with “-Tue” which isn’t valid.
Enriching Nodes



Tuesday, 19 May 2009

Adding in some semantics
rule time
                 hours “.” minutes
               end


               irb> aTimeNode.text_value #=> “9.00”
               irb> aTimeNode.elements.size #=> 3
               irb> aTimeNode.hours.text_value #=> “9”




Tuesday, 19 May 2009
rule time
                 hours “.” minutes {
                   def to_seconds
                     hours.to_i * 60 * 60 + minutes.to_i * 60
                   end
                 }
               end


               irb> aTimeNode.text_value #=> “9.00”
               irb> aTimeNode.to_seconds #=> 32400




Tuesday, 19 May 2009

We can add in methods inline in the grammar. This is just like a module scope, and we can
do any ruby we like in here.
# in film_time.treetop
               rule time
                 hours “.” minutes <TimeNode>
               end

               # in another .rb file
               class TimeNode < Treetop::Runtime::SyntaxNode
                 def to_seconds
                   hours.to_i * 60 * 60 + minutes.to_i * 60
                 end
               end




Tuesday, 19 May 2009

Cleaner in my mind to split these out into actual subclasses of SyntaxNode - keeps the
grammar more readable. In some cases you need to have modules rather than subclasses.
Interpretation &
                         Compilation



Tuesday, 19 May 2009

We’re going to build up a regular expression for the bot example. Each node will be
reponsible for building a different part of the regexp.
expression

                                            optional part

                        words   variable   words   variable

                       whereis <person>    [on      <day>]
                        /^whereis (.+?)(?:s+on (.+?))?$/




Tuesday, 19 May 2009
expression

                                            optional part

                        words   variable   words   variable

                       whereis <person>    [on      <day>]
                        /^whereis (.+?)(?:s+on (.+?))?$/




Tuesday, 19 May 2009
expression

                                            optional part

                        words   variable   words   variable

                       whereis <person>    [on      <day>]
                        /^whereis (.+?)(?:s+on (.+?))?$/




Tuesday, 19 May 2009
expression

                                            optional part

                        words   variable   words   variable

                       whereis <person>    [on      <day>]
                        /^whereis (.+?)(?:s+on (.+?))?$/




Tuesday, 19 May 2009
expression

                                            optional part

                        words   variable   words   variable

                       whereis <person>    [on      <day>]
                        /^whereis (.+?)(?:s+on (.+?))?$/




Tuesday, 19 May 2009
Interpreter Pattern



Tuesday, 19 May 2009

This is confusing - it comes from GoF. Actually we’re doing compilation here. Each node gets
an interpret method - you treat the syntax tree as a composite.
# expression
               def interpret
                 children = elements.map {|node| node.interpret }
                 RegExp.compile(“^” + children.join + “$”)
               end




Tuesday, 19 May 2009
# words
               def interpret
                 Regexp.escape(text_value)
               end




Tuesday, 19 May 2009
# variable
               def interpret
                 “(.+?)”
               end




Tuesday, 19 May 2009
# optional_part
               def interpret
                 children = elements.map {|node| node.interpret }
                 “(?:s+” + children.join + “)?”
               end




Tuesday, 19 May 2009
Adding context



Tuesday, 19 May 2009

For anything more than a simple language, you’ll need to pass around context as you
interpret the tree.
# expression
               def interpret(context=[])
                 children = elements.map do |node|
                   node.interpret(context)
                 end
                 matcher = RegExp.new(“^” + children.join + “$”)
                 ...




Tuesday, 19 May 2009

In our case we just want to record the list of variable names, so an Array will suffice. Each
interpret method now needs to take this context.
# variable
               def interpret(context)
                 context << identifier.text_value.to_sym
                 “(.+?)”
               end




Tuesday, 19 May 2009
# expression
               def interpret(context=[])
                 children = elements.map do |node|
                   node.interpret(context)
                 end
                 matcher = RegExp.new(“^” + children.join + “$”)

                 class << matcher
                   send(:define_method, :variables) do
                     context
                   end
                 end
                 matcher
               end



Tuesday, 19 May 2009

we decorate the regular expression with a list of the variables. In the real code, the returned
match objects are also decorated so you have methods for each variable and don’t have to
remember the captured groups by position
Other Options



Tuesday, 19 May 2009

You can also build external interpreters / compilers that use the tree
Complications?



Tuesday, 19 May 2009
# We want to write:
               hello [world]

               # We actually mean:
               hello[ world]




Tuesday, 19 May 2009

whitespace shuffling. In the reall code, grammar is more complicated - most of the
complication comes from dealing with edge cases here
# We should optimize:
               hello [[[world]]]

               # To this:
               hello [world]




Tuesday, 19 May 2009

This isn’t done in the real code, but should be.
# Left recursion without consuming input BAD:
               rule infinity_and_beyond
                 infinity_and_beyond / “foo”
               end




Tuesday, 19 May 2009
Problems?



Tuesday, 19 May 2009

Slow.
Other libraries



Tuesday, 19 May 2009

Racc - accepts yacc grammars. Racc runtime is part of the ruby std dist. so once you’ve built
your parser there is no dependency. Ragel - used by mongrel/thin.
Thanks!

         Twitter: @knaveofdiamonds

         XMPP bot:
         http://github.com/knaveofdiamonds/harken

         Film listings for London’s indie cinemas:
         http://filmli.st


         Treetop:
         http://github.com/nathansobo/treetop
         http://treetop.rubyforge.org



Tuesday, 19 May 2009

Weitere ähnliche Inhalte

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Treetop - I'd rather have one problem

  • 1. Some people, when faced with a problem think, “I know, I’ll use regular expressions”. Now they have two problems. I’d rather have one problem. Treetop • Roland Swingler • LRUG May 2009 Tuesday, 19 May 2009 This quotation is used a lot in presentations, normally before the presenter delves into some gnarly regexps. I’m looking for a better way.
  • 3. Tuesday, 19 May 2009 I run a film listing site: http://filmli.st. All the data is scraped from other sites - getting the data is easy with net/http or httparty or similar and then parsing the html with nokogiri or hpricot, but...
  • 4. <span> Fri/Sun-Tue 10.45 12.30 (Tue) 12.40 (not Tue) 4.00 7.00 9.30; Wed 3.00 7.30 9.00 </span> Tuesday, 19 May 2009 ... you still need to turn a text string like this into a list of Times so you can do interesting things with it. Regexps? No. That way lies madness.
  • 6. Tuesday, 19 May 2009 Chatroom bots need to be able to distinguish between messages that they should take actions on and those which they should ignore. How should we define what messages they should listen out for?
  • 7. /^s*whereiss+(.+?)(?:s+(?:ons+)?(.+?))?s*$/ Tuesday, 19 May 2009 Regular expressions? Pretty confusing.
  • 8. whereis <person> [[on] <day>] Tuesday, 19 May 2009 Much nicer to have a simpler language.
  • 10. Scenario: producing human-readable tests Given I have non-technical stakeholders When I write some integration tests Then they should be understandable by everyone Tuesday, 19 May 2009 Wouldn’t it be great if someone had written a library like this?
  • 11. Tuesday, 19 May 2009 They have! Cucumber. Cucumber’s implementation got me started looking into...
  • 12. Tuesday, 19 May 2009 Treetop. A ruby Parsing Expression Grammar. Basically a parser generator, but really simple.
  • 13. What is a parser? Tuesday, 19 May 2009 A parser determines whether strings are syntactically valid according to a set of rules known as a grammar.
  • 14. Yes / No Tuesday, 19 May 2009 From a theoretical viewpoint, parsers just say true or false, depending on whether the string is valid or not.
  • 15. Syntax Tree Tuesday, 19 May 2009 Not so useful, so instead we get back a syntax tree we can do useful things with.
  • 16. whereis <person> [on <day>] Tuesday, 19 May 2009 Lets try building a tree for this example. You can consider a string to be a list of characters, but to start getting meaning from it, you need a tree.
  • 17. words words whereis <person> [on <day>] Tuesday, 19 May 2009 We have some words...
  • 18. words variable words variable whereis <person> [on <day>] Tuesday, 19 May 2009 variables...
  • 19. optional part words variable words variable whereis <person> [on <day>] Tuesday, 19 May 2009 an optional part of an expression (enclosed with square brackets)
  • 20. expression optional part words variable words variable whereis <person> [on <day>] Tuesday, 19 May 2009 and a root node for the whole expression
  • 21. grammar Message end Tuesday, 19 May 2009 lets build that up in treetop. Each of those four types of node in the tree is going to have a rule. We write these rules in a grammar - you think of it like a ruby module.
  • 22. grammar Message rule expression (words / variable / optional_part)+ end end Tuesday, 19 May 2009 The first rule for the whole expression. Lots of things should be familiar from regular expressions - ‘+’ for one or more, brackets for grouping, and ‘/’ is like the regexp ‘|’ for alternation. So this says an expression is one or more words, variables or optional parts, in any order.
  • 23. grammar Message rule expression (words / variable / optional_part)+ end rule words [^><[]]+ end end Tuesday, 19 May 2009 words - character classes, just like regexps
  • 24. grammar Message rule expression (words / variable / optional_part)+ end rule words [^><[]]+ end rule variable '<' identifier:( [a-zA-Z_] [a-zA-Z_0-9 ]* ) '>' end end Tuesday, 19 May 2009 variables are enclosed with angle brackets, can be any valid ruby identifier string, and are labeled so we can use part of the text later.
  • 25. grammar Message rule expression (words / variable / optional_part)+ end rule words [^><[]]+ end rule variable '<' identifier:( [a-zA-Z_] [a-zA-Z_0-9 ]* ) '>' end rule optional_part quot;[quot; expression quot;]quot; end end Tuesday, 19 May 2009 optional parts are enclosed with square brackets. Here we see that rules can be recursive - which makes the parser significantly more powerful than regular expressions.
  • 26. $ tt message.treetop Tuesday, 19 May 2009 We compile the grammar with the command line tt command - you can also load grammars dynamicaly
  • 27. require ‘message’ parser = MessageParser.new tree = parser.parse(“whereis <person>...”) Tuesday, 19 May 2009 this gives us a parser we can call from ruby code
  • 28. require ‘message’ parser = MessageParser.new tree = parser.parse(“whereis <person>...”) tree.elements[0].text_value #=> “whereis ” tree.elements[1].identifier.text_value #=> “person” Tuesday, 19 May 2009 each node knows about its children and its text_value. The label we defined earlier provides sugar methods to access particular subnodes.
  • 29. Fri/Sun-Tue 4.00 7.00 Tuesday, 19 May 2009 Another example. This time we’ll think about the tree in a top down fashion rather than bottom up. This is closer to how treetop will actually evaluate an expression.
  • 30. expression Fri/Sun-Tue 4.00 7.00 Tuesday, 19 May 2009
  • 31. expression days times Fri/Sun-Tue 4.00 7.00 Tuesday, 19 May 2009
  • 32. expression days times day day range time time Fri / Sun-Tue 4.00 7.00 Tuesday, 19 May 2009
  • 33. expression days times day day range time time day day hrs mins hrs mins Fri / Sun - Tue 4 . 00 7 . 00 Tuesday, 19 May 2009
  • 34. rule expression days “ ” times end Tuesday, 19 May 2009
  • 35. rule times time (“ ” time)+ end rule time hours “.” minutes end rule hours 1 [0-2] / [0-9] end rule minutes [0-5] [0-9] end Tuesday, 19 May 2009
  • 36. rule days (day !“-” / day_range) (“/” days)? end rule day_range day “-” day end rule day “Mon”/“Tue”/“Wed”/“Thu”/“Fri”/“Sat”/“Sun” end Tuesday, 19 May 2009 The bit highlighted in red is a negative lookahead assertion. We need this because treetop evaluates alternatives from left to right - if we didn’t have the assertion then Sun-Tue would match Sun as a Day, not a DayRange, and we’d be left with “-Tue” which isn’t valid.
  • 37. Enriching Nodes Tuesday, 19 May 2009 Adding in some semantics
  • 38. rule time hours “.” minutes end irb> aTimeNode.text_value #=> “9.00” irb> aTimeNode.elements.size #=> 3 irb> aTimeNode.hours.text_value #=> “9” Tuesday, 19 May 2009
  • 39. rule time hours “.” minutes { def to_seconds hours.to_i * 60 * 60 + minutes.to_i * 60 end } end irb> aTimeNode.text_value #=> “9.00” irb> aTimeNode.to_seconds #=> 32400 Tuesday, 19 May 2009 We can add in methods inline in the grammar. This is just like a module scope, and we can do any ruby we like in here.
  • 40. # in film_time.treetop rule time hours “.” minutes <TimeNode> end # in another .rb file class TimeNode < Treetop::Runtime::SyntaxNode def to_seconds hours.to_i * 60 * 60 + minutes.to_i * 60 end end Tuesday, 19 May 2009 Cleaner in my mind to split these out into actual subclasses of SyntaxNode - keeps the grammar more readable. In some cases you need to have modules rather than subclasses.
  • 41. Interpretation & Compilation Tuesday, 19 May 2009 We’re going to build up a regular expression for the bot example. Each node will be reponsible for building a different part of the regexp.
  • 42. expression optional part words variable words variable whereis <person> [on <day>] /^whereis (.+?)(?:s+on (.+?))?$/ Tuesday, 19 May 2009
  • 43. expression optional part words variable words variable whereis <person> [on <day>] /^whereis (.+?)(?:s+on (.+?))?$/ Tuesday, 19 May 2009
  • 44. expression optional part words variable words variable whereis <person> [on <day>] /^whereis (.+?)(?:s+on (.+?))?$/ Tuesday, 19 May 2009
  • 45. expression optional part words variable words variable whereis <person> [on <day>] /^whereis (.+?)(?:s+on (.+?))?$/ Tuesday, 19 May 2009
  • 46. expression optional part words variable words variable whereis <person> [on <day>] /^whereis (.+?)(?:s+on (.+?))?$/ Tuesday, 19 May 2009
  • 47. Interpreter Pattern Tuesday, 19 May 2009 This is confusing - it comes from GoF. Actually we’re doing compilation here. Each node gets an interpret method - you treat the syntax tree as a composite.
  • 48. # expression def interpret children = elements.map {|node| node.interpret } RegExp.compile(“^” + children.join + “$”) end Tuesday, 19 May 2009
  • 49. # words def interpret Regexp.escape(text_value) end Tuesday, 19 May 2009
  • 50. # variable def interpret “(.+?)” end Tuesday, 19 May 2009
  • 51. # optional_part def interpret children = elements.map {|node| node.interpret } “(?:s+” + children.join + “)?” end Tuesday, 19 May 2009
  • 52. Adding context Tuesday, 19 May 2009 For anything more than a simple language, you’ll need to pass around context as you interpret the tree.
  • 53. # expression def interpret(context=[]) children = elements.map do |node| node.interpret(context) end matcher = RegExp.new(“^” + children.join + “$”) ... Tuesday, 19 May 2009 In our case we just want to record the list of variable names, so an Array will suffice. Each interpret method now needs to take this context.
  • 54. # variable def interpret(context) context << identifier.text_value.to_sym “(.+?)” end Tuesday, 19 May 2009
  • 55. # expression def interpret(context=[]) children = elements.map do |node| node.interpret(context) end matcher = RegExp.new(“^” + children.join + “$”) class << matcher send(:define_method, :variables) do context end end matcher end Tuesday, 19 May 2009 we decorate the regular expression with a list of the variables. In the real code, the returned match objects are also decorated so you have methods for each variable and don’t have to remember the captured groups by position
  • 56. Other Options Tuesday, 19 May 2009 You can also build external interpreters / compilers that use the tree
  • 58. # We want to write: hello [world] # We actually mean: hello[ world] Tuesday, 19 May 2009 whitespace shuffling. In the reall code, grammar is more complicated - most of the complication comes from dealing with edge cases here
  • 59. # We should optimize: hello [[[world]]] # To this: hello [world] Tuesday, 19 May 2009 This isn’t done in the real code, but should be.
  • 60. # Left recursion without consuming input BAD: rule infinity_and_beyond infinity_and_beyond / “foo” end Tuesday, 19 May 2009
  • 62. Other libraries Tuesday, 19 May 2009 Racc - accepts yacc grammars. Racc runtime is part of the ruby std dist. so once you’ve built your parser there is no dependency. Ragel - used by mongrel/thin.
  • 63. Thanks! Twitter: @knaveofdiamonds XMPP bot: http://github.com/knaveofdiamonds/harken Film listings for London’s indie cinemas: http://filmli.st Treetop: http://github.com/nathansobo/treetop http://treetop.rubyforge.org Tuesday, 19 May 2009