Scala Parser Combinators - Scalapeno Lightning Talk

•

0 gefällt mir•299 views

The document discusses parsing and Scala parser combinators. It provides an example of using parser combinators to define a parser that parses a line of text into a WordFreq case class with a string and integer field. The parser combinators approach allows defining parsing functions that are combined to parse more complex structures. This provides a robust yet easier way to define parsers compared to other options like hand-written parsers.

Software

Why? How?
- DSLs Everywhere
- Also parsing in general
- Internal vs. External
Lots of options for parsing
- String.split
- RegEx
- Hand-Written
- Parser Generators
From simple and fragile to
robust and complex
Goal: Robust, but easier to create (and in Scala…)

Parser Combinators
The “functional way” of writing a parser
A parsing function:
Character Stream → Result[Character Stream, T]
Combine functions into more complicated patterns
- Sequence
- Choice
- ...
Not only in Scala; but we’ll focus on the standard Scala Parser Combinators.

$Scala Parser Combinators - Simple Example Input: A line consisting of a word (string) and its count (number), e.g. “johnny 12” case class WordFreq(word: String, count: Int) { override def toString = "Word <" + word + "> " + "occurs with frequency " + count } And we’d like to use it in a program: object TestSimpleParser extends SimpleParser { def main(args: Array[String]) = { parse(freq, "johnny 12") match { case Success(matched,_) => println(matched) case Failure(msg,_) => println("FAILURE: " + msg) case Error(msg,_) => println("ERROR: " + msg) } } }$

$Scala Parser Combinators - Simple Example Then a possible parser is: class SimpleParser extends RegexParsers { def word: Parser[String] = """[a-z]+""".r ^^ { _.toString } def number: Parser[Int] = """(0|[1-9]d*)""".r ^^ { _.toInt } def freq: Parser[WordFreq] = word ~ number ^^ { case wd ~ fr => WordFreq(wd,fr) } } The basic pattern: - Defining functions for parsing simple strings - Map matched strings into more meaningful object model - Combine results into more complex structures$

Actimize Profiling Language
Context:
- A data profiling engine
- Aggregations, functions, metadata
- Highly customizable by clients,
professional services, etc.
- Existing interface: XML
configuration files

Parser
Defines the whole
syntax in one class
~200 lines

Summary
Where? Parsing – DSLs and Beyond
Why? Building a robust parser is complicated
How? Scala Parser Combinators

Weitere ähnliche Inhalte

Was ist angesagt?

MongoDBkesavan N B

Useful JMeter functions for scriptingTharinda Liyanage

Sqltech4us

Building a Tagless Final DSL for WebGLLuka Jacobowitz

Array within a classAAKASH KUMAR

Python Programming - Files & ExceptionsOmid AmirGhiasvand

Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Data Con LA

Manipulating stringsJancypriya M

C Homework HelpProgramming Homework Help

Reactive Programming in the Browser feat. Scala.js and PureScriptLuka Jacobowitz

Algorithm and Programming (Array)Adam Mukharil Bachtiar

05. haskell streaming ioSebastian Rettig

Jsonprimeteacher32

Dictionary in pythonvikram mahendra

RedisConf17 - Redis as a JSON document storeRedis Labs

Property Based TesingMårten Rånge

Arrays in CKamruddin Nur

Python - Lecture 12Ravi Kiran Khareedi

Exp 6.1 d-422-1Omkar Rane

New features in Ruby 2.4Ireneusz Skrobiś

Was ist angesagt? (20)

MongoDB

Useful JMeter functions for scripting

Sql

Building a Tagless Final DSL for WebGL

Array within a class

Python Programming - Files & Exceptions

Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...

Manipulating strings

C Homework Help

Reactive Programming in the Browser feat. Scala.js and PureScript

Algorithm and Programming (Array)

05. haskell streaming io

Json

Dictionary in python

RedisConf17 - Redis as a JSON document store

Property Based Tesing

Arrays in C

Python - Lecture 12

Exp 6.1 d-422-1

New features in Ruby 2.4

Ähnlich wie Scala Parser Combinators - Scalapeno Lightning Talk

Scala in Places APIŁukasz Bałamut

Scala for Java ProgrammersEric Pederson

Introduction to Scalding and MonoidsHugo Gävert

Get started with R langsenthil0809

Morel, a data-parallel programming languageJulian Hyde

Testing batch and streaming Spark applicationsŁukasz Gawron

[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark ApplicationsFuture Processing

Real-Time Spark: From Interactive Queries to StreamingDatabricks

Algorithm and Programming (Introduction of dev pascal, data type, value, and ...Adam Mukharil Bachtiar

Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...confluent

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Databricks

CS4200 2019 | Lecture 4 | Syntactic ServicesEelco Visser

Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks

Scala Talk at FOSDEM 2009Martin Odersky

AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...GeeksLab Odessa

Real Time Big Data ManagementAlbert Bifet

Introduction to Spark with ScalaHimanshu Gupta

A Brief Overview of (Static) Program Query LanguagesKim Mens

CS101- Introduction to Computing- Lecture 29Bilal Ahmed

Spark workshopWojciech Pituła

Ähnlich wie Scala Parser Combinators - Scalapeno Lightning Talk (20)

Scala in Places API

Scala for Java Programmers

Introduction to Scalding and Monoids

Get started with R lang

Morel, a data-parallel programming language

Testing batch and streaming Spark applications

[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications

Real-Time Spark: From Interactive Queries to Streaming

Algorithm and Programming (Introduction of dev pascal, data type, value, and ...

Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...

CS4200 2019 | Lecture 4 | Syntactic Services

Introducing DataFrames in Spark for Large Scale Data Science

Scala Talk at FOSDEM 2009

AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...

Real Time Big Data Management

Introduction to Spark with Scala

A Brief Overview of (Static) Program Query Languages

CS101- Introduction to Computing- Lecture 29

Spark workshop

Kürzlich hochgeladen

Generic or specific? Making sensible software design decisionsBert Jan Schrijver

AI & Machine Learning Presentation TemplatePresentation.STUDIO

Software Quality Assurance Interview QuestionsArshad QA

Announcing Codolex 2.0 from GDK SoftwareJim McKeeth

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Define the academic and professional writing..pdfPearlKirahMaeRagusta1

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10

%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburgmasabamasaba

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan

10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba

AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek

Exploring the Best Video Editing App.pdfproinshot.com

SHRMPro HRMS Software Solutions PresentationShrmpro

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health

Right Money Management App For Your Financial GoalsJhone kinadey

%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba

Architecture decision records - How not to get lost in the pastPapp Krisztián

Kürzlich hochgeladen (20)

Generic or specific? Making sensible software design decisions

AI & Machine Learning Presentation Template

Software Quality Assurance Interview Questions

Announcing Codolex 2.0 from GDK Software

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Define the academic and professional writing..pdf

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf

%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...

10 Trends Likely to Shape Enterprise Technology in 2024

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques

Exploring the Best Video Editing App.pdf

SHRMPro HRMS Software Solutions Presentation

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

Right Money Management App For Your Financial Goals

%in kempton park+277-882-255-28 abortion pills for sale in kempton park

Architecture decision records - How not to get lost in the past

Scala Parser Combinators - Scalapeno Lightning Talk

2. Why? How? - DSLs Everywhere - Also parsing in general - Internal vs. External Lots of options for parsing - String.split - RegEx - Hand-Written - Parser Generators From simple and fragile to robust and complex Goal: Robust, but easier to create (and in Scala…)

3. Parser Combinators The “functional way” of writing a parser A parsing function: Character Stream → Result[Character Stream, T] Combine functions into more complicated patterns - Sequence - Choice - ... Not only in Scala; but we’ll focus on the standard Scala Parser Combinators.

4. Scala Parser Combinators - Simple Example Input: A line consisting of a word (string) and its count (number), e.g. “johnny 12” case class WordFreq(word: String, count: Int) { override def toString = "Word <" + word + "> " + "occurs with frequency " + count } And we’d like to use it in a program: object TestSimpleParser extends SimpleParser { def main(args: Array[String]) = { parse(freq, "johnny 12") match { case Success(matched,_) => println(matched) case Failure(msg,_) => println("FAILURE: " + msg) case Error(msg,_) => println("ERROR: " + msg) } } }

5. Scala Parser Combinators - Simple Example Then a possible parser is: class SimpleParser extends RegexParsers { def word: Parser[String] = """[a-z]+""".r ^^ { _.toString } def number: Parser[Int] = """(0|[1-9]d*)""".r ^^ { _.toInt } def freq: Parser[WordFreq] = word ~ number ^^ { case wd ~ fr => WordFreq(wd,fr) } } The basic pattern: - Defining functions for parsing simple strings - Map matched strings into more meaningful object model - Combine results into more complex structures

6. Actimize Profiling Language Context: - A data profiling engine - Aggregations, functions, metadata - Highly customizable by clients, professional services, etc. - Existing interface: XML configuration files

7. Defining a Profile

8. Model How a profile object model should “ideally” look like. POSOs

9. Parser Defines the whole syntax in one class ~200 lines

10. Summary Where? Parsing – DSLs and Beyond Why? Building a robust parser is complicated How? Scala Parser Combinators

Hinweis der Redaktion

Whether we know it or not, we use DSLs in all sorts of places. For example, the SBT tool has a DSL for the domain of building code, rule engines often have DSLs, etc. This is not so much the topic of this talk, but i am pointing this out to clarify the motivation. Also, parsing in general is useful beyond the context of some user-facing DSL. We sometimes need to parse messages, for example, that are not necessarily in some well known format. One prominent example of this that I know of is parsing financial protocols, but there are more. One important distinction we need to make in this context is the distinction between Internal and External DSLs. Internal DSLs are basically DSLs that are defined in the syntax of some host language. Meaning: syntactically, they are a subset of some other, usually more general purpose language. The language used to define build in SBT is one such example. Ruby and Scala are usually popular choices for hosts of such language, given their flexible syntax. External DSLs are languages defined in a way that’s decoupled from any host language - the syntax is usually defined from scratch, and needs to be parsed on its own. The pros and cons of each language are a debate matter on its own, which we don’t have time to dive into right now; and I’m sure you can think of advantages to each. In this talk i’m focusing on the External DSLs - those that require specialized parsing. ---- Parsing is of course the problem of turning a stream of characters into something more meaningful to the program at hand. We have all sorts of ways to parse strings, as you can see here (not an exhaustive list). Some of these way are fairly simple to write, but are then less robust. While the more robust ways, e.g. ANTLR, or handwritten parsers, are usually more robust but significantly more complex to write and/or integrate with our system. Note: it’s not that it’s not possible or not good to use these methods, they are all good in some circumstances. What i would like to present here is another way to do this, that’s naturally available in Scala as well, and i believe provides a fair tradeoff between robustness and complexity.
The idea of parser combinators isn’t actually very new or unique to Scala, or very complicated. The idea is fairly simple: take a function that identifies a certain string (parses it), and combine it with other functions for other strings, to create a more complete parser. Like i said, this isn’t unique to Scala, but it has been part of the Scala standard library up to Scala 2.11 when it was separated. Here we’ll focus on this library.
This is a very simple example. Our task here is to basically parse a file of simple one line entries where each line is a word and a number - the word and how much it appears somewhere. We have a simple model class here - WordFreq. Basically a tuple of the word and its count. And we see here how this parser is used - that’s the bold part here. The “matched” variable here is bound to an instance of “WordFreq”.
And this is how the parser is actually implemented using parser combinators. Each parser function is defined as a function in the parser class. We define here two simple functions, parsing a word and a number using simple regular expressions. The 3rd method - ‘freq’ - is actually created using a sequence combinator. Essentially creating a new parser function out of the other two, when it appears in sequence in the input. Note that the output type of that method is in fact the model class we defined earlier. This very simple parser already illustrates the basic pattern: Defining functions for parsing strings Map the matched strings into a more meaningful object model (String, Int, WordFreq in our case). Combine the results of each function into more complex structures. Note how in this case the ‘fr’ is in fact of type Int - the result of parsing, as defined by the ‘number’ function.
And now to a more interesting case. Just to set the context, in Actimize we deal with a lot of data profiling. As a result, we developed a fairly robust profiling engine that allows us to express rather complicated profiles, including customization by clients, etc. The engine works pretty well. The problem is that its interface isn’t great - it’s basically huge XML files.
We wanted to achieve something like this, where in ~32 lines of code we define the same profile, also in a way that’s a lot more convenient to read and write.
And this is an example of the model classes used in defining the profile. This should be the result of the parsing, like the ‘WordFreq’ class in the previous example. And from these we can do more interesting stuff, for example generating the necessary SQL statements. This has classes for the different metadata elements, defining mappings, filters, etc. In our initial implementation right now we basically just generated the same XML file and let the existing engine work as it is, but we can of course skip that step.
And this is the actual parse code. The whole parser is roughly 200 lines of code. We can see here that it’s a simple class, extending one of the Scala Parser Combinators classes, and adding the definitions for the different parsers. We start with simple keyword definitions, but then move on to combine them into complete statements, and map them into the concrete parsed results. Given this, the compiler implementation is fairly straightforward - just serialize these classes into XMLs; I used an existing schema, with JAXB to generate the XML in this case.

Scala Parser Combinators - Scalapeno Lightning Talk

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Scala Parser Combinators - Scalapeno Lightning Talk

Ähnlich wie Scala Parser Combinators - Scalapeno Lightning Talk (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Scala Parser Combinators - Scalapeno Lightning Talk

Hinweis der Redaktion