SlideShare ist ein Scribd-Unternehmen logo
1 von 102
Downloaden Sie, um offline zu lesen
Matt Ellis

@citizenmattHow to parse a file
DON’T
@citizenmatt
Why would we write a parser?
• Speed, efficiency
• Reduce dependencies
• Custom or simple formats
• Things that aren’t files - DSLs

Command line options, HTTP headers, stdout, natural language commands

E.g. YouTrack queries
• When we’re just as interested in the structure of a file

as its contents
Matt Ellis
Developer advocate

JetBrains

@citizenmatt
@citizenmatt
@citizenmatt
PSI
Features
Project Model
Base Platform
JetBrains IDE
architecture (kinda)
@citizenmatt
Unity and ShaderLab
@citizenmatt
What are we trying to build?
@citizenmatt
How to parse a file for an IDE
@citizenmatt
Hand rolled parser
var	c	=	ReadChar();

switch	(c)	{

		case	's':

				c	=	ReadChar();

				switch	(c)	{

						case	'h':

								//	Parse	rest	of	"Shader",	then	sub-elements,	…

								//	Create	syntax	tree	node(s)	…

								break;	
						default:

								SyntaxError();

								break;

				}

				break;	
		case	'p':

				//	Parse	rest	of	"Properties",	then	sub-elements,	…

				//	Create	syntax	tree	node(s)	…

				break;

}
@citizenmatt
Back endFront end
Compiler pipeline
Lexical
analysis
Syntactic
analysis
Semantic
analysis
Code
optimisation
Code
generation
@citizenmatt
IDE pipeline
Lexical
analysis
Syntactic
analysis
Semantic
analysis
@citizenmatt
IDE pipeline
Parser
Program
structureLexer
@citizenmatt
Lexers
@citizenmatt
What is a lexer (aka scanner)?
• Performs lexical analysis

Lexical - relating to the words or vocabulary of a language
• Converts a string into a stream of tokens

Identifier, comment, string literal, braces, parentheses, whitespace, etc.
• Tokens are lightweight - typically integer values

(ReSharper uses singleton object instances)
• Parser pattern matches over tokens

Integer or object reference comparisons
@citizenmatt
Lexer output
//	Colored	vertex	lighting	
Shader	"MyShader"	
{	
		//	a	single	color	property	
		Properties	{	
				_Color	("Main	Color",	Color)	=	(1,	.5,.5,1)	
		}	
		//	define	one	subshader	
		SubShader	
		{	
				//	a	single	pass	in	our	subshader	
				Pass	
				{	
						Material	
						{	
								Diffuse	[_Color]	
						}	
						Lighting	On	
				}	
		}	
}	
0000:	END_OF_LINE_COMMENT	'//	Colored	vertex	lighting'	
0026:	NEW_LINE	'rn'	
0028:	SHADER_KEYWORD	'Shader'	
0034:	WHITESPACE	'	'	
0035:	STRING_LITERAL	'"MyShader"'	
0045:	NEW_LINE	'rn'	
0047:	LBRACE	'{'	
0048:	NEW_LINE	'rn'	
0050:	WHITESPACE	'		'	
0052:	END_OF_LINE_COMMENT	'//	a	single	color	property'	
0078:	NEW_LINE	'rn'	
0080:	WHITESPACE	'		'	
0082:	PROPERTIES_KEYWORD	'Properties'	
0092:	WHITESPACE	'	'	
0093:	LBRACE	'{'	
0094:	NEW_LINE	'rn'	
0096:	WHITESPACE	'				'	
0100:	IDENTIFIER	'_Color'	
0106:	WHITESPACE	'	'	
0107:	LPAREN	'('	
0108:	STRING_LITERAL	'"Main	Color"'	
0120:	COMMA	','	
0121:	WHITESPACE	'	'	
0122:	COLOR_KEYWORD	'Color'	
0127:	RPAREN	')'	
0128:	WHITESPACE	'	'	
0129:	EQUALS	'='	
0130:	WHITESPACE	'	'	
0131:	LPAREN	'('	
…
@citizenmatt
Lexers are a solved problem
Use a lexer generator

lex (1975), flex, CsLex, FsLex, JFLex, etc.
@citizenmatt
Anatomy of a lexer input file
User code (e.g. using directives)
%%	
directives

set up namespaces, class names, interfaces

declare regex macros

declare states
%%	
rules and actions

<state> rule { action }
@citizenmatt
ShaderLab lexer
Demo
@citizenmatt
How does it work?
• Lexer generates source code
• Rules (regexes) converted into single Finite State Machine

All regexes combined, matched at same time
• Encoded in state transition tables
• Lookup based on state and input char
• Very fast
• Not very maintainable

Seriously
@citizenmatt
a(b|c)d*e+
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
Rule: a(b|c)d*e+
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ other
0 m(1) E E E E E
1 E m(2) m(2) E E E
2 E E E m(2) m(3) E
3 a a a a m(3) a
m(x) - match,
move to state x
a - accept
E - error
Pete Jinks - http://www.cs.man.ac.uk/~pjj/cs211/ho/node6.html
@citizenmatt
It gets better
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
Rules: a(b|c)d*e+ and [0-9]+
[0-9]
4
[0-9]
‘a’ ‘b’ ‘c’ ‘d’ ‘e’ [0-9] other
0 m(1) E E E E m(4) E
1 E m(2) m(2) E E E E
2 E E E m(2) m(3) E E
3 a a a a m(3) a a
4 a a a a a m(4) a
@citizenmatt
Parsing
@citizenmatt
What is a parser?
• Performs syntactic analysis

Verifies and matches syntax of a file
• Pattern matching on stream of tokens from lexer

Can look at token offsets and text, too
• Syntax is described by a grammar
• Grammar is represented as a recursive hierarchy of rules

Top level is the whole file, composing down to structures and tokens
@citizenmatt
Example grammar
shaderFile:

		SHADER_KEYWORD

		STRING_LITERAL

		LBRACE

		propertiesBlock?

		tagsBlock?

		…

		RBRACE

;	
propertiesBlock:

		PROPERTIES_KEYWORD

		LBRACE

		property*

		RBRACE

;	
tagsBlock:

		TAGS_KEYWORD

		LBRACE

		tag*

		RBRACE

;
Shader	"MyShader"

{

		Properties	{	…	}

		Tags	{	…	}

		…

}
@citizenmatt
Parsing is NOT a solved problem
Well, it is, kinda. There are just lots of solutions
@citizenmatt
Types of parsers
• Top down/recursive descent

Match the root of the tree, recursively split up into child elements
• Bottom up/recursive ascent

Start with matching the leaves of the tree, combine into larger
constructs as you go
@citizenmatt
Top down parser
parseShaderLabFile()

		parseShaderCommand()

				match(SHADER_KEYWORD)

				parseShaderValue()

						parseShaderValueName()

								match(STRING_LITERAL)

						match(LBRACE)

						if	(tokenType	==	PROPERTIES_KEYWORD)

								parsePropertiesCommand()

						…

						match(RBRACE)
@citizenmatt
Bottom up parser
Shift/Reduce algorithm
Match token

Shift token onto stack (e.g. INTEGER, OP_PLUS, INTEGER)

Reduce larger construct (e.g. INTEGER + INTEGER becomes EXPRESSION)
@citizenmatt
Building a parser
• Hand rolled

Mechanical process to build. Easy to understand

Usually top down/recursive descent

Can use grammar to build syntax tree classes
• Parser generators

yacc/bison, ANTLR, etc.

Usually bottom up. Can be hard to debug - table driven
• ReSharper mostly uses top-down procedural parsers

Generated and hand rolled

Mainly historical. Easier to maintain, easier error recovery, etc.
@citizenmatt
Parser combinators
• Build a parser by combining other, simpler parsers
• Monads!

Think linq - similar idea, similar ease of use, similar cost
@citizenmatt
FParsec for F#
//	pstring	-	parse	a	string

//	pfloat	-	parse	a	float

//	spaces1	-	parse	one	or	more	whitespace	chars

		

let	pforward	=	(pstring	"fd"	<|>	pstring	“forward”)	>>.	spaces1	>>.	pfloat	
															|>>	fun	n	->	Forward(int	n)	
let	pleft	=	(pstring	"left"	<|>	pstring	"lt")	>>.	spaces1	>>.	pfloat		
												|>>	fun	x	->	Left(int	-x)	
let	pright	=	(pstring	"right"	<|>	pstring	"right")	>>.	spaces1	>>.	pfloat		
													|>>	fun	x	->	Right(int	x)	
let	pcommand	=	pforward	<|>	pleft	<|>	pright
Phil Trelford - http://trelford.com/blog/post/FParsec.aspx
@citizenmatt
Sprache for C#
Parser<string>	identifier	=	
				from	leading	in	Parse.WhiteSpace.Many()	
				from	first	in	Parse.Letter.Once()	
				from	rest	in	Parse.LetterOrDigit.Many()	
				from	trailing	in	Parse.WhiteSpace.Many()	
				select	new	string(first.Concat(rest).ToArray());	
var	id	=	identifier.Parse("	abc123		");	
Assert.AreEqual("abc123",	id);
@citizenmatt
Problem #1
Whitespace and comments
@citizenmatt
We’d expect this to work:
shaderBlock:

		SHADER_KEYWORD

		STRING_LITERAL

		LBRACE

		…

		RBRACE

;
Shader	"MyShader"

{

		…

}
@citizenmatt
But this is the actual input…
Shaderrn

··········"MyShader"rn

·······n

/*	Cool	shader!	*/n

{···…········}rn
@citizenmatt
Which lexes as…
SHADER_KEYWORD

NEW_LINE

WHITESPACE

STRING_LITERAL

NEW_LINE

WHITESPACE

NEW_LINE

COMMENT

NEW_LINE

LBRACE

WHITESPACE

…

WHITESPACE

RBRACE
Shaderrn

··········"MyShader"rn

·······n

/*	Cool	shader!	*/n

{···…········}rn
@citizenmatt
Which doesn’t match the grammar
shaderBlock:

		SHADER_KEYWORD

		STRING_LITERAL

		LBRACE

		…

		RBRACE

;
SHADER_KEYWORD

NEW_LINE

WHITESPACE

STRING_LITERAL

NEW_LINE

WHITESPACE

NEW_LINE

COMMENT

NEW_LINE

LBRACE

WHITESPACE

…

WHITESPACE

RBRACE
@citizenmatt
• Filter whitespace and comments from the stream of tokens

ReSharper’s tokens have IsFiltered property
• Decorator pattern

Wrap original lexer, swallow filtered tokens
Filtering lexers
Filtering
lexer
Lexer
Parser
Program
structure
@citizenmatt
What are we building?
Is it safe to lose the whitespace?
@citizenmatt
IDE requirements, Part 1
• Code editor features

Syntax highlighting, code folding, etc.
• Syntax error highlighting
• Inspections
• Refactoring
• Formatting
• Etc.
@citizenmatt
IDE requirements, Part 1
• Need to work with the contents and structure of a file
• Contents give us semantic information
• Structure allows us to report inspections, refactor, etc.

Map the semantics back to the file
• Need to represent the structure of the file
• Syntax tree is obvious choice

Inspections walk the tree, refactorings rewrite the tree
@citizenmatt
Abstract Syntax Trees
1
+
2 3
+ 1
+
5
6
= =
@citizenmatt
Concrete Parse Trees
2 WS
+
WS 3
// …
+
NL1
WS
@citizenmatt
Side problem #1
No guidance for designing parse trees!
@citizenmatt
Back to Filtering Lexers
• If we filter tokens out, we have to add them back again
• We need a Missing Tokens Inserter to add whitespace
and comments back into parse tree
Filtering
lexer
Lexer
Parser
Concrete
parse tree
Missing
tokens
inserter
@citizenmatt
Missing Tokens Inserter
• Walk leaf elements of tree

Tokens
• Advances (cached) lexer for each leaf element
• Check current lexer token has same offset as leaf
element
• If not, create leaf element and insert into tree
@citizenmatt
Problem #2
What about significant whitespace?
@citizenmatt
How do we parse this?
There are no end of scope markers!

And we’ve filtered out the whitespace!
let	ArraySample()	=	
		let	numLetters	=	26	
		let	results	=	Array.create	numLetters	0	
		let	data	=	"The	quick	brown	fox"	
		for	i	=	0	to	data.Length	-	1	do	
				let	c	=	data.Chars(i)	
				let	c	=	Char.ToUpper(c)	
				if	c	>=	'A'	&&	c	<=	'Z'	then	
						let	i	=	Char.code	c	-	Char.code	'A'	
						results.[i]	<-	results.[i]	+	1	
		printf	"done!n"
@citizenmatt
Insert zero-width tokens
• Another lexer decorator
• Keeps track of whitespace before it’s filtered
• Inserts “invisible” tokens into token stream

indicating indent/outdent or block start/end

Possibly also token to indicate invalid indentation
• Token is zero-width. Doesn’t affect parse tree
• Parser can match these invisible tokens in grammar
@citizenmatt
Lexer flexibility
It’s just nice to say
@citizenmatt
Altering tokens
• F# example: 2. and [2..0] ambiguous
• Original lexer matches 2. as FLOAT 

and 2.. as INT_DOT_DOT
• Another lexer decorator

Augment generated rules with custom code
• Decorator recognises INT_DOT_DOT 

Splits into two tokens for parser
@citizenmatt
When regexes aren’t enough
• ShaderLab nested comments
• Not possible to match with regex

Don’t even try
• Rule to match start of comment - /*

Finish lexing by hand, counting start and end comment chars

Ignore START_COMMENT and return different token - COMMENT
• It doesn’t have to be completely machine generated
/*	This	/*	is	*/	valid	*/
@citizenmatt
Problem #3
Pre-processor tokens
@citizenmatt
Pre-processor tokens
• Pre-processor tokens can
appear anywhere
• How do you add them to
the grammar/parser?
• ShaderLab has CGPROGRAM
and CGINCLUDE which are
essentially pre-processor
tokens
• (Also nested language - Cg)
@citizenmatt
Parsing pre-processor tokens
• Two pass parsing
• First pass parses pre-processor tokens
• Filtering lexer strips pre-processor tokens
• Parse normally
• Parsed pre-processor tree nodes inserted as missing
tokens
Parsing pre-processor tokens
Including

pre-processor
tokens
Filtering
lexer
Lexer
Parser
Concrete
parse
tree
Missing
tokens
inserter
Pre-processor
parser
Filtering
lexer
@citizenmatt
Problem #4
IDEs impose constraints
@citizenmatt
IDE Requirements, Part 2
• Error highlighting

The code is broken every time you type
• Incremental lexing + parsing

Performance
• Version tolerance

E.g. multiple versions of C#
• Nested/composable languages
@citizenmatt
Problem #5
Error handling
@citizenmatt
Error handling
@citizenmatt
Error handling is more of an art than a science
@citizenmatt
What happens when there’s an error?
• The parser adds an error element into the tree
• Error element spans whatever has been parsed so far

Might just be unexpected token, or incorrect element construct
• Highlighting the error in the editor is trivial

Inspection simply looks for error element, adds highlight
@citizenmatt
How do we find an error?
• Error start is obvious

mismatched rule, unexpected token
• Where does the error stop?

Off by one token could affect rest of file
• IDE must try to recover

How?
@citizenmatt
Error recovery
• Panic mode

Eat tokens until finds a “follows” token
• Token insertion/removal/substitution
• Error rules in grammar
@citizenmatt
Shader	"MyShader"	{

		Properties	{

				_RealProperty1("Real1",	Color)	=	(1,1,1,1)

				_PropName	SyntaxErrorPanicMode	=	(1,1,1,1)

				_Recovered("Real2",	Color)	=	(1,	1,	1,	1)

		}

}
Panic mode
Shader	"MyShader"	{

		Properties	{

				_RealProperty1("Real1",	Color)	=	(1,1,1,1)

				_PropName	SyntaxError	_AttemptedRecovery	=	(1,1,1,1)

				_Recovered("Real2",	Color)	=	(1,	1,	1,	1)

		}

}
@citizenmatt
• Expected RPAREN got EQUALS

Assume RPAREN missing (insert it), EQUALS matches, continue
Token insertion
Shader	"MyShader"	{

		Properties	{

				_RealProperty1("Real1",	Color	=	(1,1,1,1)

		}

}
@citizenmatt
• Token insertion fails

Inserting EQUALS doesn’t sync back up
• Expected EQUALS got extra RPAREN

Skip RPAREN (remove it), EQUALS matches, continue
Shader	"MyShader"	{

		Properties	{

				_RealProperty1("Real1",	Color))	=	(1,1,1,1)

		}

}
Token removal
@citizenmatt
Error production rules
• Create a rule that anticipates an error
• E.g. consume any tokens that shouldn’t be there
emptyBlock:

		LBRACE

		errorElementWithoutRBrace*	
		RBRACE

	;
@citizenmatt
Problem #6
Incremental lexing and parsing
@citizenmatt
What’s the problem?
• Don’t parse entire file on every change
• Only reparse smallest subtree that encloses change

Block nodes (method bodies, classes, etc. Not if, for, etc.)
• Avoid re-lexing the entire file, too
@citizenmatt
Incremental lexing
• Requires a cache of the original token stream

Token type, offsets and state of lexer (int)
• Copy cached tokens up to change position
• Restart lexer at change position with known state from
cache
• Lex until we can match tail of cached tokens
@citizenmatt
Incremental parsing
• Walk up syntax tree, find nearest element that can
reparse and that encompasses change

E.g. method/class body
• Find start of block

E.g. opening LBRACE ‘{‘
• Use updated cached lexer to find end of block

E.g. closing RBRACE ‘}’
• Parse block, add new element into tree

Uses custom entry point into parser
@citizenmatt
Problem #7
Composable languages
@citizenmatt
Three types
• Injected languages

E.g. self-contained islands in a string literal (regex)
• Inherited languages

E.g. TypeScript is a superset of JavaScript
• Nested languages

E.g. JavaScript/CSS nested inside HTML. Razor and C#
@citizenmatt
Injected languages
• Build a parse tree for the contents of another node

E.g. ShaderLab CG_PROGRAM, regular expressions, …
• Provides syntax highlighting, code completion, etc.
• Attaches a new parse tree to the node of another tree
• Changes to injected tree persisted to string and pushed
as change to the owning tree
• Changes to owning tree cause full reparse of injected
language
@citizenmatt
Inherited languages
• E.g. TypeScript is a superset of JavaScript
• TypeScriptParser derives from JavaScriptParser

Share a lexer
• Custom hand rolled parsers

Recursive descent
• Easier to inherit and override key methods

Gang of Four Template pattern
• Also XamlParser, MSBuildParser, WebConfigParser

Custom XML parsers
@citizenmatt
Nested languages
• E.g. .aspx, .cshtml - HTML superset, with C# “islands”
• ReSharper parses .aspx/.cshtml file

Builds parse tree for ASPX/Razor syntax
• HTML superset requires lexer superset
• HtmlCompoundLexer lexes “outer” language’s tokens

When encounters HTML, switches to standard HTML lexer
• How to handle C# islands?
@citizenmatt
Secondary documents
• ASPX/Razor - C# islands
• Create secondary in-memory C# file

Mirrors what gets generated when .aspx file is compiled
• Maps C# islands in .aspx to in-memory C# file
• Inspections, code completion, etc. work through the
mapping
@citizenmatt
How do you parse a file?
@citizenmatt
DON’T
@citizenmatt
Links
https://github.com/JetBrains/resharper-unity
Generating Fast, Error Recovering Parsers

http://www.dtic.mil/dtic/tr/fulltext/u2/a196581.pdf
Effective and Comfortable Error Recovery in Recursive Descent Parsers

http://www.cocolab.com/products/cocktail/doc.pdf/ell.pdf
The Definitive ANTLR4 Reference - Terrence Parr

Weitere ähnliche Inhalte

Ähnlich wie How to Parse a File (DDD North 2017)

Everything is composable
Everything is composableEverything is composable
Everything is composableVictor Igor
 
input output Organization
input output Organizationinput output Organization
input output OrganizationAcad
 
A Signature Algorithm Based On Chaotic Maps And Factoring Problems
A Signature Algorithm Based On Chaotic Maps And Factoring ProblemsA Signature Algorithm Based On Chaotic Maps And Factoring Problems
A Signature Algorithm Based On Chaotic Maps And Factoring ProblemsSandra Long
 
AiCore Brochure 27-Mar-2023-205529.pdf
AiCore Brochure 27-Mar-2023-205529.pdfAiCore Brochure 27-Mar-2023-205529.pdf
AiCore Brochure 27-Mar-2023-205529.pdfAjayRawat829497
 
Deductive verification of unmodified Linux kernel library functions
Deductive verification of unmodified Linux kernel library functionsDeductive verification of unmodified Linux kernel library functions
Deductive verification of unmodified Linux kernel library functionsDenis Efremov
 
CBSE XI COMPUTER SCIENCE
CBSE XI COMPUTER SCIENCECBSE XI COMPUTER SCIENCE
CBSE XI COMPUTER SCIENCEGautham Rajesh
 
C Programming Interview Questions
C Programming Interview QuestionsC Programming Interview Questions
C Programming Interview QuestionsGradeup
 
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Grace Yang
 
ESL Anyone?
ESL Anyone? ESL Anyone?
ESL Anyone? DVClub
 
Gate Previous Years Papers
Gate Previous Years PapersGate Previous Years Papers
Gate Previous Years PapersRahul Jain
 
Embedded SW Interview Questions
Embedded SW Interview Questions Embedded SW Interview Questions
Embedded SW Interview Questions PiTechnologies
 
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdfLDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdfVedant Gavhane
 
Junaid program assignment
Junaid program assignmentJunaid program assignment
Junaid program assignmentJunaid Ahmed
 

Ähnlich wie How to Parse a File (DDD North 2017) (20)

Everything is composable
Everything is composableEverything is composable
Everything is composable
 
input output Organization
input output Organizationinput output Organization
input output Organization
 
A Signature Algorithm Based On Chaotic Maps And Factoring Problems
A Signature Algorithm Based On Chaotic Maps And Factoring ProblemsA Signature Algorithm Based On Chaotic Maps And Factoring Problems
A Signature Algorithm Based On Chaotic Maps And Factoring Problems
 
AiCore Brochure 27-Mar-2023-205529.pdf
AiCore Brochure 27-Mar-2023-205529.pdfAiCore Brochure 27-Mar-2023-205529.pdf
AiCore Brochure 27-Mar-2023-205529.pdf
 
Deductive verification of unmodified Linux kernel library functions
Deductive verification of unmodified Linux kernel library functionsDeductive verification of unmodified Linux kernel library functions
Deductive verification of unmodified Linux kernel library functions
 
Automatic Selection of Predicates for Common Sense Knowledge Expression
Automatic Selection of Predicates for Common Sense Knowledge ExpressionAutomatic Selection of Predicates for Common Sense Knowledge Expression
Automatic Selection of Predicates for Common Sense Knowledge Expression
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Placement paper
Placement paperPlacement paper
Placement paper
 
CBSE XI COMPUTER SCIENCE
CBSE XI COMPUTER SCIENCECBSE XI COMPUTER SCIENCE
CBSE XI COMPUTER SCIENCE
 
B010430814
B010430814B010430814
B010430814
 
C Programming Interview Questions
C Programming Interview QuestionsC Programming Interview Questions
C Programming Interview Questions
 
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
 
Cs gate-2011
Cs gate-2011Cs gate-2011
Cs gate-2011
 
Cs gate-2011
Cs gate-2011Cs gate-2011
Cs gate-2011
 
ESL Anyone?
ESL Anyone? ESL Anyone?
ESL Anyone?
 
Gate Previous Years Papers
Gate Previous Years PapersGate Previous Years Papers
Gate Previous Years Papers
 
Embedded SW Interview Questions
Embedded SW Interview Questions Embedded SW Interview Questions
Embedded SW Interview Questions
 
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdfLDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
LDCQ paper Dec21 with answer key_62cb2996afc60f6aedeb248c1d9283e5.pdf
 
1
11
1
 
Junaid program assignment
Junaid program assignmentJunaid program assignment
Junaid program assignment
 

Mehr von citizenmatt

Rider - Taking ReSharper out of Process
Rider - Taking ReSharper out of ProcessRider - Taking ReSharper out of Process
Rider - Taking ReSharper out of Processcitizenmatt
 
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)citizenmatt
 
.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchester.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchestercitizenmatt
 
.NET Core Blimey! (Shropshire Devs Mar 2016)
.NET Core Blimey! (Shropshire Devs Mar 2016).NET Core Blimey! (Shropshire Devs Mar 2016)
.NET Core Blimey! (Shropshire Devs Mar 2016)citizenmatt
 
.NET Core Blimey! (dotnetsheff Jan 2016)
.NET Core Blimey! (dotnetsheff Jan 2016).NET Core Blimey! (dotnetsheff Jan 2016)
.NET Core Blimey! (dotnetsheff Jan 2016)citizenmatt
 
.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UG.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UGcitizenmatt
 
.Net Core Blimey! (16/07/2015)
.Net Core Blimey! (16/07/2015).Net Core Blimey! (16/07/2015)
.Net Core Blimey! (16/07/2015)citizenmatt
 
C# 6.0 - DotNetNotts
C# 6.0 - DotNetNottsC# 6.0 - DotNetNotts
C# 6.0 - DotNetNottscitizenmatt
 
What's New in ReSharper 9?
What's New in ReSharper 9?What's New in ReSharper 9?
What's New in ReSharper 9?citizenmatt
 

Mehr von citizenmatt (9)

Rider - Taking ReSharper out of Process
Rider - Taking ReSharper out of ProcessRider - Taking ReSharper out of Process
Rider - Taking ReSharper out of Process
 
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)
The how-dare-you-call-me-an-idiot’s guide to the .NET Standard (NDC London 2017)
 
.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchester.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchester
 
.NET Core Blimey! (Shropshire Devs Mar 2016)
.NET Core Blimey! (Shropshire Devs Mar 2016).NET Core Blimey! (Shropshire Devs Mar 2016)
.NET Core Blimey! (Shropshire Devs Mar 2016)
 
.NET Core Blimey! (dotnetsheff Jan 2016)
.NET Core Blimey! (dotnetsheff Jan 2016).NET Core Blimey! (dotnetsheff Jan 2016)
.NET Core Blimey! (dotnetsheff Jan 2016)
 
.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UG.net Core Blimey - Smart Devs UG
.net Core Blimey - Smart Devs UG
 
.Net Core Blimey! (16/07/2015)
.Net Core Blimey! (16/07/2015).Net Core Blimey! (16/07/2015)
.Net Core Blimey! (16/07/2015)
 
C# 6.0 - DotNetNotts
C# 6.0 - DotNetNottsC# 6.0 - DotNetNotts
C# 6.0 - DotNetNotts
 
What's New in ReSharper 9?
What's New in ReSharper 9?What's New in ReSharper 9?
What's New in ReSharper 9?
 

Kürzlich hochgeladen

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

How to Parse a File (DDD North 2017)