LearnDay@Xoxzo is a monthly online seminar initiated by the Xoxzo team, which is open to the public. We will have speakers from the team or guest speakers who will talk for 20 minutes each, on a subject of their choice. For this LearnDay, we have 2 speakers:
1. What do we know about vaccines against COVID-19? by Afiza
2. Reading CPython by Akira
** Disclaimer ** The views and opinions expressed in this slide do not necessarily reflect the official policy or position of Xoxzo Inc. The video for our LearnDays can be viewed at https://youtu.be/Hdmt3sy2v8g
If you're interested in joining us or present your own material during our LearnDay, send us an email at empower@xoxzo.com and follow us on Twitter at https://twitter.com/xoxzocom/
LearnDay@Xoxzo #30 2021-05-28
5. Build
1. ./configure
2. make (took 3 min on my 2012 Mac)
3. ./python.exe
4. Python 3.10.0b1 (tags/v3.10.0b1:ba4217537c, May 25
2021, 10:09:16) [Clang 12.0.0 (clang-1200.0.32.29)] on
darwin
Type "help", "copyright", "credits" or "license" for more
information.
>>>
5
14. Parser
A parser is a software component that takes input data
(frequently text) and builds a data structure – often some
kind of parse tree, abstract syntax tree or other
hierarchical structure, giving a structural representation
of the input while checking for correct syntax.
14
15. Parser for CPython
1. Parser/parser.c
2. Do not try to read it!
3. $ wc -l parser.c
32831 parser.c
4. It is NOT hand written!
5. $ head -1 parser.c
// @generated by pegen.py from ./Grammar/
python.gram
15
16. Parser Generator
•Input: Grammar/python.gram
•Output: Parser/Parser/parser.c
•Parser Generator
• Tools/peg_generator/pegen/
• This is python module.
• So, in order to build python, python is required.
• PEP 617 -- New PEG parser for CPython
• https://www.python.org/dev/peps/pep-0617/
16
17. PEG parser
•Introduced in 3.9
•Both classic parser and new PEG parser co-exists in the
source tree.
•You can switch back to the classic parser using a
command line switch (-X oldparser)
•In 3.10, all the classic parser code is removed. Thus,
code is much cleaner and easy to read.
17
18. Tokenizer
•tok_get() in tokenizer.c
•Read text from input and return token.
•Tokens (sometimes called terminal symbol) are
•Keywords
•Variable names
•Numbers
•Etc.
18
19. Demo: Changing Grammar
1. vi Grammar/python.gram
2. make regen-pegen
3. make
4. ./python.exe
5. demo
19
21. Python Indentation
1. Most of the programming languages like C, C++, and Java
use braces { } to define a block of code. Python, however,
uses indentation.
2. A code block (body of a function, loop, etc.) starts with
indentation and ends with the first unindented line.
3. The amount of indentation is up to you, but it must be
consistent throughout that block.
4. https://www.programiz.com/python-programming/
statement-indentation-comments
21
22. Tokenizer
•Stateful, line oriented.
•struct tok_state in tokenizer.h
•Remembers all indentation column position on the indent-stack.
•When indent gets deeper, it pushes the column position on the stack.
Returns virtual token INDENT
•When indent gets shallower, it pops the stack. Returns virtual token
DEDENT.
•https://github.com/python/cpython/blob/
28be3191a9db2769ed05e55c6bcbccdd029656dd/Parser/
tokenizer.c#L1205
22
23. struct tok_state
•Holds the state of the tokenizer
•Line buffer
•Indent stack
•etc.
•https://github.com/python/cpython/blob/
28be3191a9db2769ed05e55c6bcbccdd029656dd/
Parser/tokenizer.h#L31
23
24. Block definition in PEG
block[asdl_stmt_seq*] (memo):
| NEWLINE INDENT a=statements DEDENT { a }
| simple_stmts
| invalid_block
24