Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes

E
ESUGESUG
Threaded-Execution and CPS Provide Smooth Switching Between Execution
Modes
Dave Mason
Toronto Metropolitan University
©2023 Dave Mason
Execution Models
Source Interpretation
Bytecode Interpretation
Threaded Execution
Hardware Interpretation
Timing of Fibonacci
Thread CPS Object NativePharo JIT
169
278
347
1,094
559
186
304
391
1,918
591
136
332
564
2,010
695
Apple M1
Apple M2 Pro
Intel i7 2.8GHz
Threaded is 2.3-4.7 times faster than bytecode.
Zag Smalltalk
supports 2 models: threaded and native CPS
seamless transition
threaded is smaller and fully supports step-debugging
native is 3-5 times faster and can fallback to full debugging after a send
Threaded Execution
sequence of addresses of native code
like an extensible bytecode
each word passes control to the next
associated with Forth, originally in PDP-11 FORTRAN compiler, was used in BrouHaHa
fibonacci
1 fibonacci
2 self <= 2 ifTrue: [ ↑ 1 ].
3 ↑ (self - 1) fibonacci + (self - 2) fibonacci
Threaded fibonacci
1 verifySelector,
2 ":recurse",
3 dup, // self
4 pushLiteral, Object.from(2),
5 p5, // <=
6 ifFalse,"label3",
7 drop, // self
8 pushLiteral1,
9 returnNoContext,
10 ":label3",
11 pushContext,"^",
12 pushLocal0, // self
13 pushLiteral1,
14 p2, // -
15 callRecursive, "recurse",
16 pushLocal0, //self
17 pushLiteral2,
18 p2, // -
19 callRecursive, "recurse",
20 p1, // +
21 returnTop,
Some control words
1 pub fn drop(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
2 tailcall pc[0].prim(pc+1, sp+1, process, context, selector, cache);
3 }
4 pub fn dup(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
5 const newSp = sp-1;
6 newSp[0] = newSp[1];
7 tailcall pc[0].prim(pc+1, newSp, process, context, selector, cache);
8 }
9 pub fn ifFalse(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
10 const v = sp[0];
11 if (False.equals(v)) tailcall branch(pc, sp+1, process, context, selector, cache );
12 if (True.equals(v)) tailcall pc[1].prim(pc+2, sp+1, process, context, selector, cache );
13 @panic("non boolean");
14 }
15 pub fn p1(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
16 if (!Sym.@"+".selectorEquals(selector)) tailcall dnu(pc, sp, process, context, selector);
17 sp[1] = inlines.p1(sp[1], sp[0])
18 catch tailcall pc[0].prim(pc+1, sp, process, context, selector, cache);
19 tailcall context.npc(context.tpc, sp+1, process, context, selector, cache);
20 }
Continuation Passing Style
continuation is the rest of the program
comes from optimization of functional languages (continuation was a closure)
no implicit stack frames - passed explicitly
like the original Smalltalk passing Context (maybe not obvious that Context is a special kind of closure)
Normal Style
1 pub fn fibNative(self: i64) i64 {
2 if (self <= 2) return 1;
3 return fibNative(self - 1) + fibNative(self - 2);
4 }
5 const one = Object.from(1);
6 const two = Object.from(2);
7 pub fn fibObject(self: Object) Object {
8 if (i.p5N(self,two)) return one;
9 const m1 = i.p2L(self, 1) catch @panic("int subtract failed in fibObject");
10 const fm1 = fibObject(m1);
11 const m2 = i.p2L(self, 2) catch @panic("int subtract failed in fibObject");
12 const fm2 = fibObject(m2);
13 return i.p1(fm1, fm2) catch @panic("int add failed in fibObject");
14 }
Continuation Passing Style
1 pub fn fibCPS(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
2 if (!fibSym.equals(selector)) tailcall dnu(pc,sp,process,context,selector);
3 if (inlined.p5N(sp[0],Object.from(2))) {
4 sp[0] = Object.from(1);
5 tailcall context.npc(context.tpc,sp,process,context,selector);
6 }
7 const newContext = context.push(sp,process,fibThread.asCompiledMethodPtr(),0,2,0);
8 const newSp = newContext.sp();
9 newSp[0]=inlined.p2L(sp[0],1)
10 catch tailcall pc[10].prim(pc+11,newSp+1,process,context,fibSym);
11 newContext.setReturnBoth(fibCPS1, pc+13); // after first callRecursive (line 15 above)
12 tailcall fibCPS(fibCPST+1,newSp,process,newContext,fibSym);
13 }
Continuation Passing Style
1 fn fibCPS1(pc:PC, sp:Stack, process:*Process, context:ContextPtr, _:Object) Stack {
2 const newSp = sp.push();
3 newSp[0] = inlined.p2L(context.getTemp(0),2)
4 catch tailcall pc[0].prim(pc+1,newSp,process,context,fibSym));
5 context.setReturnBoth(fibCPS2, pc+3); // after 2nd callRecursive (line 19 above)
6 tailcall fibCPS(fibCPST+1,newSp,process,context,fibSym);
7 }
Continuation Passing Style
1 fn fibCPS2(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
2 const sum = inlined.p1(sp[1],sp[0])
3 catch tailcall pc[0].prim(pc+1,sp,process,context,fibSym);
4 const result = context.pop(process);
5 const newSp = result.sp;
6 newSp.put0(sum);
7 const callerContext = result.ctxt;
8 tailcall callerContext.npc(callerContext.tpc,newSp,process,callerContext,selector);
9 }
Implementation Decisions
Context must contain not only native return points, but also threaded return points;
CompiledMethods must facilitate seamless switching between execution modes;
the stack cannot reasonably be woven into the hardware stack with function calls, so no native stack;
as well as parameters, locals, and working space, stack is used to allocate Context and BlockClosure as
usually released in LIFO pattern
Conclusions
with proper structures, can easily switch between threaded and native code
threaded code is “good enough” for many purposes
this is preliminary work, so some open questions
many experiments to run to validate my intuitions
many more details in the paper
Questions?
@DrDaveMason dmason@torontomu.ca
https://github.com/dvmason/Zag-Smalltalk
Timing of Fibonacci
Thread CPS P-JIT P-Stack Z-Byte
347
1,094
559
5,033
4,730
391
1,918
591
3,527
4,820
564
2,010
695
3,568
4,609
Apple M1
Apple M2 Pro
Intel i7 2.8GHz
M2 P-Stack is presumed to be mis-configured
1 von 17

Recomendados

The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th... von
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...David Walker
200 views80 Folien
Rcpp11 von
Rcpp11Rcpp11
Rcpp11Romain Francois
1.5K views31 Folien
Tackling repetitive tasks with serial or parallel programming in R von
Tackling repetitive tasks with serial or parallel programming in RTackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RLun-Hsien Chang
25 views42 Folien
Sedna XML Database: Executor Internals von
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsIvan Shcheklein
1.8K views20 Folien
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov von
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovFwdays
190 views114 Folien
What's new in Python 3.11 von
What's new in Python 3.11What's new in Python 3.11
What's new in Python 3.11Henry Schreiner
246 views36 Folien

Más contenido relacionado

Similar a Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes

MPI von
MPIMPI
MPIZongYing Lyu
1.1K views40 Folien
Exploitation of counter overflows in the Linux kernel von
Exploitation of counter overflows in the Linux kernelExploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernelVitaly Nikolenko
330 views70 Folien
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go von
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoCapturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoScyllaDB
691 views23 Folien
What's new with Apache Spark's Structured Streaming? von
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?Miklos Christine
2K views49 Folien
Process Address Space: The way to create virtual address (page table) of user... von
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
1.3K views47 Folien
To Infinity & Beyond: Protocols & sequences in Node - Part 2 von
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2Bahul Neel Upadhyaya
516 views16 Folien

Similar a Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes(20)

Exploitation of counter overflows in the Linux kernel von Vitaly Nikolenko
Exploitation of counter overflows in the Linux kernelExploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernel
Vitaly Nikolenko330 views
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go von ScyllaDB
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoCapturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
ScyllaDB691 views
What's new with Apache Spark's Structured Streaming? von Miklos Christine
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
Miklos Christine2K views
Process Address Space: The way to create virtual address (page table) of user... von Adrian Huang
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
Adrian Huang1.3K views
To Infinity & Beyond: Protocols & sequences in Node - Part 2 von Bahul Neel Upadhyaya
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2
Linux Process & CF scheduling von SangJung Woo
Linux Process & CF schedulingLinux Process & CF scheduling
Linux Process & CF scheduling
SangJung Woo1.7K views
Cpu高效编程技术 von Feng Yu
Cpu高效编程技术Cpu高效编程技术
Cpu高效编程技术
Feng Yu3K views
Столпы функционального программирования для адептов ООП, Николай Мозговой von Sigma Software
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай Мозговой
Sigma Software128 views
Asynchronous programming with java script and node.js von Timur Shemsedinov
Asynchronous programming with java script and node.jsAsynchronous programming with java script and node.js
Asynchronous programming with java script and node.js
Timur Shemsedinov1.3K views
CorePy High-Productivity CellB.E. Programming von Slide_N
CorePy High-Productivity CellB.E. ProgrammingCorePy High-Productivity CellB.E. Programming
CorePy High-Productivity CellB.E. Programming
Slide_N128 views
Server side JavaScript: going all the way von Oleg Podsechin
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
Oleg Podsechin10K views
Reverse Engineering Dojo: Enhancing Assembly Reading Skills von Asuka Nakajima
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Asuka Nakajima3.2K views
Using R on Netezza von Ajay Ohri
Using R on NetezzaUsing R on Netezza
Using R on Netezza
Ajay Ohri7K views
Direct Code Execution @ CoNEXT 2013 von Hajime Tazaki
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013
Hajime Tazaki2.3K views
Flink 0.10 @ Bay Area Meetup (October 2015) von Stephan Ewen
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen1.7K views

Más de ESUG

Workshop: Identifying concept inventories in agile programming von
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingESUG
12 views16 Folien
Technical documentation support in Pharo von
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in PharoESUG
31 views39 Folien
The Pharo Debugger and Debugging tools: Advances and Roadmap von
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapESUG
56 views44 Folien
Sequence: Pipeline modelling in Pharo von
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in PharoESUG
86 views22 Folien
Migration process from monolithic to micro frontend architecture in mobile ap... von
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...ESUG
23 views35 Folien
Analyzing Dart Language with Pharo: Report and early results von
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsESUG
107 views30 Folien

Más de ESUG(20)

Workshop: Identifying concept inventories in agile programming von ESUG
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programming
ESUG12 views
Technical documentation support in Pharo von ESUG
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in Pharo
ESUG31 views
The Pharo Debugger and Debugging tools: Advances and Roadmap von ESUG
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and Roadmap
ESUG56 views
Sequence: Pipeline modelling in Pharo von ESUG
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in Pharo
ESUG86 views
Migration process from monolithic to micro frontend architecture in mobile ap... von ESUG
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...
ESUG23 views
Analyzing Dart Language with Pharo: Report and early results von ESUG
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
ESUG107 views
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6 von ESUG
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
ESUG38 views
A Unit Test Metamodel for Test Generation von ESUG
A Unit Test Metamodel for Test GenerationA Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test Generation
ESUG53 views
Creating Unit Tests Using Genetic Programming von ESUG
Creating Unit Tests Using Genetic ProgrammingCreating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic Programming
ESUG48 views
Exploring GitHub Actions through EGAD: An Experience Report von ESUG
Exploring GitHub Actions through EGAD: An Experience ReportExploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience Report
ESUG17 views
Pharo: a reflective language A first systematic analysis of reflective APIs von ESUG
Pharo: a reflective language A first systematic analysis of reflective APIsPharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIs
ESUG57 views
Garbage Collector Tuning von ESUG
Garbage Collector TuningGarbage Collector Tuning
Garbage Collector Tuning
ESUG20 views
Improving Performance Through Object Lifetime Profiling: the DataFrame Case von ESUG
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseImproving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
ESUG43 views
Pharo DataFrame: Past, Present, and Future von ESUG
Pharo DataFrame: Past, Present, and FuturePharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and Future
ESUG43 views
thisContext in the Debugger von ESUG
thisContext in the DebuggerthisContext in the Debugger
thisContext in the Debugger
ESUG36 views
Websockets for Fencing Score von ESUG
Websockets for Fencing ScoreWebsockets for Fencing Score
Websockets for Fencing Score
ESUG18 views
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript von ESUG
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScriptShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ESUG46 views
Advanced Object- Oriented Design Mooc von ESUG
Advanced Object- Oriented Design MoocAdvanced Object- Oriented Design Mooc
Advanced Object- Oriented Design Mooc
ESUG85 views
A New Architecture Reconciling Refactorings and Transformations von ESUG
A New Architecture Reconciling Refactorings and TransformationsA New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and Transformations
ESUG28 views
BioSmalltalk von ESUG
BioSmalltalkBioSmalltalk
BioSmalltalk
ESUG415 views

Último

Understanding HTML terminology von
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminologyartembondar5
6 views8 Folien
Dapr Unleashed: Accelerating Microservice Development von
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice DevelopmentMiroslav Janeski
12 views29 Folien
FOSSLight Community Day 2023-11-30 von
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30Shane Coughlan
6 views18 Folien
nintendo_64.pptx von
nintendo_64.pptxnintendo_64.pptx
nintendo_64.pptxpaiga02016
5 views7 Folien
Flask-Python.pptx von
Flask-Python.pptxFlask-Python.pptx
Flask-Python.pptxTriloki Gupta
7 views12 Folien
Quality Engineer: A Day in the Life von
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the LifeJohn Valentino
7 views18 Folien

Último(20)

Understanding HTML terminology von artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar56 views
Dapr Unleashed: Accelerating Microservice Development von Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski12 views
Copilot Prompting Toolkit_All Resources.pdf von Riccardo Zamana
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
Riccardo Zamana16 views
JioEngage_Presentation.pptx von admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254556 views
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... von Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller42 views
Software evolution understanding: Automatic extraction of software identifier... von Ra'Fat Al-Msie'deen
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports von Ra'Fat Al-Msie'deen
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug ReportsBushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
Fleet Management Software in India von Fleetable
Fleet Management Software in India Fleet Management Software in India
Fleet Management Software in India
Fleetable12 views
Airline Booking Software von SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta7 views
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... von TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin96 views
predicting-m3-devopsconMunich-2023.pptx von Tier1 app
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
Tier1 app7 views

Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes

  • 1. Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes Dave Mason Toronto Metropolitan University ©2023 Dave Mason
  • 2. Execution Models Source Interpretation Bytecode Interpretation Threaded Execution Hardware Interpretation
  • 3. Timing of Fibonacci Thread CPS Object NativePharo JIT 169 278 347 1,094 559 186 304 391 1,918 591 136 332 564 2,010 695 Apple M1 Apple M2 Pro Intel i7 2.8GHz Threaded is 2.3-4.7 times faster than bytecode.
  • 4. Zag Smalltalk supports 2 models: threaded and native CPS seamless transition threaded is smaller and fully supports step-debugging native is 3-5 times faster and can fallback to full debugging after a send
  • 5. Threaded Execution sequence of addresses of native code like an extensible bytecode each word passes control to the next associated with Forth, originally in PDP-11 FORTRAN compiler, was used in BrouHaHa
  • 6. fibonacci 1 fibonacci 2 self <= 2 ifTrue: [ ↑ 1 ]. 3 ↑ (self - 1) fibonacci + (self - 2) fibonacci
  • 7. Threaded fibonacci 1 verifySelector, 2 ":recurse", 3 dup, // self 4 pushLiteral, Object.from(2), 5 p5, // <= 6 ifFalse,"label3", 7 drop, // self 8 pushLiteral1, 9 returnNoContext, 10 ":label3", 11 pushContext,"^", 12 pushLocal0, // self 13 pushLiteral1, 14 p2, // - 15 callRecursive, "recurse", 16 pushLocal0, //self 17 pushLiteral2, 18 p2, // - 19 callRecursive, "recurse", 20 p1, // + 21 returnTop,
  • 8. Some control words 1 pub fn drop(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 2 tailcall pc[0].prim(pc+1, sp+1, process, context, selector, cache); 3 } 4 pub fn dup(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 5 const newSp = sp-1; 6 newSp[0] = newSp[1]; 7 tailcall pc[0].prim(pc+1, newSp, process, context, selector, cache); 8 } 9 pub fn ifFalse(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 10 const v = sp[0]; 11 if (False.equals(v)) tailcall branch(pc, sp+1, process, context, selector, cache ); 12 if (True.equals(v)) tailcall pc[1].prim(pc+2, sp+1, process, context, selector, cache ); 13 @panic("non boolean"); 14 } 15 pub fn p1(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 16 if (!Sym.@"+".selectorEquals(selector)) tailcall dnu(pc, sp, process, context, selector); 17 sp[1] = inlines.p1(sp[1], sp[0]) 18 catch tailcall pc[0].prim(pc+1, sp, process, context, selector, cache); 19 tailcall context.npc(context.tpc, sp+1, process, context, selector, cache); 20 }
  • 9. Continuation Passing Style continuation is the rest of the program comes from optimization of functional languages (continuation was a closure) no implicit stack frames - passed explicitly like the original Smalltalk passing Context (maybe not obvious that Context is a special kind of closure)
  • 10. Normal Style 1 pub fn fibNative(self: i64) i64 { 2 if (self <= 2) return 1; 3 return fibNative(self - 1) + fibNative(self - 2); 4 } 5 const one = Object.from(1); 6 const two = Object.from(2); 7 pub fn fibObject(self: Object) Object { 8 if (i.p5N(self,two)) return one; 9 const m1 = i.p2L(self, 1) catch @panic("int subtract failed in fibObject"); 10 const fm1 = fibObject(m1); 11 const m2 = i.p2L(self, 2) catch @panic("int subtract failed in fibObject"); 12 const fm2 = fibObject(m2); 13 return i.p1(fm1, fm2) catch @panic("int add failed in fibObject"); 14 }
  • 11. Continuation Passing Style 1 pub fn fibCPS(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 2 if (!fibSym.equals(selector)) tailcall dnu(pc,sp,process,context,selector); 3 if (inlined.p5N(sp[0],Object.from(2))) { 4 sp[0] = Object.from(1); 5 tailcall context.npc(context.tpc,sp,process,context,selector); 6 } 7 const newContext = context.push(sp,process,fibThread.asCompiledMethodPtr(),0,2,0); 8 const newSp = newContext.sp(); 9 newSp[0]=inlined.p2L(sp[0],1) 10 catch tailcall pc[10].prim(pc+11,newSp+1,process,context,fibSym); 11 newContext.setReturnBoth(fibCPS1, pc+13); // after first callRecursive (line 15 above) 12 tailcall fibCPS(fibCPST+1,newSp,process,newContext,fibSym); 13 }
  • 12. Continuation Passing Style 1 fn fibCPS1(pc:PC, sp:Stack, process:*Process, context:ContextPtr, _:Object) Stack { 2 const newSp = sp.push(); 3 newSp[0] = inlined.p2L(context.getTemp(0),2) 4 catch tailcall pc[0].prim(pc+1,newSp,process,context,fibSym)); 5 context.setReturnBoth(fibCPS2, pc+3); // after 2nd callRecursive (line 19 above) 6 tailcall fibCPS(fibCPST+1,newSp,process,context,fibSym); 7 }
  • 13. Continuation Passing Style 1 fn fibCPS2(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 2 const sum = inlined.p1(sp[1],sp[0]) 3 catch tailcall pc[0].prim(pc+1,sp,process,context,fibSym); 4 const result = context.pop(process); 5 const newSp = result.sp; 6 newSp.put0(sum); 7 const callerContext = result.ctxt; 8 tailcall callerContext.npc(callerContext.tpc,newSp,process,callerContext,selector); 9 }
  • 14. Implementation Decisions Context must contain not only native return points, but also threaded return points; CompiledMethods must facilitate seamless switching between execution modes; the stack cannot reasonably be woven into the hardware stack with function calls, so no native stack; as well as parameters, locals, and working space, stack is used to allocate Context and BlockClosure as usually released in LIFO pattern
  • 15. Conclusions with proper structures, can easily switch between threaded and native code threaded code is “good enough” for many purposes this is preliminary work, so some open questions many experiments to run to validate my intuitions many more details in the paper
  • 17. Timing of Fibonacci Thread CPS P-JIT P-Stack Z-Byte 347 1,094 559 5,033 4,730 391 1,918 591 3,527 4,820 564 2,010 695 3,568 4,609 Apple M1 Apple M2 Pro Intel i7 2.8GHz M2 P-Stack is presumed to be mis-configured