Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Design Principles for a High-Performance Smalltalk

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
ShowUs: Improved DoIt
ShowUs: Improved DoIt
Wird geladen in …3
×

Hier ansehen

1 von 66 Anzeige

Weitere Verwandte Inhalte

Anzeige

Design Principles for a High-Performance Smalltalk

  1. 1. Design Principles for a High-Performance Smalltalk Dave Mason Toronto Metropolitan Universityi ©2022 Dave Mason
  2. 2. Design Principles Large memories 64-bit and IEEE-768 Multi-core and threading Fast execution
  3. 3. Design Principles Large memories 64-bit and IEEE-768 Multi-core and threading Fast execution
  4. 4. Design Principles Large memories 64-bit and IEEE-768 Multi-core and threading Fast execution
  5. 5. Design Principles Large memories 64-bit and IEEE-768 Multi-core and threading Fast execution
  6. 6. Zag Smalltalk from-scratch implementation low-level is implemented in Zig goal is to support existing OpenSmalltalk systems don’t want to rewrite userland!
  7. 7. Zag Smalltalk from-scratch implementation low-level is implemented in Zig goal is to support existing OpenSmalltalk systems don’t want to rewrite userland!
  8. 8. Zag Smalltalk from-scratch implementation low-level is implemented in Zig goal is to support existing OpenSmalltalk systems don’t want to rewrite userland!
  9. 9. Zag Smalltalk from-scratch implementation low-level is implemented in Zig goal is to support existing OpenSmalltalk systems don’t want to rewrite userland!
  10. 10. Immediate Values 64 bit NaN-boxing double-floats, 51-bit SmallInteger, Booleans, nil, Unicode characters, Symbols room for instances of any type with single 32-bit value
  11. 11. Immediate Values 64 bit NaN-boxing double-floats, 51-bit SmallInteger, Booleans, nil, Unicode characters, Symbols room for instances of any type with single 32-bit value
  12. 12. Immediate Values 64 bit NaN-boxing double-floats, 51-bit SmallInteger, Booleans, nil, Unicode characters, Symbols room for instances of any type with single 32-bit value
  13. 13. Immediate Values 64 bit NaN-boxing double-floats, 51-bit SmallInteger, Booleans, nil, Unicode characters, Symbols room for instances of any type with single 32-bit value
  14. 14. S+E F F F Type 0000 0000 0000 0000 double +0 0000-7FEF xxxx xxxx xxxx double (positive) 7FF0 0000 0000 0000 +inf 7FF0-F xxxx xxxx xxxx NaN (unused) 8000 0000 0000 0000 double -0 8000-FFEF xxxx xxxx xxxx double (negative) FFF0 0000 0000 0000 -inf FFF0-5 xxxx xxxx xxxx NaN (currently unused) FFF6 xxxx xxxx xxxx heap object FFF7 0001 xxxx xxxx reserved (tag = Object) FFF7 0002 xxxx xxxx reserved (tag = SmallInteger) FFF7 0003 xxxx xxxx reserved (tag = Double) FFF7 0004 0001 0000 False FFF7 0005 0010 0001 True FFF7 0006 0100 0002 UndefinedObject FFF7 0007 aaxx xxxx Symbol FFF7 0008 00xx xxxx Character FFF8-F xxxx xxxx xxxx SmallInteger FFF8 0000 0000 0000 SmallInteger minVal FFFC 0000 0000 0000 SmallInteger 0 FFFF FFFF FFFF FFFF SmallInteger maxVal
  15. 15. Multi-core support only way to speed up applications minimal blocking computational/mutator threads - typically 1 per core I/O threads - one per open “file” global collector thread
  16. 16. Multi-core support only way to speed up applications minimal blocking computational/mutator threads - typically 1 per core I/O threads - one per open “file” global collector thread
  17. 17. Multi-core support only way to speed up applications minimal blocking computational/mutator threads - typically 1 per core I/O threads - one per open “file” global collector thread
  18. 18. Multi-core support only way to speed up applications minimal blocking computational/mutator threads - typically 1 per core I/O threads - one per open “file” global collector thread
  19. 19. Multi-core support only way to speed up applications minimal blocking computational/mutator threads - typically 1 per core I/O threads - one per open “file” global collector thread
  20. 20. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  21. 21. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  22. 22. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  23. 23. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  24. 24. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  25. 25. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  26. 26. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  27. 27. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  28. 28. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  29. 29. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  30. 30. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  31. 31. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  32. 32. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  33. 33. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  34. 34. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  35. 35. Memory management mutator threads copying collector private nursery (includes stack) 2 teen arenas - n copies before promotion when prompted, finds next 100 refs to global stack, then blocks, repeat then can proceed I/O threads maintains list of current shared buffers while I/O blocked global collector thread non-moving mark/sweep arena periodically does mark marks known shared structures (class table, symbol table, dispatch tables) asks mutators for global roots processes them until all roots have been found then can proceed to sweep ...
  36. 36. ... Memory management global collector for non-moving mark-&-sweep uses Fibonacci heap (similar to Mist) large objects (e.g. 16Kib) have separately mapped pages (allows mmap of large files) to minimize memory creep
  37. 37. ... Memory management global collector for non-moving mark-&-sweep uses Fibonacci heap (similar to Mist) large objects (e.g. 16Kib) have separately mapped pages (allows mmap of large files) to minimize memory creep
  38. 38. ... Memory management global collector for non-moving mark-&-sweep uses Fibonacci heap (similar to Mist) large objects (e.g. 16Kib) have separately mapped pages (allows mmap of large files) to minimize memory creep
  39. 39. Heap objects header Bits What Characteristics 12 length number of long-words beyond the header 4 age 0 - nursery, 1-7 teen, 8+ global 8 format 24 identityHash 16 classIndex LSB recognize pointer-free instance variables and arrays separately length of 4095 - forwarding pointer - copying, become:, promoted format is somewhat similar to SPUR encoding - various-sized non-object arrays strings stored in UTF-8
  40. 40. Heap objects header Bits What Characteristics 12 length number of long-words beyond the header 4 age 0 - nursery, 1-7 teen, 8+ global 8 format 24 identityHash 16 classIndex LSB recognize pointer-free instance variables and arrays separately length of 4095 - forwarding pointer - copying, become:, promoted format is somewhat similar to SPUR encoding - various-sized non-object arrays strings stored in UTF-8
  41. 41. Heap objects header Bits What Characteristics 12 length number of long-words beyond the header 4 age 0 - nursery, 1-7 teen, 8+ global 8 format 24 identityHash 16 classIndex LSB recognize pointer-free instance variables and arrays separately length of 4095 - forwarding pointer - copying, become:, promoted format is somewhat similar to SPUR encoding - various-sized non-object arrays strings stored in UTF-8
  42. 42. Heap objects header Bits What Characteristics 12 length number of long-words beyond the header 4 age 0 - nursery, 1-7 teen, 8+ global 8 format 24 identityHash 16 classIndex LSB recognize pointer-free instance variables and arrays separately length of 4095 - forwarding pointer - copying, become:, promoted format is somewhat similar to SPUR encoding - various-sized non-object arrays strings stored in UTF-8
  43. 43. Heap objects header Bits What Characteristics 12 length number of long-words beyond the header 4 age 0 - nursery, 1-7 teen, 8+ global 8 format 24 identityHash 16 classIndex LSB recognize pointer-free instance variables and arrays separately length of 4095 - forwarding pointer - copying, become:, promoted format is somewhat similar to SPUR encoding - various-sized non-object arrays strings stored in UTF-8
  44. 44. Unified dispatch single level of hashing for method dispatch each class dispatch table has entry for every method it has been sent - regardless of place in hierarchy near-perfect hash using Φ hashing standard SPUR/OpenVM optimizations don’t work well in multi-core environments
  45. 45. Unified dispatch single level of hashing for method dispatch each class dispatch table has entry for every method it has been sent - regardless of place in hierarchy near-perfect hash using Φ hashing standard SPUR/OpenVM optimizations don’t work well in multi-core environments
  46. 46. Unified dispatch single level of hashing for method dispatch each class dispatch table has entry for every method it has been sent - regardless of place in hierarchy near-perfect hash using Φ hashing standard SPUR/OpenVM optimizations don’t work well in multi-core environments
  47. 47. Unified dispatch single level of hashing for method dispatch each class dispatch table has entry for every method it has been sent - regardless of place in hierarchy near-perfect hash using Φ hashing standard SPUR/OpenVM optimizations don’t work well in multi-core environments
  48. 48. High performance Inlining references to self / super code are inlined methods with small number of implementations are inlined - rather than heuristic prevents creation of many blocks provides large compilation units for optimization
  49. 49. High performance Inlining references to self / super code are inlined methods with small number of implementations are inlined - rather than heuristic prevents creation of many blocks provides large compilation units for optimization
  50. 50. High performance Inlining references to self / super code are inlined methods with small number of implementations are inlined - rather than heuristic prevents creation of many blocks provides large compilation units for optimization
  51. 51. High performance Inlining references to self / super code are inlined methods with small number of implementations are inlined - rather than heuristic prevents creation of many blocks provides large compilation units for optimization
  52. 52. Code Generation no interpreter, 3 code generation models threaded-execution method is sequence of Zig function addresses uses Zig tail-call-elimination - passes pc, sp, hp, thread, context primitives and control implementations stand-alone generator generates Zig code for methods depends on Zig inlining and excellent code generation JIT future LLVM jitter
  53. 53. Code Generation no interpreter, 3 code generation models threaded-execution method is sequence of Zig function addresses uses Zig tail-call-elimination - passes pc, sp, hp, thread, context primitives and control implementations stand-alone generator generates Zig code for methods depends on Zig inlining and excellent code generation JIT future LLVM jitter
  54. 54. Code Generation no interpreter, 3 code generation models threaded-execution method is sequence of Zig function addresses uses Zig tail-call-elimination - passes pc, sp, hp, thread, context primitives and control implementations stand-alone generator generates Zig code for methods depends on Zig inlining and excellent code generation JIT future LLVM jitter
  55. 55. Code Generation no interpreter, 3 code generation models threaded-execution method is sequence of Zig function addresses uses Zig tail-call-elimination - passes pc, sp, hp, thread, context primitives and control implementations stand-alone generator generates Zig code for methods depends on Zig inlining and excellent code generation JIT future LLVM jitter
  56. 56. Code Generation no interpreter, 3 code generation models threaded-execution method is sequence of Zig function addresses uses Zig tail-call-elimination - passes pc, sp, hp, thread, context primitives and control implementations stand-alone generator generates Zig code for methods depends on Zig inlining and excellent code generation JIT future LLVM jitter
  57. 57. Code Generation no interpreter, 3 code generation models threaded-execution method is sequence of Zig function addresses uses Zig tail-call-elimination - passes pc, sp, hp, thread, context primitives and control implementations stand-alone generator generates Zig code for methods depends on Zig inlining and excellent code generation JIT future LLVM jitter
  58. 58. Code Generation no interpreter, 3 code generation models threaded-execution method is sequence of Zig function addresses uses Zig tail-call-elimination - passes pc, sp, hp, thread, context primitives and control implementations stand-alone generator generates Zig code for methods depends on Zig inlining and excellent code generation JIT future LLVM jitter
  59. 59. Code Generation no interpreter, 3 code generation models threaded-execution method is sequence of Zig function addresses uses Zig tail-call-elimination - passes pc, sp, hp, thread, context primitives and control implementations stand-alone generator generates Zig code for methods depends on Zig inlining and excellent code generation JIT future LLVM jitter
  60. 60. Code Generation no interpreter, 3 code generation models threaded-execution method is sequence of Zig function addresses uses Zig tail-call-elimination - passes pc, sp, hp, thread, context primitives and control implementations stand-alone generator generates Zig code for methods depends on Zig inlining and excellent code generation JIT future LLVM jitter
  61. 61. Code Generation no interpreter, 3 code generation models threaded-execution method is sequence of Zig function addresses uses Zig tail-call-elimination - passes pc, sp, hp, thread, context primitives and control implementations stand-alone generator generates Zig code for methods depends on Zig inlining and excellent code generation JIT future LLVM jitter
  62. 62. Code Generation no interpreter, 3 code generation models threaded-execution method is sequence of Zig function addresses uses Zig tail-call-elimination - passes pc, sp, hp, thread, context primitives and control implementations stand-alone generator generates Zig code for methods depends on Zig inlining and excellent code generation JIT future LLVM jitter
  63. 63. Conclusions while the intent of this paper is to provide design principles, also described some implementation this is preliminary work, so some open questions many experiments to run to validate my intuitions
  64. 64. Conclusions while the intent of this paper is to provide design principles, also described some implementation this is preliminary work, so some open questions many experiments to run to validate my intuitions
  65. 65. Conclusions while the intent of this paper is to provide design principles, also described some implementation this is preliminary work, so some open questions many experiments to run to validate my intuitions
  66. 66. Questions? @DrDaveMason dmason@ryerson.ca https://github.com/dvmason/Zag-Smalltalk

×