Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Pharo Virtual Machine: News from the Front

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 110 Anzeige

Weitere Verwandte Inhalte

Ähnlich wie Pharo Virtual Machine: News from the Front (20)

Weitere von ESUG (20)

Anzeige

Aktuellste (20)

Pharo Virtual Machine: News from the Front

  1. 1. P. Tesone - G. Polito - ESUG’22
 @tesonep
 @guillep Pharo Virtual Machine News from the Front guillermo.polito@univ-lille.fr pablo.tesone@inria.fr
  2. 2. 2022 VM+ Team 2
  3. 3. Virtual Machine Execution Engine cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C 3
  4. 4. ARM64 Backend • ARM64 is now pervasive: • New Apple M1 • Raspberry Pi 4 • Microsoft Surface Pro X • PineBook Pro • … move r1 #1 move r2 #17 checkSmallInt r1 checkSmallInt r2 add r3 r1 r2 checkSmallInt r3 move r1 r3 ret 32bit x86 64bit x86_64 32bit ARMv5-7 64bit ARMv8 JIT compiler IR 4
  5. 5. Working Directly on Real Hardware • How to do a partial implementation, in an iterative way? • Hardware availability: did not have access to an Apple M1 yet • Slow Change-Compile-Test cycle • Bug reproduction is a demanding task 5
  6. 6. Simulation Environment Production VM (C) Simulation Environment (Pharo) Heap Native Code Cache Unicorn LLVM Disassembler VM Interpreter GC JIT Compiler Transpiled to Testing infrastructure Miranda et al. Two decades of smalltalk vm development: live vm development through simula ti on tools. 
 VMIL’18 6
  7. 7. Extending Simulation with Unit Tests Production VM (C) Simulation Environment (Pharo) Heap Native Code Cache Unicorn LLVM Disassembler VM Interpreter GC JIT Compiler Transpiled to Testing infrastructure 7
  8. 8. 8 Our testing infrastructure by example testPushConstantZeroBytecodePushesASmallIntegerZero self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode. self assert: self popAddress equals: (memory integerObjectOf: 0)
  9. 9. Reusable test fi xtures covering e.g., 9 Our testing infrastructure by example testPushConstantZeroBytecodePushesASmallIntegerZero self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode. self assert: self popAddress equals: (memory integerObjectOf: 0) Reusable test fi xtures covering e.g., - trampoline and stub compilation - heap initialization
  10. 10. Reusable test fi xtures covering e.g., 10 Our testing infrastructure by example testPushConstantZeroBytecodePushesASmallIntegerZero self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode. self assert: self popAddress equals: (memory integerObjectOf: 0) Reusable test fi xtures covering e.g., - trampoline and stub compilation - heap initialization Compiler internal DSL
  11. 11. testPushConstantZeroBytecodePushesASmallIntegerZero self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode. self assert: self popAddress equals: (memory integerObjectOf: 0) Compiler internal DSL JIT Execution helpers such as e.g., - run all code between two addresses - run until the PC hits an address 11 Our testing infrastructure by example Reusable test fi xtures covering e.g., - trampoline and stub compilation - heap initialization
  12. 12. testPushConstantZeroBytecodePushesASmallIntegerZero self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode. self assert: self popAddress equals: (memory integerObjectOf: 0) Blackbox testing Depend only on observable behaviour Reusable on different backends / Resistant to changes in the implementation 12
  13. 13. testPushConstantZeroBytecodePushesASmallIntegerZero self compile: [ compiler genPushConstantZeroBytecode ]. self runGeneratedCode. self assert: self popAddress equals: (memory integerObjectOf: 0) Cross-Compilation, Cross-Execution Hardware independent Parametrizable tests http://www.unicorn-engine 13
  14. 14. There is no silver bullet • Simulators are cheap, but not 100% trustworthy • Full execution (simulated or on real HW) • more expensive to run • cannot unit-test it (less controllable) • Unit tests only exercise speci fi c scenarios • Full executions exercise not yet covered scenarios 14
  15. 15. • Simulate the execution, less than you run tests • Run the real app, less than you simulate • Go back and forth: • Turn full execution failures into tests • Fix with the aid of the test:
 => unit test are faster to run
 => easier to debug
 => detect regressions Our testing Work fl ow 15
  16. 16. Testing & TDDing the VM • No useful unit tests by ~06/2020 • Large manual testing e ff ort during 2020 while porting to ARM64bits • Extended VM simulation with a (TDD compatible) unit testing infrastructure • 450+ written tests on the interpreter and the garbage collector* • 580+ written tests on the JIT compiler* • Parametrisable for 32 and 64bits, ARM32, ARM64, x86, x86-64 MPLR’21 * Numbers by 05/2021 16
  17. 17. Testing & TDDing the VM • No useful unit tests by ~06/2020 • Large manual testing e ff ort during 2020 while porting to ARM64bits • Extended VM simulation with a (TDD compatible) unit testing infrastructure • 450+ written tests on the interpreter and the garbage collector* • 580+ written tests on the JIT compiler* • Parametrisable for 32 and 64bits, ARM32, ARM64, x86, x86-64 MPLR’21 * Numbers by 05/2021 1040+ tests, are they enough? 17
  18. 18. How can we automatically test VMs? cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C PLDI’22 18
  19. 19. Challenges of VM Test Generation cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C ? • Do they cover di ff erent code regions/branches/ paths? • How do we determine what is the expected output
 of a generated test? Challenge 1: Test Challenge 2: Test PLDI’22 19
  20. 20. cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory Test Scenarios 1. Test Generation 2. Execution 2. Execution 3. Di ff erential Testing I C 20 Interpreter-Guided Automatic JIT Compiler Unit Testing
  21. 21. cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory Test Scenarios 1. Test Generation 2. Execution 2. Execution 3. Di ff erential Testing I C 21 PLDI’22 Interpreter-Guided Automatic JIT Compiler Unit Testing
  22. 22. cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory Test Scenarios 1. Test Generation 2. Execution 2. Execution 3. Di ff erential Testing I C 22 PLDI’22 Interpreter-Guided Automatic JIT Compiler Unit Testing
  23. 23. cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory Test Scenarios 1. Test Generation 2. Execution 2. Execution 3. Di ff erential Testing I C Interpreter-Guided Automatic JIT Compiler Unit Testing PLDI’22 23
  24. 24. cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory Test Scenarios 1. Test Generation 2. Execution 2. Execution 3. Di ff erential Testing I C Insight 1: Interpreters are Executable Semantics => Concolic Meta- Interpretation 24 PLDI’22 Interpreter-Guided Automatic JIT Compiler Unit Testing
  25. 25. PLDI’22 25 Pharo VM Example Interpreter are Executable Semantics
  26. 26. Interpreter are Executable Semantics Pharo VM Example If both operands are integers If their sum does not over fl ow Else, slow path => message send 26 PLDI’22
  27. 27. cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory Test Scenarios 1. Test Generation 2. Execution 2. Execution 3. Di ff erential Testing I C Interpreter-Guided Automatic JIT Compiler Unit Testing Insight 1: Interpreters are Executable Semantics => Concolic Meta- Interpretation Insight 2: Interpreters and Compiler Share Semantics => Di ff erential Testing 27 PLDI’22
  28. 28. 28 Pharo VM Example Interpreter VS Compiled Code PLDI’22
  29. 29. Interpreter VS Compiled Code Pharo VM Example PLDI’22 29
  30. 30. Concolic Testing through Meta-interpretation • Idea: Guide test generation by looking at the implementation Different cases if x > 100 or <= 100!! Different cases if x = 1023 or != 1023 Godefroid et al. DART: Directed Automated Random Testing. PLDI’05
 Set et al. CUTE: a concolic unit testing engine for C. FSE’05 30 int f(int x, int y){ if (x > 100){ if (y == 1023){ segfault(!!) } } }
  31. 31. x y constraints next? • Concrete + Symbolic execution • Goal: automatically discover all execution paths int f(int x, int y){ if (x > 100){ if (y == 1023){ segfault(!!) } } } Godefroid et al. DART: Directed Automated Random Testing. PLDI’ 05
 Set et al. CUTE: a concolic unit testing engine for C. FSE’05 Concolic Testing by Example 31
  32. 32. x y constraints next? 0 0 x <= 100 • Concrete + Symbolic execution • Goal: automatically discover all execution paths int f(int x, int y){ if (x > 100){ if (y == 1023){ segfault(!!) } } } Godefroid et al. DART: Directed Automated Random Testing. PLDI’ 05
 Set et al. CUTE: a concolic unit testing engine for C. FSE’05 Concolic Testing by Example 32
  33. 33. x y constraints next? 0 0 x <= 100 x > 100 • Concrete + Symbolic execution • Goal: automatically discover all execution paths int f(int x, int y){ if (x > 100){ if (y == 1023){ segfault(!!) } } } Godefroid et al. DART: Directed Automated Random Testing. PLDI’ 05
 Set et al. CUTE: a concolic unit testing engine for C. FSE’05 Concolic Testing by Example 33
  34. 34. x y constraints next? 0 0 x <= 100 x > 100 101 0 • Concrete + Symbolic execution • Goal: automatically discover all execution paths int f(int x, int y){ if (x > 100){ if (y == 1023){ segfault(!!) } } } Godefroid et al. DART: Directed Automated Random Testing. PLDI’ 05
 Set et al. CUTE: a concolic unit testing engine for C. FSE’05 Concolic Testing by Example 34
  35. 35. x y constraints next? 0 0 x <= 100 x > 100 101 0 x > 100, y != 1023 • Concrete + Symbolic execution • Goal: automatically discover all execution paths int f(int x, int y){ if (x > 100){ if (y == 1023){ segfault(!!) } } } Godefroid et al. DART: Directed Automated Random Testing. PLDI’ 05
 Set et al. CUTE: a concolic unit testing engine for C. FSE’05 Concolic Testing by Example 35
  36. 36. x y constraints next? 0 0 x <= 100 x > 100 101 0 x > 100, y != 1023 x > 100, y == 1023 • Concrete + Symbolic execution • Goal: automatically discover all execution paths int f(int x, int y){ if (x > 100){ if (y == 1023){ segfault(!!) } } } Godefroid et al. DART: Directed Automated Random Testing. PLDI’ 05
 Set et al. CUTE: a concolic unit testing engine for C. FSE’05 Concolic Testing by Example 36
  37. 37. x y constraints next? 0 0 x <= 100 x > 100 101 0 x > 100, y != 1023 x > 100, y == 1023 101 1023 • Concrete + Symbolic execution • Goal: automatically discover all execution paths int f(int x, int y){ if (x > 100){ if (y == 1023){ segfault(!!) } } } Godefroid et al. DART: Directed Automated Random Testing. PLDI’ 05
 Set et al. CUTE: a concolic unit testing engine for C. FSE’05 Concolic Testing by Example 37
  38. 38. x y constraints next? 0 0 x <= 100 x > 100 101 0 x > 100, y != 1023 x > 100, y == 1023 101 1023 x > 100, y != 1023 fi nished! • Concrete + Symbolic execution • Goal: automatically discover all execution paths int f(int x, int y){ if (x > 100){ if (y == 1023){ segfault(!!) } } } Godefroid et al. DART: Directed Automated Random Testing. PLDI’ 05
 Set et al. CUTE: a concolic unit testing engine for C. FSE’05 Concolic Testing by Example 38
  39. 39. Some Numbers • 3 bytecode compilers + 1 native method compiler • 4928 tests generated • 478 di ff erences 39
  40. 40. Analysis of Differences through Manual Inspection • 91 causes, 6 di ff erent categories • Errors both in the interpreter AND the compilers • 14 causes of segmentation faults! 40
  41. 41. • Test generation ~5 minutes • Total run time of ~10 seconds • Avg 30ms per instruction Practical and Cheap 10 100 1000 Native Method Stack−to−Register Simple Linear−Allocator Compiler Time (ms) − log scale 41
  42. 42. More in the PLDI article! • Discovered Bugs • Concolic Model • Testing Infrastructure PLDI’22 42
  43. 43. Is that all?
  44. 44. Ongoing RISCV64 Port • Currently under development: Real HW testing stage • Taking advantage of our harness test suite • Improving tests and scenarios • Collaboration with Q. Ducasse, P. Cortret, L. Lagadec from ENSTA Bretagne • Future work on: Hardware-based security enforcement 44
  45. 45. Single Instruction Multiple Data Extensions 45
  46. 46. SIMD Design Space • VM Primitives • Specialised • Faster, less checks • Vectorised Bytecode • Composable • Safe at the expense
 of speed 46
  47. 47. Tools for Debugging • Machine Code Debugger • Compiler IR Visualisations • Disassembler DSL • … 47
  48. 48. Pharo VM Manual Variable Localisation interpret self fetchNextBytecode. [ true ] whileTrue: [ self dispatchOn: currentBytecode in: BytecodeTable ]. ^ nil pushReceiverBytecode self fetchNextBytecode. self internalPush: self receiver pushBool: trueOrFalse <inline: true> self push: (objectMemory booleanObjectOf: trueOrFalse) internalAboutToReturn: resultOop through: aContext <inline: true> 
 […] self internalPush: resultOop 
 […] internalPush: aValue localSP := localSP - bytesPerWord. self longAt: localSP put: aValue push: aValue stackPointer := stackPointer - byte self longAt: stackPointer put: aVal Developer should know C generation semantics 48
  49. 49. Automatic Variable localisation! Intel x86-64 0.0 0.5 1.0 1.5 ArrayAccess BinaryTrees Chameleons ChameneosRedux ClassVarBinding Compiler Benchmark Relative performance group 1−fp 2−spfp 3−ip 4−ipfp 5−ipsp 6−sp 7−ipfpsp Averages of 100 iterations + stdev. Relative to baseline (no optimisation). Higher i 49
  50. 50. Analysing Code Cache Behavior Analysing Events We see trashing in the code Red: Compaction Events We need to increase the size of the code Occupation Rate 50
  51. 51. Code Cache Unexpected results Execution time for di ff erent Young Space size (1MB, 10MB, 100MB) and Cache Sizes (1.44MB, 2.8MB, 5MB, 10MB) Young Space Code Cache Size Time Loading Moose 51
  52. 52. We are hiring! • We have • Engineer Positions • Phd Positions • Keywords: Compilers, Interpreters, Memory Management, Security • Come talk to us! 52
  53. 53. Pharo VM - News from the Front @pharoproject pharo.org
 consortium-adm@pharo.org discord.gg/QewZMZa thepharo.dev Permanent Space New Image Format Faster Startup / Saving Ephemerons Speculative
 Compilation cold hot hot Interpre Compil Managed I C guillermo.polito@univ-lille.fr Stay in Sync! 53
  54. 54. Conclusion • 478 di ff erences found, 91 causes, 6 categories • Practical: • 4928 tests generated in ~8 minutes • 4928 tests run in ~40 seconds cold hot hot Inter Compiled Execution Manage Test Scenarios 1. Test Generation 2. Execution 2. Execution 3. Di ff erential Testing I C Guille Polito - Pablo Tesone - Stéphane Ducasse guillermo.polito@univ-lille.fr
 @guillep 54
  55. 55. Improvements: Clean Up • V3 Support • Old Memory Format • Old Block Closures • Dead Code • ~ 65KLOC 55
  56. 56. Improvements: Sockets • Uni fi ed Implementation in all Platforms • Better Async Support • Unix Sockets (Under Work) • IPv6 Addresses (Under Work) 56
  57. 57. Improvements: Serial Port FFI • Pure FFI implementation • Working in all Platforms (Unix / Windows / OSX) • Migrating Plugins to FFI 57
  58. 58. Improvements: RISCV64 Ongoing Port • Currently under development: Real HW testing stage • Taking advantage of our harness test suite. • Improving tests and scenarios • Collaboration with Q. Ducasse, P. Cortret, L. Lagadec from ENSTA Bretagne • Future work on: Hardware-based security enforcement 58
  59. 59. Improvements: Open Build Service Better Support for Linux Distributions Building using existing system Supporting system Multiple Architectures Initial targets: • Arch / Manjaro • Debian • Fedora • Raspbian • Ubuntu • openSuse 59
  60. 60. Improvements: Visual Studio Support Building & Debugging MSVC - No cygwin 60
  61. 61. Improvements: Windows ARM MSVC - No cygwin 61
  62. 62. Improvements: Raspbian 32/64 bits 62
  63. 63. Back to the Future Objectives for 2022 63
  64. 64. Permanent Space Problem • Many permanent objects • They have references from/to other objects • We are traversing them to GC • E.g., Classes, Methods, Literals, Resources Generates Long Pauses 64
  65. 65. Permanent Space Our Solution • New Object Space for permanent Objects • Minimise or Eliminate GC passes • Persisting them through executions 65
  66. 66. Permanent Space Our Solution • New Object Space for permanent Objects • Minimise or Eliminate GC passes • Persisting them through executions We need to put them in a 66
  67. 67. New Image Format Problem • Current Image format only support a single object space • No extensible: not new metadata nor new data • Cannot be Memory Mapped (it is modi fi ed before save/load) • Requires to discard all state of the running VM (slow saves) 67
  68. 68. New Image Format Problem • Current Image format only support a single object space • No extensible: not new metadata nor new data • Cannot be Memory Mapped (it is modi fi ed before save/load) • Requires to discard all state of the running VM (slow saves) Slow and Restricting 68
  69. 69. New Image Format Our Solution • New Image format based in directories / bundles • Many Elements of data and metadata • Metadata en User & Machine readable format (STON?) • Extensible format 69
  70. 70. Fast Snapshots / Loading Based on PermSpace & Image Format • Memory Mapped Image • Shareable State • Saving / Loading Warm State of the VM 70
  71. 71. • ARM64, RISCV64, Slang… • Lots of Tests! • Integration: Sockets, serial • Visual Studio, Open Build Service @pharoproject pharo.org
 consortium-adm@pharo.org discord.gg/QewZMZa thepharo.dev Permanent Space New Image Format Faster Startup / Saving Next Objectives Ephemerons Speculative
 Compilation 71
  72. 72. Guille Polito, Stéphane Ducasse, Pablo Tesone,
 Théo Rogliano, Pierre Misse-Chanabier, Carolina Hernandez, Luc Fabresse RMoD Team — Inria Lille Nord Europe — UMR9189 CRIStAL — CNRS Cross-ISA Testing of the Pharo VM Lessons learned while porting to ARMv8 64bits Tool Paper — MPLR’21 72
  73. 73. Context The Pharo VM 73
  74. 74. Some Numbers • 255 stack based bytecodes (77 di ff erent) + ~340 primitives/ native methods • 146 di ff erent IR instructions • polymorphic inline caches • threaded code interpreter • generational scavenger GC Lots of combinations! 74
  75. 75. Objective: Implementing an ARM64 Backend • ARM64 is now pervasive: • New Apple M1 • Raspberry Pi 4 • Microsoft Surface Pro X • PineBook Pro • … move r1 #1 move r2 #17 checkSmallInt r1 checkSmallInt r2 add r3 r1 r2 checkSmallInt r3 move r1 r3 ret 32bit x86 64bit x86_64 32bit ARMv5-7 64bit ARMv8 JIT compiler IR 75
  76. 76. Case Study 1 Porting the Cogit JIT Compiler to ARM64 • Started with no tests and no hardware (main target Apple M1) • Incremental test development: bytecode, native methods, PICs, code patching • All tests run from the beginning on our four targets:
 x86, x86-64, ARM32 and ARM64 • Test allowed safe modi fi cations in the IR to support 
 e.g., ARM64 Multiplication over fl ow • ARM64 speci fi c tests covered stack alignment, W+X … 76
  77. 77. Case Study 2 Ongoing Port to RISCV64 • Currently under development • Is our harness test suite enough to develop a new backend? • Are our tests general enough? • Collaboration with Q. Ducasse, P. Cortret, L. Lagadec from ENSTA Bretagne • Future work on: Hardware-based security enforcement 77
  78. 78. Case Study 3 Debugging and Testing Memory Corruptions • Bug report using Ephemerons
 https://github.com/pharo-project/pharo/issues/8153 • Starting the other way around • First reproducing the bug in real-hardware
 => long to execute (even longer in simulation)
 => required manual developer intervention • Then building a unit test from observations • Test becomes a part of the regression test suite 78
  79. 79. Future Perspectives Automatic VM Validation • Automatic (Unit?) Test Case Generation • Interpreter vs Compiler Di ff erential Testing • VM Tailored Multi-level Debugging 79
  80. 80. Real Hardware
 Execution Full-System
 Simulation Unit-Testing Feedback-cycle speed Very low Low High Availability Low High High Reproducibility Low Low High Precision High Low Low Debuggability Low High High Lessons learned while porting to ARMv8 64bits Cross-ISA Testing of the Pharo VM 80
  81. 81. Debugging a compiler Insights: build your own tools, based on needs, not desires Examples: • Machine code debugger • Bytecode-IR visualization • Disassembler DSL 81
  82. 82. Interpreter-Guided JIT Compiler Test Generation Validating the Pharo JIT compiler through concolic execution and di ff erential testing Guille Polito - Pablo Tesone - Stéphane Ducasse guillermo.polito@univ-lille.fr
 @guillep PLDI’22 — San Diego 82
  83. 83. Virtual Machine Execution Engine cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C 83
  84. 84. Virtual Machine Execution Engine cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C 84
  85. 85. Virtual Machine Execution Engine cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C 85
  86. 86. Virtual Machine Execution Engine cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C 86
  87. 87. Virtual Machine Execution Engine cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C 87
  88. 88. How can we automatically test VMs? cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C 88
  89. 89. Challenges of VM Test Generation cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C ? • Do they cover di ff erent code regions/branches/ paths? • 
 
 How do we determine what is the expected output
 of a generated test? Challenge 1: Test Challenge 2: Test 89
  90. 90. Black Box Testing + Fuzzing cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C 90
  91. 91. Black Box Testing + Fuzzing cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory I C Slow Coarse Grained Non Determinism Require Multiple Reference Implementations 91
  92. 92. cold code hot spot detection hot code Interpreted Execution Compiled Execution Managed Memory Test Scenarios 1. Test Generation 2. Execution 2. Execution 3. Di ff erential Testing I C Interpreter-Guided Automatic JIT Compiler Unit Testing 92
  93. 93. Implementation View Interpreter Instruction Path input contraints output constraints Path input contraints output constraints Path input contraints output constraints Path input contraints output constraints Concrete Input VM Frame Compiled Instruction Concrete Output VM Frame Differential Results 1. concolic exploration 2. compilation 3. concrete JIT test execution 4. validate 93
  94. 94. Implementation View Interpreter Instruction Path input contraints output constraints Path input contraints output constraints Path input contraints output constraints Path input contraints output constraints Concrete Input VM Frame Compiled Instruction Concrete Output VM Frame Differential Results 1. concolic exploration 2. compilation 3. concrete JIT test execution 4. validate 94
  95. 95. Implementation View Interpreter Instruction Path input contraints output constraints Path input contraints output constraints Path input contraints output constraints Path input contraints output constraints Concrete Input VM Frame Compiled Instruction Concrete Output VM Frame Differential Results 1. concolic exploration 2. compilation 3. concrete JIT test execution 4. validate 95
  96. 96. Implementation View Interpreter Instruction Path input contraints output constraints Path input contraints output constraints Path input contraints output constraints Path input contraints output constraints Concrete Input VM Frame Compiled Instruction Concrete Output VM Frame Differential Results 1. concolic exploration 2. compilation 3. concrete JIT test execution 4. validate 96
  97. 97. Implementation View Interpreter Instruction Path input contraints output constraints Path input contraints output constraints Path input contraints output constraints Path input contraints output constraints Concrete Input VM Frame Compiled Instruction Concrete Output VM Frame Differential Results 1. concolic exploration 2. compilation 3. concrete JIT test execution 4. validate 97
  98. 98. Concolic Meta-Interpretation Model • Models VM behaviour during concolic execution • Frame • Objects + types • Classes • Then fl attened into SAT solver equations id class_object type value (if small integer) slots AbstractObject receiver method argument_size arguments operand_stack_size operand_stack AbstractVMFrame format class_id AbstractClass * class_object 98
  99. 99. Experimental Context: The Pharo VM • Interpreted-compiled mixed execution • Some numbers: • 255 stack based bytecodes • ~340 primitives/native methods • 146 di ff erent IR instructions • x86, x86-64, ARMv7, ARMv8, RISC-V • Industrial consortium: • 28 International companies, 26 academic partners 99
  100. 100. Previous Manual Testing Effort • No useful unit tests by ~06/2020 • Large manual testing e ff ort during 2020 while porting to ARM64bits • Extended VM simulation with a (TDD compatible) unit testing infrastructure • 450+ written tests on the interpreter and the garbage collector* • 580+ written tests on the JIT compiler* • Parametrisable for 32 and 64bits, ARM32, ARM64, x86, x86-64 Cross-ISA Testing of the Pharo VM. Lessons learned while porting to ARMv8 64bits. * Numbers by 05/20 100
  101. 101. Evaluation • 3 bytecode compilers + 1 native method compiler • 4928 tests generated • 478 di ff erences 101
  102. 102. Analysis of Differences through Manual Inspection • 91 causes, 6 di ff erent categories • Errors both in the interpreter AND the compilers • 14 causes of segmentation faults! 102
  103. 103. Characterising Concolic Execution Paths per instruction 1 10 100 Bytecode Native Method Paths per Instruction Time (ms) − log scale Paths per
 Type of Instruction • Native methods present in average more paths than bytecode instructions
 
 => longer time to explore
 
 => potentially more bugs Paths 103
  104. 104. • Test generation ~5 minutes • Total run time of ~10 seconds • Avg 30ms per instruction Practical and Cheap 10 100 1000 Native Method Stack−to−Register Simple Linear−Allocator Compiler Time (ms) − log scale 104
  105. 105. More in the article! • Discovered Bugs • Concolic Model • Testing Infrastructure 105
  106. 106. Conclusion • 478 di ff erences found, 91 causes, 6 categories • Practical: • 4928 tests generated in ~8 minutes • 4928 tests run in ~40 seconds cold hot hot Inter Compiled Execution Manage Test Scenarios 1. Test Generation 2. Execution 2. Execution 3. Di ff erential Testing I C Guille Polito - Pablo Tesone - Stéphane Ducasse guillermo.polito@univ-lille.fr
 @guillep 106
  107. 107. Extras 107
  108. 108. Simulation + Testing Environment Production VM (C) Simulation Environment (Pharo) Heap Native Code Cache Unicorn LLVM Disassembler VM Interpreter GC JIT Compiler Transpiled to Testing infrastructure 108
  109. 109. Unit Testing Infrastructure Comparison Real Hardware
 Execution Full-System
 Simulation Unit-Testing Feedback- cycle speed Very low Low High Availability Low High High Reproducibilit y Low Low High Precision High Low Low Debuggability Low High High 109
  110. 110. s1 = max small int s2 = small int 1 s1 = max small int s2 = small int 1 receiver = ? method = ? operand_stack_size > 1 AND s1 is small int AND s2 is small int AND s3 > max small int operand_stack_size > 1 AND s1 is small int AND s2 not small int receiver = ? method = ? operand stack operand stack operand stack operand stack Abstract Input Frame Abstract Output Frame + Recorded Path Constraints Negated Path Constraints receiver = ? method = ? receiver = ? method = ? operand_stack_size <= 1 operand_stack_size > 1 operand stack operand stack (empty) (empty) s1 = small int s2 = small int s3 = s1 + s2 receiver = ? method = ? receiver = ? method = ? operand_stack_size > 1 AND s1 is small int AND s2 is small int AND (s3 < max small int AND s3 > min small int) operand_stack_size > 1 AND s1 is small int AND s2 is small int AND ! (s3 < max small int AND s3 > min small int) operand stack operand stack s1 = small int s2 = obj s1 = small int s2 = obj receiver = ? method = ? receiver = ? method = ? operand_stack_size > 1 AND s1 is small int AND s2 not small int operand_stack_size > 1 AND s1 not small int s1 = obj s2 = small int s1 = obj s2 = small int receiver = ? method = ? receiver = ? method = ? operand_stack_size > 1 AND s1 not small int (empty) operand stack operand stack <<constraint solving + abstract frame construction>> <<constraint solving + abstract frame construction>> <<constraint solving + abstract frame construction>> <<constraint solving + abstract frame construction>> concolic execution Path negation Concolic Execution #1 Concolic Execution #2 Concolic Execution #3 Concolic Execution #4 Concolic Execution #5 Exit: invalid frame Exit: success Exit: failure Exit: failure Exit: failure 110

×