Diese Präsentation wurde erfolgreich gemeldet.

Porting NetBSD to the open source LatticeMico32 CPU

1

Teilen

Wird geladen in …3
×
29 von 76
29 von 76

Porting NetBSD to the open source LatticeMico32 CPU

1

Teilen

Herunterladen, um offline zu lesen

In this talk I gave at EHSM 2014 event ( http://ehsm.eu ) I am explaining what a MMU is and how it works. I then explain how I ported NetBSD (and EdgeBSD which is a fork of NetBSD) on this open source LM32 CPU in which I added an MMU.

In this talk I gave at EHSM 2014 event ( http://ehsm.eu ) I am explaining what a MMU is and how it works. I then explain how I ported NetBSD (and EdgeBSD which is a fork of NetBSD) on this open source LM32 CPU in which I added an MMU.

Weitere Verwandte Inhalte

Ähnliche Bücher

Kostenlos mit einer 14-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 14-tägigen Testversion von Scribd

Alle anzeigen

Porting NetBSD to the open source LatticeMico32 CPU

  1. 1. Porting NetBSD on the open source LatticeMico32 CPU Yann Sionneau M-Labs @ EHSM 2014
  2. 2. About me • Yann Sionneau • Embedded software developer • Working at Sequans Communication • M-Labs contributor • @yannsionneau on twitter • Email: yann.sionneau@gmail.com
  3. 3. I’m going to talk about… How to run NetBSD and EdgeBSD on the Milkymist One
  4. 4. Agenda • I) The hardware part: the MMU –What is a MMU and how it works • II) The software part –How to port NetBSD to a new CPU
  5. 5. Milkymist One?!
  6. 6. Milkymist One?!
  7. 7. Milkymist One?!
  8. 8. The Milkymist One uses an FPGA
  9. 9. What’s an FPGA?? • A chip
  10. 10. FPGA internals
  11. 11. Milkymist System-on-Chip
  12. 12. LatticeMico32 CPU • 32 bits Harvard Architecture RISC • Big Endian • 6 stages • Fully bypassed • Optional configurable I/D caches – Direct mapped or – 2-way set associative • Wishbone on-chip bus
  13. 13. LatticeMico32 , Good points • Small • Portable (works with several FPGA vendors) • Fast (~100 MHz on Slowtanpartan 6) • Actually works • GCC/Binutils/GDB/Qemu/uCLinux/OpenWRT support • OPEN SOURCE
  14. 14. LatticeMico32, Bad points • No Memory Management Unit… yet!
  15. 15. LatticeMico32, Bad points • No Memory Management Unit… yet! Done 
  16. 16. Used in… • Closed source commercial ASICs • Open source projects • Can achieve 800 MHz in TSMC 90nm standard cell process
  17. 17. LatticeMico32 pipeline
  18. 18. What’s a pipeline? • « In computing, a pipeline is a set of data processing elements connected in series, where the output of one element is the input of the next one. » -- Pipeline (computing), Wikipedia
  19. 19. What’s a pipeline? Data processing element 1 Data processing element 2 Data processing element 3 IN IN INOUTOUT OUT
  20. 20. What’s a pipeline? $ cat .bash_history | grep 'cat' | wc -l 6
  21. 21. What’s a CPU pipeline?
  22. 22. What’s a CPU pipeline?
  23. 23. Pipelined instruction execution Instr. number Pipeline Stage 1 A 2 3 4 Clock cycle 1 2 3 4 5 6 7
  24. 24. Pipelined instruction execution Instr. number Pipeline Stage 1 A F 2 A 3 4 Clock cycle 1 2 3 4 5 6 7
  25. 25. Pipelined instruction execution Instr. number Pipeline Stage 1 A F D 2 A F 3 A 4 Clock cycle 1 2 3 4 5 6 7
  26. 26. Pipelined instruction execution Instr. number Pipeline Stage 1 A F D X 2 A F D 3 A F 4 A Clock cycle 1 2 3 4 5 6 7
  27. 27. Pipelined instruction execution Instr. number Pipeline Stage 1 A F D X M 2 A F D X 3 A F D 4 A F Clock cycle 1 2 3 4 5 6 7
  28. 28. Pipelined instruction execution Instr. number Pipeline Stage 1 A F D X M W 2 A F D X M 3 A F D X 4 A F D Clock cycle 1 2 3 4 5 6 7
  29. 29. Pipelined instruction execution Instr. number Pipeline Stage 1 A F D X M W 2 A F D X M W 3 A F D X M 4 A F D X Clock cycle 1 2 3 4 5 6 7
  30. 30. Main Memory CPU Internal Before PHYSICAL ADDRESS PHYSICAL ADDRESS PA PA
  31. 31. Main Memory CPU Internal Raising exception After VIRTUAL ADDRESSES PHYSICAL ADDRESSES
  32. 32. What’s the MMU’s job? • Translate « virtual addresses » into « physical addresses » • Memory protection against unwanted execution of code or data write (e.g. software bug or security issue) – Memory right access management
  33. 33. Main Memory CPU pipeline VA PA VA : Virtual Address PA : Physical Address
  34. 34. Main Memory CPU pipeline VA PA VA : Virtual Address PA : Physical Address How does the MMU know the VA->PA translation ?
  35. 35. Main Memory CPU pipeline VA PA VA : Virtual Address PA : Physical Address Page Table
  36. 36. Main Memory CPU pipeline VA PA VA : Virtual Address PA : Physical Address Page TableWhy « PAGE »?
  37. 37. Why « Page »? • 0x00000004 -> 0x10000000 • 0x00000005 -> 0x10000001 • 0x00000006 -> 0x10000002 Etc…
  38. 38. Why « Page »? • 0x00000004 -> 0x10000000 • 0x00000005 -> 0x10000001 • 0x00000006 -> 0x10000002 Etc… This is WRONG!!!
  39. 39. Why « Page »? • 0x00000*** -> 0x10000*** • 0x00001*** -> 0x10001*** • 0x00002*** -> 0x10002*** Etc…
  40. 40. Main Memory CPU pipeline VA PA VA : Virtual Address PA : Physical Address Page Table
  41. 41. Main Memory CPU pipeline VA PA VA : Virtual Address PA : Physical Address Page Table TLB TLB : Translation Lookaside Buffer
  42. 42. Main Memory CPU pipeline VA PA VA : Virtual Address PA : Physical Address Page Table TLB Operating System Updates the Gets information from the Updates the
  43. 43. Features? • Page size –Only 4 kB 32 bits physical address : xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx How many bits of an address indicate the offset within a given page?
  44. 44. Features? • Page size –Only 4 kB 32 bits physical address : xxxxxxxx xxxxxxxx xxxx xxxx xxxxxxxx Page number [31:12] 20 bits Offset [11:0] 12 bits
  45. 45. Features? • 2 TLB (Translation Lookaside Buffer) –ITLB –DTLB • Each TLB contains 1024 entries –How many bits needed to index the TLB?
  46. 46. Features? • 2 TLB (Translation Lookaside Buffer) –ITLB –DTLB • Each TLB contains 1024 entries –How many bits needed to index the TLB? 10 bits!
  47. 47. Features? • No hardware page-tree walker – i.e. TLB is software assisted
  48. 48. Virtual address Load or store? Instruction or Data? Physical address Access granted/denied
  49. 49. Virtual address Load or store? Instruction or Data? Physical address Access granted/denied I don’t know!
  50. 50. Let’s have a look inside
  51. 51. Tag [10] Physical page number [20] Read-only [1] Valid [1] 0xABC 0xABC00 0 0 0x280 0xB0001 1 1 0x300 0x00001 0 1 The TLB VA = 0xA0001004
  52. 52. Tag [10] Physical page number [20] Read-only [1] Valid [1] 0xABC 0xABC00 0 0 0x280 0xB0001 1 1 0x300 0x00001 0 1 The TLB VA = 0xA0001 004 Page number Offset in the page
  53. 53. Tag [10] Physical page number [20] Read-only [1] Valid [1] 0xABC 0xABC00 0 0 0x280 0xB0001 1 1 0x300 0x00001 0 1 The TLB VA = 0xA0001 004 Page offset = 4
  54. 54. Tag [10] Physical page number [20] Read-only [1] Valid [1] 0xABC 0xABC00 0 0 0x280 0xB0001 1 1 0x300 0x00001 0 1 The TLB VA = 0xA0001 004 Page offset = 4 Virtual Page number = 0xA0001
  55. 55. Tag [10] Physical page number [20] Read-only [1] Valid [1] 0xABC 0xABC00 0 0 0x280 0xB0001 1 1 0x300 0x00001 0 1 The TLB VA = 0xA0001 004 Page offset = 4 Virtual Page number = 0xA0001 VPN = 0xA0001  1010 0000 0000 0000 0001
  56. 56. Tag [10] Physical page number [20] Read-only [1] Valid [1] 0xABC 0xABC00 0 0 0x280 0xB0001 1 1 0x300 0x00001 0 1 The TLB VA = 0xA0001 004 Page offset = 4 Virtual Page number = 0xA0001 TLB index = 1 VPN = 0xA0001  1010 0000 00 00 0000 0001 TLB index, used to select a TLB line
  57. 57. Tag [10] Physical page number [20] Read-only [1] Valid [1] 0xABC 0xABC00 0 0 0x280 0xB0001 1 1 0x300 0x00001 0 1 The TLB VA = 0xA0001 004 Page offset = 4 Virtual Page number = 0xA0001 TLB index = 1 VPN = 0xA0001  1010 0000 00 00 0000 0001 TLB index, used to select a TLB line
  58. 58. Tag [10] Physical page number [20] Read-only [1] Valid [1] 0xABC 0xABC00 0 0 0x280 0xB0001 1 1 0x300 0x00001 0 1 The TLB VA = 0xA0001 004 Page offset = 4 Virtual Page number = 0xA0001 TLB index = 1 VPN = 0xA0001  1010 0000 00 00 0000 0001 Tag = 0x280  1010 0000 00 =
  59. 59. Tag [10] Physical page number [20] Read-only [1] Valid [1] 0xABC 0xABC00 0 0 0x280 0xB0001 1 1 0x300 0x00001 0 1 The TLB VA = 0xA0001 004 Page offset = 4 Virtual Page number = 0xA0001 TLB index = 1 VPN = 0xA0001  1010 0000 00 00 0000 0001 Tag = 0x280  1010 0000 00 = Physical page number = 0xB0001
  60. 60. Tag [10] Physical page number [20] Read-only [1] Valid [1] 0xABC 0xABC00 0 0 0x280 0xB0001 1 1 0x300 0x00001 0 1 The TLB VA = 0xA0001 004 Page offset = 4 Virtual Page number = 0xA0001 TLB index = 1 VPN = 0xA0001  1010 0000 00 00 0000 0001 Tag = 0x280  1010 0000 00 = Physical page number = 0xB0001 Physical Address = 0xB0001004
  61. 61. Porting NetBSD • 1°) NetBSD cross compilation toolchain – build.sh – Makefiles here and there – Arch-specific directories Allows to do: $ ./build.sh -U -m lm32 tools
  62. 62. Porting NetBSD • 2°) Support for built-ins in libkern – NetBSD kernel is • Not linked against libgcc • Linked against libkern – Need to implement basic arithmetic functions emitted by gcc in object code – Implementation in sys/lib/libkern/arch/lm32
  63. 63. Porting NetBSD • 3°) Building my first kernel – Create sys/arch/lm32 and sys/arch/milkymist – Populate • sys/arch/<cpu|soc>/include • sys/arch/<cpu|soc>/conf – Stub, stub, stub… Allows to do: $ ./build.sh -m milkymist -U kernel=GENERIC
  64. 64. Porting NetBSD • 4°) Write basic console driver for early prints struct consdev milkymist_com_cons = { […] milkymist_com_cngetc, /* cn_getc: kernel getchar interface */ milkymist_com_cnputc, /* cn_putc: kernel putchar interface */ […] };
  65. 65. Porting NetBSD • 5°) Implement exception handlers • 6°) Call milkymist_startup() C code – Initialize console driver • -> consinit() -> milkymist_uart_cnattach() • cn_tab = &milkymist_com_cons; – Initialiaze virtual memory subsystem • Call MD pmap_bootstrap() – Let the kernel boot • Call NetBSD MI main()
  66. 66. Porting NetBSD • 7°) Implement pmap.9 pmap -- machine-dependent portion of the virtual memory system – pmap_bootstrap() – pmap_init, pmap_create, pmap_destroy … – SW managed TLB? -> sys/uvm/pmap/ – used in (PowerPC Booke and LM32)
  67. 67. Porting NetBSD • 8°) Implement copyin/copyout • 9°) Implement atomic operations – No atomic instruction  RAS (Restartable Atomic Sequence) CAS (Compare And Swap) – Other atomic ops built around this CAS
  68. 68. RAS CAS int _atomic_cas_32(volatile uint32_t *val, uint32_t old, uint32_t new); _atomic_cas_32: _atomic_cas_ras_start: lw r4, (r1+0) /* load *val into r4 */ bne r4, r2, 1f /* compare r4 (*val) and old (r2) */ sw (r1+0), r3 _atomic_cas_ras_end: 1: mv r1, r4 /* return (*val) */ ret
  69. 69. Porting NetBSD • 10°) Add support for interrupts – Write a function to register interrupt handlers • 11°) Have a running system clock – Write cpu_initclocks() – Write clock irq handler • Call hardclock()
  70. 70. Other functions to write • Switch context from one thread to another – cpu_switchto(9) • Copy data and abort on page fault – kcopy(9) • Save current context – setfault() • Low level code to finish up fork() operation – cpu_lwp_fork(9)
  71. 71. Other functions to write • Block interrupts to protect critical sections – spl(9) • Init CPU and print copyright message – cpu_startup(9) • Determine the root file system device – cpu_rootconf(9) • Etc…
  72. 72. Porting NetBSD • To boot user space – Create dummy ramdisk with /sbin/init – Build kernel with MFS – Insert ramdisk with mdsetimage – Boot it!
  73. 73. Porting NetBSD DEMO
  74. 74. Thank you! Sébastien Bourdeauducq, Michael Walle, Robert Swindells, Stefan Kristiansson, Lars-Peter Clausen, Pierre Pronchery, Radoslaw Kujawa, Youri Mouton, Matt Thomas, tech-kern@, M- Labs mailing list, and many more
  75. 75. Questions?
  76. 76. NetBSD/milkymist Memory Layout Kernel space User space 0 0xffffffff 0xc0000000 0xc8000000 Ram window User stack Kernel stack DDR SDRAM : 128 MB

Notizen

  • Say « Memory Management Unit »
  • Electronic device aimed at generating artistic video performance in parties and concerts
  • You can capture live dancers and apply videos effects like rotations zoom in/out translations and project the result against a screen of a wall
    It reacts in real time with synchronization to audio input and can be controlled via MIDI keyboard or DMX (protocol used to control stage lighting and effects)
  • Array of configurable logic blocks, linked together by a programmable switch matrix
  • Previously I said it’s slow to access main memory.
    Here MMU is accessing PT (in RAM) each time to get translations, aren’t we slowing our CPU down?
  • TLB: clever word for « cache for PVA -> PPA translations »
    1st time you wanna translate a page -> go to PT in RAM
    Next time you translate the same page -> TLB hit in 1 cycle
  • In LM32, like MIPS or PowerPC Booke, MMU does not read the page table itself to refill the TLB. (no hardware page tree walker)
    Instead MMU raises exception and lets the OS update the TLB.
    TLB is entirely managed by SW.
  • kcopy: copy data like memcpy, aborts on page fault
    Setfault: saves current context for later restoring if we take a page fault
    cpu_lwp_fork() is the machine-dependent portion of fork1() which finishes a fork operation
  • cpu_startup: init cpu, print copyright message
    spl: raise and lower the interrupt priority level used by kernel code to block interrupts in critical sections
    cpu_rootconf: determine the root file system device
  • Thank you for attending, and thanks for all those who helped for this work
  • ×