-
1.
Porting NetBSD
on
the open source
LatticeMico32 CPU
Yann Sionneau
M-Labs
@ EHSM 2014
-
2.
About me
• Yann Sionneau
• Embedded software developer
• Working at Sequans Communication
• M-Labs contributor
• @yannsionneau on twitter
• Email: yann.sionneau@gmail.com
-
3.
I’m going to talk about…
How to run NetBSD and
EdgeBSD on the
Milkymist One
-
4.
Agenda
• I) The hardware part: the MMU
–What is a MMU and how it works
• II) The software part
–How to port NetBSD to a new CPU
-
5.
Milkymist One?!
-
6.
Milkymist One?!
-
7.
Milkymist One?!
-
8.
The Milkymist One uses an FPGA
-
9.
What’s an FPGA??
• A chip
-
10.
FPGA internals
-
11.
Milkymist System-on-Chip
-
12.
LatticeMico32 CPU
• 32 bits Harvard Architecture RISC
• Big Endian
• 6 stages
• Fully bypassed
• Optional configurable I/D caches
– Direct mapped or
– 2-way set associative
• Wishbone on-chip bus
-
13.
LatticeMico32 , Good points
• Small
• Portable (works with several FPGA vendors)
• Fast (~100 MHz on Slowtanpartan 6)
• Actually works
• GCC/Binutils/GDB/Qemu/uCLinux/OpenWRT
support
• OPEN SOURCE
-
14.
LatticeMico32, Bad points
• No Memory Management Unit… yet!
-
15.
LatticeMico32, Bad points
• No Memory Management Unit… yet!
Done
-
16.
Used in…
• Closed source commercial ASICs
• Open source projects
• Can achieve 800 MHz in TSMC 90nm
standard cell process
-
17.
LatticeMico32 pipeline
-
18.
What’s a pipeline?
• « In computing, a pipeline is a set of
data processing elements connected
in series, where the output of one
element is the input of the next
one. »
-- Pipeline (computing), Wikipedia
-
19.
What’s a pipeline?
Data processing
element 1
Data processing
element 2
Data processing
element 3
IN
IN INOUTOUT
OUT
-
20.
What’s a pipeline?
$ cat .bash_history | grep 'cat' | wc -l
6
-
21.
What’s a CPU pipeline?
-
22.
What’s a CPU pipeline?
-
23.
Pipelined instruction execution
Instr.
number
Pipeline Stage
1 A
2
3
4
Clock cycle 1 2 3 4 5 6 7
-
24.
Pipelined instruction execution
Instr.
number
Pipeline Stage
1 A F
2 A
3
4
Clock cycle 1 2 3 4 5 6 7
-
25.
Pipelined instruction execution
Instr.
number
Pipeline Stage
1 A F D
2 A F
3 A
4
Clock cycle 1 2 3 4 5 6 7
-
26.
Pipelined instruction execution
Instr.
number
Pipeline Stage
1 A F D X
2 A F D
3 A F
4 A
Clock cycle 1 2 3 4 5 6 7
-
27.
Pipelined instruction execution
Instr.
number
Pipeline Stage
1 A F D X M
2 A F D X
3 A F D
4 A F
Clock cycle 1 2 3 4 5 6 7
-
28.
Pipelined instruction execution
Instr.
number
Pipeline Stage
1 A F D X M W
2 A F D X M
3 A F D X
4 A F D
Clock cycle 1 2 3 4 5 6 7
-
29.
Pipelined instruction execution
Instr.
number
Pipeline Stage
1 A F D X M W
2 A F D X M W
3 A F D X M
4 A F D X
Clock cycle 1 2 3 4 5 6 7
-
30.
Main Memory
CPU Internal
Before
PHYSICAL
ADDRESS
PHYSICAL
ADDRESS
PA
PA
-
31.
Main Memory
CPU Internal
Raising exception
After
VIRTUAL ADDRESSES PHYSICAL ADDRESSES
-
32.
What’s the MMU’s job?
• Translate « virtual addresses » into « physical
addresses »
• Memory protection against unwanted
execution of code or data write (e.g. software
bug or security issue)
– Memory right access management
-
33.
Main Memory
CPU pipeline
VA PA
VA : Virtual Address
PA : Physical Address
-
34.
Main Memory
CPU pipeline
VA PA
VA : Virtual Address
PA : Physical Address
How does the MMU know the VA->PA
translation ?
-
35.
Main Memory
CPU pipeline
VA PA
VA : Virtual Address
PA : Physical Address
Page Table
-
36.
Main Memory
CPU pipeline
VA PA
VA : Virtual Address
PA : Physical Address
Page TableWhy « PAGE »?
-
37.
Why « Page »?
• 0x00000004 -> 0x10000000
• 0x00000005 -> 0x10000001
• 0x00000006 -> 0x10000002
Etc…
-
38.
Why « Page »?
• 0x00000004 -> 0x10000000
• 0x00000005 -> 0x10000001
• 0x00000006 -> 0x10000002
Etc…
This is WRONG!!!
-
39.
Why « Page »?
• 0x00000*** -> 0x10000***
• 0x00001*** -> 0x10001***
• 0x00002*** -> 0x10002***
Etc…
-
40.
Main Memory
CPU pipeline
VA PA
VA : Virtual Address
PA : Physical Address
Page Table
-
41.
Main Memory
CPU pipeline
VA PA
VA : Virtual Address
PA : Physical Address
Page Table
TLB
TLB : Translation
Lookaside Buffer
-
42.
Main Memory
CPU pipeline
VA PA
VA : Virtual Address
PA : Physical Address
Page Table
TLB
Operating
System
Updates the
Gets information from the
Updates the
-
43.
Features?
• Page size
–Only 4 kB
32 bits physical address :
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
How many bits of an address indicate the
offset within a given page?
-
44.
Features?
• Page size
–Only 4 kB
32 bits physical address :
xxxxxxxx xxxxxxxx xxxx xxxx xxxxxxxx
Page number [31:12]
20 bits
Offset [11:0]
12 bits
-
45.
Features?
• 2 TLB (Translation Lookaside Buffer)
–ITLB
–DTLB
• Each TLB contains 1024 entries
–How many bits needed to index the TLB?
-
46.
Features?
• 2 TLB (Translation Lookaside Buffer)
–ITLB
–DTLB
• Each TLB contains 1024 entries
–How many bits needed to index the TLB?
10 bits!
-
47.
Features?
• No hardware page-tree walker
– i.e. TLB is software assisted
-
48.
Virtual address
Load or store?
Instruction or
Data?
Physical address
Access
granted/denied
-
49.
Virtual address
Load or store?
Instruction or
Data?
Physical address
Access
granted/denied
I don’t know!
-
50.
Let’s have a look inside
-
51.
Tag [10] Physical page number [20] Read-only [1] Valid [1]
0xABC 0xABC00 0 0
0x280 0xB0001 1 1
0x300 0x00001 0 1
The TLB
VA = 0xA0001004
-
52.
Tag [10] Physical page number [20] Read-only [1] Valid [1]
0xABC 0xABC00 0 0
0x280 0xB0001 1 1
0x300 0x00001 0 1
The TLB
VA = 0xA0001 004
Page number
Offset in the page
-
53.
Tag [10] Physical page number [20] Read-only [1] Valid [1]
0xABC 0xABC00 0 0
0x280 0xB0001 1 1
0x300 0x00001 0 1
The TLB
VA = 0xA0001 004 Page offset = 4
-
54.
Tag [10] Physical page number [20] Read-only [1] Valid [1]
0xABC 0xABC00 0 0
0x280 0xB0001 1 1
0x300 0x00001 0 1
The TLB
VA = 0xA0001 004 Page offset = 4
Virtual Page number = 0xA0001
-
55.
Tag [10] Physical page number [20] Read-only [1] Valid [1]
0xABC 0xABC00 0 0
0x280 0xB0001 1 1
0x300 0x00001 0 1
The TLB
VA = 0xA0001 004 Page offset = 4
Virtual Page number = 0xA0001
VPN = 0xA0001 1010 0000 0000 0000 0001
-
56.
Tag [10] Physical page number [20] Read-only [1] Valid [1]
0xABC 0xABC00 0 0
0x280 0xB0001 1 1
0x300 0x00001 0 1
The TLB
VA = 0xA0001 004 Page offset = 4
Virtual Page number = 0xA0001 TLB index = 1
VPN = 0xA0001 1010 0000 00 00 0000 0001
TLB index, used to
select a TLB line
-
57.
Tag [10] Physical page number [20] Read-only [1] Valid [1]
0xABC 0xABC00 0 0
0x280 0xB0001 1 1
0x300 0x00001 0 1
The TLB
VA = 0xA0001 004 Page offset = 4
Virtual Page number = 0xA0001 TLB index = 1
VPN = 0xA0001 1010 0000 00 00 0000 0001
TLB index, used to
select a TLB line
-
58.
Tag [10] Physical page number [20] Read-only [1] Valid [1]
0xABC 0xABC00 0 0
0x280 0xB0001 1 1
0x300 0x00001 0 1
The TLB
VA = 0xA0001 004 Page offset = 4
Virtual Page number = 0xA0001 TLB index = 1
VPN = 0xA0001 1010 0000 00 00 0000 0001
Tag = 0x280 1010 0000 00
=
-
59.
Tag [10] Physical page number [20] Read-only [1] Valid [1]
0xABC 0xABC00 0 0
0x280 0xB0001 1 1
0x300 0x00001 0 1
The TLB
VA = 0xA0001 004 Page offset = 4
Virtual Page number = 0xA0001 TLB index = 1
VPN = 0xA0001 1010 0000 00 00 0000 0001
Tag = 0x280 1010 0000 00
=
Physical page number = 0xB0001
-
60.
Tag [10] Physical page number [20] Read-only [1] Valid [1]
0xABC 0xABC00 0 0
0x280 0xB0001 1 1
0x300 0x00001 0 1
The TLB
VA = 0xA0001 004 Page offset = 4
Virtual Page number = 0xA0001 TLB index = 1
VPN = 0xA0001 1010 0000 00 00 0000 0001
Tag = 0x280 1010 0000 00
=
Physical page number = 0xB0001
Physical Address = 0xB0001004
-
61.
Porting NetBSD
• 1°) NetBSD cross compilation toolchain
– build.sh
– Makefiles here and there
– Arch-specific directories
Allows to do:
$ ./build.sh -U -m lm32 tools
-
62.
Porting NetBSD
• 2°) Support for built-ins in libkern
– NetBSD kernel is
• Not linked against libgcc
• Linked against libkern
– Need to implement basic arithmetic functions
emitted by gcc in object code
– Implementation in sys/lib/libkern/arch/lm32
-
63.
Porting NetBSD
• 3°) Building my first kernel
– Create sys/arch/lm32 and sys/arch/milkymist
– Populate
• sys/arch/<cpu|soc>/include
• sys/arch/<cpu|soc>/conf
– Stub, stub, stub…
Allows to do:
$ ./build.sh -m milkymist -U kernel=GENERIC
-
64.
Porting NetBSD
• 4°) Write basic console driver for early prints
struct consdev milkymist_com_cons = {
[…]
milkymist_com_cngetc, /* cn_getc: kernel getchar interface */
milkymist_com_cnputc, /* cn_putc: kernel putchar interface */
[…]
};
-
65.
Porting NetBSD
• 5°) Implement exception handlers
• 6°) Call milkymist_startup() C code
– Initialize console driver
• -> consinit() -> milkymist_uart_cnattach()
• cn_tab = &milkymist_com_cons;
– Initialiaze virtual memory subsystem
• Call MD pmap_bootstrap()
– Let the kernel boot
• Call NetBSD MI main()
-
66.
Porting NetBSD
• 7°) Implement pmap.9
pmap -- machine-dependent portion of the virtual
memory system
– pmap_bootstrap()
– pmap_init, pmap_create, pmap_destroy …
– SW managed TLB? -> sys/uvm/pmap/
– used in (PowerPC Booke and LM32)
-
67.
Porting NetBSD
• 8°) Implement copyin/copyout
• 9°) Implement atomic operations
– No atomic instruction RAS (Restartable Atomic
Sequence) CAS (Compare And Swap)
– Other atomic ops built around this CAS
-
68.
RAS CAS
int _atomic_cas_32(volatile uint32_t *val, uint32_t old,
uint32_t new);
_atomic_cas_32:
_atomic_cas_ras_start:
lw r4, (r1+0) /* load *val into r4 */
bne r4, r2, 1f /* compare r4 (*val) and old (r2) */
sw (r1+0), r3
_atomic_cas_ras_end:
1:
mv r1, r4 /* return (*val) */
ret
-
69.
Porting NetBSD
• 10°) Add support for interrupts
– Write a function to register interrupt handlers
• 11°) Have a running system clock
– Write cpu_initclocks()
– Write clock irq handler
• Call hardclock()
-
70.
Other functions to write
• Switch context from one thread to another
– cpu_switchto(9)
• Copy data and abort on page fault
– kcopy(9)
• Save current context
– setfault()
• Low level code to finish up fork() operation
– cpu_lwp_fork(9)
-
71.
Other functions to write
• Block interrupts to protect critical sections
– spl(9)
• Init CPU and print copyright message
– cpu_startup(9)
• Determine the root file system device
– cpu_rootconf(9)
• Etc…
-
72.
Porting NetBSD
• To boot user space
– Create dummy ramdisk with /sbin/init
– Build kernel with MFS
– Insert ramdisk with mdsetimage
– Boot it!
-
73.
Porting NetBSD
DEMO
-
74.
Thank you!
Sébastien Bourdeauducq, Michael Walle, Robert
Swindells, Stefan Kristiansson, Lars-Peter
Clausen, Pierre Pronchery, Radoslaw Kujawa,
Youri Mouton, Matt Thomas, tech-kern@, M-
Labs mailing list, and many more
-
75.
Questions?
-
76.
NetBSD/milkymist Memory Layout
Kernel
space
User space
0 0xffffffff
0xc0000000
0xc8000000
Ram window
User stack
Kernel
stack
DDR SDRAM :
128 MB
Say « Memory Management Unit »
Electronic device aimed at generating artistic video performance in parties and concerts
You can capture live dancers and apply videos effects like rotations zoom in/out translations and project the result against a screen of a wall
It reacts in real time with synchronization to audio input and can be controlled via MIDI keyboard or DMX (protocol used to control stage lighting and effects)
Array of configurable logic blocks, linked together by a programmable switch matrix
Previously I said it’s slow to access main memory.
Here MMU is accessing PT (in RAM) each time to get translations, aren’t we slowing our CPU down?
TLB: clever word for « cache for PVA -> PPA translations »
1st time you wanna translate a page -> go to PT in RAM
Next time you translate the same page -> TLB hit in 1 cycle
In LM32, like MIPS or PowerPC Booke, MMU does not read the page table itself to refill the TLB. (no hardware page tree walker)
Instead MMU raises exception and lets the OS update the TLB.
TLB is entirely managed by SW.
kcopy: copy data like memcpy, aborts on page fault
Setfault: saves current context for later restoring if we take a page fault
cpu_lwp_fork() is the machine-dependent portion of fork1() which finishes a fork operation
cpu_startup: init cpu, print copyright message
spl: raise and lower the interrupt priority level used by kernel code to block interrupts in critical sections
cpu_rootconf: determine the root file system device
Thank you for attending, and thanks for all those who helped for this work