Lanai, the mystery CPU architecture in LLVM.

Disclaimer: I have had access to some confidential information about some of the matter discussed in this page. However, everything written here is derived form publicly available sources, and references to these sources are also provided.

Some of my recent long-term projects revolve around a little known CPU architecture called 'Lanai'. Unsurprisingly, very few people have heard of it, and even their Googling skills don't come in handy. This page is a short summary of what I know, and should serve as a reference for future questions.

Myricom & the origins of Lanai

Myricom is a hardware company founded in 1994. One of their early products was a networking interface card family and protocol, Myrinet. I don't know much about it, other than it did some funky stuff with wormhole routing.

As part of their network interface card design, they introduced data plane programmability with the help of a small RISC core they named LANai. It originally ran at 33MHz, the speed of the PCI bus on which the cards were operating. These cores were quite well documented on the Myricom website, seemingly with the end-user programmability being a selling point of their devices.

It's worth noting that multiple versions of LANai/Lanai have been released. The last publicly documented version on the old Myricom website is Lanai3/4. Apart from the documentation, sources for a gcc/binutils fork exist to this day on Myricom's Github.

At some point, however, Myricom stopped publicly documenting the programmability of their network cards, but documentation/SDK was still available on request. Some papers and research websites actually contain tutorials on how to get running with the newest versions of the SDK at the time, and even document the differences between the last documented Lanai3/4 version and newer releases of the architecture/core.

This closing down of the Lanai core documentation by Myricom didn't mean they stopped using it in their subsequent cards. The core made its way into their Ethernet offerings (after Myrinet basically died), like their 10GbE network cards. You can easily find these 10G cards on eBay, and they even have the word 'Lanai' written on their main ASIC package. Even more interestingly, Lanai binaries are shipped with Linux firmware packages, and can be chucked straight into a Lanai disassembler (eg. the Myricom binutils fork's objdump).

Technical summary of Lanai3/4

32 registers, most of them general purpose, with special treatment for R0 (all zeroes), R1 (all ones), R2 (the program counter), R3 (status register), and some registers allocated for mode/context switching.
4-stage RISC-style pipeline: Calculate Address, Fetch, Compute, Memory
Delay slot based pipeline hazard resolution
No multiplication, no division. It's meant to route packets, not crunch numbers.
The world's best instruction mnemonic: PUNT, to switch between user and system contexts.

Here's a sample of Lanai assembly:

000000f8 <main>:
      f8: 92 93 ff fc   st      %fp, [--%sp]
      fc: 02 90 00 08   add     %sp, 0x8, %fp
     100: 22 10 00 08   sub     %sp, 0x8, %sp
     104: 51 80 00 00   or      %r0, 0x0, %r3
     108: 04 81 40 01   mov     0x40010000, %r9
     10c: 54 a4 08 0c   or      %r9, 0x80c, %r9
     110: 06 01 11 11   mov     0x11110000, %r12
     114: 56 30 11 11   or      %r12, 0x1111, %r12
     118: 96 26 ff f4   st      %r12, -12[%r9]
     11c: 96 26 ff f8   st      %r12, -8[%r9]
     120: 86 26 13 f8   ld      5112[%r9], %r12

00000124 <.LBB3_1>:
     124: 46 8d 00 00   and     %r3, 0xffff, %r13
     128: 96 a4 00 00   st      %r13, 0[%r9]
     12c: 01 8c 00 01   add     %r3, 0x1, %r3
     130: e0 00 01 24   bt      0x124 <.LBB3_1>
     134: 96 24 00 00   st      %r12, 0[%r9]

The `add`/`sub`/`or` instruction have their destination on the right hand side. `st` and `ld` are memory store and load instructions respectively. Note the lack of 32-bit immediate load (instead a `mov` and `or` instruction are used in tandem). That `mov` instruction isn't real, either - it's a pseudo instruction for an `add 0, 0x40010000, %r9`. Also note the branch delay slot at address 134 (this instruction gets executed even if the branch at 130 is taken).

The ISA is quite boring, and in my opinion that's a good thing. It makes core implementations easy and fast, and it generally feels like one of the RISC-iest cores I've dealt with. The only truly interesting thing about it is its' dual-context execution system, but that unfortunately becomes irrelevant at some point, as we'll see later.

Google & the Lanai team

In the early 2010s, things weren't going great at Myricom. Due to financial and leadership difficulties, some of their products got canceled, and in 2013, core Myricom engineers were bought out by Google, and they transferred the Lanai intellectual property rights with them. The company still limps on, seemingly targeting the network security and fintech markets, and even continuing to market their networking gear as programmable, but Lanai is nowehere to be seen in their new designs.

So what has Google done with the Lanai engineers and technology? The only thing we know is that in 2016 Google implemented and upstreamed a Lanai target in LLVM, and that it was to be used internally at Google. What is it used for? Only Google knows, and Google isn't saying.

The LLVM backend targets Lanai11. This is quite a few numbers higher than the last publicly documented Lanai3/4, and there's quite a few differences between them:

No more dual-context operation, no more PUNT instruction. The compiler/programmer can now make use of nearly all registers from r4 to r31.
No more dual-ALU (R-R-R) instructions. This was obviously slow, and was probably a combinatorial bottleneck in newer microarchitectural implementations.
Slightly different delay slot semantics, pointing at a new microarchitecture (likely having stepped away from a classic RISC pipeline into something more modern).
New additional instruction format and set of accompanying instructions: SPLS (special part-word load/store), SLI (special load immediate), and Special Instruction (containing amongst others popcount, of course).

Lanai Necromancy

As you can tell by this page, this architecture intrigued me. The fact that it's an LLVM target shipped with nearly every LLVM distribution while no-one has access to hardware which runs the emitted code is just so spicy. Apart from writing this page, I have a few other Lanai-related projects, and I'd like to introduce them here:

I'm porting Rust to Lanai11. I have a working prototype, which required submitting some patches to upstream LLVM to deal with IR emitted by rustc. This has been upstreamed. My rustc patches are pending on...
I'm implementing LLD support for Lanai. Google (in the LLVM mailing list posts) mentions they use a binutils ld, forked off from the Myricom binutils fork. I've instead opted to implement an LLD backend for Lanai, which currently only supports the simplest relocations. I haven't yet submitted a public LLVM change request for this, but this is on my shortlist of things to do. I have to first talk to the LLVM/Google folks on the maintenance plan for this.
I've implemented a simple Lanai11 core in Bluespec, as part of my qfc monorepo. 3-stage pipeline (merged addr/fetch stages), in-order. It's my first bit of serious Bluespec code, so it's not very good. I plan on implementing a better core at some point.
I've implemented a small Lanai-based microcontroller, qf105, which is due to be manufactured in 130nm as part of the OpenMPW5 shuttle. Which is, notably, sponsored by Google :).

If you're interested in following or joining these efforts, hop on to ##q3k on libera.chat.

In addition to my effort piecing together information about Lanai and making use of it for my own needs, the TrueBit project also used it as a base for their smart contract system (in which they implemented a Lanai interpreter in Solidity).

Documentation

Useful resources, in no particular oder:

Original Lanai3/4 docs from Myricom's website, archived.
Myrinet tutorial by James Otto, archived.
A Myrinet Firmware development experience by Marc Herbert.
Lanai per-generation ISA differences, as shown by GCC architecture/machine options.

Back to q3k.org.