Disclaimer: I have had access to some confidential information about some of the matter discussed in this page. However, everything written here is derived form publicly available sources, and references to these sources are also provided.
Some of my recent long-term projects revolve around a little known CPU architecture called 'Lanai'. Unsurprisingly, very few people have heard of it, and even their Googling skills don't come in handy. This page is a short summary of what I know, and should serve as a reference for future questions.
Myricom is a hardware company founded in 1994. One of their early products was a networking interface card family and protocol, Myrinet. I don't know much about it, other than it did some funky stuff with wormhole routing.
As part of their network interface card design, they introduced data plane programmability with the help of a small RISC core they named LANai. It originally ran at 33MHz, the speed of the PCI bus on which the cards were operating. These cores were quite well documented on the Myricom website, seemingly with the end-user programmability being a selling point of their devices.
It's worth noting that multiple versions of LANai/Lanai have been released. The last publicly documented version on the old Myricom website is Lanai3/4. Apart from the documentation, sources for a gcc/binutils fork exist to this day on Myricom's Github.
At some point, however, Myricom stopped publicly documenting the programmability of their network cards, but documentation/SDK was still available on request. Some papers and research websites actually contain tutorials on how to get running with the newest versions of the SDK at the time, and even document the differences between the last documented Lanai3/4 version and newer releases of the architecture/core.
This closing down of the Lanai core documentation by Myricom didn't mean they stopped using it in their subsequent cards. The core made its way into their Ethernet offerings (after Myrinet basically died), like their 10GbE network cards. You can easily find these 10G cards on eBay, and they even have the word 'Lanai' written on their main ASIC package. Even more interestingly, Lanai binaries are shipped with Linux firmware packages, and can be chucked straight into a Lanai disassembler (eg. the Myricom binutils fork's objdump).
Here's a sample of Lanai assembly:
000000f8 <main>: f8: 92 93 ff fc st %fp, [--%sp] fc: 02 90 00 08 add %sp, 0x8, %fp 100: 22 10 00 08 sub %sp, 0x8, %sp 104: 51 80 00 00 or %r0, 0x0, %r3 108: 04 81 40 01 mov 0x40010000, %r9 10c: 54 a4 08 0c or %r9, 0x80c, %r9 110: 06 01 11 11 mov 0x11110000, %r12 114: 56 30 11 11 or %r12, 0x1111, %r12 118: 96 26 ff f4 st %r12, -12[%r9] 11c: 96 26 ff f8 st %r12, -8[%r9] 120: 86 26 13 f8 ld 5112[%r9], %r12 00000124 <.LBB3_1>: 124: 46 8d 00 00 and %r3, 0xffff, %r13 128: 96 a4 00 00 st %r13, 0[%r9] 12c: 01 8c 00 01 add %r3, 0x1, %r3 130: e0 00 01 24 bt 0x124 <.LBB3_1> 134: 96 24 00 00 st %r12, 0[%r9]
The `add`/`sub`/`or` instruction have their destination on the right hand side. `st` and `ld` are memory store and load instructions respectively. Note the lack of 32-bit immediate load (instead a `mov` and `or` instruction are used in tandem). That `mov` instruction isn't real, either - it's a pseudo instruction for an `add 0, 0x40010000, %r9`. Also note the branch delay slot at address 134 (this instruction gets executed even if the branch at 130 is taken).
The ISA is quite boring, and in my opinion that's a good thing. It makes core implementations easy and fast, and it generally feels like one of the RISC-iest cores I've dealt with. The only truly interesting thing about it is its' dual-context execution system, but that unfortunately becomes irrelevant at some point, as we'll see later.
In the early 2010s, things weren't going great at Myricom. Due to financial and leadership difficulties, some of their products got canceled, and in 2013, core Myricom engineers were bought out by Google, and they transferred the Lanai intellectual property rights with them. The company still limps on, seemingly targeting the network security and fintech markets, and even continuing to market their networking gear as programmable, but Lanai is nowehere to be seen in their new designs.
So what has Google done with the Lanai engineers and technology? The only thing we know is that in 2016 Google implemented and upstreamed a Lanai target in LLVM, and that it was to be used internally at Google. What is it used for? Only Google knows, and Google isn't saying.
The LLVM backend targets Lanai11. This is quite a few numbers higher than the last publicly documented Lanai3/4, and there's quite a few differences between them:
As you can tell by this page, this architecture intrigued me. The fact that it's an LLVM target shipped with nearly every LLVM distribution while no-one has access to hardware which runs the emitted code is just so spicy. Apart from writing this page, I have a few other Lanai-related projects, and I'd like to introduce them here:
If you're interested in following or joining these efforts, hop on to ##q3k on libera.chat.
In addition to my effort piecing together information about Lanai and making use of it for my own needs, the TrueBit project also used it as a base for their smart contract system (in which they implemented a Lanai interpreter in Solidity).
Useful resources, in no particular oder:
Copyright 2022 Serge Bazanski. This work is licensed under a Creative Commons Attribution 4.0 International License.