← Back to Custom Model Training

Tinkering with a NPU

A first field note on the Snapdragon Elite X1E80100 Qualcomm Hexagon NPU, low-cost tokens, local inference, and what the hardware claims invite us to test.

Source note: Originally published on Eric's Advisory Hour Substack. Read the canonical post.

anime snapdragon NPU chip
anime snapdragon NPU chip

The NPU

I’ve got this NPU and a severe need for low-cost tokens. This article kicks off a small series of posts around a very specific bit of hardware: the one sitting on the machine I typically do my writing on. It’s call the… and brace yourself: the Snapdragon Elite X1E80100 Qualcomm Hexagon NPU. It’s a ridiculous name. It even shows up like that in task manager on my Windows OS.

image

The claim is that one can load up to 13B parameter models on the NPU device with up to 45 TOPS. What’s that actually mean in practice though is a bit more interesting. Up until now, I’ve only really experimented with various phi models. But let’s dig into more of what this hardware actually provides.

Thanks for reading! Subscribe for free to receive new posts and support my work.

The X1E80100 (X1E-80100)

The Bing AI tells me that The X1E-80100 is built on a 4nm process by TSMC, with 12 Oryon CPU cores, 42 MB L3 cache, and LPDDR5X-8448 memory support. LPDDR5X is better read, IMO, as LP-DDR5-X. LP stands for “low-power”. DDR is that “double data rate” and the “X” is just a cheap version identifier. It’s easier to read a big word when it’s broken up into segments like that, and especially so when it looks like the industry has taken to just tacking on initials of things and assumes we’ll all be along for the ride.

Speaking of rides…

One of the interesting things about LPDDR5X is that it shows up in automotive applications. This was new to me, so I found it interesting. Samsung has a really interesting writeup, that’s filled with a lot more product names and buzzwords slung-together than a drunken game of scrabble here.

The low-power nature for automotive applications makes sense and then to see it translated over to the surface is quite interesting. I’d love one of these full “system on a chips” in a raspberry pi. Having an on-board inference engine for the pi is something I keep an eye on. While the Jetson nano series is interesting, it’s really hard to beat the ecosystem of Raspberry for DIY.

The NPU Claims

The benefits of the NPU are pretty enormous for agentic workloads-assuming it works, of course. The deckware for the NPU is very attractive, arguing that just about any use-case you can imagine is fully unlocked with this hardware. This is good. I can imagine a whole lot!

Hexagon NPU: AI enablement. (Image Source: Qualcomm)
Hexagon NPU: AI enablement. (Image Source: Qualcomm)

I’ll be going over my “hello world” code for writing and reading to the NPU in the near future. While I enjoy a good amount of abstraction, I’d like to see a real simple example. How does one make use of these models without all the demo software that abounds? Can I do something interesting with these? Time to find out!

Thanks for reading! Subscribe for free to receive new posts and support my work.

Original source

This local copy preserves the article text, source link, and inline media. Canonical Substack URL: https://advisoryhour.substack.com/p/tinkering-with-a-npu.