Structure and Interpretation of Computer Programmers

I make it easier and faster for you to write high-quality software.

Friday, November 13, 2020

Apple Silicon, Xeon Phi, and Amigas

The new M1 chip in the new Macs has 8-16GB of DRAM on the package, just like many mobile phones or single-board computers. But unlike many desktop, laptop or workstation computers (there are exceptions). In the first tranche of Macs using the chip, that’s all the addressable RAM they have (i.e. ignoring caches), just like many mobile phones or single-board computers. But what happens when they move the Apple Silicon chips up the scale, to computers like the iMac or Mac Pro?

It’s possible that these models would have a few GB of memory on-package and access to memory modules connected via a conventional controller, for example DDR4 RAM. They almost certainly would if you could deploy multiple M1 (or successor) packages on a single system. Such a Mac would be a non-uniform memory access architecture (NUMA), which (depending on how it’s configured) has implications for how software can be designed to best make use of the memory.

NUMA computing is of course not new. If you have a computer with a CPU and a discrete graphics processor, you have a NUMA computer: the GPU has access to RAM that the CPU doesn’t, and vice versa. Running GPU code involves copying data from CPU-memory to GPU-memory, doing GPU stuff, then copying the result from GPU-memory to CPU-memory.

A hypothetical NUMA-because-Apple-Silicon Mac would not be like that. The GPU shares access to the integrated RAM with the CPU, a little like an Amiga. The situation on Amiga was that there was “chip RAM” (which both the CPU and graphics and other peripheral chips could access), and “fast RAM” (only available to the CPU). The fast RAM was faster because the CPU didn’t have to wait for the coprocessors to use it, whereas they had to take turns accessing the chip RAM. Nonetheless, the CPU had access to all the RAM, and programmers had to tell `AllocMem` whether they wanted to use chip RAM, fast RAM, or didn’t care.

A NUMA Mac would not be like that, either. It would share the property that there’s a subset of the RAM available for sharing with the GPU, but this memory would be faster than the off-chip memory because of the closer integration and lack of (relatively) long communication bus. Apple has described the integrated RAM as “high bandwidth”, which probably means multiple access channels.

A better and more recently analogy to this setup is Intel’s discontinued supercomputer chip, Knight’s Landing (marketed as Xeon Phi). Like the M1, this chip has 16GB of on-die high bandwidth memory. Like my hypothetical Mac Pro, it can also access external memory modules. Unlike the M1, it has 64 or 72 identical cores rather than 4 big and 4 little cores.

There are three ways to configure a Xeon Phi computer. You can not use any external memory, and the CPU entirely uses its on-package RAM. You can use a cache mode, where the software only “sees” the external memory and the high-bandwidth RAM is used as a cache. Or you can go full NUMA, where programmers have to explicitly request memory in the high-bandwidth region to access it, like with the Amiga allocator.

People rarely go full NUMA. It’s hard to work out what split of allocations between the high-bandwidth and regular RAM yields best performance, so people tend to just run with cached mode and hope that’s faster than not having any on-package memory at all.

And that makes me think that a Mac would either not go full NUMA, or would not have public API for it. Maybe Apple would let the kernel and some OS processes have exclusive access to the on-package RAM, but even that seems overly complex (particularly where you have more than one M1 in a computer, so you need to specify core affinity for your memory allocations in addition to memory type). My guess is that an early workstation Mac with 16GB of M1 RAM and 64GB of DDR4 RAM would look like it has 64GB of RAM, with the on-package memory used for the GPU and as cache. NUMA APIs, if they come at all, would come later.

posted by Graham at 09:35  

Sunday, August 9, 2020

Nvidia and ARM

Nvidia’s ambitions are scarcely hidden. Once it owns Arm it will withdraw its licensing agreements from its competitors, notably Intel and Huawei, and after July next year take the rump of Arm to Silicon Valley

This tech giant up for sale is a homegrown miracle – it must be saved for Britain

posted by Graham at 14:35  

Saturday, August 1, 2020

6502

On the topic of the Apple II, remember that MOS was owned by Commodore Business Machines, a competitor of Apple’s, throughout the lifetime of the computer. Something to bear in mind while waiting to see where ARM Holdings lands.

posted by Graham at 16:06  

Tuesday, June 16, 2020

Forearmed

In researching my piece for the upcoming de Programmatica Ipsum issue on cloud computing, I had thoughts about Apple, arm, and any upcoming transition that didn’t fit in the context of that article. So here’s a different post, about that. I’ve worked at both companies so don’t have a neutral point of view, but I’ve also been in bits of the companies far enough from their missions that I don’t have any insider insight into this transition.

So, let’s dismiss the Mac transition part of this thread straight away: it probably will happen, for the same reasons that the PowerPC->Intel transition happened (the things Apple needed from the parts – mostly lower power consumption for similar performance – weren’t the same things that the suppliers needed, and the business Apple brought wasn’t big enough to make the suppliers change their mind), and it probably will be easier, because Apple put the groundwork in to make third-party devs aware of porting issues during the Intel transition, and encourage devs to use high-level frameworks and languages.

Whether you think the point is convergence (now your Catalyst apps are literally iPad IPAs that run on a Mac), or cost (Apple buy arm chipset licences, but have to buy whole chips from Intel, and don’t get the discount everybody else does for sticking the Intel Inside holographic sticker on the case), or just “betterer”, the arm CPUs can certainly provide. On the “betterer” argument, I don’t predict that will be a straightforward case of tomorrow’s arm Mac being faster than today’s Intel Mac. Partly because compilers: gcc certainly has better optimisations on Intel and I wouldn’t be surprised to find that llvm does too. Partly because workload, as iOS/watchOS/tvOS all keep the platform on guard rails that make the energy use/computing need expectations more predictable, and those guard rails are only slowly being added to macOS now.

On the other hand, it’s long been the case that computers have controller chips in for interfacing with the hardware, and that those chips are often things that could be considered CPUs for systems in their own rights. Your mac certainly already has arm chips in if you bought it recently: you know what’s running the OS for the touch bar? Or the T2 security chip? (Actually, if you have an off-brand PC with an Intel-compatible-but-not-Intel chip, that’s probably an arm core running the x86-64 instructions in microcode). If you beef one of those up so that it runs the OS too, then take a whole bunch of other chips and circuits off the board, you both reduce the power consumption and put more space in for batteries. And Apple do love talking battery life when they sell you a computer.

OK, so that’s the Apple transition done. But now back to arm. They’re a great business, and they’ve only been expanding of late, but it’s currently coming at a cost. We don’t have up to date financial information on Arm Holdings themselves since they went private, but that year they lost ¥31bn (I think about $300M). Since then, their corporate parent Softbank Group has been doing well, but massive losses from their Vision Fund have led to questions about their direction and particularly Masayoshi Son’s judgement and vision.

arm (that’s how they style it) have, mostly through their partner network, fingers in many computing pies. From the server and supercomputer chips from manufacturers like Marvell to smart lightbulbs powered by Nordic Semiconductor, arm have tentacles everywhere. But their current interest is squarely on the IoT side. When I worked in their HPC group in 2017, Simon Segars described their traditional chip IP business as the “legacy engine” that would fund the “disruptive unit” he was really interested in, the new Internet of Things Business Unit. Now arm’s mission is to “enable a trillion connected devices”, and you can bet there isn’t a world market for a trillion Macs or Mac-like computers.

If some random software engineer on the internet can work this out, you can bet Apple’s exec team have worked it out, too. It seems apparent that (assuming it happens) Apple are transitioning the Mac platform to arm at start of the (long, slow) exit arm make from the traditional computing market, and still chose to do it. This suggests something else in mind (after all, Apple already designs its chips in-house, so why not have them design RISC-V or MIPS chips, or something entirely different?). A quick timetable of Mac CPU instruction sets:

  • m68k 1984 – 1996, 12 years (I exclude the Lisa)
  • ppc 1994 – 2006, 12 years
  • x86 and x86-64 2006 – 2021?, 15 years?
  • arm 2020? – 203x?, 1x years?

I think it likely that the Mac will wind down with arm’s interest in traditional computing, and therefore arm will be the last ever CPU/SoC architecture for computers called Macs. That the plan for the next decade is that Apple is still at the centre of a services-based, privacy-focused consumer electronics experience, but that what they sell you is not a computer.

posted by Graham at 13:30  

Powered by WordPress