Posts

Latest Posts

Music production on Power: an adventure in porting


[Here's a guest post from taylor.fish on their porting work on music and audio software. I thought it made a good tutorial on porting and also is a great way to show off the diverse things people are doing on OpenPower. Like all guest and first-party posts on Talospace, this article remains the property of the original author and may be distributed under CC-BY-SA 4.0. -- Ed.]

For the past five years, I’ve used a Blackbird as my primary computing device. Prior to that I used x86 systems flashed with Libreboot, but aging hardware running unsupported firmware only gets you so far.

The switch to a truly owner-controlled Power ISA system was a welcome one, but it wasn’t without its growing pains. I could no longer assume any given piece of Linux-compatible software would successfully compile and run on my machine, due to the small but significant portion with architecture-specific issues. That didn’t deter me from continuing to use my Blackbird as my main device, but when I wanted to get back into music production, I knew I would have to confront this problem: my previous endeavors, although sticking entirely to free software, were all on x86.

Of particular importance in digital music production are plugins, which include instruments like synthesizers and samplers, effects like equalizers and reverb, and analysis tools like spectrograms and oscilloscopes. While there are some excellent plugins released as free software, they are vastly outnumbered by proprietary ones, and it takes dedication to commit to producing music only with free software. Not wanting that uphill battle to feel more like a cliff, when I discovered that some of those plugins wouldn’t run on Power, I was determined to change that fact rather than lose what few tools I had.

And so my porting adventure began. Over the past year or so I developed patches for every piece of free/libre audio software that I wanted to use but that didn’t work on Power. Some of those changes have been merged upstream, but for all the others I maintain a GitHub organization called PowerAudio that contains forked versions of each repository with ppc64le patches applied (along with some other improvements for use on GNU/Linux). If you just want working audio plugins on Power, visit that page, which has more information about each piece of ported software. If you want to know more about what the porting process entailed, however, read on…

The porting process

Each piece of software to be ported has its own unique set of problems, but there are some common themes. Here are some of the most frequent issues that prevent audio plugins from working on Power ISA systems:

1. Architecture-specific compiler options

This is the easiest issue to fix. Some projects pass architecture-specific options to the compiler (like -msse on x86) but don’t restrict those options only to the relevant architectures. In that case the fix is simply to perform an architecture check before applying those options, as in this change to tap-lv2:

@@ -10 +10,3 @@
+ifneq ($(findstring $(shell uname -m),x86_64 amd64 i386 i686),)
 CFLAGS += -mtune=generic -msse -msse2 -mfpmath=sse
+endif

Some projects handle specific architectures but then assume x86 as a fallback. In that case, it’s easiest to continue the pattern and add ppc as one of the cases, as in this change to Helm:

@@ -25,4 +25,8 @@ ifneq (,$(findstring aarch,$(MACHINE)))
 ifneq (,$(findstring arm,$(MACHINE)))
	SIMDFLAGS := -march=armv8-a -mtune=cortex-a53 -mfpu=neon-fp-armv8 -mfloat-abi=hard
+else
+ifneq (,$(findstring ppc,$(MACHINE)))
+	SIMDFLAGS :=
 else
	SIMDFLAGS := -msse2
+endif

2. Assumptions broken by Power

Some code appears to be cross-platform but contains assumptions that don’t apply to Power ISA systems. For example, the plugin framework DPF used the first part of gcc -dumpmachine to obtain the correct directory name for VST plugins. On 64-bit little-endian PowerPC, that yields powerpc64le. But the VST 3 specification says the directory name should match uname -m, which is ppc64le. These two identifiers happen to be identical for many architectures, but generally differ on PowerPC and Power ISA systems.

Simply using uname -m here would break cross-compilation, so the fix performs a text substitution to correct the discrepancy on Power:

@@ -691 +691,3 @@ ifeq ($(LINUX),true)
-VST3_BINARY_DIR = Contents/$(TARGET_PROCESSOR)-linux
+# This must match `uname -m`, which differs from `gcc -dumpmachine` on PowerPC.
+VST3_ARCHITECTURE := $(patsubst powerpc%,ppc%,$(TARGET_PROCESSOR))
+VST3_BINARY_DIR = Contents/$(VST3_ARCHITECTURE)-linux

A similar issue existed in JUCE, a popular framework for audio plugins. JUCE performs architecture detection with a chain of preprocessor conditionals that (somewhat hackily) use #error to emit the correct architecture identifier, which JUCE uses in places that expect the output of uname -m. While those conditionals do attempt to detect PowerPC, they don’t account for the fact that the different endiannesses of 64-bit PowerPC have different identifiers, and incorrectly classify ppc64le as ppc64. This can cause compilation to fail entirely when JUCE uses the incorrect architecture name but then runs a validator that expects the correct one.

A simple endianness check fixes this one:

@@ -64,3 +64,7 @@ #elif defined(__ppc__) || defined(__ppc) || ...
   #if defined(__ppc64__) || defined(__powerpc64__) || defined(__64BIT__)
-    #error JUCE_ARCH ppc64
+    #ifdef __LITTLE_ENDIAN__
+      #error JUCE_ARCH ppc64le
+    #else
+      #error JUCE_ARCH ppc64
+    #endif
   #else

3. Lack of inclusion in platform-specific code

Some projects contain truly platform-specific code that needs to be written separately for each architecture, but don’t include PowerPC one of the handled cases. In these situations, the most straightforward (and sometimes only) fix is to add the necessary Power-specific code. For example, sfizz contained a copy of a low-level dependency that only supported x86 and ARM, but because that dependency had already added support for Power and other architectures upstream, all that was necessary for sfizz was to update the dependency.

Another example comes from DISTRHO Ports, a large collection of plugins ported to GNU/Linux, in which an architecture detection script required adding code to detect the various types of PowerPC:

@@ -42,4 +42,19 @@
     elif echo "${fileout}" | grep -q "x86-64"; then
         if [ "$(uname -m)" != "x86_64" ]; then
             MESON_EXE_WRAPPER="qemu-x86_64-static"
         fi
+
+    elif echo "${fileout}" | grep -q "64-bit LSB.*PowerPC"; then
+        if [ "$(uname -m)" != "ppc64le" ]; then
+            MESON_EXE_WRAPPER="qemu-ppc64le-static"
+        fi
+
+    elif echo "${fileout}" | grep -q "64-bit MSB.*PowerPC"; then
+        if [ "$(uname -m)" != "ppc64" ]; then
+            MESON_EXE_WRAPPER="qemu-ppc64-static"
+        fi
...

Although the purpose of this script is to aid cross-compilation, the lack of Power support prevented even native compilation.

4. Missing optional vector intrinsics

Hardware-specific vector intrinsic functions (SIMD) are often used to improve performance, but software that uses them must provide a separate implementation for each supported architecture, which rarely includes Power. However, some software is designed to use vector intrinsics only when such an implementation exists, falling back to a non-optimized, cross-platform approach otherwise. In practice, though, if non-optimized platforms don’t get much testing, bugs that unintentionally prevent compilation on these systems can go unnoticed.

This issue occurred in a dependency used by Wavetable that contained, but did not require, optimized SIMD code for x86–64 and ARM, but caused errors on other architectures by mistakenly trying to use their nonexistent SIMD implementations. Because SIMD was designed to be optional in this dependency, the fix simply adds an architecture check:

@@ -80,5 +80,7 @@
  #ifdef JUCE_32BIT
   #define GIN_HAS_SIMD 0
- #else
+ #elif defined(JUCE_INTEL) || defined(JUCE_ARM)
   #define GIN_HAS_SIMD 1
+ #else
+  #define GIN_HAS_SIMD 0
  #endif

Another example comes, again, from JUCE. JUCE contains a copy of libpng, a PNG library that actually contains optimized VSX code for Power! But in a cruel twist of irony, JUCE excluded that optimized code from their copy, yet kept the code that tries to use it. The result? Linker failures that, because they occur in a helper tool almost always compiled early in the build process, prevent almost all software that uses JUCE from compiling on Power.

The fix for this one comes from libpng itself, which demonstrates the proper way of disabling optimizations, by defining certain macros instead of just deleting the implementations. So these macros simply need to be added to JUCE, which… already defines one of them?

#define PNG_ARM_NEON_OPT 0

Indeed, because JUCE’s copy of libpng excludes the optimized routines for all architectures, this issue presumably appeared on ARM at some point and was fixed (x86 happens not to exhibit the problem because libpng optimizations are opt-in on that architecture). That would have been a great time to include the other macros to disable optimizations, but instead, that task is accomplished by this fix:

@@ -268 +268,4 @@
   #define PNG_ARM_NEON_OPT 0
+  #define PNG_POWERPC_VSX_OPT 0
+  #define PNG_INTEL_SSE_OPT 0
+  #define PNG_MIPS_MSA_OPT 0

5. Missing required vector intrinsics

Finally, one of the most common sources of incompatibility with Power in audio software is the non-optional use of SIMD, necessitating separate implementations for each architecture. Unsurprisingly, support for Power is not typically included.

My preferred approach in this case is to use SIMDe, a cross-platform implementation of x86 (and ARM) vector intrinsics, optimized with the platform’s native vector operations when available. In the case of Vaporizer2, that looks something like this:

@@ -6,5 +6,8 @@
 #ifdef __aarch64__ //arm64
	#include "../../sse2neon.h"
-#else
+#elif defined JUCE_INTEL
	#include "immintrin.h"
+#else
+	#define SIMDE_ENABLE_NATIVE_ALIASES
+	#include <simde/x86/sse3.h>
 #endif

Something similar is done for Vitalium, the fork of Vital in DISTRHO Ports. However, when attempting to use SIMDe to implement the x86 intrinsics it uses, I encountered odd runtime errors with backtraces that involved SIMDe. More investigation is needed to determine the cause, but because Vitalium also provides implementations of its vector-optimized routines for ARM, an easier workaround was to configure SIMDe to implement ARM’s vector intrinsics (NEON) instead, which appears to avoid that issue.

@@ -33,8 +33,13 @@
 #else
-  static_assert(false, "No SIMD Intrinsics found which are necessary for compilation");
+  #warning "No native SIMD support; using SIMDe"
+  #define SIMDE_ENABLE_NATIVE_ALIASES
+  #include <simde/arm/neon.h>
+  #define VITAL_NEON 1
+  #define VITAL_SIMDE 1
 #endif

-#if VITAL_SSE2
+#if VITAL_SIMDE
+#elif VITAL_SSE2
   #include <immintrin.h>
 #elif VITAL_NEON
   #include <arm_neon.h>

[See also x86intrin.h -- Ed.]

Lastly, another example of this comes—yet again—from JUCE, in a form that prompts a different solution. JUCE contains its own set of SIMD functions, designed with an architecture-independent API but requiring architecture-specific implementations. Although Power is predictably unsupported, JUCE does contain cross-platform fallback implementations for almost all of its SIMD API; however, these are used only as part of the architecture-specific implementations, to implement operations that don’t have a precise native equivalent on a given platform.

Why not, then, provide a universal fallback implementation for all unsupported architectures? That’s exactly what this change does (with a diff too large to include here), fixing another source of compilation errors in projects that use JUCE.

Conclusion

When ARM devices running desktop operating systems started to become more common, in particular due to Apple’s decision to move away from Intel, I had hoped that this would encourage x86-only software to become architecture-independent, benefiting Power in the process. Unfortunately, many projects have simply special-cased support for ARM, even when cross-platform alternatives exist (which don’t preclude optimized architecture-specific routines if desired). Maybe another architecture will eventually become the catalyst for this change, but until then, we’ll need patches.

I will, however, take a moment to complain again about JUCE. Despite making me sign their Contribution License Agreement, JUCE has ignored all of my pull requests (juce-pr1, juce-pr2, and juce-pr3), even the simplest two that would not require much review (yet are the most important fixes for Power). Because of this, every project that uses JUCE must, at a minimum, use a patched version in order to compile on Power.

Still, I think the biggest takeaway is that music production is absolutely possible on Power. The experience is undoubtedly rough around the edges, but I hope that PowerAudio’s ports can reduce the discrepancy in available tools compared to other architectures, and generally make this activity more accessible to Power ISA users.

Baseline JIT patches available for Firefox ESR128 on OpenPOWER


It's been a long hot summer at $DAYJOB and I haven't had much time for much of anything, but I got granted some time this week to take care of an unrelated issue and seized the opportunity to get caught up.

The OpenPOWER Firefox JIT still crashes badly in Wasm and Ion for reasons I have yet to ascertain, but the Baseline Interpreter and Baseline Compiler stages of the JIT continue to work great and are significantly faster than the baseline Interpreter (even in a PGO-LTO build), so I did the needful and finally got them pulled up to the new Extended Support Release which is Firefox 128.

I then spent the last two days bashing out crashes and bugs, including a regression from Firefox's new WebAssembly-based in-browser translation engine. The browser chrome now assumes that WebAssembly is always present, but on JIT-less tier-3 machines (or partially implemented JITs like ours, and possibly where Wasm is disabled in prefs) it isn't, so it hits an uncaught error which then blows up substantial portions of the browser UI like the stop-reload button and context menus. The Fedora official ppc64le build of Firefox 128.0.3 is affected as well; I filed bug 1912623 with a provisional fix. Separately all JIT and JavaScript tests completely pass in multiple permutations of Baseline Interpreter and Baseline Compiler, single- and multi-threaded.

As a sign of confidence I've been dogfooding it for the last 24 hours with my typical massive number of tabs and add-ons and can't get it to crash anymore, so I'm typing this blog post in it and using it to upload its own changesets to Github. Grab the ESR source from Mozilla (either pull a tree with Mercurial or just download an archive) and apply the changesets in numerical order, though after bug 1912623 is fixed you won't need #823094. The necessary .mozconfig for building an LTO-PGO build, which is what I'm using, is also in that issue; it's pretty much the same as earlier ones except for --enable-jit.

Little-endian POWER9 remains the officially supported architecture. This version has not been tested on POWER8 or big-endian POWER9, though the JIT should still statically disable itself even if compiled with it on, so the browser should still otherwise work normally. If this is not the case, I consider that a bug, and will accept a fix (I don't have a POWER8 system here to test against). There are no Power10 specific instructions, but I don't see any reason why it wouldn't work on a Power10 machine or on a SolidSilicon S1 whenever we get one of those.

Comments always solicited, though backtraces and reliable STRs are needed to diagnose any bug, of course. Meanwhile I've got more work cut out for me but at least we're back in the saddle for another go.

Chromium Power ISA patches ... from Solid Silicon


It appears that some of the issues observed by me and others with Chromium on Fedora ppc64le may in fact be due to an incomplete patch set, which is now available on Solid Silicon's Gitlab. If your distro doesn't support this, now you have an upstream to point them at or build your own. They include the Ungoogled changes as well, even though I retain my philosophical objections to Chromium, and still use Firefox personally (I've got to get back on the horse and resume maintaining my personal builds now that I've got Plasma 6 back running on Xorg again).

Oh, yeah, it really is that Solid Silicon. You can make your own speculations from the commit log, though regardless of whether Solid Silicon is truly a separate concern or a Raptor subsidiary, it wouldn't be surprising that Raptor resources are assisting since they've kind of bet the store on the S1.

Timothy Pearson's comments in the Electron Github suggest that Google has been pretty resistant to incorporating support for architectures outside of their core platforms. This is not a wholly unreasonable position on Google's part but it's not a particularly charitable one, and unlike Mozilla, the Chrome team doesn't really have the concept of a tier-3 build nor any motivation to. That kind of behaviour is all the more reason not to encourage browser monocultures because it's not just the layout engine that causes vendor lock-in. Fortunately V8, the JavaScript engine, is maintained separately, and reportedly has been more accommodating presumably because of things like Node.js on IBM hardware (even IBM i via PASE!).

Mozilla is much more accepting of this as long as regressions aren't introduced. This is why TenFourFox patches were largely not upstreamed since they would potentially cause problems with Cocoa widgets in later versions of macOS, though what patches were generally applicable I would do so. The main reason I'm still maintaining the Firefox ppc64le JIT patches outside is because I still can't solve these recent startup crashes deep within Wasm code, which largely limits me to Baseline Compiler and thus is not suitable for loading into the tree yet (we'd have to also upstream pref changes that would adversely affect tier-1 until this is fixed). I still intend to pull these patches up to the next ESR, especially since Github is glacially slow now without a JIT and it's affecting my personal ability to do other tasks. Maybe I should be working on something like rr for ppc64le at the same time because stepping through deeply layered code in gdb is a great way to go stark raving mad.

A RISC-V option for your Framework laptop (how about POWER next?)


Many of you have heard of the Framework laptop, a modular system that you can DIY from a mainboard and parts or purchase fully assembled. The designs are open-sourced on Github and Framework has actively been trying to develop an ecosystem around the product.

The part that's potentially most interesting is the mainboard. Framework actively advertises the notion that you can just replace components piecemeal to upgrade, including the logic board, yet keep the same display, port loadout, keyboard, battery and so on if they still work. You can even stick the old one in a case and use it for something else, which is not only environmentally conscious but very customer-friendly.

Now the first third-party Framework mainboard is coming, and it's not x86: it's RISC-V, and it fits in their 13" chassis. A RISC-V option is of course not new in portable computers; I reviewed the ClockworkPi RISC-V DevTerm a couple years ago, which can take either an RPi ARM compute module or an Allwinner D1 based on the 1GHz RV64IMAFDCVU XuanTie C906. However, the CPU is more powerful than that, a quad-core StarFive JH7110 with four SiFive U74 cores. The new Framework mainboard is based on an existing DeepComputing laptop product called "Roma;" DeepComputing now sells a more advanced version in a laptop of their own based on the octocore SpacemiT K1. Combined with the generally well-regarded Framework loadout and creature comforts, this could definitely be a product to watch.

That said, much as I was disappointed with the performance of the RISC-V DevTerm, most people are going to be similarly unimpressed with the performance of this one. Phoronix's benchmarks placed it well below the Raspberry Pi 4 (and the Orange Pi 5 crushed it), and Framework is trying to set expectations low by saying, "The peripheral set and performance aren’t yet competitive with our Intel and AMD-powered Framework Laptop Mainboards." That would certainly be an understatement, and is yet another example of the self-licking RISC-V ice cream cone getting ahead of its skis on real-world throughput. Framework also apologetically notes that the board "has soldered memory and uses MicroSD cards and eMMC for storage, both of which are limitations of the processor." Still, it's (soon to be) available and functional, and it could be mounted in one of those small desktop cases, so if you want a sidecar RISC-V machine to play with you've got another option better than yet another SBC.

But more important than that: it proves that you can put really any architecture on such a board and take advantage of the Framework, uh, framework instead of reinventing the wheel completely. So, instead of these various attempts at building a PowerPC laptop, why isn't there a Power ISA Framework mainboard? Wouldn't that approach just make more sense?

A baby Power10, if you're desperate


Are you really desperate to have your own Power10 (libre issues notwithstanding) while we wait for S1? IBM historically releases "little" versions of their servers after the launch systems have exhausted their novelty and now it's time for this generation's. If you've got 2Us in your rack, a wad of money in your wallet and an IBM salesdroid in your Rolodex, in about a month the Power S1012 could be yours.

Based on the size of the board, no one would mistake this for a Blackbird, yet it's pretty much the IBM equivalent: a single socket supporting up to eight cores. It comes as either a rackmount or in IBM's mega-tower case with four RAM slots for up to 256GB of memory. Tape and RAID are options, and it boots Linux, AIX or IBM i. If you need more sockets, there's the S1022 with a second one in the same form factor, and if you need more capacity, the 4U S1014 has you covered — and is still tower-ready in the same way that Orson Welles was suit-ready.

IBM hasn't shown as much love for their baby towers recently, though. In fact, there wasn't an IBM 2U option at all in POWER9's generation (no doubt much to Raptor's relief); if you wanted Big Blue in a Littler Box, you had to buy the 4U S914 instead (or a leftover POWER8 S812). Also, it seems like the S1012 tower's power output is gimped somewhat: the spec sheet says the rackmount can put 240W through the single CPU socket but the tower manages "only" 195W, which limits your core count. In the glory days, though, we had things like this.

This is my long-trucking POWER6 p520, the 2U baby of the old POWER6 generation. You could get it with two sockets and the same CPUs as its larger siblings, and since the POWER6 was SMT-2, I've got four threads running on its single LPAR. It has RAID and an optical drive and 16GB of RAM, with more available if you were willing to do battle with IBM Capacity on Demand codes. All in all, not bad for 2009.

Of course, I'm being very facetious in this article, because naturally none of these towers are really workstation substitutes. The S1012 (and certainly the S1022) is undoubtedly as loud as the POWER6, and while the POWER6's back baffle reduces some of the noise, it correspondingly reduces ventilation. There's a reason, after all, that I gave the thing its own room with the other geriatric servers. Plus, IBM doesn't talk to us end users: you'll have to buy it through a VAR or authorized rep. That was why I said screw it to buying a brand-spanking new POWER7 back in the day and got the POWER6, because it was used, cheaper and actually available. Which reminds me — if you have to ask how much it is, you almost certainly can't afford it. Hope you've been saving your pennies for the S1.

Rocky Linux 9.4


Rocky Linux 9.4 is out, based on RHEL 9.4, but, you know, free. (Note that Rocky Linux 8.9 doesn't come in a ppc64le version, so Rocky 9.x is your only choice.) If you want the stability of RHEL but don't like the pricetag and don't need the support, here's one of your options. As is typical for such point releases, this one primarily refreshes included software along with security updates. Boot, minimal and DVD ISOs are available for download.