Raptor, help us out here on the software side
Unfortunately, work on this bug is getting stymied by the fact that POWER9 has hardware watchpoints disabled in the Linux kernel. Trying to catch who's writing the faulty values is nearly impossible in an application of Firefox's complexity without it because without hardware assistance gdb has to single step through the code. When you don't know even what haystack to look for the needle in, it's slow going, as in (no exaggeration) hundreds of times slower. I left Firefox to try to initialize overnight on my 8-core Talos II in the debugger. By the morning, 8 hours later, it hadn't even launched its first thread.
The reason they are disabled is an errata which causes watchpoints set on cache-inhibited memory (such as devices) to make the CPU halt with a checkstop. Arguably some sort of fault on this is correct behaviour, but a checkstop is catastrophic; it's the equivalent of stopping your car by driving it off the road into a wall. I don't fault the PowerPC kernel maintainers for taking this interim approach because without it an unprivileged user could instantly halt the machine, even inadvertently. Even if the kernel could detect that the watchpoint was pointing at a cache-inhibited address and return an error, a tricksy user could potentially set up the watchpoint to "good" memory and then change the memory mapping.
I talked to the PowerPC kernel maintainers about this and an interim solution we're sort of agreed on is to use a debugfs entry as a one-way switch so that workstation developers like me can turn on hardware watchpoints "at our own risk." When I'm ready to debug something that requires a proper watchpoint, then I create a debugfs file and the kernel will then allow the watchpoint in hardware until the next reboot. I'm the only user on the machine, so if I screw it up, it's only my pasty Roman sculptured tuckus (and filesystem).
But this isn't going to write itself. I need to scratch my own itch, get it working, get it accepted, get it actually in a kernel (instead of schlepping forward local changes), and then finish debugging Firefox, and then finish the JIT, in addition to my day job, my work on TenFourFox and not being in trouble with my lovely wife for not emerging from the back room for hours.
So now I'm going to be a little less than nice with Raptor. Raptor has been fairly public about their support for Chromium, ostensibly because of the (IMHO irresponsible) proliferation of Electron apps, but they've been very tepid on Firefox. I understand a small company can't do everything, but they have had time to do ports of Unreal Engine and WINE (though this in and of itself is not enough to make the QEMU-WINE fusion Hangover work, which I've been also trying to tinker with and have it about 70% building). These are fun things to do and are certainly interesting, but that makes statements like this a bit galling, and statements like this a bit disingenuous.
Mozilla has a chicken-egg problem when it comes to an architecture that until very recently had a very small desktop share: its share (and share of desktop users using Firefox) would doubtlessly increase with a Firefox JIT, but the resources to expend into writing that JIT can't be justified until there is a larger share. Furthermore, it rings hollow for Raptor to ding Mozilla in that tweet about not being sufficiently open when Google until literally days ago wouldn't land the existing POWER9 Chromium work when Mozilla has been allowing POWER9 (and other PowerPC) patches into Firefox as a tier-3 for pretty much its entire existence. You can put the open-source lipstick on the Google pig as much as you like but at the end of the day, it's still Google and it's still Google's repo. Freedom involves choice. I'm not going to slam the people who did hard work on the Chromium port, because it is hard work and unfortunately Electron is a thing despite my misgivings, but I am going to slam Raptor for endorsing it at Firefox's expense.
I'm not asking Raptor to do the Firefox JIT port, though I may be soliciting help to farm it out with my reduced number of available cycles. (Right now it's based on Firefox 62, which once it works there, we'll forward-port it to trunk. More on that in a future post, but I'll probably put my current work up on Github. If you're interested in contributing, post in the comments.) I am asking Raptor to endorse the effort, however, and I am asking them to become more involved with developer-facing features to allow those of us who are working on ports to do so more productively.
As an example, developing and getting the interim debugfs switch for hardware watchpoints into the kernel would be an enormous help to me personally (and would save me a great deal of time), and would probably be very beneficial for other developers. Nearly everyone on the LinuxPPC kernel team I talked to agreed this is a big deficiency and one that is realistically implementable. It would be nice if this could proceed in parallel so I'm not blocked on doing everything myself because on a scale of 0 to even, I just can't. I'm sure there are many other developer pain points that will appear as more people start working on Talos systems, and I'd like Raptor to also treat these requests with priority and dedicate resources to worthy ones to allow more port work and development to flourish.
Let me soften a little bit in conclusion by saying Raptor has a very hard row to hoe being a small company jumpstarting an entire ecosystem. I'm being hard on them because I'm glad they exist, I intend to continue being a customer, and as a long-time Power ISA bigot I want them to succeed. But I also want to see the principles of free computing embodied in the hardware for the Talos family appropriately manifested in software. I don't see that being adequately expressed in the choices they've made so far and I'd like that to change. Developers need to be prioritized and software choice needs to be facilitated. Let's see more of that so we can see more POWER9 adoption and a brighter future for desktop computing.
Maybe you have already noticed, but Chromium PPC64le patch has gone upstream:
ReplyDeletehttps://twitter.com/shawnanastasio/status/1100180701457063937
Yes, that's what I meant by just days ago.
DeleteFor what it's worth, I've been working directly with Timothy Pearson on getting an official Raptor machine set up for Mozilla so they can do testing with the graphics stack on big endian machines.
ReplyDeleteA few people in MozIRC #gfx are interested in hacking on this on weekends, but can't because they don't have BE hardware to test on. The idea is that if Raptor can bring us a POWER8 or POWER9 machine that has X11 and VNC, this can become a thing.
Of course, none of this is really public, unless you're following the #talos-workstation channel on Freenode. So I can completely understand why you wouldn't know that, and think that Raptor is putting everything into Chromium. However, I can assure you they are helping with Mozilla too. It's just slower going because I have to act as an intermediary for this stuff. (Moz has no interest in sysadminning this box, so I need to get it set up properly and such.)
I'm not letting them off that easily. Everything you've mentioned is good, and needed, and I'm happy to hear it. But Raptor has made a particular point of highlighting the JIT with good reason. The gfx stuff is important but it's smaller in scope, and doesn't really require anything from Raptor other than a buildbox. In fact, *you* having to act as intermediary actually makes their offer seem rather unserious.
DeleteMeanwhile, you make my point for me: if all this stuff is going on behind closed doors, then the *public* goal of computing freedom is getting shorter shrift than it ought to.
> In fact, *you* having to act as intermediary actually makes their offer seem rather unserious.
DeleteBecause the Mozilla graphics team already knows me, and so does Timothy, and the Moz people don't want have to deal with it. Trust me, I tried to make it a direct connection; it's Mozilla, not Raptor, that requires the intermediary.
> Meanwhile, you make my point for me: if all this stuff is going on behind closed doors, then the *public* goal of computing freedom is getting shorter shrift than it ought to.
A public IRC channel on Freenode is not exactly "closed doors". We (the POWER community) do not have a central mailing list where we can discuss all the things we are working on. You should probably join #Adelie on Interlinked IRC as well if you really want to keep up with everything going on with PPC64, because we're a hub for a lot of POWER work as well. We do email weekly status reports to the adelie-devel@ mailing list, and normally that includes our POWER (and ARM) porting work, but it doesn't always have it all.
I think we're going to agree to disagree on this, but while Adelie certainly represents some measure of Power work going on (and valuable work at that), I resist the assertion that it represents much or substantially all of the Power community. Please don't take my observation as an insult, merely my own observations, but I think that coordination should be coming from a higher level.
DeleteFortunately I just saw Timothy's E-mail, now that I'm home from work, and I'll reply to that.
Oh, I never meant it to come across that we represented much or all of the POWER community. I apologise if it did. I just meant we're a hub for a lot of work that goes in to it and are helping organise some of it (incl. Mono, Go, and yes, Mozilla). I'm not at all insulted. :)
DeleteI don't think it's entirely fair to claim that Raptor has been endorsing Google over Mozilla.
ReplyDeleteIn fact, they've tweeted at Google multiple times raising concern over their prior reluctance towards accepting the patches.
Rather, they've just been endorsing the community port of Chromium. I'm sure if you tweeted at them to raise awareness for your Firefox JIT they'd be equally supportive.
Some of us rage-quit Twitter long ago. :) And it's not that I'm discounting your work -- far from it. But as you say, the fact they've really had to yell at Google to get them to accept that work means something.
DeleteIf anyone is looking for cheap POWER boxes:
ReplyDeletehttps://www.piospartslap.de/Tyan-Rack-2U-Server-TN71-BP012-1x-IBM-POWER8-10-Core-CPU-2926-GHz-2x-750Watt
And the guys of the Linux PowerPC notebook project could give free access to a POWER8 machine
ReplyDeletehttps://twitter.com/RaptorCompSys/status/1100518476819693569
ReplyDeleteCK, you should mail him or join the IRC ;-)
Yes, thanks. I just saw his E-mail.
DeleteOpenPOWER mailing list is On-Line 8-)
ReplyDeletehttps://twitter.com/hughhalf/status/1101580630272299008
Sounds your wish about DAWR has been fulfilled, see https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-April/187763.html
ReplyDeleteYes, I saw that come across. Hopefully it makes it into a shipping kernel soonish.
Delete