Sherief, FYI

Thoughts on Rift Apart (PC)

Ratchet & Clank: Rift Apart came out on PlayStation 5 in 2021 and as a first-party title it was built around, and to show off, the architecture of the PS5 - especially its fast IO system and hardware decompression support. And it succeeded - the “rifts” and other game elemnts would transport you between worlds, loading gigabytes of data in a mere second or two. Insomniac’s a studio that knows how to get the most out of the hardware and it showed.

In July 2023 a PC port came out, made by Nixxes Software who also have a track record of knowing what they’re doing around the PC as a platform and delivering solid ports. This also had the distinction of being the first game to use DirectStorage’s compressed format (GDeflate) with the engine supporting GPU and CPU decompression of assets, offering lots of potential tradeoffs around bottlenecks. For these reasons, it’s a very interesting PC port on the technical level, and a sign of things to come as games built around Xbox Series / PS5 consoles make their way to PC.

Digital Foundry’s Alex Battaglia did an amazing review of Rift Apart’s PC port, which I recommend everyone watch. I found the video excellent, and had a few comments about it and the game it covers that I felt warranted something longer than a tweet or two.

At the 1:17 mark Alex complains about some options not gracefully changing in real time, and introducing a wait time when changed (in this specific case it was the ray traced ambient occlusion). While I agree that this is not an ideal situation, for some options this can really be unavoidable with the only other option being to stall rendering and wait till the change is ready, aka to stutter. I haven’t gotten my hands on the PC port yet so all I can do is make educated guesses based on Alex’s video, but in the case of RTAO quality levels changing them might require building new ray tracing acceleration structures like a Bounding Volume Hierarcy (BVH) using meshes with a different Level of Detail (LOD). This is not an instant process, and it could be that the game in the video linked earlier is showing a flash because it deallocates the previously computed BVHs, computes the sizes needed for the new ones, allocates memory for that and computes new BVHs for all the meshes in the scene. This just can’t be done instantaneously, and as much as I hold games to high standards if that is indeed the reason I think Nixxes made the right call here by not stuttering and instead momentarily compromising image quality.

At the 2:27 mark it’s mentioned that Richard Leadbetter was seeing out-of-memory (OOM) errors causing the game to abruptly terminate and at 3:01 Alex mentions some textures never loading their higher quality mips - both seem to indicate that some sort of graphics memory defragmentation might be necessary.

At 16:51 Alex makes an excellent point about something that I don’t think is getting enough attention - the overhead of having many heavy-weight applications running in the background / notification area (AKA system tray) and how much of an impact they can have on performance. A lot of people think a game launcher using a gigabyte of memory isn’t a big deal since there’s a lot of RAM to go around or most of it is paged out, but in practice I’ve noticed that a lot of the heavy weight apps that really shouldn’t be that heavy weight are generaly poorly coded and often using suboptimal solutions to problems. Most of them also don’t lie dormant when in the background, meaning not as much of their memory is paged out as you’d think. The Epic Games Store launcher for example seems to be an Unreal Engine powered application itself - so when it’s running, you now have a second GPU context competing with whatever games you have for execution time. That is A Bad Thing in general, and EGS can be absolutely egregious in how it uses GPU resources. Other apps like Discord, Slack, launchers, etc are mostly unoptimized and use way more resources than would ideally be needed to achieve their goals. Every bloated app is an excellent example of opportunity cost, and the effect of not internalizing development externalities. In Alex’s experience, which matches mine, Steam seemed to be well-behaved in that regard. In my own experience, everything else just wasn’t.

At 20:51 Alex shows how the game’s portal sequence runs much better on an HDD equipped system when the data is cached - another thing that works better when you have less apps in the background and another example of the opportunity cost we’re losing with Electron apps taking 10x or 100x the memory needed to display a chat box or a game launcher. Every byte you lose to a bloated app is a byte of disk cache you’re not getting, meaning that even if you have lots of RAM bloated apps still affect you. Alex also raises an excellent point: the game can pre-emptively issue these reads, since the sequence in question is linear in nature.

At 23:40 Alex measures the storage bandwidth used by the initial portal jump sequence and it never exceeds 500 MB/s, though he mentions other people seeing 1200 MB/s - both a far cry from the 3500 MB/s or 7000 MB/s top rated speeds of the NVMe SSDs in question. What I would have loved to see is the PCIe transmission (Tx) bandwidth used in these scenarios, as this has potential to be a bottleneck on PC and is something that the PS5 doesn’t have to deal with given its unified memory architecture (UMA) design. DirectStorage with GPU decompression is aimed at tackling exactly this bottleneck: it’s compression aimed at maximizing the data transferred per second over the PCIe bus to the GPU memory. Oh what I’d give for that same video but with an additional section measuring PCIe bandwidth and testing the same GPUs in 16x, 8x, and 4x modes (2x and 1x if possible?)

Overall, Rift Apart is a great PC port and Alex’s video is a great technical review of it. However, there are a lot of tradeoffs involved with DirectStorage / GDeflate and IO - I hope that one day we can see a PC first title adopting GDeflate and designed around the non-unified memory architecture inherent in discrete GPU used over PCIe. That would be the real, ultimate test of the format.