Lately, I've been decompiling Tomb Raider 5 with some friends and while researching potential sources of debug informations that could help the process, I stumbled upon the Pocket PC version of Tomb Raider 1. It was ported by Ideaworks3D, a London-based game development company specialized in porting.

It's supposed to run on low-performance handheld devices running Windows Mobile/CE 5.0 and thus one would imagine that they have simply taken the Windows code and tweaked it a little bit to make it run on CE. Well as I discovered, it's more complicated than that. First, there is no Windows version of TR1, it was only released for DOS and was never ported to either Win16 or Win32. Second, they actually didn't take the PC version as a base, but the PSX version.

It may seem weird, why take the PSX version if your product is going to run on Windows CE. As it appears, Ideaworks3D seems to have developed an in-house userland syscall JIT translator for PSX, and uses it when porting games to CE. In other words, they compile the PSX codebase to ARM code and link that binary against a DLL file called iepaella.dll which contains implementations of PSX syscalls that call the WinCE API. Apparently, they have also made a version of that DLL which runs on standard Win32 that they used to make an ActiveX port of their Pocket PC port of the PSX version. It allowed playing TR1 in Internet Explorer. Not sure why anyone would ever do that, though.

I find this quite interesting because the main game executable seems to have a code near-identical to the original PSX version, which means that "IEPaella" is effectively a full-featured userland PSX emulator for WinCE and Win32 that is capable of mapping the PSX system routines to DirectX API calls. I haven't been able to find any other similar product on the internet. The only software that could be considered similar is Usercorn, a userland emulator based on Unicorn which implements most of the Linux, BSD and Darwin syscalls and even some DOS interrupts. It's very basic though, nowhere near what Paella does.

Currently, the codebase of the decompilation project is divided in 3 main folders:

  • GAME contains the shared game code
  • SPEC_PSX contains the PSX platform code
  • SPEC_PC contains the PC platform code

Debugging on PSX is much harder than on PC, because the binary is run in an emulator and you can't just run the code step-by-step to see where it crashes. Using Paella would allow doing such a thing because the binary is effectively being run on the computer and works like any other C++ program. We're currently search for ways to implement Paella support in TOMB5, but it may take time because not all PSX syscalls are implemented. It will eventually work, though.

For my latest project (Turing), I chose to use PyQt 5 for the GUI because it's what seemed the best to me to allow the whole thing to be cross-platform without much of a hassle. It has done its job quite well for about everything.

After some months of development though, I ran into an issue I was unable to fix : there is a bug, somewhere inbetween pylupdate5 (that scans code files for lines to translate) and lrelease (that compiles .ts files to .qm files) which basically prevents using non-ASCII characters in source strings. You can use them in translated strings, but not in the code files.

Quite strange, since both the code files and the .ts files (which are in XML format) are encoded in UTF-8. Or so I thought.

It seems that lrelease assumes that everything is ASCII (well, to be precise, Latin-1) if you don't specify it at each <message> element in the file, even though the very first line of the file (the XML header) specifies the encoding, in this case utf-8. pylupdate5 has no problem with that and assumes UTF-8 by default.

The workaround is to add an attribute to each <message> element in the .ts file to force lrelease to understand that it's UTF-8. Basically, replacing <message> by <message encoding="UTF-8"> everywhere. The problem is that when pylupdate5 re-saves the file after adding the new lines from the code, it discards the attributes (maybe it assumes that they aren't needed, which is a correct assumption given that it's a freaking XML file). Thus, I needed to write a script that is executed between the two calls to make sure that lrelease always gets fed a file with the attributes, so that it always parses it correctly.

I don't know if it's a bug on the PyQt side or on the Qt side, but it stills seems quite weird to me that we still encounter this kind of problems in 2018, when every sane piece of software uses UTF-8 by default.

This article by Joel Spolsky is from 2003, and it seems that back then the basic knowledge of how to correctly use encodings had already been invented, so why are so many programs still unable to use text correctly?