PSXDEV : Forensic debugging for the win!
Spent some time today trying to find a bug that would only trigger when uploading some code to the PSX and not when loading from CD. Turns out there is a methodology called forensic debugging, and I've been doing it without knowing !
The bug
So I'm debugging this little psx game engine I've been at for a few months now, and right now, there's two ways you can load the level's data. You can use the classic way, and load it from the CD, but it means you need to generate an ISO in order to test your builds, and you need some kind of ODE if you want to be able to test on real hardware.
The other way is to load the level directly in the PSX memory at the correct address, then upload your executable.
With nops, that would be :
# enable debug mode
nops /debug
# load level data
nops /bin 0x800b2410 Overlay.lvl1 #<- this is the overlays loading address
# load 'engine' executable
nops /exe main.ps-exe
Unfortunately, sometimes it won't work, and you're met with a black screen after upload. Hopefully, you can still access the PSX registers though. And we can find some interesting informations there !
PSX registers
If you set unirom's debug kernel before uploading anything, you should be able to access them with nops /regs
.
In my case, here is what I got when met with the BSOD :
stat =0x00004000 badv =0x00000000
r0 =0x00000000 at =0x80040000 v0 =0x00000000 v1 =0x275A0C80
a0 =0x800425B4 a1 =0x800B37EC a2 =0x800B3B94 a3 =0x800B3904
t0 =0x000000DD t1 =0x800B2574 t2 =0x80042008 t3 =0x00001000
t4 =0x00000017 t5 =0x00000017 t6 =0xFFFFFFDF t7 =0xFFFFFB1E
s0 =0x800B2510 s1 =0x800425B4 s2 =0x00000000 s3 =0x8003BABC
s4 =0x800ABDF4 s5 =0x00000000 s6 =0x80042008 s7 =0x00000008
t8 =0xFFFFFB1E t9 =0x00000000 k0 =0x00000000 k1 =0x00000F1C
gp =0xA0010FF0 sp =0x801FFC04 fp =0x80042008 ra =0x8001507C
rapc =0x800150A0
hi =0x00000000 lo =0x00000000 sr =0x4000FF14 caus =0x1000001C
DBE - Bus Error on data load/store (0x7)
So first, this : DBE - Bus Error on data load/store (0x7)
. I suspected from experience (did I mention I crash the PSX a lot ?) that it had something to do with a null pointer.
So I decided to check those registers. I mean there must be some useful information in there, however cryptic this looks to my newbie eyes...
Turns out it's all there : https://psx-spx.consoledev.net/cpuspecifications/#cpu-registers
And it also turns out that the two most interesting pieces of information for us right now are the ra
and pc
address. ra
is the return address, the adress to which the cpu should go back when the current subroutine returns.
And pc
is the program counter, and will let you know where exactly the cpu was when it crashed.
So with that information, I thought about checking the linker map file that's generated when compiling the program, and look for something near those adresses.
I couldn't find the exact addresses in this file, but I could see that it must have something to do with the drawQuad() function that's at 0x80014ec0
:
.text.drawQuad
0x0000000080014ec0 0x358 src/graphics.o
0x0000000080014ec0 drawQuad
.text.drawTri 0x0000000080015218 0x314 src/graphics.o
0x0000000080015218 drawTri
And once again, psxdev's @Nicolas Noble came to the rescue with two words : addr2line and objdump.
addr2line
addr2line
"converts addresses into file names and line numbers.".
Straightforward enough (remember to use your toolchain's binaries though):
mipsel-linux-gnu-addr2line -e main.elf 0x8001507C
yielded a laconic
./src/graphics.c:352
So there you have it, just look at the file, at the line, and realize your immense doofusness.
// If tim mode == 0 | 1, set CLUT coordinates
if ( (mesh->tim->mode & 0x3) < 2 ) {
setClut(poly4,
mesh->tim->crect->x,
mesh->tim->crect->y
);
}
Trying to set a CLUT on non-existent tim data... easily fixed with a simple check :
if (mesh->tim){
// If tim mode == 0 | 1, set CLUT coordinates
if ( (mesh->tim->mode & 0x3) < 2 ) {
setClut(poly4,
mesh->tim->crect->x,
mesh->tim->crect->y
);
}
}
objdump
objdump
"displays information from object files."
Specifically with the -d
flag, objdump can give us a clue as to what went wrong exactly :
mipsel-linux-gnu-objdump -d main.elf | grep 800150a0
yields :
80015078: 02202025 move a0,s1
8001507c: 8e03000c lw v1,12(s0)
80015080: 00000000 nop
80015084: 8c620000 lw v0,0(v1)
80015088: 00000000 nop
8001508c: 30420002 andi v0,v0,0x2
80015090: 1440000a bnez v0,800150bc <drawQuad+0x1fc>
80015094: 00000000 nop
80015098: 8c630004 lw v1,4(v1)
8001509c: 00000000 nop
800150a0: 84620000 lh v0,0(v1)
800150a4: 84630002 lh v1,2(v1)
the lw
instruction here can be translated to :
Load into v0
the content of what's at the address stored in v1
+ 0
Looking at those instructions, nothing stands out as illegal, but I might be overlooking something.
Anyway, in that case, addr2line
was enough to find the culprit and fix the issue !
Sources
https://discord.com/channels/642647820683444236/663664210525290507/860539386689093662
https://linux.die.net/man/1/addr2line
https://linux.die.net/man/1/objdump
https://psx-spx.consoledev.net/cpuspecifications/#cpu-registers