PSXDEV snippets : PSX drawing optimization
Arthur : oh, by the way, the GTE can easily transform 1500+ triangles just fine, it's actually displaying them (i.e.: building display lists) that take most of the CPU time.
misscelan : yeah, in this case Quilt is transforming and projecting 2500 (cause the backface culling and clipping comes after) but still 2500 is fine with the GTE. But if you want to start optimizing first I would recommend to use the scratchpad as much as possible and if you want to go the extramile move to asm to take advantage of the GTE parallelization (since as far as I know the compiler won't do that for you). There are other things like the clipping function which is a bit slow or for example since you are using only FT3 you could initilize all the packets already, also you can replace gte_stsxyX with the macro to retrieve the three of them.
Arthur : Yeah, pre-allocated packets are a good way to improve performance, at the cost of using more RAM. For each model, if you generate the packets and keep them in ram, with a copy for each framebuffer, you will only need to change the vertices and nothing else (vertex colors, uv maps, others). That would save a lot of CPU cycles.
misscelan : I meant to initilize them by calling only setPolyFT3 in all packages at the start, so save some measly cycles :joy: . Pre-allocating all packets as you mentioned will use a lot of ram and if you take advantage of the parallelization you can load/set most of that stuff while the GTE does its thing. I actually in most cases run out of things to do and have to wait for the GTE to finish.
Sources
Arthur & misscelan : https://discord.com/channels/642647820683444236/642849069378568192/852081184921616432