How it works - scurest/duckstation-3D-Screenshot GitHub Wiki

The PS1 graphics pipeline looks like this

The GPU is famously purely 2D. It receives 2D polygons with UVs and vertex colors for rasterization in back-to-front order. To help convert 3D scenes to 2D, the Geometry Transform Engine (GTE) coprocessor can be used to translate, rotate, and project 3D vertices onto the 2D screen.

Game code is opaque to us, but we can hook into the GTE and GPU and watch the data flowing through them. A 3D screenshot needs both the 3D positions fed into the GTE and the UVs and vertex colors fed into the GPU.

So we watch the vertices going into the GTE and remember which 3D point projected to which 2D screen position. Then later on when we see a 2D poly going into the GPU, we match up each of its 2D corners with the 3D vertex that projected to that corner position. If we get a match for all the corners, then we can be pretty sure it was originally a 3D poly on those vertices, and write it out to the OBJ file.

Solved Problem: Spikes

What happens if two 3D vertices project to the same point on screen, ie. they're on the same line of sight? Then when we get a poly with that screen coordinate in the GPU, we won't know which of the two 3D verts it corresponded to. If we pick the wrong one, we will probably get a "spike" shooting out along a line of sight to the wrong vert.

To avoid this, whenever we see that a vertex is going to land on the same screen pixel as another one, we reach in and jitter its 2D position around so it falls on a different pixel. Theoretically this completely solves the spike problem. In game, it looks like this.

The 2D positions are completely screwed up, but remember, we only use those as keys to look up the original 3D positions. The 2D positions never go into the OBJ file. So we can screw up the 2D frame as much as we want as long as we gets lots of good 3D information out of it.

Solved Problem: Backface Culling

Normally polys which are back-facing or zero-size in screen space are culled before they ever make it to the GPU. This means 3D screenshots would only contain the camera-facing side of the scene.

The actual cull test is done by game code, but fortunately it is based on the sign of an NCLIP test computed on the GTE, which we can hook into.

We have two strategies for getting the backfacing polys.

Disable Culling option

In the simple case, we just force NCLIP to be positive. Most games treat a positive NCLIP as forward-facing, so this tricks the game into thinking all polys are forward-facing and thus disables culling.
Capture Front & Back option

However some games, like Ape Escape, vary the sign they require for the cull test. Since we can't know what sign they expect, we can't set it to the correct one. In this case, we have a different strategy: capture once with normal NCLIP, rewind via save state, then capture again with reversed NCLIP. The second "exposure" should get any polys missed by the first, and we can put them together to get the whole shot.

In principle, this fixes any game which determines culling solely from the sign of NCLIP.

To prevent the player from moving in between the two exposures, which could misalign them, we also need to disable controller input while capturing in this mode. Think of it like sitting still for a photograph.

Unsolved Problem: 2D "litter"

Jittering prevents 3D verts from landing on the same pixel, but it doesn't do anything for points that don't come out of the GTE, eg. 2D sprites. What happens if all the corners of a 2D sprite happen to land on pixels occupied by 3D verts?

Then we get a "false positive"; we mistake that sprite for a poly in 3D space, stretched out between the vertices it collided with. These form "litter" scattered around the 3D scene.

TODO: picture

Solutions:

Clean up litter by hand in your modeling program.
Take multiple shots and use the best one. You can often avoid it by luck. You only need one corner of a sprite to land on an unoccupied pixel for it not to happen.
Use the freecam to move closer to your subject of interest. The vertices of distant objects are more dense in screen space and therefore increase your chance of hitting an occupied pixel by accident.

IDEA: Detect probable sprites by their being perfect rectangles in screen/UV space, flag in the OBJ for quick review.

Unsolved Problem: Mid-frame VRAM updates

Textures are currently dumped using the content of VRAM at the end of the frame. Games that upload to VRAM mid-frame or use render-to-texture techniques will have wrong textures.

TODO: examples

PROGNOSIS: No conceptual obstacles, just a pain to implement.

Partially Solved Problem: GTE outputs and GPU inputs "out of phase"

Sometimes game code will apply additional transforms to the 2D vertices after they come out of the GTE. This prevents us from recognizing the 3D vert the 2D coordinate came from, resulting in missing or incorrect polys.

EXAMPLE: Digimon World 2

SOLUTION: Use PGXP. PGXP is an enhancement for PlayStation emulation that ferries high-precision screen-space coordinates and depth info from the GTE to the GPU, where it's used for enhanced rendering. It already has a mature system for tracking the GTE outputs as they move through game code. We can take over this pipeline, inject our 3D verts in the GTE, and read them back out in the GPU.

TODO: investigate cases where this doesn't work