Sprint 34 - Adam-Poppenheimer/Civ-Clone GitHub Wiki

Goals

A startup time for a Tiny map at max player count of no more than 45 seconds.
A startup time for a Small map at max player count of no more than 45 seconds.
A startup time for a Standard map at max player count of no more than 45 seconds.

Risks and Mitigation

My terrain bake textures in their current state require an enormous amount of memory. There's no way I'll be able to generate a large or even a standard map with them like this. If I can't figure out how to make terrain baking less expensive, I'll be left with small maps that consume an enormous amount of memory and take a long time to load.
- I might gain a lot from texture compression. If DTX5 can cut my memory requirements by enough, I might not need to do anything else. It's also possible I could use a narrower color format, maybe something like 4 or 6 bits per channel rather than 8. That would reduce memory. I don't know if I can reduce the resolution of the texture when culture is present, since culture requires fairly high resolution. But maybe I could adapt the resolution of the texture to the needs of the MapChunk holding it. A MapChunk with no features, for instance, could have a tiny texture (like 1x1) or a null texture whose sampling is trivial. A MapChunk with a few lower-demand textures like Marshes or riverbanks might get an intermediate-resolution texture. Something with roads and culture might get the highest resolution. Even just implementing the null texture idea would save a lot of memory in ocean chunks, which should be fairly common. If that's not sufficient I can always try deallocating (and not immediately loading) bake textures on chunks that aren't visible. That might cause major performance issues when moving the camera, though, so I'm hesitant to do that.
Blending the contributions to my bake texture has proven a difficult task. And I know that blending multiple transparent objects is nontrivial. If I can't figure out how to make culture play nice with everything else, I might need to add an entirely new texture for baking, further exacerbating my memory problems.
- Adding another texture is unacceptable. Performing another render pass, however, is much more reasonable. Since the standard Blend directives haven't produced the results I need, I might be able to do something fancy with a "post- process" pass that adds culture by sampling what's already been rendered and doing some more complex math on it. That might be an operation similar to UI overlays. Maybe there's something I can learn there? At the very least I should do some more research on semitransparent blending. The fact that I'm blending over a clear background might complicate things, but surely that's been done before.
Multi-threading seems like a very good way of using CPU resources more efficiently and decreasing load times. But I've only engaged in the most trivial multi-threading tasks before. Given Unity's lack of support for the operation, adding parallelism to the codebase could open up a whole can of worms that I'm not equipped to deal with.
- Apparently, Unity does have a job system for doing somewhat safe multithreading tasks. That was introduced in the 2018 version, so I don't think I have it in my current version, but I could always try updating. I can also make use of standard .Net threads, though that incurs some extra difficulties. Regardless of the difficulty, multithreading is a very important aspect of programming and I really ought to know how to do it. I can use this performance bottleneck as a learning experience, perhaps.
Optimization is a game of diminishing returns, especially if I'm hacking away at the same basic plans the entire time. It's entirely possible that my current implementation cannot be optimized to get it where I want to go. And if that happens, I'm not sure what I'll do.
- Even at this stage, with my current architecture, I have a bunch of ideas for reducing memory usage and processing time. Compression, null bake textures, and multithreading alone are major potential sources of improved performance. I highly doubt I'm about to run out of things to improve. If I do start running out of avenues of improvement, I can always try reducing visual quality or trying completely new approaches. I'm sure there are others who've put lots of thought into large, procedurally-generated terrains at runtime. Most of those focus on generating natural-looking maps, but maybe I can take some lessons from their approaches.
Processing bake textures takes a lot of time, since I have to use ReadPixels() to pull data into the CPU. While switching to a compressed format might save memory, it'll probably also increase the burden on the CPU, which will increase the time it takes to load the game. That'll increase the need for multi-threading, which'll increase the complexity of the codebase. Compression could end up being so expensive as to not be worth the memory savings, especially if the level of compression is fairly modest.
- Right now memory is the big bottleneck. Slow load times are not as big of a problem as the game crashing the whole system. I feel like I have more room at the moment to change CPU performance than I do with memory performance. I can imagine multithreading making the game run faster, but I cannot imagine how I'd make terrain baking work without lots of textures. Taking burdens from main memory and placing them on the CPU is probably an acceptable tradeoff.

Review

This sprint got off to a fairly good start. Using texture compression turned out to be very easy and very effective. It immediately shaved off about 2 gigs worth of memory usage on relatively small map sizes, and will likely save even more for larger ones.

Immediately after that, I made another major improvement to memory usage by permitting variable bake texture sizes. I ended up creating several size categories, though I think most of them were not very important. Using 0x0 textures for chunks that had no need for baking, however, saved another large chunk of memory.

With these two changes, memory became a much smaller bottleneck. It's not clear if I'll need to implement additional memory-saving measures at the moment, since I don't know how my implementation will work for large and busy maps. But memory stopped being the main concern within the first few days of the sprint.

After resolving these memory problems, I turned my attention to multi-threading my heightmap calculations. My original plan was to upgrade to Unity 2019 and make use of the Jobs system. So I began to upgrade my project to the newest version of Unity. What followed was a week and a half of frustration. At first I was simply making changes to my code to facilitate multi-threading, like reusing PointOrientationData on a per-chunk basis rather than using the same object for the whole map. But that relatively productive work quickly disappeared into a morass of compatibility issues, plugin conflicts, assembly reference errors, and tool-fighting. Most of my time was spent trying to get the program to compile or getting my tests to work or getting various things to talk to each-other. This culminated in a problem with Visual Studio that forced me to download a utility designed to uninstall intransigent VS installs. I lost a full day to the VS uninstall process.

But I was making progress in between fighting my tools. At first I tried to apply my heightmap calculations to Unity's Jobs system. But it turns out the restrictions Jobs have made using it for heightmaps very difficult. Even if I could've managed it, it would've created a horrible and messy architecture, and one that requires a ton of data duplication. Unless the Burst compiler is pure magic, it's not even clear to me that using Jobs would've been that efficient.

Instead, I opted to use my new access to .Net 4.X and tried implementing heightmaps through Tasks. I ended up building a fairly simple multi-threading plan that still takes chunks one at a time but processes columns of samples in parallel. Even this simple change was enough to drastically improve the performance of my heightmap calculations to the point where they're not the major bottleneck any longer. And working in that space (plus the research I did concomitant with the sprint) has given me ideas as to how I could improve performance even further. Working more carefully to let the render thread and the CPU work at the same time, for instance, will likely improve performance substantially.

Preliminary results suggest that the changes I've made have finally dropped my Tiny load times below the target of 45 seconds, as well. I'm not fully convinced that this is true at the moment, but performance has definitely increased so much that I'm ready to move on to larger map sizes.

There was a lot of wasted time in this sprint, some of it wasted by tools, others of it wasted by exasperation. It was far from an efficient use of time. But it also produced fairly substantial results. So overall I'm not sure what to think about it.

Retrospective

What went well?

I've vastly increased the memory efficiency of my game, that it now looks like I'll be able to handle fairly large map sizes without crashing my computer.
I've successfully upgraded my project to the latest version of Unity, which gives me access to all sorts of fancy tools that I didn't have before. That should make development (in the long run) a lot smoother.
My architecture for heightmap and alphamap calculations has proven very tolerant to the requirements of multi-threading. It didn't take many changes to get basic multi-threading of heightmap calculations in place (at least using .Net solutions), which implies that I was onto something when I made all of my heightmap calculations independent of each-other.
I have managed to get any amount of multi-threaded code working. While not a major accomplishment in and of itself, it's an important first step. And I learned a lot while writing those handful of lines.
Heightmaps are now running much more quickly, and I still have ideas for making them run even faster.
I think I've managed to improve the directory structure of the project. There are a lot fewer redundant folders and the structure I have seems cleaner and easier to navigate.

What could be improved?

I need to be a lot more careful when doing big version switches like this in the future. I have no doubt that there are still a ton of bugs in my codebase related to the version change, and I have little way of knowing what they are.
I spent a lot of time flailing about with my tools because I wasn't thinking carefully about what exactly was broken and what I needed to do. The VS uninstall debacle came about because I was convinced I needed to manually compile the source of Moq when in fact all I needed was the latest DLL. I burned a day and a half doing something more easily done by downloading a NuGet package and moving a DLL from one folder to another.
With the heightmap code more under control, rendering and map generation are now becoming the main bottlenecks to performance. Rendering is most likely slow because the CPU and GPU are waiting on each-other, which is sub-optimal. That'll need to be fixed. More worrisome is map generation, which was not made with multithreading in mind at all. I'll have to take another look at that section of the codebase and see if I can make improvements to its architecture.