Home Artists Posts Import Register

Content

The lighting surface cache drawing functions are a good candidate for testing parallel computing implementations because while they share some of the same approaches implemented in the rasterizer, they're much smaller and simpler with evenly sized loop iterations. Plus, speeding up the lighting surface caches would be good too.

However, so far the performance became slower, not faster. At the maximum number of threads in my computer (8), there's a 10% performance impact (from 30 to 27 fps in timedemo demo1 with the modified unvised e1m3 map from the public release). Reducing NUMTHREADS down to 1 goes back to the previous performance.

Now, for something really curious, I've decided to disable the surface cache drawing functions entirely to see what the maximum speedup could be, and got 30.8 fps. Just a 2.7% speed gain. This shows that the lighting surface caches are already very fast.

While there's no much need for multicore parallel processing in the lighting surface cache code, it still puzzles me why the multicore approach made it slower. The OpenMP multi-threaded parallel processing library is easy to implement, but it looks like getting good performance out of it will be very tricky. There's a lot to read about it, and I'll have to keep doing different experiments.

Since multicore rendering won't be as straightforward to implement as expected, I'll see which other aspects of the engine should be prioritized for now, to keep doing actual progress. Multicore rendering will keep being worked on the side.

Files

Comments

Pritchard

The most common reason for multithreading slowdowns, at least in my experience writing for higher level languages, is memory. It's very easy to introduce issues with threads starving for data and/or cache missing.