D/A5 port: Windows binaries to test

Started by Simon, August 24, 2015, 09:42:39 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ccexplore

Here's data from an older, crappier Windows laptop (4-5 years old? maybe even older, not sure) that I don't use much except for testing on slower hardware. ;P  (Note: it's probably not the oldest I have access to, stay tuned. ;P)

At least the fullscreen mouse problem didn't happen on this cmoputer, no surprise since it's using default system DPI settings unlike the other one.  The inability to regain mouse capture in windowed mode after focus lost does repro though.

One other thing:  potentially consider avoiding mode 5 altogether.  During the benchmark in both fullscreen and windowed modes, Lix consistently crashed the graphics driver on this computer whenever the benchmark gets to mode 5.  It seems Windows has gotten better over the years and instead of Bluescreen-and-reboot, in this case display just flashes black a few times followed by a "your display driver has crash and recover" error message, and then back to normal.  If I'm lucky sometimes Lix even manages to continue running after all that, though most of the time the program dies with it.  Ah, the joys of dealing with graphics-driver-specific bugs.  (To be fair, it may also be possible that Lix itself is doing something wrong that it shouldn't be doing in the first place; that said, the driver is supposed to be resilient and not crash.)

Simon

ccx: Thanks for the extensive feedback and log/profiling files.

Mouse not captured on reentry: I can reproduce this here. It's a general bug, seems be solvable with the correct Allegro calls. Issue tracking for the D port is on github, I've added it.

Mouse doesn't leave small rectangle: Will look into this.

Ichotolot's machine is a few years old, but was top-class when he bought it. He hits 60 FPS on the 3500 gadget map too, and I have the log/profile results.

I'll make a table from the various results at some time.

-- Simon

geoo

Quote from: ccexplore on August 26, 2015, 06:33:11 AMOne other thing:  potentially consider avoiding mode 5 altogether.  During the benchmark in both fullscreen and windowed modes, Lix consistently crashed the graphics driver on this computer whenever the benchmark gets to mode 5.  It seems Windows has gotten better over the years and instead of Bluescreen-and-reboot, in this case display just flashes black a few times followed by a "your display driver has crash and recover" error message, and then back to normal.  If I'm lucky sometimes Lix even manages to continue running after all that, though most of the time the program dies with it.  Ah, the joys of dealing with graphics-driver-specific bugs.  (To be fair, it may also be possible that Lix itself is doing something wrong that it shouldn't be doing in the first place; that said, the driver is supposed to be resilient and not crash.)
Mode 5 is actually the exact same as mode 4, just applied to a bigger bitmap. It uses ./images/matt/oriental/bonsaitree.png (227x207) instead of ./images/matt/earth/07b.png (16x16). I recall using the even huger dragon instead of the bonsai tree at some point. I encountered a crash but I didn't output the modes yet back then so I didn't find what caused it. I wonder if it's due to the size, or due to the weird dimensions. Maybe you can try replacing the bonsai tree with a 256x256 bitmap and see if you encounter the same crash, or change the size of the small bitmap to something weird. Mode 4 and 5 really just blit bitmaps onto the bitmap of the level map (al_draw_bitmap), while the other modes use textured triangles (al_draw_prim). Or at least that's what I think they do, I don't know what exactly those functions do internally.

ccexplore

Quote from: geoo on August 26, 2015, 11:03:18 AMI wonder if it's due to the size, or due to the weird dimensions. Maybe you can try replacing the bonsai tree with a 256x256 bitmap and see if you encounter the same crash, or change the size of the small bitmap to something weird.

It seems to be mostly due to the size.  After testing it out with various different sizes of bonsai (note: the benchmark desperately needs the ability to skip ahead to the next thing via keypress or similar, for testing out this sort of thing), it seems that the cutoff might be somewhere around 192x192.  Sizes much smaller are fine (ie. no crashes) even with odd non-power-of-2/non-multiple-of-8 sizes, like 127x127 and what-not.  Around 192x192 are where the crash seems to start happening, with weird sizes seemingly to more likely to trigger the crash than non-weird ones.  At that boundary point the crash may not always repro with 100% consistency but still quite often.  Once you go sufficiently beyond 192 (like maybe 200x200), the crash seems to be triggered all the time regardless of weirdness of the size.

For now I think I'm more inclined to chalk the fault up to the graphics driver, though it also suggest there may be a need for a user-configurable parameter to place a size limit on bitmaps (which gets handled by Lix by essentially splitting one blit into multiple blits that conforms to the limit), as a potential mitigation against faulty drivers like what we have here.

Quote from: geoo on August 26, 2015, 11:03:18 AMMode 4 and 5 really just blit bitmaps onto the bitmap of the level map (al_draw_bitmap), while the other modes use textured triangles (al_draw_prim). Or at least that's what I think they do, I don't know what exactly those functions do internally.

Hmm, are you sure you have the mode numbers correct in the statements above?  After reducing the bonsai to 127x127 so mode 5 works, I'm seeing significant performance differences between 4 and 5, with 5 being extremely slow (probably 1 frame per second if not worse), while 4 is more acceptable (at least it doesn't look like framestepping like mode 5 ;P).  On the other hand, mode 3 is almost as slow as mode 5 though still better (looks something like maybe 2 frames per second versus mode 5's 1 per second).

It does make sense to me that textured polygons are more efficient, after all they are the workhorses of 3D graphics which most graphics cards have to support in this day and age.  That being said, I have to suspect you may still not be doing certain things the right way, if blitting is getting you framerates that look like framestepping.  Perhaps time to browse some sort of forum on A5 for technical expertise?

geoo

#19
QuoteHmm, are you sure you have the mode numbers correct in the statements above?
Yes, they are correct. You can check perform_geoo_benchmarks() in https://github.com/SimonN/LixD/blob/master/src/basics/demo.d for details (don't mind Simon's weird variable names). The modes are as outlined:

  • Write all triangle vertices in an array, then use a single al_draw_prim call. 16x16 bitmap, 20000 triangles.
  • Write all triangle vertices in an array, then use a single al_draw_prim call. 16x16 bitmap, 80000 triangles.
  • Write all triangle vertices in an array, then use a single al_draw_prim call. 227x207 bitmap, 20000 triangles.
  • Hold target bitmap. Call al_draw_bitmap for each bitmap. Unhold target bitmap. 16x16 bitmap, 10000 calls.
  • Hold target bitmap. Call al_draw_bitmap for each bitmap. Unhold target bitmap. 227x207 bitmap, 10000 calls.
  • Write the vertices for each pair of triangles forming a rectangle into an array, then call al_draw_prim for each. 16x16 bitmap, 20000 triangles.
What all modes do is draw some stuff in the top left corner. All the other stuff floating around is just fluff Simon was testing, and tbh it shouldn't actually be drawn during the tests.

We checked A5 forums for similar questions, and the results seems reasonably consistent with benchmarks other people made. If the allegro layer is too slow, the alternative is doing everything directly via OpenGL. E.g. OpenGL natively supports quads instead of triangles, which would shrink the required size of a vertex array for modes 1-3, for instance. Though afaik allegro doesn't always use OpenGL in the backend, I think it depends on the graphics card what it uses.

ccexplore

Interesting, so it sounds like the slowness seen in 3 and 5 are due to the much larger bitmap size.  You would have 183.55 times more bitmap data to deal with even when keeping everything else equal.  Perhaps it would be interesting to have a mode that uses 16x16 bitmap but have 3671000 triangles or 1835500 calls (ie. scaled by that 183.55 factor) to compare against.

Do 3 and 5 actually reflect realistic situations that may be encountered in real lix levels, or is it more like a stress test for benchmarking only?

I don't pretend to be anywhere close to an expert on this stuff, but regarding 1-3 vs 6, I think I've heard/read that you basically want to batch GPU calls as much as possible.  That way the driver can work out the most efficient way to schedule the given work across the GPU's pipeline taking as much advantage of parallelism as it can.  So I would expect (but haven't look at the actual data to confirm) 6 to be less efficient than 1.

Regarding OpenGL, my main worry would be that I think for Windows, even though most graphics drivers support it I don't think it's really required, at least not to the degree that DirectX support is.  DirectX is first-class citizen in Windows world compare to OpenGL.  You can bet that nothing Microsoft writes for Windows would be using OpenGL, and many games written for Windows also will likely use DirectX because it's where Microsoft and graphics card vendors will spend most of their efforts in terms of enhancements, improvements, etc., such that when performance is paramount it will make more sense for them to target DirectX.  As a result given this business climate, if nothing else I'd expect that for the same Windows graphics driver, the DirectX aspects will likely be much more well tested and maintained, compare with the OpenGL aspects.  The situation is such that for example, Chrome apparently uses a OpenGL->DirectX translation layer in Windows to handle WebGL content, rather than directly targeting OpenGL.

So if Allegro 5 can intelligently choose between OpenGL vs DirectX for its Windows implementation, that may be advantageous compare with always sticking to OpenGL.  Linux is of course different since it's just OpenGL in that world.

Triangles vs quads may not be that good a reason to jump to OpenGL anyway, in terms of performance.  At least the given data on this test computer would indicate the texture size can be much more of a bottleneck.  It would not surprise me that at least for some lower-end drivers, their quad support may well be handled by the driver internally by turning the quads back into triangles especially by the time things get to the GPU.

ccexplore

Quote from: ccexplore on August 26, 2015, 06:33:11 AM(Note: it's probably not the oldest I have access to, stay tuned. ;P)

Continuing the trend, here's data from an older laptop with an even crappier graphics card, in fact likely the crappiest brand in existence* (SiS).  The laptop is running Win7 but has a label that says Windows Vista, so that should give you a good idea how old it is. :XD:  And in case you're wondering, I still have yet an older one running XP, though it might not turn on anymore since I haven't powered or even charged it for months ever since Microsoft stops providing security patches for it. :XD:

Bad news on this Vista-era laptop: the game only lasts up to the first level of the benchmark, it crashes (the game this time, not the driver) on the second level for both fullscreen and windowed.  The logs seems to have captured the crash details successfully so I'll leave it for Simon/geoo to sort out.

And yes, the laptop does run C++ Lix just fine, as does Clones, so maybe it's a sort of worst case scenario but still a valid setup for testing this.

*actually I wonder if they're even still in business

Simon

Crash on the Vista-era laptop in level 2: The level has #SIZE_X 3200, and the crash is upon creating a VRAM bitmap, but getting a null pointer back from A5. It's possible that the gfx card's max VRAM bitmap size is 2048x2048. The card in my 9-year-old laptop can't make any VRAM bitmaps larger than 4096x4096 either.

Things like this are the reason why I've opted for the debugging build. It may run slower, but will dutifully log all the assertion failures and other things flying out of the main loop. Thanks for testing this!

The Clones developers are good hustlers when it comes to speeding up things. They use many 128x128 (IIRC) bitmaps instead of one large bitmap for the land. They have to draw on these pieces eventually with terrain-altering skills. I have to experiment how fast it is to blit this land at some time.

A5 should select either OpenGL or DirectX, yes. It probably defaults to DirectX on Windows, but I don't know much about the internals.

-- Simon

ccexplore

Does Clones attempt to keep the entire level's terrain bitmap in VRAM no matter the size of the level, potentially just failing to run the level if it's too large?  Or does it do something like dynamically swap parts of the level in and out of VRAM as needed when it can't all fit?  I seem to recall some fairly large singleplayer levels (though perhaps not 3200x3200 or even 2048x2048).

Clam

My results attached.

OS: Windows 8.1
CPU: Intel Core i3-3110M 2.4GHz
RAM: 8GB
GPU: Intel HD 4000
Age of hardware: 3 years
Screen resolution: 1366 x 768

Fullscreen works fine, which is an improvement from the current C++ Lix. I haven't noticed any other issues :lix-smile:

geoo

I deleted my old post and I'm creating a new one to bump the topic as I tested it on my XP machine now.

I ran it on my primary machine in both Windows and Ubuntu. Same machine, same hardware specs, but the performance on Ubuntu is notably better.
On my XP machine the benchmark does at least run, but only the first level runs at 60 fps. The 3500 gadget level takes almost 10 seconds to load, so the benchmark only has two or three frames there.