Object-oriented design with Simon

Simon · April 11, 2016, 09:03:52 AM

Part 8: Sucky performance of OO -- but is it the lion's share of my problem?

There's a performance downside to the state pattern in part 6.

Naively, with automatic memory management, we allocate a class instance for the lemming, then a class instance for the job. We don't estimate the maximum memory size to fit all jobs, but allocate dynamically per subclass. The lemmings go in a collection of pointer-to-lemming, and point themselves to their jobs. When we copy n Lemmings, we follow 2n pointers, and allocate 2n little chunks.

The tagged union is faster. All lemmings occupy the same amount of space: job-independent fields, plus the maximum length among all jobs. You can allocate naively n lemmings, or put them in an array that allocates once. Similarly, all-fields-in-base-class allows to allocate once.

In D/A5 Lix, I've implemented the naive strategy. In addition, I allocate entire savestates under the garbage collector, then run the garbage collection whenever savestates become superfluous. Savestates contain VRAM bitmaps of the land, in addition to the lixes and jobs. To preserve VRAM that's managed by the savestates, I run the RAM garbage collection multiple times per second during framestepping. But running the GC may hit performance much harder than a slow allocation strategy.

Related to my strategy:
Discussion with ccx about a crash due to exhausted VRAM
github issue: before running the GC often
github issue: performance of running the GC often

I would like faster savestating, aiming at faster framestepping. I should compare the performance gain of (D/A5 Lix now to D/A5 Lix with manual savestate management, without running the GC, but still 2n allocations each savestating) with the performance gain of (2n allocations to a single allocation for lixes).

I want to feel good. I want huge gains in VRAM management; I want negligible prospects in hacking up the job hierarchy. I wouldn't like my OO design responsible for my problem... at least this time.

-- Simon

ccexplore · April 11, 2016, 09:54:07 PM

Even if you run the GC less often, it seems like it may just possibly defer the overhead to happen in one go later, so instead of constant slow framerate, you may just end up with mostly okay framerate punctuated every now and then by a total stutter from the eventual GC. Which may still be preferable; just want to point it out.

That being said, it might be possible to reduce how frequently you run GC in order to deal with VRAM exhaustion, without abandoning automatic memory management completely. Under the assumption that the main problem is because D's GC is not aware of video memory allocations associated with A5, at the points where you are actually making the calls to A5 to allocate and deallocate, you can also track an estimate of how much video memory has been allocated so far. Then whenever you're currently looking at running GC to mitigate VRAM exhaustion, you can check the total allocation estimate first, and only run the GC when it actually exceeds a certain threshold (possibly user-configurable as a troubleshooting mitigation).

Reducing number of class instances and pointers should theoretically help with performance of GC, since it is generally based on walking those instances and pointers, but obviously it's hard to be certain of the effectiveness without some sort of detailed profiling.

One other random thought: you mention faster framestepping as the motivation, but I kind of wonder if maybe a possible solution there is some sort of frame-dropping instead (as long as it's not excessive of course). I'm just not sure how critical it is to have perfectly smooth framestepping, given that in a lot of cases, I imagine you first get it going continuously to get to the ballpark of where you want, and then you start manually hitting the key more slowly to truly step one-by-one. The first part seems like the only part where speed matters, and if it's just about getting to the ballpark, frame-dropping could potentially be an acceptable mitigation?

Simon · April 12, 2016, 06:05:20 PM

Great ideas!

I love the frame-dropping during ballpark-framestepping. The fastest method for backstepping goes back 15 steps per graphics frame, even when the FPS drop below 60. Instead, the feature should go back (15 * 60 / FPS) steps, to cancel performance hits.

Yeah, I haven't profiled enough for the question in part 8. I work on D Lix sporadically, mostly on the editor. When I get loose ideas for physics, they go on the backburner. Manual savestate RAM/VRAM management smell best to me. Still, I'll keep the alternatives in mind.

-- Simon

Simon · May 05, 2017, 12:45:58 PM

Lesser of two globals

Mutable global state is a source of bugs; avoid if possible. This is ~~joseki~~ standard knowledge.

But even immutable globals bite you in the ass. The medium font is a global? Let's export things as images that are independent from screen size, then immediately continue to draw to screen afterwards. Maybe add decoration that silently depends on resolution to our unscaled exported image. Load and destroy the independent medium font and the scaled medium font back and forth?

One answer is to wrap this in a drawing context, load and cache multiple drawing contexts lazily. The first time I draw to screen, a context loads the scaled font. The first time I export as image, another context loads the unscaled font.

Switch between these contexts by mutable global state, because it's rare to switch? <_<

-- Simon

ccexplore · May 06, 2017, 12:08:50 AM

I think the problem statement is a little vague for those of us who don't work closely with the D-lix codebase.

The drawing context idea seems like a good solution based solely on what you described. I'm not sure why you still need to switch state? The idea for introducing drawing context would implicitly mean that your drawing operations should always be explicitly given a particular drawing context to use (potentially you can make null or some other similar convenience value to mean using a "default" context, that can be global and immutable). It sounds like you would want a separate "export image" drawing context that would only be used for the export case. If you want, you can make that second drawing context a second global.

I don't know that it's even really fair to blame this on using an immutable global per se. What that really translates to, IMO, is that you simply didn't anticipate the need for multiple drawing contexts to begin with, specifically that there is a need for scaled and unscaled fonts, due to differences between drawing to screen and exporting as images.

ccexplore · May 06, 2017, 10:05:09 AM

Incidentally, (temporary) switching of globals may not be that bad if you manage it through a RAII helper class whose purpose is basically to override the global for the duration of the lifetime of the helper class (and restoring to the previous value upon finalization). Of course this assumes certain patterns usage such as no multithreading.

Simon · May 10, 2017, 03:47:38 PM

Yes, I like RAII helpers in similar situations. I wrapped target bitmaps or blenders in them, these are Allegro thread-local.

Lix is entirely single-threaded. There is theoretical potential to parallelize: D has all mutable variables thread-local by default, Allegro 5 drawing routines have a global drawing target per thread. But multithreading is daunting and my early dabblings in 2015 to parallelize crashed.

In my application, the drawing context switches so rarely that I'd rather not clutter the GUI elements' interface. But enlarging that interface is a good solution in general.

Fruit basket problem

(Source: Object-Oriented Design Heuristics, by Arthur J. Riel)

You have a class Fruit, with subclasses Apple, Banana, Orange. In particular, Apples know how to core themselves. And there is a container class Basket, this collects Fruit and allows you to insert and iterate over the fruit. We have lots of different fruits in the basket. How do we iterate over the fruits, telling the Apples to core themselves, ignoring other types of fruit?

~~The Smalltalk solution is that you tell every fruit to core itself. Calling core on a Banana would be legal even if Banana doesn't understand this, and the banana would do nothing.~~ (Wrong, see 2 posts further down.) The problem becomes much harder in D, C++, or Java; here, it's illegal to call core on Fruit. I like strict static analysis and want the compiler to yell at me.

Proposed solutions:

Fat interface: Make an empty method core in Fruit. Override this only in Apple with the meaningful behavior. Call core on every fruit in the Basket. Downside: Fruit gets special knowledge of Apple behavior. But maybe this is the least intrusive solution.

Explicit analysis: dynamic_cast<Apple*> every Fruit*. This returns null if we have a banana or orange. Call core on what casts successfully to Apple. Downsides: The fruit basket problem is far too common. Explicit case analysis everywhere is exactly what OOP tries to avoid. Instead of littering the entire codebase with special-casings, we should centralize all this special-casing in classes Fruit and Apple. We'd get OOP's verbosity without reaping its benefits.

Apple-knowing Basket: Consider Fruit and Apple correctly designed, but Basket ill-suited for our purpose. By definition, the Basket returns Fruit references, and calling core on Fruit is not allowed. Instead, the Basket should privately maintain a list of Fruit references, and, separately, a list of Apple references that point to the same objects as the Fruit list points to. Downsides: What is the Basket's interface to add elements? Will it dynamic-cast, or should Basket's caller make sure he calls addFruit and addApple correctly? What happens if Basket's caller gets their Fruit from a Fruit source that doesn't support this pattern?

-- Simon

ccexplore · May 10, 2017, 07:57:10 PM

Hmm, apple-knowing baskets frankly sound like a hack that doesn't really scale all that better than explicit analysis or fat interface. If peeling fruit becomes a thing (let's say for purpose of discussion, some fruits like strawberries can't be peeled, even though I'm guessing technically they probably still have some kind of peel in terms of biology), does the basket now also need to track that separately too? What if I want to carry my fruits in a tray instead, does the tray now have to copy all the special-casing that the basket is doing?

How does Smalltalk handle the case where the method being called would normally return a value? If you try to invoke the method on an object that doesn't actually support the method, what happens? Does the invocation attempt still return some sort of default or special value (think null or undefined or similar) for that case? It seems to me even in Smalltalk, you may not be able to always take advantage of the feature where it silently allows methods to be invoked on unsupported types. Even if the language guarantees some sort of well-defined return value in the unsupported case, its semantics may not always be what you want for downstream calculations/processing utilizing the return value, and in such cases it could be better or even required to check for support-ness upfront instead.

Ultimately, regardless of language support, what ends up happening "under the hood" is that the check will be made one way or another, whether implicitly by the language (eg. Smalltalk would effectively be doing the check for you of whether the method exists for the target object) or explicitly by the programmer. For languages where it must be done explicitly, it just becomes a matter of if/where/how you want to "hide" the check.

In some cases you can also simplify the manual checking by, for example, requiring fruits to implement an ICoreable interface if it supports coring. So you just have to check for ICoreable and not more explicitly for specific fruit types.

I do agree that the situation you are describing [ie. abstracting out the pattern of "if (supported) dothis() else /* do nothing */"] is not uncommon, and it would probably be useful for commonly used languages to support that case better.

Simon · May 10, 2017, 08:34:24 PM

Good reply, I googled some more.

Errata: Smalltalk won't let you call nonexistant methods on objects by default, that's a runtime exception. You could redefine the nonexistant-method handler in your classes, but that would be considered a massive hack.

Smalltalk lets you call existant methods on null references, and does nothing. This is much different and doesn't help with the fruit basket problem.

Will come back for the rest!

-- Simon

Colorful Arty · May 10, 2017, 10:16:43 PM

I like this scenario. My first instinct would be to create a boolean attribute for Fruit named isCoreable and then give the subclasses of fruit that are coreable the core() function. Then, you can check each fruit, check if its attribute isCoreable is true, then call the method core().

I know for sure this is not optimal, which is where you could have the subclasses of Fruit that are coreable implement an interface that lets them be cored. I'm not sure how many languages use interfaces, but Java does, and Python can use abstract classes instead to solve the problem.

Simon · May 10, 2017, 11:09:07 PM

Quote from: ccexplore on May 10, 2017, 07:57:10 PM
apple-knowing baskets frankly sound like a hack
need to track that [peeling] separately too?
carry my fruits in a tray instead, does the tray now have to copy all the special-casing that the basket is doing?

I assume the work grows too quickly here, yes.

I don't dare to propose a Basket<mainClass, subclass1, subclass2, ...> with compile-time code generation, even when that's probably possible.

Maybe this is truly the land for dynamic typing without any static checks at all? But even those raise errors on unsupported methods...

QuoteHow does Smalltalk handle [...] still return some sort of default or special value (think null or undefined or similar) for that case?

Most of this is answered by "I erred and Smalltalk doesn't allow it by default." The Smalltalk nil object responds to all messages that the class would normally respond to, and returns the nil object. This looks OK because everything is an object in Smalltalk.

QuoteFor languages where it must be done explicitly, it just becomes a matter of if/where/how you want to "hide" the check.

Wise interpretation. The three proposed suggestions happen give the three possible answers: The check can go in the element class, in the client code, or in the container class. I didn't see this!

QuoteI do agree that the situation you are describing [ie. abstracting out the pattern of "if (supported) dothis() else /* do nothing */"] is not uncommon, and it would probably be useful for commonly used languages to support that case better.

Nnnnn, thanks.

Quote from: Colorful Artycreate a boolean attribute for Fruit named isCoreable and then give the subclasses of fruit that are coreable the core() function. Then, you can check each fruit, check if its attribute isCoreable is true, then call the method core().

Yep, this looks like a variant of the fat interface. The Fruit class gets some Apple-specific knowledge.

If you define isCoreable in Fruit, and core() only in the subclasses, then you still have to cast in the client code.

Quotesubclasses of Fruit that are coreable implement an interface that lets them be cored. I'm not sure how many languages use interfaces, but Java does, and Python can use abstract classes instead to solve the problem.

QuoteIn some cases you can also simplify the manual checking by, for example, requiring fruits to implement an ICoreable interface if it supports coring. So you just have to check for ICoreable and not more explicitly for specific fruit types.

This is excellent when we don't like to put everything right in base class. The example becomes Banana : Fruit; Orange : Fruit; Apple : Fruit, Coreable.

Defining the Coreable interface is probably wise even if we do it solely for explicit dynamic casting later. Coring seems to be a central concern in our domain, otherwise we wouldn't worry as much about this particular design problem. We can well abstract that into an interface and dynamic-cast to that.

Bonus: Found a webpage with the exact section from Riel's book! I bought the hardcover second-hand last year, it was splendid bedtime reading. At least if you believe that OO is often good.

-- Simon

NaOH · April 01, 2019, 06:01:12 PM

Forgive me for appearing out of nowhere to respond to this old thread, but I was thinking about this issue recently.

Quote from: Simon on April 11, 2016, 09:03:52 AM
The tagged union is faster. All lemmings occupy the same amount of space: job-independent fields, plus the maximum length among all jobs. You can allocate naively n lemmings, or put them in an array that allocates once. Similarly, all-fields-in-base-class allows to allocate once.

Could you perhaps leverage placement-new in order to store all the jobs in a contiguous memory region, which you could then copy all in one sweep with memcpy() in order to make savestates?

Simon · April 02, 2019, 05:38:12 PM

Quote from: NaOH on April 01, 2019, 06:01:12 PM
Could you perhaps leverage placement-new in order to store all the jobs in a contiguous memory region, which you could then copy all in one sweep with memcpy() in order to make savestates?

Yes, I've started to emplace in late 2017. But the array is not purely an array of Job. The memory layout in Lix is like this:

+------------+------------+------------+------------+----
| Lix 0 | Lix 1 | Lix 2 | Lix 3 |
| +-------+ +-------+ +-------+ +-------+ ...
| | Job 0 | | Job 1 | | Job 2 | | Job 3 |
+----+-------+----+-------+----+-------+----+-------+----

Each Lix object contains a Job region, and the Job is placement-new-ed into the Lix object. Job is still a polymorphic hierarchy. The Job region is large enough to take any subclass.

It's not 100 % memory safe because old Job code could run while the Job region has been overwritten with a different Job. But it's a tradeoff for speed, there are runtime assertions, and also D has static if to test at compile time that each Job subclass fits into the reserved region.

To the outside, each Lix including her Job behaves like a value type. You can savestate the array with memcpy.

I've begun to read Effective Modern C++, it's nice to see that C++ is catching up with memory management and compile-time features.

-- Simon

Simon · February 18, 2020, 08:28:40 PM

Split off from here: Rendering speed in Lemmings-like games

Also today: Barns, animals, feeding: Software class design

-- Simon

Simon · April 04, 2020, 06:04:29 PM

(In a future post, I'll write about mixing std::unique_ptr with covariant return types in C++. But it's necessary to introduce covariance first. Thus, here goes.)

Covariance, Part 1: Definitions

Let us have class Animal and subclass Cat. Thus, each Cat is an Animal; wherever an Animal is expected, we can submit a Cat. But not every Animal is a Cat.

Now we have a Source<Cat> that can provide a Cat on demand, e.g., a pregnant female cat, or a crusty old cat-hoarding lady.

Wherever Source<Animal> is expected, we can successfully submit the lady. The lady provides a Cat which is also an Animal, thus she fulfils the requirements to be a Source<Animal>.

We say that Source<T> is covariant in its type argument T. Both of these arrows point into the same direction:

Cat ------------- is a ------------> Animal
Source<Cat> ----- is a ----> Source<Animal>

Now we have a Sink<Animal>, e.g., an animal shelter.

When somebody cannot care anymore for their Cat and asks where the next Sink<Cat> is, we can point them to the animal shelter. The shelter will accept any Animal, in particular, it will accept a Cat.

We say that Sink<T> is contravariant in its type argument T. These arrows point into opposite directions:

Cat ------------- is a ------------> Animal
Sink<Cat> <------ is a ------- Sink<Animal>

Now consider the types List<Animal> and List<Cat> that allow insertion and lookup. Assuming a strong static type system, List<Cat> guarantees that only a Cat may get inserted, and also guarantees that anything inside is a Cat.

Where a List<Animal> is expected, can we submit a List<Cat>? No, because the code might then try to insert Dog into List<Cat> because that code believes it has a List<Animal> where insertion of Dog would be allowed.

Where a List<Cat> is expected, can we submit a List<Animal>? No, because the code might then try to look up the first Cat of the List<Animal> because the code believes it has a List<Cat>, but instead might get a Dog from the List<Animal>.

We say that List<T> is invariant in its type argument T. There is no subtype relationship between List<Animal> and List<Cat>.

Example. A machine that clones an object, i.e., gives back the same object and an equal copy, is covariant: An animal cloner gives back an animal and a copy, a cat cloner also gives back an animal and a copy. Really, a T cloner is a Source<T>.

Example. A Sink<Sink<T>> is covariant in T. The arrow gets inverted twice.

Cat ---------------- is a ---------------> Animal
Sink<Cat> <--------- is a ---------- Sink<Animal>
Sink<Sink<Cat>> ---- is a ---> Sink<Sink<Animal>>

Real-world examples of Sink<Sink<T>> are convoluted and misleading, e.g., burger stores that only cater to zoo keepers who feed only cats? This occurs more naturally in software, e.g., an algorithm might be best described as a higher-order function that takes as argument a function that takes T.

C++ and D allow covariant return types when you override a function that returns a pointer. Thus, in the following C++ code,

class Animal {
public:
virtual Animal* clone() { return new Animal(); }
};

class Cat : public Animal {
public:
virtual Cat* clone() override { return new Cat(); }
};

... it is legal to override Animal* clone() to have the stricter return type Cat* which converts implicitly to Animal*. Virtual dispatch behaves the same as usual, i.e., calling animal->clone() gives you an Animal* that might well come from Cat::clone.

This is good for two reasons.

The type system can check more rigorously what you implement in Cat::clone while class Cat gets compiled, even if the method call later will only ever happen via virtual dispatch from an Animal*.
Nonvirtual callers who definitely have a Cat will be guaranteed to get Cat*, not Animal*, which is useful in a world that deals only with Cat, e.g., during implementation of other Cat methods.

My plan is to write another post on how this fails with std::unique_ptr<Animal> instead of Animal*, and discuss workarounds.

-- Simon