Object-oriented design with Simon

Simon · January 26, 2016, 01:01:00 PM

Let's learn object-oriented design with Simon, part 1.

I feel this would benefit the NL codebase, and even the Lix codebase has a few classes that are still a bit too long. No idea who else will benefit, but maybe it's entertaining.

Two disclaimers

If this were 1995, OO would be hailed as the panacea of coding. In 2015, I will remind everyone that it's a design choice suitable for some parts of your program. It's your job to see whether it fits, or if it adds complexity that nobody needs.
NL might not get additional skills. In that case, don't trade a debugged design for undebugged theoretical soundness. The point of a design to minimize the cost of changing and maintaining it. If you aren't going to change much, don't refactor yet.

Code will be written in a Python-D-C++-Hybrid that I make up on the spot, but I hope stuff will be most clear like this. Indentation will replace begin/end.

The main principle is to replace if-else-if-else-if-else by dynamic dispatch. Whenever you have this:

if (lem.action == walker)
lem.updateLemmingWalker()
else if (lem.action == faller)
lem.updateLemmingFaller()
else if (lem.action == builder)
lem.updateLemmingBuilder()
else if ...

We can do the following instead (a naive approach, with problems still). But it will dispose of the error-prone if-else chains for now.

abstract class Lemming:
void update():
do nothing

class Walker extends Lemming:
override void update():
check ground
move ahead or turn

class Builder extends Lemming:
override void update():
produce brick or not
check if out of bricks

Then we make a container object that collects Lemmings, and put the walkers, fallers, builders, ..., inside.

foreach (lem in container):
lem.update();

If the lemming is a walker, this code will not call Lemming.update(), which would do nothing, but instead call Walker.update(). If the lemming is a builder, the code will call Builder.update(). The code calls the correct method without us knowing what the correct method is. The code calls the correct method without us having made an if-else chain to select the correct method.

There is a problem here, because the lemmings will want to change between classes. When the builder runs out of bricks, it wants to become a walker. But a class object shall never self-destruct and replace itself with another class, because it doesn't know which references to it should be updated.

Idea for that in the next post.

-- Simon

NaOH · January 26, 2016, 10:28:33 PM

Is the idea composition?

I had the poor luck to be introduced to coding with object-oriented design. This was great when I was a naïve youngster, but now that I have explored different languages and paradigms, I have lost a reference point and I don't really appreciate the benefits of OO. Increasingly I'm viewing OO design principles as more limiting than they are helpful in shaping my code.

I realized something was wrong when a friend asked me, "so what is object-oriented design anyway?" and I couldn't give a clear answer. I'm no longer sure why it is that we put functions inside classes sometimes and call them "methods." The only benefit I can really see is polymorphism, but suddenly I feel like I'd be more comfortable with function pointers than dynamic dispatching.

Now, this opinion can't possibly be credible, because

Quote from: SimonIf this were 1995, OO would be hailed as the panacea of coding

so I'd like to watch this thread closely; maybe you can rekindle my affection for OO.

ccexplore · January 27, 2016, 12:40:27 AM

The Wikipedia article on OOP does have a nice criticism section, so you're hardly alone in questioning the benefits of the paradigm. Like Simon said, it's design choice. People working on certain programming languages that have since become widely adopted in the software industry may have been smitten with the paradigm back in the 90s and introduced many of its features into their languages, but of course, it takes time and experience using the paradigm to fully understand the benefits and shortcomings.

Adding features to a language will of course increase its complexity, so in that sense it is understandable that some may prefer a language with less built-in features--you may have to write a little more code to do the same thing, but there's also less of a chance of getting tripped up by subtleties in the language's features.

In any case, in practice you rarely have a choice--except for the times when you're starting with no code written, whenever you have to work off from an existing codebase, the choice of language is usually taken out of your control at that point. Even when you have the luxury to start from scratch, the choice of language may be limited by practical concerns, like how many people you can find that is proficient with a language.

Simon · January 27, 2016, 03:35:28 AM

Awesome, this got replies.

QuoteIs the idea composition? [...] Increasingly I'm viewing OO design principles as more limiting than they are helpful in shaping my code.

The idea is explicitly about when to choose polymorphic inheritance over composition. I found that useful for 2 parts of my own program: GUI, which is the classic application domain of class hierarchies. The second is lix activities, which have a very rigid, small interface -- switch to, perform, hooks on becoming and de-becoming -- and comparatively many implementations of these.

Inheritance incurs a design debt: By the Liskov substitution principle, "B extends A" requires that wherever an A is desired, we can pass a B to that, and the receiver can treat it like an A. We shall not curb functionality. This makes the resulting class hierarchy very rigid. Everything depends on the classes at the beginning, which become very hard to change.

Composition (class B { A a; ... } instead of class B : A { ... }) does not take up this debt. We don't have to expose any A functionality in B's interface. Some newer languages have features to forward any method call to B on to the component A as long as B doesn't implement them. This merges the convenience of not redefining methods from A with the convenience of not taking up any design debt.

The function pointers would work as well, and are idiomatic in C to get dynamic binding. We can make our own virtual table instead of having the compiler generate that from the class code.

Why then use the complex builtin language feature instead of composition or simple pointers? Leverage the compiler, let it check your work statically. This is a widespread mantra in D and C++. In D and the Java world, this is leveraged pretty strongly: On forgetting to implement an abstract method, the compiler will complain. On overriding a method marked final, the compiler will complain. The function pointers would instead crash at runtime to remind of necessary overriding.

In both cases, there's the option to interpret the error/crash as a mistake in the base class design, instead of as a welcome reminder.

When new classes are added, but the root classes remain the same, the if-else-chains would require updating code all over the application. There is no way to check whether one place has been forgotten. With virtual method dispatch, the changes are all bundled together in one specific place, the new class. If we expect our program to grow mainly in this manner -- a reasonable assumption with GUI components and lix skills -- then the class hierarchy might pull its weight.

QuoteIn any case, in practice you rarely have a choice

Right right. Any kind of overarching design becomes encrusted and hard to change. OO is particularly prone to generate unnecessary rigid complexity.

Choice of problem domain, tooling ecosystem, and language, all of them have an influence on what overarching design is expected. I'm weird in that I love reading about professional software development, but only have a hobby project without any holy design to preserve.

OO with Simon, Part 2
Idea for can't-replace-self in container

We store lemmings, and the lemmings change their behavior during their lifetime. This leads to the following idea: Don't replace lemmings. Replace the behavior of lemmings.

class Lemming:
private Job job
void become(Job j):
assert (j.lemming == this)
job := j
void updateCalledByTheGame():
job.perform()

class Job:
protected Lemming lemming
abstract void perform():
do nothing yet
constructor(Lemming l):
this.lemming := l
this.lemming.become(this)

We introduce tight coupling between job and lemming here. (Both know of each other, and can access each other's public interfaces.) I have thought about this for weeks, and can't see how to get rid of it.

I'm also not as opposed to public/protected members as it's traditionally recommended to be. The reason is that in D, we can later rename the member and make it private, and write an accessor property method that's callable exactly with the syntax of the previously-public field. In other languages, you might want to encapuslate behind boilerplate immediately.

class Faller : Job:
constructor(Lemming l):
super(l)
override void perform():
lemming.moveDown
if hit ground:
lemming.become(new Walker(lemming))

class Walker : Job:
constructor(Lemming l):
super(l)
override void perform():
lemming turns or walks ahead
if (no ground):
lemming.become(new Faller(lemming))

My biggest dread is that this is overkill for the problem at hand. I had function pointers in C++ with a jump table. However, the C++ lixes had a fixed memory size. There were 2 ints reserved for each job to be used however they saw fit.

Using dynamically allocated classes allowed arbitrary private fields for each job. This has lead to much more expressive code inside the job methods. This expressiveness is not visible in my sample code above, because I'm focusing the hierarchy, not the exact method contents of any single job. But inside these methods, I've experienced the biggest gain in readability.

-- Simon

Simon · January 29, 2016, 08:25:01 AM

OO and general design with Simon, part 3

Question that came up in IRC: In part 2, we wanted to solve the problem of self-replacing lixes. Of the following two solutions, what benefits does a) provide over b)?

a) Game manages non-replacing lixes, lixes have replacing jobs
b) Game manages replacing lixes

The answer is separtion of concerns. The game, or our physics model in general, is pretty complicated. It must deal with land, gadgets, lixes, have some interface to accept player input, etc. Whatever logic we can keep separate from the game at all, we should keep separate.

Compared with the game, the lixes are a smaller part of the program. The Lix class might end up longer, but it has less impact on the overall program. They're still very complicated, but not as complicated as the entire physics model.

In our solution a), the jobs tell the lixes: We hit terrain, you have to take up another job. In solution b), the lixes tell the game: We hit terrain, you have to replace me. The game is already very busy, we should try not to bother it with such subtleties. However, in our a), writing detailed job-handling inside the lix class seems natural. Lixes are mainly about skills.

This doesn't have to do much with object-orientation. You can have this in any structured language. Classes with access modifiers are merely one way to separate concerns.

-- Simon

Simon · January 29, 2016, 09:27:15 AM

OO with Simon, part 4
The clone() pattern

How to copy lixes around?

Deep copies (create a new object) instead of shallow copies (make new reference to existing object) are handy to implement savestates. We don't want this:

class Lemming:
Job job
copyConstructor(Lemming l):
this.job = l.job

This would make a new reference to the old job, not what we want.

Now, a problem: How to make a deep copy of the old job? new is not polymorphic. The following code does not work.

class Lemming:
Job job
copyConstructor(Lemming l):
job = new Job(l.job) // error: Job is abstract, can't instantiate

abstract class Job:
abstract void perform()
copyConstructor(Job):
...

class Builder : Job:
copyConstructor(Builder b):
super(b)
...

Even if Lemming l is a builder, feeding l to Lemming.copyConstructor will fail to produce a new Builder job. There are languages with polymorphic new, but C++, Java, D are not among them. We're doing hard-ass 1990-style object-orientation here.

The standard solution is to write our own polymorphic method, which is idiomatically called clone().

class Lemming:
Job job
copyConstructor(Lemming l):
job = l.job.clone() // leave it to polymorphism to instantiate the correct Job subclass

abstract class Job:
abstract void perform()
abstract Job clone()
copyConstructor(Job): // if Job has any fields, we still need this
...

class Builder : Job:
copyConstructor(Builder b):
super(b) // initialize the fields of Job
... // initialize the fields of Builder
override Builder clone():
return new Builder(this)

This code produces the desired effect: The game can make copies of lixes, unconcerned of what jobs they have. A lix, upon copy-construction, will instantiate a copy of the correct Job sublcass.

If your langague forbids overriding Job clone() with method signature Builder clone(), then instead override as Job clone() in the subclass. The above code will keep its meaning. D has covariant return types, which means that I may override as Builder clone(), and instantly qualify as a nerd for sneaking that term into a social discussion.

A downside of the clone pattern is verbosity. You have to define both copy-constructor and clone method in each subclass. And you have to define an overridden clone method in each subclass. Even if your language assists you with powerful compile-time code generation, you have to do it in every subclass.

-- Simon

NaOH · January 29, 2016, 03:19:55 PM

In C++, if Job is not stored as a pointer, won't the lemming be deep-copied automatically? The Job copy-constructor will be called implicitly and by default it will copy over each field.

Simon · January 29, 2016, 03:42:52 PM

In C++, if the lemming's job field is Job instead of Job*, there would be no polymorphism possible. Such a field cannot hold a builder job. Even when Builder doesn't introduce new fields, and therefore sizeof(Job) == sizeof(Builder), ~~the memory layout doesn't necessarily match, for each object carries its own vtbl~~. (wrong, will make extra post). If we do this in C++:

Lemming lem;
Job job;
job = Builder(lem);

...the Builder would be implicitly converted to Job, then assigned to the static field.

This is a reason why Delphi and D make a semantic difference between struct/record with value semantics, and classes with reference semantics.

Complete example:

#include <iostream>

class A {
public:
virtual void bark() { std::cout << "Hello from A\n"; }
};

class B : public A {
public:
virtual void bark() { std::cout << "Hello from B\n"; }
};

int main()
{
A val = B();
val.bark();
A* ptr = new B();
ptr->bark();
}

$ ./a.out
Hello from A
Hello from B

-- Simon

Simon · January 29, 2016, 04:20:08 PM

OO with Simon, part 5
Virtual function tables, specific to C++

Here are some declarations in C++.

class A { };
class B { int field; };
class C { int field; void method() { } };
class D { int field; virtual void method() { } };

How long are the objects of type A, B, C, D in bytes each? On the workstation in my office here, g++ -v says: Target: i686-linux-gnu, gcc version 4.6.3. The sizes of the objects are 1, 4, 4, 8.

The interesting difference is that existence of at least one virtual method puts an extra pointer into the class. This is a vtbl pointer. It points into a static vtbl, where the function pointers sit who enable dynamic dispatch. The vtbl is static in the C++-class sense of static, i.e., there is one vtbl per class.

A statically dispatched method is hard-wired to call the same method every time. On dispatching a method call dynamically, the code doesn't jump into a function immediately; instead it dereferences the vtbl pointer and jumps into where the vtbl points. In a class hierarchy, different classes may point to different vtbls. That's C++'s implementation of polymorphism.

Now, fields with manual function pointers instead of using the static vtbl would enable the following:

class Lemming {
void (*jobMethod)(Lemming&);
void perform() { jobMethod(*this); }
};
void performWalking(Lemming&) { ... }
void performBuilding(Lemming&) { ... }

Lemming lem;
lem.jobMethod = &performBuilding;
Lemming another = lem;

This would value-copy the function pointer, as NaOH has suggested. This is a difference to vtbls: In the example right here, each object points to a method directly. With vtbls, we point to a static immutable vtbl, which points to a method.

Yes, this obviates the need for the clone pattern. I haven't thought about upsides or downsides much yet.

-- Simon

namida · January 29, 2016, 11:19:17 PM

I don't quite follow the technical terms or C-type code here; but I believe something like what you're describing is already how NeoLemmix handles different actions.

Code (LemGame.pas 1353~1379 (in TLemmingGame.Create)) Select

// initialized once - this comment NOT present in actual LemGame.pas file, added for clarity here
  LemmingMethods[baNone]       := nil;
  LemmingMethods[baWalking]    := HandleWalking;
  LemmingMethods[baJumping]    := HandleJumping;
  LemmingMethods[baDigging]    := HandleDigging;
  LemmingMethods[baClimbing]   := HandleClimbing;
  LemmingMethods[baDrowning]   := HandleDrowning;
  LemmingMethods[baHoisting]   := HandleHoisting;
  LemmingMethods[baBuilding]   := HandleBuilding;
  LemmingMethods[baBashing]    := HandleBashing;
  LemmingMethods[baMining]     := HandleMining;
  LemmingMethods[baFalling]    := HandleFalling;
  LemmingMethods[baFloating]   := HandleFloating;
  LemmingMethods[baSplatting]  := HandleSplatting;
  LemmingMethods[baExiting]    := HandleExiting;
  LemmingMethods[baVaporizing] := HandleVaporizing;
  LemmingMethods[baBlocking]   := HandleBlocking;
  LemmingMethods[baShrugging]  := HandleShrugging;
  LemmingMethods[baOhnoing]    := HandleOhNoing;
  LemmingMethods[baExploding]  := HandleExploding;
  LemmingMethods[baToWalking]  := HandleWalking; //should never happen anyway
  LemmingMethods[baPlatforming] := HandlePlatforming;
  LemmingMethods[baStacking]   := HandleStacking;
  LemmingMethods[baStoning]    := HandleStoneOhNoing;
  LemmingMethods[baStoneFinish] := HandleStoneFinish;
  LemmingMethods[baSwimming]   := HandleSwimming;
  LemmingMethods[baGliding]    := HandleGliding;
  LemmingMethods[baFixing]     := HandleFixing;

Code (LemGame.pas 6149~6150 (in TLemmingGame.HandleLemming)) Select

  Method := LemmingMethods[L.LemAction];
  Result := Method(L);

I'm not sure if there's a reason why the former actually initializes these values in the code, rather than using constants (it's like this in vanilla Lemmix too, and since it works, I never attempted to change it). Possibly Delphi doesn't allow referencing methods in constants.

Simon · January 30, 2016, 05:48:41 AM

~~You have~~ NL has manually tabled function pointers, and selects one manually every time during job performance. Tabling the pointers first only moves the manual if-else-if-else into a table elsewhere -- because there's only one place in the code where you dip into the table at runtime.

So, ~~you're doing~~ the current implementation is exactly what I consider problematic in part 1.

I will eventually make a longer post with the drawbacks. The main drawback is that we have to put any job-specific fields into the base class, but I want to argue in detail why that's problematic. Another drawback is that you have to maintain several tables of functions, without compiler help about forgotten maintenance.

-- Simon

EricLang · February 02, 2016, 09:58:18 AM

QuoteThe main drawback is that we have to put any job-specific fields into the base class

When implementing the game-mechanics years ago, this was exactly the "philosofical" problem I ran into.
Imagine having different and more mechanics: then new variables / fields must indeed be added to the basic TLemming class.
I choose back then for convenience, simplicity and speed because of the relatively low amount of fields.

I once tried the idea of different classes for each state and came to the conclusion that it was not better. Far from it. "Transforming" a lemming into a new class seems "unnatural" to me.
So what could be a fast and simple and readable solution?

When looking at the original TLemming class, there are actually not so much fields and a lot the fields are used my each state, mostly regarding the animation.
When there would be too much "dedicated state fields" I think I would probably give (allocate) the lemming a "backpack" of specialized fields or maybe use some kind of "delphi cased record" trick.

Simon · February 10, 2016, 12:37:53 AM

Quote"Transforming" a lemming into a new class seems "unnatural" to me.

Yes, I agree that the lemmings should not be replaced. Many of their fields are persistent between jobs. Objects should model the real world: if we don't think of lemmings to be replaced, we shouldn't replace them. My proposed solution is to replace the behavior object instead.

This solution is compelling in hindsight. I have described its advantages in parts 2, 3, 4. But a class hierarchy is costly, even though it's idiomatic in OO design. To justify that cost, I will compare alternatives. It's a difficult design problem, and personal preference plays a role, too.

OO with Simon, part 6
Comparison of all-in-base-class, tagged union, and state pattern

Bloated single class: You put special fields for builder, special fields for faller, ..., all into the Lemming class. They're right next to job-independent fields like position or exploder timer. All fields are mutable, and all jobs can access everything.

Widespread access to mutable fields is a concern. Whenever you have a bug, you must inspect lots of code for how it affects the memory. The compiler cannot help.

Dispatching from the monolithic class via manual function pointers (end of part 5) doesn't solve the problem of widely-visible mutable fields.

Tagged union: I assume this is meant by the Delphi cased record trick. You reserve enough memory to hold the job-specific fields for any single job, and put this as a value field (struct, record) into the Lemming class.

Even though the builder variables now sit in the same memory location as the faller variables, you get to name them per job as bricksLeft and pixelsFallen.

This even more type-unsafe than the bloated base class. You can access the builder's fields even when you're a faller. But because they're in a union, not in separate locations, you overwrite the builder's fields when you write to the faller's fields.

A feeble benefit might be speed while making a savestate. You can memcpy the Lemming to save it entirely, and copy fewer bytes in total than with the bloated base.

State pattern: This is my solution explained in part 2. We make an abstract class for the behavior, subclass it many times, then have the lemming carry a dynamically allocated object for the behavior.

A lemming can have only one state at a time. It's not possible to access wrong fields, and the compiler checks that for you.

Old state objects don't cause long-term psychologic damage to the lemming when he gets a new state object of the same type.

Sometimes, you must access fields of different state classes nonetheless, especially during transistion between states. You can use an explicit dynamic cast -- which angers the object-oriented deities --, or write lots of specialized methods. And herein lie the problems of the state pattern.

About that, I would love to ask someone stronger than me at design for ideas.

NL might not get additional skills. In that case, don't trade a debugged design for undebugged theoretical soundness. The point of a design to minimize the cost of changing and maintaining it. If you don't want to change much, keep the existing thing.

-- Simon

namida · February 10, 2016, 01:35:38 AM

QuoteNL might not get additional skills.

I can rule out adding any further skills in the near future; both the current level format and the current skill panel code and images would need significant changes to support it. When 7 new skills were added in V1.15n, it was coded in such a way that it left room for one further one, simply because parts of it (eg. the bytes in the level file that signify which skills a level has; which is a 16-bit value with an on/off bit for each skill) were often already well-suited towards adding a 16th. It was only a few versions later that one more was added.

Further down the line, it may be possible, but only if someone offers a very compelling reason to include a new skill. We already have two that very rarely see any use (although there have definitely been some levels that use them to excellent results) - the Disarmer and the Cloner. Even out of the remainder, the only ones that see a lot of use are Walker, Glider and Platformer.

Simon · February 28, 2016, 05:13:24 AM

Object-oriented design with Simon, part 7
"Object" must be empty

It is an egregious offense to call a self-defined class Object or Instance.

If you name your own defined classes Object or Instance, you will unnecessarily overload human language. There is always a better name than Object, and almost always a better name for Instance.

The only merit to have a class Object is to have a universal root class. D defines Object as such, which would be no problem if it were an empty class, usable like C++'s void*. But this was before D got powerful templates. Therefore, Object defines opEquals as a hook to override == instead of leaving comparison to templates and duck-typing. I ran into subclassing problems intrinsic to D's Object.

D is still a low-Simon-rant-% language. :>

-- Simon