Lemmings Forums

Lix => Lix Main => Topic started by: lemming_1 on January 19, 2015, 11:03:21 PM

Title: How hard is it to translate Lix into another language?
Post by: lemming_1 on January 19, 2015, 11:03:21 PM
Lix is such an awesome game, I'd love to translate it to Hungarian  :D

Edit by Simon, April 2015: This is now possible. You need version 2015-04-25 or later. Look at the file doc/transl.txt for instructions.
Title: Re: How hard is it to translate Lix into another language?
Post by: ccexplore on January 20, 2015, 03:07:34 AM
I assume Simon will answer this question accurately at some point soon.  That said, even if the answer turns out to be "not easy right now", I'd like to propose getting the groundwork coding ready to make translations easy or at least possible.
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on January 21, 2015, 05:38:53 AM
Hi,

thanks for the offer. If you wish to try, translate the English strings in src/other/language.cpp (https://github.com/SimonN/Lix/blob/master/src/other/language.cpp), get the file as raw text version (https://raw.githubusercontent.com/SimonN/Lix/master/src/other/language.cpp). It starts around line 420 and goes to line 930.

Problems:

Best regards,
Simon
Title: Re: How hard is it to translate Lix into another language?
Post by: ccexplore on January 21, 2015, 01:51:47 PM
Hmm, interesting points on lack of diacritics support (hmm, is it an Allegro issue or something else?) and the need to translate tutorial-level texts. :(

Still, it got me thinking about what it would take to recode that file (and mostly only that file) so that it's possible to at least handle translations via an external text file loaded at runtime.  Maybe I'll take a stab at that myself some day...
Title: Re: How hard is it to translate Lix into another language?
Post by: lemming_1 on January 22, 2015, 11:52:42 AM
Okay, thanks. Looks like I'll have to wait with translating Lix until it supports diacritics (or at least these characters: á é ö ü í ó ú ő ű)
Title: Re: How hard is it to translate Lix into another language?
Post by: ccexplore on January 22, 2015, 09:02:42 PM
Hmm, interesting points on lack of diacritics support (hmm, is it an Allegro issue or something else?)

I looked through Simon's GitHub and Allegro's online manual (https://www.allegro.cc/manual/4/).  At least from the manual, Allegro appears to support UTF-8 (in fact that's its default charset) and full Unicode range in its font handling, so in theory I believe it may be possible to extend Lix to support diacritics and other non-ASCII characters.  The work required will include:
Anyway, it will take a while to get the coding work done for this plus other necessities outlined in earlier posts, even if I ended up getting involved.  I'm afraid you may have to wait for quite some time.
Title: Re: How hard is it to translate Lix into another language?
Post by: lemming_1 on January 25, 2015, 02:30:54 AM
Don't worry. I can wait, I really appreciate all the hard work that's already gone into making Lix as good as it is :thumbsup: Just pm me if you need my help with anything!
Title: Re: How hard is it to translate Lix into another language?
Post by: ccexplore on February 02, 2015, 03:06:49 PM
Quick update on progress:

(the "translations" in this case is obviously programmatically generated and not a real language.  just a quick way to verify the font is working correctly)
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on February 02, 2015, 04:14:27 PM
Oh là là! Never thought to see something like this. Very nice!

The strings are still kept hardcoded in language.cpp, I suppose?

-- Simon
Title: Re: How hard is it to translate Lix into another language?
Post by: ccexplore on February 02, 2015, 06:59:19 PM
The strings are still kept hardcoded in language.cpp, I suppose?

Nope, with my change, translations of the strings in language.cpp can now loaded from data/translate.txt, making it possible to create and use translations without recompiling Lix. (That was my primary goal after all.) More precisely:
  - Presence of data/translate.txt means there's a 3rd language besides English and German, and when present, this third language (whose display name comes from the file's contents) will be shown as an option in the initial language selection menu (the one you see the very first time you run the game) and in the Options menu later where you can change the user's language.  In the user's profile it will be stored as number 3 for $LANGUAGE.
  - The English and German text currently hard-coded in language.cpp will remain in there.  When user set language to the custom language, strings will first be loaded as if it were English (ie. from the hard-coded language.cpp stuff), then it will try loading from data/translate.txt.  This way, any new strings introduced by a new version of Lix that is not accounted for in an old data/translate.txt file, would at least result in English being displayed for them instead of nothing.
  - There's also a way to make the program dump the current set of strings for whatever the current language is (including English and German), outputting to a text file in the same format as data/translate.txt.  This feature is meant to help translators start off with a translate.txt file that has all the entries the program knows about.  If it's a new language, they can generate the file from English and then start from there.  If it's to update the translation file due to new strings introduced in a new version of the game, they can generate the file from the custom language (which would load from the outdated data/translate.txt and fall back to English for new strings), and then diff the result against the old translate.txt to see what's new.

For reference, I've attached the data/translate.txt I used for the screenshots.  I generated it starting from the dump feature under English, and then ran a script to programmatically convert the ASCII characters into the various counterparts with diacritical marks, to create a fake language of "Èñĝĺìśĥ". ;P
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on February 03, 2015, 12:09:45 AM
Yes, this is a good step forward. Translators can immediately check how their wording looks in the game.

Deciding on the exact file format or filename is something for later, do what feels natural for now.

If it helps, I'm completely fine with std::map <string, string> or similar containers to implement language.cpp's work. Right now, I use lots of variable names that are statically compiled, which makes the language.cpp code faster than dynamic string lookup. But looking up a handful of strings is not time-critical at all, therefore dynamic lookup is preferred.

-- Simon
Title: Re: How hard is it to translate Lix into another language?
Post by: ccexplore on February 03, 2015, 08:01:11 AM
A little off-topic from what this is originally about, but ever wanna curse say something in a foreign [European] language in multiplayer Lix?  Well...

[note:  to successfully input such characters you'll still need to install/enable a non-English keyboard layout in your OS, which I'm not honestly too familiar with, so I can't guarantee it'll actually successfully handle all the possible key combinations/sequences in real life.  Basically it just relies on the underlying Allegro library's Unicode version of readkey (ureadkey).]
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on February 05, 2015, 08:31:27 AM
I've tested ccx's code on Linux. It seems to work flawlessly, I can enter various üäß. Many thanks for your work!

We haven't tested mutliplayer yet because I'll be busy for a week, maybe at some evening these days.

We wish to enable unicode for usernames. Unicode usernames can be stored nicely in the game's global config, data/config.txt. However, the game saves further data to a file called <username>.txt. We want to save everything with ASCII filenames. I propose:
Alternatively, is there some standard for this, popular across all operating systems?

Is 4 digits of hex enough, or should we anticipate support for unicode above 0xFFFF? We could use __<codepoint> for those, with two leading underscores followed by 8 hex digits, big endian. Or we could, from the start, encode everything as _<UTF-8 code> in little endian.

We should find a solution to this before we'll do a release.

-- Simon
Title: Re: How hard is it to translate Lix into another language?
Post by: geoo on February 05, 2015, 08:49:58 AM
I'd propose to use _<UTF-8 code> in hex for everything except for characters in the range 0x20-0x7F. For those you could just use the normal ASCII character (which is equivalent to its UTF-8 counterpart). You can determine the number of characters that you have to read after a '_' by looking at the first byte (i.e. next two characters after it in the filename), see specification (https://tools.ietf.org/html/rfc3629), so there's no ambiguity if someone uses [0-9A-F] in their username. The only caveat are characters like '_' (which is now reserved for your encoding), and forbidden characters like '/', which you'd probably also encode using _<UTF-8 code>. (How do you handle '/' in usernames right now anyway?)


Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on February 05, 2015, 09:04:55 AM
I'd propose to use _<UTF-8 code> in hex for everything except for characters in the range 0x20-0x7F.

0x20-0x7F includes some chars forbidden in Windows filenames.

Quote
(which is equivalent to its UTF-8 counterpart). You can determine the number of characters that you have to read after a '_' by looking at the first byte (i.e. next two characters after it in the filename), see specification (http://en.wikipedia.org/wiki/UTF-8#Description), so there's no ambiguity if someone uses [0-9A-F] in their username.

Right, that's why I'm considering UTF-8 mangling with little endian, i.e. _123456 means UTF-8 char 0x56 34 12. Converting this number to the unicode codepoint should be straightforward.

Quote
The only caveat are characters like '_' (which is now reserved for your encoding), and forbidden characters like '/', which you'd probably also encode using _<UTF-8 code>. (How do you handle '/' in usernames right now anyway?)

Currently, the game tries to save to the filename including /, and behavior depends on C++'s std::ofstringstream when it cannot create the file. Maybe it throws an exception crashing Lix, or, more likely, it'll be left in non-good state and I don't check that anymore.

-- Simon
Title: Re: How hard is it to translate Lix into another language?
Post by: geoo on February 05, 2015, 09:15:44 AM
0x20-0x7F includes some chars forbidden in Windows filenames.
That's why I put the caveat.

Right, that's why I'm considering UTF-8 mangling with little endian, i.e. _123456 means UTF-8 char 0x56 34 12. Converting this number to the unicode codepoint should be straightforward.
Wait, but isn't the first byte (0x56) determining in the most straight-forward way how many of the next bytes to read? I guess you could also keep reading bytes that start with binary digits 10 until you encounter something else to see which bytes belong to the current character...seems a bit more strange though.
Title: Re: How hard is it to translate Lix into another language?
Post by: namida on February 05, 2015, 10:41:22 AM
Why not just save the filename as a hexidecimal representation (or a hash) of the name altogether? That should be simpler and avoid having to deal with what characters can and can't be in filenames on certain systems.
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on February 05, 2015, 12:51:46 PM
Quote from: geoo
Right, that's why I'm considering UTF-8 mangling with little endian, i.e. _123456 means UTF-8 char 0x56 34 12. Converting this number to the unicode codepoint should be straightforward.
Wait, but isn't the first byte (0x56) determining in the most straight-forward way how many of the next bytes to read?

We probably mean the same thing. Yes, I want to use one underscore with a variable number of hex digits. The first two hex digits, 0x12, are the first little-endian byte. That tells whether there is more data to interpret. In the example 0x56 is the last byte, i.e., the first big-endian byte.

(I don't have to interpret filenames btw, I will always start with the unicode string and mangle that into a filename, never the other way around. It's nice still to be able to.)

Quote from: namida
Why not just save the filename as a hexidecimal representation (or a hash) of the name altogether?

That breaks backward compatibility with the existing user files, and it won't be easy to tell the the user from looking at the filename.

-- Simon
Title: Re: How hard is it to translate Lix into another language?
Post by: namida on February 05, 2015, 12:57:07 PM
It's a plain text file, right? Couldn't you just store the username on the first line or something?

As for backwards compatibility, just look for filenames that don't meet the naming convention (or don't have a username stored in them) and update them to the new format? Or, you could simply store the username as mentioned above (updating existing files that don't have one to take the username from the filename), without paying any regards to the filename (so the user can call the file whatever they want as long as it's in the right place)?
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on February 05, 2015, 01:11:07 PM
It's a plain text file, right? Couldn't you store the username on the first line or something?

Possible, but the logic of the program would change. Then, every user file must be inspected first to see whether its username matches the current user from the global config.

Quote
As for backwards compatibility, look for filenames that don't meet the naming convention (or don't have a username stored in them) and update them to the new format?

That bloats the code more than my proposed solution, but will consider it once you show why converting every char is superior.

Quote
Or, you could store the username as mentioned above (updating existing files that don't have one to take the username from the filename), without paying any regards to the filename (so the user can call the file whatever they want as long as it's in the right place)?

The user doesn't care about how that file is named. It's created when he first runs the program, I want to ask for the bare minimum of input at that time. He must be able to play the game ASAP. The file should be named the exact same way in a fresh installation, so he can overwrite. A determined algorithm of mangling usernames into filenames is best.

Why is mangling every char better than mangling things except A-Z a-z 0-9, space, dash, and non-threatening ascii chars?

-- Simon
Title: Re: How hard is it to translate Lix into another language?
Post by: namida on February 05, 2015, 01:14:03 PM
Well, my idea was that the filename doesn't matter; it just stores a username in the file (which can be any character). Thus, you just need to generate a default filename somehow. It doesn't really matter what this is either; it could even be random. Thus, a hex representation or a hash would just be the simplest way to generate one, without having to worry about which characters are used.

You could always store somewhere in the settings which username is active by filename (though I guess this could get complicated in itself).
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on February 05, 2015, 06:34:21 PM
Currently using A: filename is the key, file doesn't contain the key, only the associated values
You propose B: filename arbitrary, game searches all files in dir and looks for they key inside, then reads values from the same file

A opens fewer files than B. I still don't see a benefit of B over A.

Do you somehow feel better if the key is stored in the file? No matter what method is chosen, we need some hashing/mangling/randomizing to generate a filename.

-- Simon
Title: Re: How hard is it to translate Lix into another language?
Post by: ccexplore on February 05, 2015, 09:10:26 PM
Wow, didn't expect so much interest/passion in allowing Unicode usernames. :o

Two quick thing to note:

1) It's nice to try to keep the filename as unmangled from the username as feasible.  On the other hand, it's not that big a deal even if we don't do so, since it is rare for the user to need to manually edit the profile, and it's also rare for most users to have multiple profiles within the same Lix installation (thus the file would be located relatively easily in the rare times the user needs to locate it, even if it winds up with an ugly filename).

2) Compatibility can also mean the ability to look for unmangled username as filename as a fallback (ie. look for the name we would be using in old version) when filename with "mangled username" cannot be found.  In other words, we can recognize the filename we'd use in both old and new versions, while of course we always save using the new version's scheme.  This would effectively rename the user's profile for them.  It means we aren't necessarily constrained by compatibility to have to keep all existing ASCII-only usernames be unmangled, at a cost of slightly more code (but probably hardly much more than what we'd have to write to do name mangling or hashing anyway?).  This point especially applies if we decide to perform mangling on some ASCII-only usernames in order to avoid other potentially problematic characters (which I do think is a nice idea).

As a sidenote, I likely won't start working on this until the weekend at the earliest.  Of course, this being open sourced, someone else can always beat me to the punch and implement something yourself.
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on February 06, 2015, 11:08:13 AM
ccx's and my consensus is to preserve ASCII in filenames, and escape where appropriate. Design decisions so far:
Title: Re: How hard is it to translate Lix into another language?
Post by: geoo on February 06, 2015, 11:56:40 AM
Just for the record, according to this (http://support.microsoft.com/kb/177506) here are all the other characters that you don't strictly have to escape, and that you could use as escape characters instead of underscore:

Code: [Select]
   ^   Accent circumflex (caret)
   &   Ampersand
   '   Apostrophe (single quotation mark)
   @   At sign
   {   Brace left
   }   Brace right
   [   Bracket opening
   ]   Bracket closing
   ,   Comma
   $   Dollar sign
   =   Equal sign
   !   Exclamation point
   -   Hyphen
   #   Number sign
   (   Parenthesis opening
   )   Parenthesis closing
   %   Percent
   .   Period
   +   Plus
   ~   Tilde
   _   Underscore
Title: Re: How hard is it to translate Lix into another language?
Post by: GigaLem on February 15, 2015, 03:59:42 PM
I was think of putting the word lix into google translate i started with Japanese but it comes out as LIX
Title: Re: How hard is it to translate Lix into another language?
Post by: namida on February 15, 2015, 04:04:19 PM
Assuming you didn't want to go with a custom "Japanese-ized" name, the transliteration would be リックス "rikkusu". Of course, that's assuming "Lix" is the singular form (I remember there was a debate a while back about singular and plural forms of "Lix", but I don't remember what the consensus was).

If "Lix" is the singular form, but you wanted to invoke the idea of plurality in the name, you could (though by no means have to) add たち "tachi" at the end.
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on April 02, 2015, 12:50:14 AM
I've let this feature slip through the cracks. >_>; It's great contributed code by ccx that's been unreleased for two months now. It deserves use.

ccx, I've pushed a minor change to branch unicode, to not-replace in user-filenames: space, dash, apostrophe.

With that done, what's the big picture: What does our code do, what should yet be implemented before release? Should we do the dictionary for entire new languages? (e.g. rewrite language.cpp to use std::map <string, string>, and parse a user-supplied language file)

-- Simon
Title: Re: How hard is it to translate Lix into another language?
Post by: ccexplore on April 02, 2015, 01:49:27 AM
With that done, what's the big picture: What does our code do, what should yet be implemented before release? Should we do the dictionary for entire new languages? (e.g. rewrite language.cpp to use std::map <string, string>, and parse a user-supplied language file)

It's been a while.  If I recall the one big thing left that hasn't been fully flushed out design-wise is the handling of translation of things like tutorial level hints/instructions, and miscellaneous level- related things like _English.txt and so forth.  I mean, technically we can do nothing for those and rely on user manually replacing the files involved (scattered in various directories and subdirectories) in their own installation with their own set of translated files, though I don't think that counts as a solution (well, at least probably not a good solution).

The way I handled the translation for the game's own text, there is effectively already a std::map and a user-supplied language file, I just had a layer of macros so that the existing source code can continue to just reference global variables and still get the translations (hint: the map is actually <string, pointer to the global variables>).  It is admittedly slightly hacky so in principle, a proper rewrite is probably "better".  On the other hand if you're porting to D anyway at some point, I think we can live with slightly hacky for a while given the eventual demise of the current C++ port.

One other sidenote:  I discovered to my dismay that the Windows port of A4 has some bugs having some impact to this effort.  The one in particular is that despite making the A4 APIs Unicode-aware, inexplicably in their internal code for keyboard handling, A4 wound up using an ASCII version of an underlying OS function even though Windows has a Unicode equivalent for that.  The end result is that it looks like Unicode characters above codepoint value of 255 cannot be typed in Lix in Windows even when you are using a keyboard layout in Windows that can generate such characters. :(  [And yes, some European languages do need that, like Hungarian, just not the "common" ones like Spanish/French/German etc.]  I'm tempted to chalk this one up for now to "wait for the D port" hoping A5 doesn't have the same stupid bug in its Windows port.
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on April 02, 2015, 02:14:15 AM
Thanks for the quick reply!

i18n for level titles, hints, level dir descriptions: Yes, I didn't want to commit to a solution yet. This is a nagging item on the agenda, and it deserves a good solution. I'd be willing to release the implemented unicode features without a solution to it. We have (user name -> user filename) mangling, and translatable strings in the GUI with exactly one user language.

The diligent user can submit his language file, so I can add its contents to the code.

Hacky solution with #define and std::map <string, *string>: It's very much adequate. Especially as a patch, where easy merging is a nice to have. With the D/A5 port underway, it's fine to keep it like this.

Bug in A4: Yes, I'm okay with deferring it to the D/A5 port, even if that won't be ready for a rather long time.

-- Simon
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on April 08, 2015, 03:14:44 AM
Further questions before releasing:

How should a translator begin his work? To generate translations_dump.txt, the translator must select the custom language. But he cannot select it in the options unless a translate.txt exists, which he hasn't made yet. (File doesn't exist -> options dialogue forces the button back to English)

Once we know the translator's workflow, I'll write the documentation for this.

Now, I filename-escape anything except A-Z a-z 0-9. In particular, dash, space, single-quote are escaped, too. I've written a fallback for loading the user config: Try loading from escaped filename, if not exist, try loading from completely unescaped filename. Always save to the escaped filename. Reasoning: I need a fallback loader anyway for a robust upgrade, and now the escaping rule is super simple.

-- Simon
Title: Re: How hard is it to translate Lix into another language?
Post by: ccexplore on April 08, 2015, 06:03:00 AM
Further questions before releasing:

How should a translator begin his work? To generate translations_dump.txt, the translator must select the custom language. But he cannot select it in the options unless a translate.txt exists, which he hasn't made yet. (File doesn't exist -> options dialogue forces the button back to English)

Once we know the translator's workflow, I'll write the documentation for this.

Just create an empty translations_dump.txt file outside of Lix (or put anything in it, it'll get overwritten).  Lix decides to write or not write to the file based solely on the file's existence.  The currently selected language determines what gets written to it.  So translations_dump.txt will end up with the English text initially if the user initially selected English as the language, for example.

It was done this way in lieu of a command line switch or similar (which would've normally been a more sensible option) as this method seems most amenable to both scripting as well as for users who cannot deal with anything other than through UI, while avoiding having to create new UI in Lix just for it.  Feel free to supplement with other methods though.

And yes, documentation was of course meant to be part of this effort as well, good reminder.
Title: Re: How hard is it to translate Lix into another language?
Post by: Simon on April 08, 2015, 10:54:28 PM
April 2015: Lix supports other languages now. Read the first post in this topic for instructions.

Touch dumpfile manually, then run Lix: It's a good approach for C++ Lix. In D Lix, I'd like to have even the builtin languages as such files, so people can copy a file and hack.

The first variable for the language name is now main_name_of_language, this is d'accord with the other "main" variable names, and I'd like to use it in D Lix later.

Dump target file: data/translate_dump.txt
Translations file: data/translate.txt

Documentation has been written to doc/transl.txt.

I've pushed to github all of these changes, and the documentation in the mentioned file. I encourage you to fetch them and examine, maybe suggest renamings/movements of files. Otherwise we're good to go. :-)

-- Simon

P.S. I didn't realize before the C++ pattern to conjure local functions with local
class { public: T operator () { ... } } functionname,
so I've learned something again. :-)