Author Topic: Notepad doesn't understand Unix line endings  (Read 5865 times)

0 Members and 1 Guest are viewing this topic.

Offline Forestidia86

  • Posts: 723
  • inactive
    • View Profile
Notepad doesn't understand Unix line endings
« on: December 29, 2017, 02:46:07 PM »
Edit Simon: This was split off mobius's how replays work ??

Don't use Windows Notepad, it cannot display text files with Unix line endings, which is a 30-year-old bug.

Isn't that too much said? Of all the plenty files I inspected I could open all but sometimes the text was just clustered with no real line breaks. So isn't it more like it can't understand the line breaks than that it can't be read at all?
$FILENAME should be the first entry, so even without line breaks it should be viable.

Use a reasonable text editor, of which there are many.

That's not really helpful. You point users to go on search for external software to use your product properly. Especially if there are many it can be hard to decide what to take and what is safe to take.
I can understand that it would be problematic to promote a particular program it is nevertheless not user friendly.
I don't know if it would be at least ok to link to a trustworthy arcticle which presents the different alternatives?
« Last Edit: January 23, 2018, 04:57:13 PM by Simon »

Offline Simon

  • Administrator
  • Posts: 3876
    • View Profile
    • Lix
Re: how replays work ??
« Reply #1 on: December 29, 2017, 03:55:46 PM »
it can't understand the line breaks
$FILENAME should be the first entry, so even without line breaks it should be viable.

Yeah, Notepad misinterprets the line endings. Printable ASCII chars are displayed correctly. I don't remember how Notepad handles Unicode.

As long as Notepad saves the linebreaks back to file, it produces acceptable output. Linebreaks carry meaning in the Lix formats. You can even put Windows linebreaks and Unix linebreaks in the same text file and Lix should still be happy.

Still, I don't know though why anybody would want to suffer from such a broken program. It supposed to accomplish a single thing, edit text files, yet cannot understand the simplest format of text files.

Quote
Use a reasonable text editor, of which there are many.
That's not really helpful. You point users to go on search for external software to use your product properly.

I merely remember complaints about how Lix text files contained garbage. They do not. The warning against Notepad is merely a precaution against this common reply.

Text editors are fundamental parts of operating systems, as are utilities to copy, move, delete files. Now, Windows's default text editor is sorely broken. That merely means every Windows user has to fix their problem with their operating system. The bugs in Notepad will hit them with any other culture that uses text files. It's hardly specific to Lix.

Maybe it's already enough to open text files with Wordpad instead of Notepad? I haven't researched this.

-- Simon

Offline Forestidia86

  • Posts: 723
  • inactive
    • View Profile
Re: how replays work ??
« Reply #2 on: December 29, 2017, 04:08:05 PM »
Maybe it's already enough to open text files with Wordpad instead of Notepad? I haven't researched this.

Yeah, Wordpad seems to show it properly, but is almost too fancy for this task.

Text editors are fundamental parts of operating systems, as are utilities to copy, move, delete files. Now, Windows's default text editor is sorely broken. That merely means every Windows user has to fix their problem with their operating system. The bugs in Notepad will hit them with any other culture that uses text files. It's hardly specific to Lix.

I can't approve of your view concerning user responsibility but that's how it is. As a Windows user you are used to have the full package, we both are caught in our own (OS) culture in this disagreement, I think. But I will have to accept that.
I think you rarely need to use text editing to use a program properly (on Windows) so I think it's a bit particular to Lix. (Copying, deleting, moving is usually done per mouse and drag&drop or shortcuts or rightclick etc. on Windows not via console at least by standard users I think.) 
But I generally agree that there are and can be different instances where it hits (yeah, if there are plain txt-files to read it's sometimes clustered for other programs as well), so it's a general problem, ok. 
I don't want to derail this thread further, so I won't say anymore.
« Last Edit: December 29, 2017, 06:38:18 PM by Forestidia86 »

Offline nin10doadict

  • Posts: 330
  • Guy who constantly misses the obvious
    • View Profile
Re: how replays work ??
« Reply #3 on: December 29, 2017, 04:36:44 PM »
Quote
The bugs in Notepad will hit them with any other culture that uses text files.
See, I never knew that Notepad was so buggy. When I've delved into hacking Fire Emblem, I've noticed that the text files related to that all seem to have awful formatting. For all I know, that isn't the case and Notepad is just failing to display the new line characters so everything seems jammed together.
...Upon opening such files in WordPad, the new line characters are displayed and everything is nicely spaced. Huh. I might have to start using WordPad as the default. Learning things! 8-)

Offline Simon

  • Administrator
  • Posts: 3876
    • View Profile
    • Lix
Re: how replays work ??
« Reply #4 on: December 29, 2017, 05:00:09 PM »
Common tasks shouldn't require hand-editing in the first place. This guideline has fuelled the different file browsers in Lix and partly the framestepping, to assign skills with perfect precision.

But fixing $FILENAME lines has been my most common hand-editing task. I've even written scripts that fix the innermost dir in $FILENAME lines according to where the level sits, to detect moves across ranks in packs. I'm already considering to print the pointed-to level path inside Lix: Issue #276. Maybe it should be changable from inside Lix, maybe even with something smarter than a mere text-entry field.

Yeah, the culture on Linux is massively different. I don't even expect Windows users to write scripts, but I'd expect them, if problems arise, to at least look at simple self-describing (I hope) data.

Quote
Wordpad seems to show it properly, but is almost too fancy
Quote
WordPad, the new line characters are displayed and everything is nicely spaced

Yeah, Wordpad isn't ideal for text files, it's designed for formatting rich text. But it gets our job done without installing anything, which is nice.

-- Simon



I've split this thread off mobius's how replays work ??, here are half-posts from there that belong into this topic:

Quote from: mobius
notepad pissed me off because basically what was said; I often used it write down a post for a forum before hand* but because of the line breaks issue the word wrap feature of notepad is stupid and I would have to constantly switch it on and off.

Anyway: I now use and highly recommend Notepad++ ; free and very simple but way more features than notepad; and word wrap that works :thumbsup:

*because of things like the other annoying issue where if something goes wrong like trying to upload an attachment which is too large; the site psudo-crashes, fails to post and you lose your text.

Quote from: Forestidia
Yeah, I have Notepad++, too. It's probably good for advanced tasks but too fancy for me for simple text editing. (I really don't want to play defender for Win Notepad, I just like the plainness; but it seems from the reactions that my issue is a non-issue.)
« Last Edit: January 23, 2018, 04:56:24 PM by Simon »

Offline Forestidia86

  • Posts: 723
  • inactive
    • View Profile
Re: how replays work ??
« Reply #5 on: December 29, 2017, 06:25:21 PM »
Quote
Wordpad seems to show it properly, but is almost too fancy
Quote
WordPad, the new line characters are displayed and everything is nicely spaced

Yeah, Wordpad isn't ideal for text files, it's designed for formatting rich text. But it gets our job done without installing anything, which is nice.

Just an interesting thing I've noticed: If you save the file with WordPad, it actually shows properly afterwards in Notepad.

Offline Ryemanni

  • Posts: 328
  • Indeed.
    • View Profile
Re: how replays work ??
« Reply #6 on: December 29, 2017, 10:23:53 PM »
Notepad++ all the way! :lix-wink:

Offline ccexplore

  • Posts: 5311
    • View Profile
Re: how replays work ??
« Reply #7 on: December 30, 2017, 04:37:19 AM »
(Only out of curiosity: Why are you using slash instead of backslash? Is it slash in Linux? (Though slash seems to work for cmd prompt in Win as well but path is generally shown with backslash.))

*nix OSes had always used the forward slash as the path separator.  This tradition likely even bled into other forms of paths like URLs that also uses forward slashes in a similar way.  Windows had traditionally used backslashes instead but I think in at least some (if not all) contexts it will accept either.  I don't know the history of how all this came to be, but I'm sure you can find out via Google and Wikipedia.

Offline Simon

  • Administrator
  • Posts: 3876
    • View Profile
    • Lix
Re: how replays work ??
« Reply #8 on: December 30, 2017, 04:52:10 AM »
Yeah, as ccexplore explains, on Linux and Mac, slash is the only allowed directory separator.

On Windows, backslash may be default, but Windows understands slash perfectly fine. Thus, slash it is, everywhere in Lix. Instant cross-platform compatibility. :lix-cool: This is really nice of Windows, for once.

In light of this argument for slash, it looks sensible to argue that I should output CRLF instead of LF in all text files, because CRLF works with any tool including Windows Notepad. But, unlike slash vs. backslash, CRLF is more complex a token, and would be more complex to implement because I'd have to override D standard library behavior. I'll let the library worry about the line terminator, write simple code, and indeed accept both CRLF and LF in Lix files.

-- Simon

Offline Simon

  • Administrator
  • Posts: 3876
    • View Profile
    • Lix
Re: how replays work ??
« Reply #9 on: January 23, 2018, 01:55:13 PM »
I've dabbled in the documentation of the Zig programming language recently, and found this about Zig source file encodings:

Ascii control characters [are allowed], except for U+000a (LF): U+0000 - U+0009, U+000b - U+0001f, U+007f. (Note that Windows line endings (CRLF) are not allowed, and hard tabs are not allowed.)

Details on allowed characters, e.g., non-ASCII may only appear in string literals, not in identifiers
Rationale on forbidding Windows line endings, mentioning Notepad bugs

From my viewpoint of maintaining files by many different people, this worthy of praise. Tab characters for indentation and different line endings cause endless merge conflicts. But the language now requires every Windows developer to configure their editor first, even for Hello World.

It doesn't flag trailing whitespace as errors though, even though I consider trailing whitespace a nice indicator for mediocre codebase quality. :lix-evil:

-- Simon
« Last Edit: January 23, 2018, 02:58:02 PM by Simon »

Offline Forestidia86

  • Posts: 723
  • inactive
    • View Profile
Re: how replays work ??
« Reply #10 on: January 23, 2018, 03:54:38 PM »
I actually don't understand enough of it to really be able to comment on it.
But from a mere argumentative viewpoint:
How coherent is it to disallow line endings of a very widespread OS but at the same time complain that Notepad doesn't support line endings of other OS? Maybe the same rationale lies behind Notepad not getting fixed in understanding Unix endings as Zig doesn't get the feature to understand Windows endings.
Why should be Unix line endings the base of everything and not the Windows ones?

Offline Simon

  • Administrator
  • Posts: 3876
    • View Profile
    • Lix
Re: Notepad doesn't understand Unix line endings
« Reply #11 on: January 23, 2018, 04:09:33 PM »
The idea behind Zig syntax is that code should be clear and verbose where appropriate, and that very common problems have one idiomatic solution.

Apparently, this design principle goes even into source formatting. If the code authors are forced to adhere to such a standard, other users of the source code enjoy stronger guarantees, e.g., combine snippets from different sources and always end with consistent linebreaks troughout the new file. I happen to like such strict standards, but that's really where different tastes clash.

Now, from the possible standards of CRLF or LF, they chose LF because it's simpler. It happens that git and sed play slightly better with LF, too, but simplicity was their main argument.

They plan to offer an extra tool, zig fmt, that takes your source and automatically converts CRLF to LF, tabs to spaces, strip unnecessary spaces, etc. Similar tools are accepted in other ecosystems, e.g., the Go language. Until they have zig fmt ready, the burden is on the programmer. But that's still in line with the philosophy that the programmer should take good care to bring their code in the best-presentable shape.

-- Simon
« Last Edit: January 23, 2018, 04:52:19 PM by Simon »

Offline Forestidia86

  • Posts: 723
  • inactive
    • View Profile
Re: Notepad doesn't understand Unix line endings
« Reply #12 on: January 23, 2018, 05:42:57 PM »
This is the last thing I say to it because it upsets me; you can't image how much. But as I indicated, maybe that results from my non-understanding of the matter.

What you say makes no sense to me.
If Zig sacrifices cross-compatibility it's a good feature but if Notepad does it's a bad bug.
You say it's better for other users. But who are these other users you have in mind, only Linux users? Code can be produced without going through a compiler. And code from OSes that don't have the right endings, format etc. has to be converted as well or especially because of the strict standard. This seems just like a mix-up of what ought to be and what is. Maybe if all programmers adhere to this philosophy everything works out fine but is this realistic?
For me this sounds just like a nightmare for all users that have an OS that doesn't use Unix endings/standards by default. 

Offline Simon

  • Administrator
  • Posts: 3876
    • View Profile
    • Lix
Re: Notepad doesn't understand Unix line endings
« Reply #13 on: January 23, 2018, 06:12:12 PM »
Don't focus on the tools.

Focus on the data. Data either conforms to a rule, or it doesn't. There is no "code from an OS", there is "valid Lix level file" or "valid Zig code".

For Lix levels, the levels may have LF or CRLF endings, because I merely call the D standard library and let it worry about file endings.

For Lix's documentation, I explicitly demand in doc/srcfmt.txt that it have CRLF endings, such that Notepad users can still read and change it. For reading documentation, it doesn't matter in the slightest that Notepad might choke on some levels.

In Zig source, CR and tab characters are considered bugs, by definition of the format. This is not sacrificing cross-compatibility, this is the definition of the format.

Now, when people say "text file", you have to ask exactly what formatting they mean. There are many possible formats out there, sometimes they aren't even Unicode, such as the Windows-1252 8-bit format. I'll wager a guess that the most common text file formats are UTF-8 with LF and UTF-8 with CRLF. If you consider all UTF-8 files with LF malformed, then Notepad has no bug.

Confusion only arises as long as we don't agree what valid data is.

Quote
But who are these other users you have in mind, only Linux users?

Every git user, every diff user.

These tools care about differences in files. You want to minimize irrelevant differences in files. LF/CRLF differences, tabs/space differences, and trailing whitespace are the most common source of such irrelevant differences. It's reasonable that many ecosystems eventually settle on a standard.

-- Simon

Offline Forestidia86

  • Posts: 723
  • inactive
    • View Profile
Re: Notepad doesn't understand Unix line endings
« Reply #14 on: January 23, 2018, 07:13:51 PM »
In Zig source, CR and tab characters are considered bugs, by definition of the format. This is not sacrificing cross-compatibility, this is the definition of the format.

But exactly that implementation of the definition is the act/decision that sacrifices cross-compability (it would be possible otherwise). There maybe good reasons for it but that doesn't change that it is a deliberate decision to declare these things invalid.

These tools care about differences in files. You want to minimize irrelevant differences in files. LF/CRLF differences, tabs/space differences, and trailing whitespace are the most common source of such irrelevant differences. It's reasonable that many ecosystems eventually settle on a standard.

By making the differences to something relevant? So line endings play a role and not only the visible strings of signs. With that you bloat the relevant aspects for the code to work. It so plays a role which standard the editor has that you use. Part of the data that is relevant for the compiler seems to be obscure for a normal user because you can't see it easily.
It's not only experienced programmers that can be hit by that. What about people like me that are no programmers but nevertheless have to do with the source code since they build from source or similiar.

Don't focus on the tools.

Well, you surely have to, if you need to care about line endings and formats.

Offline Simon

  • Administrator
  • Posts: 3876
    • View Profile
    • Lix
Re: Notepad doesn't understand Unix line endings
« Reply #15 on: January 23, 2018, 08:07:59 PM »
But exactly that implementation of the definition is the act/decision that sacrifices cross-compability (it would be possible otherwise).

That decision makes it impossible to edit Zig source with Notepad, correct. At least until they finish their formatting tool.

Having stronger standards makes collaboration easier, because the files have less chance to differ in seemingly-irrelevant noise. I assume you agree that there is at least potential benefit here; I judge this benefit very valuable but I can accept if you deem it small.

Then the issue is whether killing Notepad interop is worth the stricter standards. They decided that it is. It's still some work in other editors, even if you can configure their endings.

In Lix, I decided that it's not worth it, and accept either file.

Quote
By making the differences to something relevant? So line endings play a role and not only the visible strings of signs.

I assume this targets version control and diff tools, because it's a reply to the answer to (Users of which software are hit by LF/CRLF differences and can thus profit from a standardized codebase?).

Then yes, whitespace plays a role for version control and diff tools. The files are different on disk, and these tools should then treat files as different. Version control is agnostic of your use case: It doesn't know whether the text makes sense in any language. Different bytes between file A and file B mean different files, and the tools should highlight those changes to me.

Quote
Part of the data that is relevant for the compiler seems to be obscure for a normal user because you can't see it easily.

Correct, relevant for the compiler. At least the compiler will immediately, and noisily, tell you about it.

The hope is that easier merges with version control is worth this, because programmers are less likely to send code with different endings to each other (the compiler would have complained). And that it's easier to write extra tools because the standard is so strict.

Quote
It's not only experienced programmers that can be hit by that. What about people like me that are no programmers but nevertheless have to do with the source code since they build from source or similiar.

Yes, because "text files" come in many encodings and varieties. Learn the nuances, or be prepared to run into subtle problems. Even with an editor that understands LF-terminated files, there can still be issues with encodings, or even within Unicode. Text files aren't simple.

If the Zig compiler accepted CRLF-files, would that prevent you from ever running into trouble with Notepad?

Quote
Well, you surely have to, if you need to care about line endings and formats.

What would be your suggested alternative to learning about different formats? Ask every community to only produce UTF-8 with CRLF, because Notepad can understand that?

Or encourage tools that abstract away from the different formats? ... But Notepad fails this criterion?

-- Simon
« Last Edit: January 23, 2018, 08:14:57 PM by Simon »

Offline ccexplore

  • Posts: 5311
    • View Profile
Re: Notepad doesn't understand Unix line endings
« Reply #16 on: January 23, 2018, 09:02:39 PM »
Then the issue is whether killing Notepad interop is worth the stricter standards. They decided that it is. It's still some work in other editors, even if you can configure their endings.

One key difference is that no professional software engineers will be caught dead using Notepad to edit their source code. :P Lix is a bit different since we don't want to exclude non-programmers for a technical reason that they'd neither understand nor care.

It should be pretty easy to find free text editors out there for Windows that are more powerful than Notepad (of course, that's about as low as the bar can go) and can handle Unix line endings just fine.

Offline Forestidia86

  • Posts: 723
  • inactive
    • View Profile
Re: Notepad doesn't understand Unix line endings
« Reply #17 on: January 23, 2018, 10:18:53 PM »
Just to be clear: I extra got Notepad++ for Lix some time ago. And yeah, with in-depth programming you are probably generally better off with using something more powerful than mere Notepad.

You talk about sending code but what constitutes code under this conditions? You seem to need more than the plain text, you seem to need the file to be sure? (If I open one of the Lix source files on github, mark the text and copy the content in a text file, then it doesn't seem to convey the file endings but only the plain text? (Notepad shows it then properly as opposed to copying it from the file itself.))

About strictness: One example of kind of non-strictness from your code that actually had hit me:

In one situation you said to me to change _trapMouse = true to false in src/hardware/mouse.d.

But there is an instance in the code that looks like that:
_trapMouse       = true; (l. 151)

The spaces make sense because so everything is neatly in line but if I do find and replace it seems to be seen as a different string of signs. Was I supposed to change that as well? I actually did that after having it overlooked at first. Are these spaces no problem for strict standards?

Offline Simon

  • Administrator
  • Posts: 3876
    • View Profile
    • Lix
Re: Notepad doesn't understand Unix line endings
« Reply #18 on: January 23, 2018, 11:14:33 PM »
Quote
what constitutes code under this conditions? You seem to need more than the plain text, you seem to need the file to be sure?

The 100 % exact answer depends on the language. E.g., for D, it's UTF-8 text with CRLF or LF endings (which both behave similar to a space I think). The text can be in a file, but need not be. Non-Unicode encodings aren't valid D. ASCII is a subset of UTF-8 and thus valid unicode.

Quote
If I open one of the Lix source files on github, mark the text and copy the content in a text file, then it doesn't seem to convey the file endings but only the plain text? (Notepad shows it then properly as opposed to copying it from the file itself.)

Interesting phenomenon. From this gif, I assume that Windwos's copy buffer is encoding-agnostic. If you somehow get LF text inside the copy buffer, it will come out as LF text.

Now, according to atom editor issue 8365's first post, most text editors convert LF to CRLF when you copy, such that only CRLF text makes it into the copy buffer.

I assume your webbrowser behaves the same: When it renders HTML or a text file for you, it silently converts the displayed text to CRLF once you highlight & copy.

Quote
_trapMouse       = true; (l. 151)
The spaces make sense because so everything is neatly in line but if I do find and replace it seems to be seen as a different string of signs.

Excellent catch.

These spaces have no meaning, but make the editing harder, unnecessarily. They affect search & replace. When one of the neighboring lines change, the alignment becomes wrong anyway, and we would have to make larger changes than necessary (re-align all lines in the block), then the change will be harder to understand.

Especially older parts of the source still have these decorative spaces for alignment with neighboring lines. I try to not put them anymore into any code. I admit that the habit is hard to break.

Yeah, I assumed you had change this line as well. My bad for not finding this in your attached mouse.d. <_<;; I took a brief look at the file back then, you didn't change anything but such lines, but didn't check whether you found every single occurrence.



In Lix, I won't stop accepting/outputting CRLF on Windows. I've merely found the Zig rules an unexpected example of how far you can take stylistic rules. And I hope that the discussion was not considered trolling, even though in hindsight, it may easily look like it.

Whenever I create the Windows Lix download, I should probably convert all levels to CRLF.

For git, it's possible to configure per repository (I haven't done it so far) how levels should be checked out (e.g., have them as LF in the repo but CRLF in a Windows worktree). Sadly, the repo has half-LF, half-CRLF levels, even though I've paid attention for a while now to check in only LF levels.

-- Simon

Offline Crane

  • Posts: 1081
    • View Profile
Re: Notepad doesn't understand Unix line endings
« Reply #19 on: January 24, 2018, 01:36:27 AM »
To give you my input... Notepad is a bare-bones text editor and is honestly not that good when you have to do technical things. The fact it can't handle Unix-style line endings is one of the main problems with it, and this shortcoming should not be a reason to modify your programs to produce Windows-style line endings because, as mentioned, this breaks cross-compatibility (although usually the worst you'll see is a symbol representing CR in the Unix text editors)

I personally use Notepad++ for my technical work, and this does properly support and preserve whatever line endings are given.