GNU bug report logs - #31185
Why is there no full support for Unicode?

Previous Next

Package: diffutils;

Reported by: Keepun <keepun <at> gmail.com>

Date: Mon, 16 Apr 2018 22:02:01 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Keepun <keepun <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 31185 <at> debbugs.gnu.org
Subject: bug#31185: [bug-diffutils] bug#31185: Why is there no full support for Unicode?
Date: Tue, 17 Apr 2018 23:27:36 +0300
[Message part 1 (text/plain, inline)]
UTF-8 does not require BOM, but for UTF-16 and UTF-32 BOM is always 
present. Files with UTF-16 and UTF-32 without the BOM should be 
identified as binary.

But why there are no plans to support UTF-16 and UTF-32? Diff is part of 
the Git and is used all over the world. Now 2018 and Unicode solved 
problems with encodings.


17.04.2018 10:37, Paul Eggert:
> Keepun wrote:
>> Files with encoding greater than 8 bits without BOM at the beginning 
>> can be immediately identified as binary.
>
> No, the BOM is not required or recommended in UTF-8, so it would be a 
> mistake to identify GNU/Linux text files as binary merely because they 
> lack a BOM. Typically these files do not have a BOM, and when they do 
> one of the first things many users do is remove the BOM because it can 
> cause trouble in practice.
>
> Diffutils does not support UTF-16, where a BOM would make more sense, 
> and there are no plans to add support for UTF-16 (or for UTF-32, for 
> that matter).

[Message part 2 (text/html, inline)]

This bug report was last modified 7 years and 61 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.