On 02/29/2016 10:40 AM, Marcello Perathoner wrote: >> Wrong, at least according to the POSIX definition of text file. A text >> file is one with no encoding errors. > > > """ > 3.397 Text File > > A file that contains characters organized into zero or more lines. The > lines do not contain NUL characters and none can exceed {LINE_MAX} bytes > in length, including the character. Although POSIX.1-2008 does > not distinguish between text files and binary files (see the ISO C > standard), many utilities only produce predictable or meaningful output > when operating on text files. The standard utilities that have such > restrictions always specify "text files" in their STDIN or INPUT FILES > sections. http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html > > 3.206 Line > > A sequence of zero or more non- characters plus a terminating character. > > 3.87 Character > > A sequence of one or more bytes representing a single graphic symbol or control code. > > Note: > This term corresponds to the ISO C standard term multi-byte character, where a single-byte character is a special case of a multi-byte character. Unlike the usage in the ISO C standard, character here has no necessary relationship with storage space, and byte is used when storage space is discussed. > > See the definition of the portable character set in Portable Character Set for a further explanation of the graphical representations of (abstract) characters, as opposed to character encodings. > Encoding errors are not characters, but bytes. A line cannot contain encoding errors. Therefore, a file with encoding errors is not a text file. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org