GNU bug report logs -
#34524
wc: word count incorrect when words separated only by no-break space
Previous Next
Reported by: vampyrebat <at> gmail.com
Date: Mon, 18 Feb 2019 08:13:02 UTC
Severity: normal
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
On 09/03/19 05:52, Bruno Haible wrote:
> Hi Pádraig,
>
>>>> In regard to options for enabling various behaviors for wc(1),
>>>> I'm thinking we might keep the strict POSIX isspace() behavior
>>>> with LC_CTYPE=C and/or POSIXLY_CORRECT=1, and use iswnbspace()
>>>> by default
>
> Since you plan to add a --words=... option in the future (as suggested
> by Paul or me), it would make sense to add this option now, instead
> of testing POSIXLY_CORRECT. If you introduce POSIXLY_CORRECT dependent
> behaviour now (and need to keep it for backward-compatibility), you'll
> have a hard to understand interface: What will the following do?
>
> env POSIXLY_CORRECT=1 wc --words=unicode
> wc --words=unicode
Well until we actually support more contextual
unicode word separation operation, the --words
option parameter would be a bit redundant.
Generally no-one would need to use POSIXLY_CORRECT
directly with wc, rather setting it globally
on a system or script to minimize changes.
In the above example --words=unicode would be
an explicit option to operate in extension to POSIX,
and so POSIXLY_CORRECT would be ignored there.
cheers,
Pádraig
This bug report was last modified 6 years and 78 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.