GNU bug report logs - #31526
Range [a-z] does not follow collate order from locale.

Previous Next

Package: sed;

Reported by: Bize Ma <binaryzebra <at> gmail.com>

Date: Sat, 19 May 2018 07:39:02 UTC

Severity: important

Tags: notabug

Found in version 4.4-2

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Bize Ma <binaryzebra <at> gmail.com>
To: 31526 <at> debbugs.gnu.org
Subject: bug#31526: Range [a-z] does not follow collate order from locale.
Date: Fri, 18 May 2018 17:58:05 -0400
[Message part 1 (text/plain, inline)]
Package: sed
Version: 4.4-2
Severity: important

Dear Maintainer,

With a locale set to en_US.utf8 it is expected that the collating order is
this:

    $ printf '%b' $(printf '\\U%x\\n' {32..127}) | sort | tr -d '\n'
    `^~<=>| _-,;:!?/.'"()[]{}@$*\&#%+0123456789aAbBcCdDeEfFgGhHiIjJ
kKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ

It is expected that a range [a-z] will match 'aAbBcCdD…', all lower and
upper letters.
But it isn't:

    $ printf '%b' $(printf '\\U%x' {32..127}) | sed 's/[^a-z]//g'
    abcdefghijklmnopqrstuvwxyz

However, the range [a-Z] does match all letters, lower or upper:

    $ printf '%b' $(printf '\\U%x' {32..127}) | sed 's/[^a-Z]//g'
    ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

If this is the correct way in which sed should work, then, if you please:

    - What is the rationale leading to such decision?.
    - Where is it documented?.
    - Where is it implemented in the code?.
    - Why does the manual document otherwise?.
[Message part 2 (text/html, inline)]

This bug report was last modified 7 years and 92 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.