Package: sed

Version: 4.4-2

Severity: important

Dear Maintainer,

With a locale set to en_US.utf8 it is expected that the collating order is this:

$ printf '%b' $(printf '\\U%x\\n' {32..127}) | sort | tr -d '\n'

`^~<=>| _-,;:!?/.'"()[]{}@$*\&#%+0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ

It is expected that a range [a-z] will match 'aAbBcCdD…', all lower and upper letters.

But it isn't:

$ printf '%b' $(printf '\\U%x' {32..127}) | sed 's/[^a-z]//g'

abcdefghijklmnopqrstuvwxyz

However, the range [a-Z] does match all letters, lower or upper:

$ printf '%b' $(printf '\\U%x' {32..127}) | sed 's/[^a-Z]//g'

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

If this is the correct way in which sed should work, then, if you please:

- What is the rationale leading to such decision?.

- Where is it documented?.

- Where is it implemented in the code?.

- Why does the manual document otherwise?.