GNU bug report logs -
#24425
[PATCH] Don’t cast Unicode to 8-bit when casing unibyte strings
Previous Next
Reported by: Michal Nazarewicz <mina86 <at> mina86.com>
Date: Mon, 12 Sep 2016 22:48:02 UTC
Severity: normal
Tags: patch
Done: Michal Nazarewicz <mina86 <at> mina86.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> From: Michal Nazarewicz <mina86 <at> mina86.com>
> Cc: 24425 <at> debbugs.gnu.org
> Date: Thu, 15 Sep 2016 16:23:54 +0200
>
> On Tue, Sep 13 2016, Eli Zaretskii wrote:
> > Currently, case changes in unibyte characters and strings are only
> > well defined for pure ASCII text; if the input or the result is not
> > pure ASCII, we produce "undefined behavior".
>
> Would the following (not tested) make sense then:
AFAIU, it would disallow handling unibyte text by setting up case
tables for 8-bit characters in their multibyte representation,
i.e. above #x3FFF00. I'd rather not lose that, although I don't think
I've ever seen that used.
> > Properly means that upcasing "istanbul" in the above example will
> > produce "İSTANBUL", not "iSTANBUL", and downcasing "IRMA" will produce
> > "ırma".
>
> I thought about that but then another corner case is "istanbul\xff"
> which is a unibyte string with 8-bit bytes.
And what is the problem in that case?
> I have no strong feelings either way so I’m happy just leaving it as is
> as well.
That is fine with me.
Was there some real-life use case where you bumped into this? If so,
maybe we should discuss that use case, perhaps the solution, if we
need one, is something other than what we talked about until now.
This bug report was last modified 8 years and 252 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.