GNU bug report logs - #24425
[PATCH] Don’t cast Unicode to 8-bit when casing unibyte strings

Previous Next

Package: emacs;

Reported by: Michal Nazarewicz <mina86 <at> mina86.com>

Date: Mon, 12 Sep 2016 22:48:02 UTC

Severity: normal

Tags: patch

Done: Michal Nazarewicz <mina86 <at> mina86.com>

Bug is archived. No further changes may be made.

Full log

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Michal Nazarewicz <mina86 <at> mina86.com>
Cc: 24425 <at> debbugs.gnu.org
Subject: bug#24425: [PATCH] Don’t cast Unicode to 8-bit when casing unibyte strings
Date: Thu, 15 Sep 2016 21:55:20 +0300

> From: Michal Nazarewicz <mina86 <at> mina86.com>
> Cc: 24425 <at> debbugs.gnu.org
> Date: Thu, 15 Sep 2016 16:23:54 +0200
> 
> On Tue, Sep 13 2016, Eli Zaretskii wrote:
> > Currently, case changes in unibyte characters and strings are only
> > well defined for pure ASCII text; if the input or the result is not
> > pure ASCII, we produce "undefined behavior".
> 
> Would the following (not tested) make sense then:

AFAIU, it would disallow handling unibyte text by setting up case
tables for 8-bit characters in their multibyte representation,
i.e. above #x3FFF00.  I'd rather not lose that, although I don't think
I've ever seen that used.

> > Properly means that upcasing "istanbul" in the above example will
> > produce "İSTANBUL", not "iSTANBUL", and downcasing "IRMA" will produce
> > "ırma".
> 
> I thought about that but then another corner case is "istanbul\xff"
> which is a unibyte string with 8-bit bytes.

And what is the problem in that case?

> I have no strong feelings either way so I’m happy just leaving it as is
> as well.

That is fine with me.

Was there some real-life use case where you bumped into this?  If so,
maybe we should discuss that use case, perhaps the solution, if we
need one, is something other than what we talked about until now.

This bug report was last modified 8 years and 307 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #24425 [PATCH] Don’t cast Unicode to 8-bit when casing unibyte strings

GNU bug report logs - #24425
[PATCH] Don’t cast Unicode to 8-bit when casing unibyte strings