GNU bug report logs -
#24425
[PATCH] Don’t cast Unicode to 8-bit when casing unibyte strings
Previous Next
Reported by: Michal Nazarewicz <mina86 <at> mina86.com>
Date: Mon, 12 Sep 2016 22:48:02 UTC
Severity: normal
Tags: patch
Done: Michal Nazarewicz <mina86 <at> mina86.com>
Bug is archived. No further changes may be made.
Full log
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Currently, when operating on unibyte strings and buffers, if casing
ASCII character results in a Unicode character the result is forcefully
converted to 8-bit by masking all but the eight least significant bits.
This has awkward results such as:
(let ((table (make-char-table 'case-table)))
(set-char-table-parent table (current-case-table))
(set-case-syntax-pair ?I ?ı table)
(set-case-syntax-pair ?İ ?i table)
(with-case-table table
(concat (upcase "istanabul") " " (downcase "IRMA"))))
=> "0STANABUL 1rma"
Change the code so that ASCII characters being cased to Unicode
characters are left unchanged when operating on unibyte data. In other
words, aforementioned example will produce:
=> "iSTANBUL "Irma"
Arguably this isn’t correct either but it’s less wrong and ther’s not
much we can do when the strings are unibyte.
Note that casify_object had a ‘(c >= 0 && c < 256)’ condition but since
CHAR_TO_BYTE8 (and thus MAKE_CHAR_UNIBYTE) happily casts Unicode
characters to 8-bit (i.e. c & 0xFF), this never triggered for discussed
case.
* src/casefiddle.c (casify_object, casify_region): When dealing with
unibyte data, don’t attempt to store Unicode characters in the result.
---
src/casefiddle.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)
Unless there are objections, I’ll commit it in a few days.
diff --git a/src/casefiddle.c b/src/casefiddle.c
index 2d32f49..247cc6f 100644
--- a/src/casefiddle.c
+++ b/src/casefiddle.c
@@ -71,8 +71,8 @@ casify_object (enum case_action flag, Lisp_Object obj)
{
if (! inword)
c = upcase1 (c1);
- if (! multibyte)
- MAKE_CHAR_UNIBYTE (c);
+ if (! multibyte && CHAR_BYTE8_P (c))
+ c = CHAR_TO_BYTE8 (c);
XSETFASTINT (obj, c | flags);
}
return obj;
@@ -93,18 +93,19 @@ casify_object (enum case_action flag, Lisp_Object obj)
c1 = c;
if (inword && flag != CASE_CAPITALIZE_UP)
c = downcase (c);
- else if (!uppercasep (c)
- && (!inword || flag != CASE_CAPITALIZE_UP))
- c = upcase1 (c1);
+ else if (!inword || flag != CASE_CAPITALIZE_UP)
+ c = upcase (c1);
if ((int) flag >= (int) CASE_CAPITALIZE)
inword = (SYNTAX (c) == Sword);
if (c != c1)
{
- MAKE_CHAR_UNIBYTE (c);
- /* If the char can't be converted to a valid byte, just don't
- change it. */
- if (c >= 0 && c < 256)
- SSET (obj, i, c);
+ if (CHAR_BYTE8_P (c))
+ c = CHAR_TO_BYTE8 (c);
+ else if (!ASCII_CHAR_P (c))
+ /* If the char can't be converted to a valid byte, just don't
+ change it. */
+ continue;
+ SSET (obj, i, c);
}
}
return obj;
@@ -250,8 +251,11 @@ casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e)
if (! multibyte)
{
- MAKE_CHAR_UNIBYTE (c);
- FETCH_BYTE (start_byte) = c;
+ /* If the char can't be converted to a valid byte, just don't
+ change it. */
+ if (ASCII_CHAR_P (c) ||
+ (CHAR_BYTE8_P (c) && ((c = CHAR_TO_BYTE8 (c)), true)))
+ FETCH_BYTE (start_byte) = c;
}
else if (ASCII_CHAR_P (c2) && ASCII_CHAR_P (c))
FETCH_BYTE (start_byte) = c;
--
2.8.0.rc3.226.g39d4020
This bug report was last modified 8 years and 270 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.