GNU bug report logs -
#24425
[PATCH] Don’t cast Unicode to 8-bit when casing unibyte strings
Previous Next
Reported by: Michal Nazarewicz <mina86 <at> mina86.com>
Date: Mon, 12 Sep 2016 22:48:02 UTC
Severity: normal
Tags: patch
Done: Michal Nazarewicz <mina86 <at> mina86.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Currently, when operating on unibyte strings and buffers, if casing
ASCII character results in a Unicode character the result is forcefully
converted to 8-bit by masking all but the eight least significant bits.
This has awkward results such as:
(let ((table (make-char-table 'case-table)))
(set-char-table-parent table (current-case-table))
(set-case-syntax-pair ?I ?ı table)
(set-case-syntax-pair ?İ ?i table)
(with-case-table table
(concat (upcase "istanabul") " " (downcase "IRMA"))))
=> "0STANABUL 1rma"
Change the code so that ASCII characters being cased to Unicode
characters are left unchanged when operating on unibyte data. In other
words, aforementioned example will produce:
=> "iSTANBUL "Irma"
Arguably this isn’t correct either but it’s less wrong and ther’s not
much we can do when the strings are unibyte.
Note that casify_object had a ‘(c >= 0 && c < 256)’ condition but since
CHAR_TO_BYTE8 (and thus MAKE_CHAR_UNIBYTE) happily casts Unicode
characters to 8-bit (i.e. c & 0xFF), this never triggered for discussed
case.
* src/casefiddle.c (casify_object, casify_region): When dealing with
unibyte data, don’t attempt to store Unicode characters in the result.
---
src/casefiddle.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)
Unless there are objections, I’ll commit it in a few days.
diff --git a/src/casefiddle.c b/src/casefiddle.c
index 2d32f49..247cc6f 100644
--- a/src/casefiddle.c
+++ b/src/casefiddle.c
@@ -71,8 +71,8 @@ casify_object (enum case_action flag, Lisp_Object obj)
{
if (! inword)
c = upcase1 (c1);
- if (! multibyte)
- MAKE_CHAR_UNIBYTE (c);
+ if (! multibyte && CHAR_BYTE8_P (c))
+ c = CHAR_TO_BYTE8 (c);
XSETFASTINT (obj, c | flags);
}
return obj;
@@ -93,18 +93,19 @@ casify_object (enum case_action flag, Lisp_Object obj)
c1 = c;
if (inword && flag != CASE_CAPITALIZE_UP)
c = downcase (c);
- else if (!uppercasep (c)
- && (!inword || flag != CASE_CAPITALIZE_UP))
- c = upcase1 (c1);
+ else if (!inword || flag != CASE_CAPITALIZE_UP)
+ c = upcase (c1);
if ((int) flag >= (int) CASE_CAPITALIZE)
inword = (SYNTAX (c) == Sword);
if (c != c1)
{
- MAKE_CHAR_UNIBYTE (c);
- /* If the char can't be converted to a valid byte, just don't
- change it. */
- if (c >= 0 && c < 256)
- SSET (obj, i, c);
+ if (CHAR_BYTE8_P (c))
+ c = CHAR_TO_BYTE8 (c);
+ else if (!ASCII_CHAR_P (c))
+ /* If the char can't be converted to a valid byte, just don't
+ change it. */
+ continue;
+ SSET (obj, i, c);
}
}
return obj;
@@ -250,8 +251,11 @@ casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e)
if (! multibyte)
{
- MAKE_CHAR_UNIBYTE (c);
- FETCH_BYTE (start_byte) = c;
+ /* If the char can't be converted to a valid byte, just don't
+ change it. */
+ if (ASCII_CHAR_P (c) ||
+ (CHAR_BYTE8_P (c) && ((c = CHAR_TO_BYTE8 (c)), true)))
+ FETCH_BYTE (start_byte) = c;
}
else if (ASCII_CHAR_P (c2) && ASCII_CHAR_P (c))
FETCH_BYTE (start_byte) = c;
--
2.8.0.rc3.226.g39d4020
This bug report was last modified 8 years and 251 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.