GNU bug report logs - #16731
24.3.50; Latin small letter sharp s is not considered lower-case

Previous Next

Package: emacs;

Reported by: Jorgen Schaefer <forcer <at> forcix.cx>

Date: Wed, 12 Feb 2014 17:31:02 UTC

Severity: normal

Merged with 10576

Found in version 24.3.50

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16731 in the body.
You can then email your comments to 16731 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Wed, 12 Feb 2014 17:31:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jorgen Schaefer <forcer <at> forcix.cx>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 12 Feb 2014 17:31:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jorgen Schaefer <forcer <at> forcix.cx>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.3.50; Latin small letter sharp s is not considered lower-case
Date: Wed, 12 Feb 2014 18:29:23 +0100
Hi!
The following seems like a bug:

(string-match "[[:lower:]]" "ß") => nil

`describe-char' for this says:

  name: LATIN SMALL LETTER SHARP S
  general-category: Ll (Letter, Lowercase)
  decomposition: (223) ('ß')

Not sure why it would not be considered a lower-case letter. Umlauts
like ä, ö and ü are matched correctly.

Regards,
        -- Jorgen

Configured using:
 `configure --without-x'

Important settings:
  value of $LC_ALL: 
  value of $LC_COLLATE: de_DE.UTF-8
  value of $LC_CTYPE: de_DE.UTF-8
  value of $LC_MESSAGES: POSIX
  value of $LC_MONETARY: POSIX
  value of $LC_NUMERIC: POSIX
  value of $LC_TIME: POSIX
  value of $LANG: POSIX
  locale-coding-system: utf-8-unix




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Wed, 12 Feb 2014 17:56:02 GMT) Full text and rfc822 format available.

Message #8 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Jorgen Schaefer <forcer <at> forcix.cx>
Cc: 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Wed, 12 Feb 2014 12:55:22 -0500
Jorgen Schaefer wrote:

> Not sure why it would not be considered a lower-case letter. Umlauts
> like ä, ö and ü are matched correctly.

See http://debbugs.gnu.org/10576

(I have no idea whether this is an Emacs bug or not.)




Merged 10576 16731. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 12 Feb 2014 17:56:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Wed, 12 Feb 2014 19:28:02 GMT) Full text and rfc822 format available.

Message #13 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#16731: 24.3.50; Latin small letter sharp s is not considered
 lower-case
Date: Wed, 12 Feb 2014 20:31:20 +0100
Am 12.02.2014 18:55, schrieb Glenn Morris:
> Jorgen Schaefer wrote:
>
>> Not sure why it would not be considered a lower-case letter. Umlauts
>> like ä, ö and ü are matched correctly.
>
> See http://debbugs.gnu.org/10576
>
> (I have no idea whether this is an Emacs bug or not.)
>
>
>
>

IMO the answer given at link is not valid. Indeed the implementation in buffer.h does check --&& upcase1 (c)-- and expects a result, i.e. ignores the fact, some characters 
might not have an upcase variant.

When seeing there is a downcase-table, the check probably should be done against this.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Wed, 12 Feb 2014 19:50:02 GMT) Full text and rfc822 format available.

Message #16 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
Cc: 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Wed, 12 Feb 2014 21:49:25 +0200
> Date: Wed, 12 Feb 2014 20:31:20 +0100
> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> 
> > See http://debbugs.gnu.org/10576
> >
> > (I have no idea whether this is an Emacs bug or not.)
> >
> 
> IMO the answer given at link is not valid.

It accurately describes what happens in the code, so it's definitely
valid.

> When seeing there is a downcase-table, the check probably should be done against this.

Not sure what you mean by that, please elaborate.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Wed, 12 Feb 2014 20:07:01 GMT) Full text and rfc822 format available.

Message #19 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50; Latin small letter sharp s is not considered
 lower-case
Date: Wed, 12 Feb 2014 21:10:57 +0100
Am 12.02.2014 20:49, schrieb Eli Zaretskii:
>> Date: Wed, 12 Feb 2014 20:31:20 +0100
>> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
>>
>>> See http://debbugs.gnu.org/10576
>>>
>>> (I have no idea whether this is an Emacs bug or not.)
>>>
>>
>> IMO the answer given at link is not valid.
>
> It accurately describes what happens in the code, so it's definitely
> valid.
>
>> When seeing there is a downcase-table, the check probably should be done against this.
>
> Not sure what you mean by that, please elaborate.
>
>

See buffer.h
IIUC the mentioned lowercasep is implemented as !uppercasep (c) && upcase1 (c) != c;
upcase1 (c) must fail, as there is no upcased of this char.

While upcase1 can't succeed, downcase should - if "ß" is a member of downcase_table.








Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Wed, 12 Feb 2014 20:17:01 GMT) Full text and rfc822 format available.

Message #22 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
Cc: 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Wed, 12 Feb 2014 22:16:40 +0200
> Date: Wed, 12 Feb 2014 21:10:57 +0100
> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> CC: 16731 <at> debbugs.gnu.org
> 
> While upcase1 can't succeed, downcase should - if "ß" is a member of downcase_table.

But which character do you want to downcase in this case?

This whole logic works only for _pairs_ of characters (and the
char-table used here is populated by calls to set-case-syntax-pair).
Such machinery cannot possibly work when there's no pair.

The only way I can see out of this conundrum is to consult the
Lowercase Unicode property of the character as fallback, assuming that
won't slow down regex search too much.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Wed, 12 Feb 2014 20:30:03 GMT) Full text and rfc822 format available.

Message #25 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50; Latin small letter sharp s is not considered
 lower-case
Date: Wed, 12 Feb 2014 21:33:31 +0100
Am 12.02.2014 21:16, schrieb Eli Zaretskii:
>> Date: Wed, 12 Feb 2014 21:10:57 +0100
>> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
>> CC: 16731 <at> debbugs.gnu.org
>>
>> While upcase1 can't succeed, downcase should - if "ß" is a member of downcase_table.
>
> But which character do you want to downcase in this case?
>
> This whole logic works only for _pairs_ of characters (and the
> char-table used here is populated by calls to set-case-syntax-pair).

So populate it differently, resp. allow empty slots.

> Such machinery cannot possibly work when there's no pair.
>
> The only way I can see out of this conundrum is to consult the
> Lowercase Unicode property of the character as fallback, assuming that
> won't slow down regex search too much.
>
>

You can do (downcase "d") for example, which results in "d".

Instead of

upcase1 (c) != c

what about

downcase (c) == c

?








Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Wed, 12 Feb 2014 20:59:02 GMT) Full text and rfc822 format available.

Message #28 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Wed, 12 Feb 2014 21:57:23 +0100
On Wed, Feb 12, 2014 at 9:33 PM, Andreas Röhler
<andreas.roehler <at> easy-emacs.de> wrote:

> what about
>
> downcase (c) == c

Won't that be true for characters that have no upcase/downcase
difference, like digits?

   J




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 03:47:01 GMT) Full text and rfc822 format available.

Message #31 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
Cc: 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 05:46:22 +0200
> Date: Wed, 12 Feb 2014 21:33:31 +0100
> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> CC: 16731 <at> debbugs.gnu.org
> 
> Am 12.02.2014 21:16, schrieb Eli Zaretskii:
> >> Date: Wed, 12 Feb 2014 21:10:57 +0100
> >> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> >> CC: 16731 <at> debbugs.gnu.org
> >>
> >> While upcase1 can't succeed, downcase should - if "ß" is a member of downcase_table.
> >
> > But which character do you want to downcase in this case?
> >
> > This whole logic works only for _pairs_ of characters (and the
> > char-table used here is populated by calls to set-case-syntax-pair).
> 
> So populate it differently, resp. allow empty slots.

How will we then be able to distinguish between lower-case characters
that have no upcase variant and characters that are not lower-case
characters at all?

> You can do (downcase "d") for example, which results in "d".
> 
> Instead of
> 
> upcase1 (c) != c
> 
> what about
> 
> downcase (c) == c
> 
> ?

The same is true for any non-letter, like punctuation.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 08:24:01 GMT) Full text and rfc822 format available.

Message #34 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Juanma Barranquero <lekktu <at> gmail.com>, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50; Latin small letter sharp s is not considered
 lower-case
Date: Thu, 13 Feb 2014 09:27:43 +0100
Am 13.02.2014 04:46, schrieb Eli Zaretskii:
>> Date: Wed, 12 Feb 2014 21:33:31 +0100
>> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
>> CC: 16731 <at> debbugs.gnu.org
>>
>> Am 12.02.2014 21:16, schrieb Eli Zaretskii:
>>>> Date: Wed, 12 Feb 2014 21:10:57 +0100
>>>> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
>>>> CC: 16731 <at> debbugs.gnu.org
>>>>
>>>> While upcase1 can't succeed, downcase should - if "ß" is a member of downcase_table.
>>>
>>> But which character do you want to downcase in this case?
>>>
>>> This whole logic works only for _pairs_ of characters (and the
>>> char-table used here is populated by calls to set-case-syntax-pair).
>>
>> So populate it differently, resp. allow empty slots.
>
> How will we then be able to distinguish between lower-case characters
> that have no upcase variant and characters that are not lower-case
> characters at all?
>
>> You can do (downcase "d") for example, which results in "d".
>>
>> Instead of
>>
>> upcase1 (c) != c
>>
>> what about
>>
>> downcase (c) == c
>>
>> ?
>
> The same is true for any non-letter, like punctuation.
>
>

Okay, right.

So it seems upcase_table is populated wrongly with "ß"?





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 13:38:02 GMT) Full text and rfc822 format available.

Message #37 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Andreas Röhler <andreas.roehler <at> easy-emacs.de>,
 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 08:37:45 -0500
> How will we then be able to distinguish between lower-case characters
> that have no upcase variant and characters that are not lower-case
> characters at all?

Right: to handle this, we need to distinguish characters that are
lower-case without an uppercase variant from characters which are
neither lowercase nor uppercase.

We could do that by saying that the upcase table should return nil or -1
for ß, to indicate that the upcase version is "missing".  But such
a change will probably require carefully revising "all" the code that
uses those tables.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 15:54:02 GMT) Full text and rfc822 format available.

Message #40 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
Cc: lekktu <at> gmail.com, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 17:53:08 +0200
> Date: Thu, 13 Feb 2014 09:27:43 +0100
> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> CC: 16731 <at> debbugs.gnu.org, Juanma Barranquero <lekktu <at> gmail.com>
> 
> So it seems upcase_table is populated wrongly with "ß"?

I see nothing wrong with it: its entry is the character itself, like
any other character that has no up-case variant.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 16:34:01 GMT) Full text and rfc822 format available.

Message #43 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 18:33:05 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Andreas Röhler <andreas.roehler <at> easy-emacs.de>,
>   16731 <at> debbugs.gnu.org
> Date: Thu, 13 Feb 2014 08:37:45 -0500
> 
> > How will we then be able to distinguish between lower-case characters
> > that have no upcase variant and characters that are not lower-case
> > characters at all?
> 
> Right: to handle this, we need to distinguish characters that are
> lower-case without an uppercase variant from characters which are
> neither lowercase nor uppercase.
> 
> We could do that by saying that the upcase table should return nil or -1
> for ß, to indicate that the upcase version is "missing".  But such
> a change will probably require carefully revising "all" the code that
> uses those tables.

Right.  I can instead suggest a much less intrusive change below.  Its
only disadvantage is that if some user or Lisp program overrides the
standard case tables, and actually _wants_ some lower-case characters
behave as if they weren't, looking at the Unicode tables will undo
such customizations.  If this is a concern, perhaps we could compare
the case table with the standard value, and only use the Unicode
attributes when they are equal?

If the approach below is accepted, a related question is how to treat
letters whose category is Lt, i.e. "titlecase" -- do we consider such
letters upper case or don't we?

--- src/buffer.h~0	2014-01-01 09:46:07.000000000 +0200
+++ src/buffer.h	2014-02-13 18:27:32.225839000 +0200
@@ -1349,7 +1349,19 @@ downcase (int c)
 }
 
 /* True if C is upper case.  */
-INLINE bool uppercasep (int c) { return downcase (c) != c; }
+INLINE bool uppercasep (int c)
+{
+  Lisp_Object val;
+
+  if (downcase (c) != c)
+    return true;
+
+  if (NILP (Vunicode_category_table))
+    return false;
+
+  val = CHAR_TABLE_REF (Vunicode_category_table, c);
+  return INTEGERP (val) && XINT (val) == UNICODE_CATEGORY_Lu;
+}
 
 /* Upcase a character C known to be not upper case.  */
 INLINE int
@@ -1364,7 +1376,16 @@ upcase1 (int c)
 INLINE bool
 lowercasep (int c)
 {
-  return !uppercasep (c) && upcase1 (c) != c;
+  Lisp_Object val;
+
+  if (!uppercasep (c) && upcase1 (c) != c)
+    return true;
+
+  if (NILP (Vunicode_category_table))
+    return false;
+
+  val = CHAR_TABLE_REF (Vunicode_category_table, c);
+  return INTEGERP (val) && XINT (val) == UNICODE_CATEGORY_Ll;
 }
 
 /* Upcase a character C, or make no change if that cannot be done.  */




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 17:11:02 GMT) Full text and rfc822 format available.

Message #46 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 12:10:49 -0500
>  /* True if C is upper case.  */
> -INLINE bool uppercasep (int c) { return downcase (c) != c; }
> +INLINE bool uppercasep (int c)
> +{
> +  Lisp_Object val;
> +
> +  if (downcase (c) != c)
> +    return true;
> +
> +  if (NILP (Vunicode_category_table))
> +    return false;
> +
> +  val = CHAR_TABLE_REF (Vunicode_category_table, c);
> +  return INTEGERP (val) && XINT (val) == UNICODE_CATEGORY_Lu;
> +}
 
Doesn't sound too bad.  But it does beg the question: why check
(downcase (c) != c) at all, then?


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 17:40:02 GMT) Full text and rfc822 format available.

Message #49 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 19:39:04 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: andreas.roehler <at> easy-emacs.de,  16731 <at> debbugs.gnu.org
> Date: Thu, 13 Feb 2014 12:10:49 -0500
> 
> >  /* True if C is upper case.  */
> > -INLINE bool uppercasep (int c) { return downcase (c) != c; }
> > +INLINE bool uppercasep (int c)
> > +{
> > +  Lisp_Object val;
> > +
> > +  if (downcase (c) != c)
> > +    return true;
> > +
> > +  if (NILP (Vunicode_category_table))
> > +    return false;
> > +
> > +  val = CHAR_TABLE_REF (Vunicode_category_table, c);
> > +  return INTEGERP (val) && XINT (val) == UNICODE_CATEGORY_Lu;
> > +}
>  
> Doesn't sound too bad.  But it does beg the question: why check
> (downcase (c) != c) at all, then?

Because it's faster, and for most characters will do the job.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 17:59:02 GMT) Full text and rfc822 format available.

Message #52 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: Eli Zaretskii <eliz <at> gnu.org>, 
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50; Latin small letter sharp s is not considered
 lower-case
Date: Thu, 13 Feb 2014 19:02:08 +0100
Am 13.02.2014 18:39, schrieb Eli Zaretskii:
>> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
>> Cc: andreas.roehler <at> easy-emacs.de,  16731 <at> debbugs.gnu.org
>> Date: Thu, 13 Feb 2014 12:10:49 -0500
>>
>>>   /* True if C is upper case.  */
>>> -INLINE bool uppercasep (int c) { return downcase (c) != c; }
>>> +INLINE bool uppercasep (int c)
>>> +{
>>> +  Lisp_Object val;
>>> +
>>> +  if (downcase (c) != c)
>>> +    return true;
>>> +
>>> +  if (NILP (Vunicode_category_table))
>>> +    return false;
>>> +
>>> +  val = CHAR_TABLE_REF (Vunicode_category_table, c);
>>> +  return INTEGERP (val) && XINT (val) == UNICODE_CATEGORY_Lu;
>>> +}
>>
>> Doesn't sound too bad.  But it does beg the question: why check
>> (downcase (c) != c) at all, then?
>
> Because it's faster, and for most characters will do the job.
>

Maybe I'm missing the point: all change needed is not to store "ß" into the uppercase-table.
Why not store nil there instead?





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 17:59:03 GMT) Full text and rfc822 format available.

Message #55 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 18:58:04 +0100
On Thu, Feb 13, 2014 at 5:33 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:

> If the approach below is accepted, a related question is how to treat
> letters whose category is Lt, i.e. "titlecase" -- do we consider such
> letters upper case or don't we?

No Unicode expert, but this suggest they are uppercase, sort of:

http://www.unicode.org/faq/casemap_charprop.html

"Q: What is titlecase? How is it different from uppercase?

A: Titlecase takes its name from the case format used when forming a
title, in which the initial letter in a word is capitalized and the
rest are not. Titlecase is also used in forming a sentence by
capitalizing the first word, and for forming proper names. The
titlecase mapping in the Unicode Standard is the mapping applied to
the initial character in a word.

The titlecase mapping in Unicode differs from the uppercase mapping in
that a number of characters require special handling. These are
chiefly ligatures and digraphs such as 'fl', 'dz', and 'lj', plus a
number of polytonic Greek characters. For example, U+01C7 (LJ) maps to
U+01C8 (Lj) rather than to U+01C9 (lj)."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 18:11:02 GMT) Full text and rfc822 format available.

Message #58 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 13:10:02 -0500
>> Doesn't sound too bad.  But it does beg the question: why check
>> (downcase (c) != c) at all, then?
> Because it's faster,

Is it?  Both lookups look like CHAR_TABLE_REF to me.

> and for most characters will do the job.

But we'll check the unicode table at least for more than half the
characters (i.e. for all the lowercase and non-case characters), so the
fast path can't give us more than a factor of 2 speed up anyway, and the
slow path is made slower by unnecessarily looking up the case table.

I guess what I mean is that without actual measurements it's not obvious
at all that speed is a good justification.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 18:18:01 GMT) Full text and rfc822 format available.

Message #61 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 20:16:45 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: andreas.roehler <at> easy-emacs.de,  16731 <at> debbugs.gnu.org
> Date: Thu, 13 Feb 2014 13:10:02 -0500
> 
> >> Doesn't sound too bad.  But it does beg the question: why check
> >> (downcase (c) != c) at all, then?
> > Because it's faster,
> 
> Is it?  Both lookups look like CHAR_TABLE_REF to me.
> 
> > and for most characters will do the job.
> 
> But we'll check the unicode table at least for more than half the
> characters (i.e. for all the lowercase and non-case characters), so the
> fast path can't give us more than a factor of 2 speed up anyway, and the
> slow path is made slower by unnecessarily looking up the case table.
> 
> I guess what I mean is that without actual measurements it's not obvious
> at all that speed is a good justification.

What about custom buffer-local case tables?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 18:19:02 GMT) Full text and rfc822 format available.

Message #64 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
Cc: monnier <at> iro.umontreal.ca, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 20:17:59 +0200
> Date: Thu, 13 Feb 2014 19:02:08 +0100
> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> CC: 16731 <at> debbugs.gnu.org
> 
> Maybe I'm missing the point: all change needed is not to store "ß" into the uppercase-table.
> Why not store nil there instead?

Because that's not what case tables are documented to hold.  We will
break back compatibility if we put nil there.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 18:20:02 GMT) Full text and rfc822 format available.

Message #67 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juanma Barranquero <lekktu <at> gmail.com>
Cc: monnier <at> iro.umontreal.ca, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 20:18:59 +0200
> From: Juanma Barranquero <lekktu <at> gmail.com>
> Date: Thu, 13 Feb 2014 18:58:04 +0100
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 16731 <at> debbugs.gnu.org
> 
> On Thu, Feb 13, 2014 at 5:33 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
> > If the approach below is accepted, a related question is how to treat
> > letters whose category is Lt, i.e. "titlecase" -- do we consider such
> > letters upper case or don't we?
> 
> No Unicode expert, but this suggest they are uppercase, sort of:
> 
> http://www.unicode.org/faq/casemap_charprop.html
> 
> "Q: What is titlecase? How is it different from uppercase?
> 
> A: Titlecase takes its name from the case format used when forming a
> title, in which the initial letter in a word is capitalized and the
> rest are not. Titlecase is also used in forming a sentence by
> capitalizing the first word, and for forming proper names. The
> titlecase mapping in the Unicode Standard is the mapping applied to
> the initial character in a word.
> 
> The titlecase mapping in Unicode differs from the uppercase mapping in
> that a number of characters require special handling. These are
> chiefly ligatures and digraphs such as 'fl', 'dz', and 'lj', plus a
> number of polytonic Greek characters. For example, U+01C7 (LJ) maps to
> U+01C8 (Lj) rather than to U+01C9 (lj)."

The question is whether we want [:upper:] to match titlecase letters.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 18:24:01 GMT) Full text and rfc822 format available.

Message #70 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 19:22:49 +0100
On Thu, Feb 13, 2014 at 7:18 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:

> The question is whether we want [:upper:] to match titlecase letters.

Yes, I understand. And I'm pointing out that, unless there's a
separate [:title:] matcher, matching them with [:upper:] is not
entirely unreasonable. Whether it is the right thing to do or not will
depend on the uses, I think.

    J




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 18:48:02 GMT) Full text and rfc822 format available.

Message #73 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Juanma Barranquero <lekktu <at> gmail.com>, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 13:47:33 -0500
Eli Zaretskii wrote:

> The question is whether we want [:upper:] to match titlecase letters.

What does grep do?
(http://debbugs.gnu.org/16631 ?)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 19:16:01 GMT) Full text and rfc822 format available.

Message #76 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 14:15:37 -0500
> What about custom buffer-local case tables?

That's what I meant by my question, yes.  Your change will break about half of
the uses of buffer-local case tables.  Using the unicode table all the
time will break them all.
Is it a real issue?  I really don't know.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 20:17:01 GMT) Full text and rfc822 format available.

Message #79 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Glenn Morris <rgm <at> gnu.org>
Cc: lekktu <at> gmail.com, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 22:16:00 +0200
> From: Glenn Morris <rgm <at> gnu.org>
> Cc: Juanma Barranquero <lekktu <at> gmail.com>,  16731 <at> debbugs.gnu.org
> Date: Thu, 13 Feb 2014 13:47:33 -0500
> 
> Eli Zaretskii wrote:
> 
> > The question is whether we want [:upper:] to match titlecase letters.
> 
> What does grep do?
> (http://debbugs.gnu.org/16631 ?)

Grep (like most of other programs) uses locale-dependent tables
provided by libc, so it's not really relevant for us what it does.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Thu, 13 Feb 2014 20:26:02 GMT) Full text and rfc822 format available.

Message #82 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Thu, 13 Feb 2014 22:24:52 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: andreas.roehler <at> easy-emacs.de,  16731 <at> debbugs.gnu.org
> Date: Thu, 13 Feb 2014 14:15:37 -0500
> 
> > What about custom buffer-local case tables?
> 
> That's what I meant by my question, yes.  Your change will break about half of
> the uses of buffer-local case tables.  Using the unicode table all the
> time will break them all.
> Is it a real issue?  I really don't know.

Neither do I.

How about if we use the unicode tables only if the corresponding
buffer's case-table is the standard one (Vascii_*_table)?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Fri, 14 Feb 2014 16:21:01 GMT) Full text and rfc822 format available.

Message #85 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50; , Latin small letter sharp s is not considered
 lower-case
Date: Fri, 14 Feb 2014 08:20:48 -0800
Grep doesn't just use glibc's tables; it has its own dfa matcher (also 
shared by awk), and runs into problem in this area as well.  I'm working 
on fixes for this in my limited spare time.

If you want 'uppercasep' to match what glibc and grep mean by 
[[:upper:]], Emacs might need to check not merely for 
UNICODE_CATEGORY_Lu but also for other Unicode categories (mixed case, 
title case).  I haven't investigated the details.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Fri, 14 Feb 2014 17:23:02 GMT) Full text and rfc822 format available.

Message #88 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Fri, 14 Feb 2014 12:22:35 -0500
>> Is it a real issue?  I really don't know.
> Neither do I.

Maybe it's not a problem.  Someone(TM) should grep to try and figure it
out, and then try it out.

> How about if we use the unicode tables only if the corresponding
> buffer's case-table is the standard one (Vascii_*_table)?

That sounds kludgy.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Fri, 14 Feb 2014 18:17:02 GMT) Full text and rfc822 format available.

Message #91 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Fri, 14 Feb 2014 20:16:08 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: andreas.roehler <at> easy-emacs.de,  16731 <at> debbugs.gnu.org
> Date: Fri, 14 Feb 2014 12:22:35 -0500
> 
> > How about if we use the unicode tables only if the corresponding
> > buffer's case-table is the standard one (Vascii_*_table)?
> 
> That sounds kludgy.

Why kludgy?  If the tables were not customized, it is a sign that this
buffer is OK with the default properties, which is what the Unicode
properties are about.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Fri, 14 Feb 2014 21:00:03 GMT) Full text and rfc822 format available.

Message #94 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Fri, 14 Feb 2014 15:59:00 -0500
>> > How about if we use the unicode tables only if the corresponding
>> > buffer's case-table is the standard one (Vascii_*_table)?
>> That sounds kludgy.
> Why kludgy?

Because, if someone were to take the Vascii_*_table, make a little
change to them and use them in a buffer, he suddenly gets different
behavior for some chars he hasn't touched.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Sat, 15 Feb 2014 07:13:02 GMT) Full text and rfc822 format available.

Message #97 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Sat, 15 Feb 2014 09:12:39 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: andreas.roehler <at> easy-emacs.de,  16731 <at> debbugs.gnu.org
> Date: Fri, 14 Feb 2014 15:59:00 -0500
> 
> >> > How about if we use the unicode tables only if the corresponding
> >> > buffer's case-table is the standard one (Vascii_*_table)?
> >> That sounds kludgy.
> > Why kludgy?
> 
> Because, if someone were to take the Vascii_*_table

How could they? these variables are not exposed to Lisp.  Only
ascii-case-table is, which is not the one I had in mind.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Mon, 17 Feb 2014 03:10:02 GMT) Full text and rfc822 format available.

Message #100 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Sun, 16 Feb 2014 22:09:32 -0500
> How could they? these variables are not exposed to Lisp.  Only
> ascii-case-table is, which is not the one I had in mind.

Right, I was thinking of standard-case-table.  Still, same problem: take
that standard case table change it a bit, and suddenly other chars than
the ones you changed are affected.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Mon, 17 Feb 2014 05:30:03 GMT) Full text and rfc822 format available.

Message #103 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: andreas.roehler <at> easy-emacs.de, 16731 <at> debbugs.gnu.org
Subject: Re: bug#16731: 24.3.50;
 Latin small letter sharp s is not considered lower-case
Date: Mon, 17 Feb 2014 07:29:31 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: andreas.roehler <at> easy-emacs.de,  16731 <at> debbugs.gnu.org
> Date: Sun, 16 Feb 2014 22:09:32 -0500
> 
> > How could they? these variables are not exposed to Lisp.  Only
> > ascii-case-table is, which is not the one I had in mind.
> 
> Right, I was thinking of standard-case-table.  Still, same problem: take
> that standard case table change it a bit, and suddenly other chars than
> the ones you changed are affected.

But customizing case-tables is already a very special use case.  Why
can't we expect such users to deal with these issues?

The only alternative (besides leaving the original problem unsolved)
is to ignore buffer-local case tables.  Is this more acceptable?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#16731; Package emacs. (Fri, 16 Jul 2021 12:33:02 GMT) Full text and rfc822 format available.

Message #106 received at 16731 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Jorgen Schaefer <forcer <at> forcix.cx>
Cc: 10576 <at> debbugs.gnu.org, 16731 <at> debbugs.gnu.org
Subject: Re: bug#10576: Subject: 23.4; char class [:lower:] misses latin
 small letter sharp s
Date: Fri, 16 Jul 2021 14:32:41 +0200
Jorgen Schaefer <forcer <at> forcix.cx> writes:

> The following seems like a bug:
>
> (string-match "[[:lower:]]" "ß") => nil

This has been fixed in Emacs 28.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 28.1, send any further explanations to 10576 <at> debbugs.gnu.org and Andreas Röhler <andreas.roehler <at> easy-emacs.de> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Fri, 16 Jul 2021 12:33:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 14 Aug 2021 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 311 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.