GNU bug report logs - #23833
24.5; assoc-string with CASE-FOLD may fail

Previous Next

Package: emacs;

Reported by: ynyaaa <at> gmail.com

Date: Thu, 23 Jun 2016 12:02:02 UTC

Severity: normal

Found in version 24.5

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23833 in the body.
You can then email your comments to 23833 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#23833; Package emacs. (Thu, 23 Jun 2016 12:02:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to ynyaaa <at> gmail.com:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 23 Jun 2016 12:02:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: ynyaaa <at> gmail.com
To: bug-gnu-emacs <at> gnu.org
Subject: 24.5; assoc-string with CASE-FOLD may fail
Date: Thu, 23 Jun 2016 21:01:03 +0900
`assoc-string' called with non-nil CASE-FOLD argument may fail to match
with downcased key.

(let (l)
  (dotimes (c (max-char))
    (or (let ((s (string c)))
          (assoc-string (downcase s) (list s) t))
        (setq l (cons c l))))
  l)
=>(497 458 455 452)



In GNU Emacs 24.5.1 (i686-pc-mingw32)
 of 2015-04-11 on LEG570
Windowing system distributor `Microsoft Corp.', version 6.0.6002
Configured using:
 `configure --prefix=/c/usr --host=i686-pc-mingw32'

Important settings:
  value of $LANG: JPN
  locale-coding-system: cp932

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:

Quit
C-h C-b is undefined

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils advice help-fns time-date japan-util
tooltip electric uniquify ediff-hook vc-hooks lisp-float-type mwheel
dos-w32 ls-lisp w32-common-fns disp-table w32-win w32-vars tool-bar dnd
fontset image regexp-opt fringe tabulated-list newcomment lisp-mode
prog-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core frame cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev
minibuffer nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote make-network-process
w32notify w32 multi-tty emacs)

Memory information:
((conses 8 76911 6828)
 (symbols 32 17601 0)
 (miscs 32 34 127)
 (strings 16 11124 3996)
 (string-bytes 1 279109)
 (vectors 8 10463)
 (vector-slots 4 464483 4962)
 (floats 8 57 134)
 (intervals 28 185 22)
 (buffers 508 12))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#23833; Package emacs. (Thu, 23 Jun 2016 12:27:01 GMT) Full text and rfc822 format available.

Message #8 received at 23833 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> users.sourceforge.net>
To: ynyaaa <at> gmail.com
Cc: 23833 <at> debbugs.gnu.org
Subject: Re: bug#23833: 24.5; assoc-string with CASE-FOLD may fail
Date: Thu, 23 Jun 2016 08:26:22 -0400
assoc-string uses compare-strings, which uses upcase to ignore case,
but upcase is not always the inverse of downcase:

(upcase (downcase "DZ")) ;=> "Dz"
;; Or to put it another way
(= (upcase (downcase ?\u01F1)) ?\u01F2) ;=> t

Same behaviour seen in emacs-25 and master

In GNU Emacs 25.0.95.5 (x86_64-unknown-linux-gnu, X toolkit)
 of 2016-06-18 built on zony
Repository revision: 94cb773c6668291d7a4e6d03a552e46f99a0350e
Windowing system distributor 'The X.Org Foundation', version 11.0.11803000

In GNU Emacs 25.1.50.4 (x86_64-unknown-linux-gnu, X toolkit)
 of 2016-06-22 built on zony
Repository revision: 9990eb7727f78e5f4a88f512812637c603391fca
Windowing system distributor 'The X.Org Foundation', version 11.0.11803000




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Thu, 23 Jun 2016 15:13:02 GMT) Full text and rfc822 format available.

Notification sent to ynyaaa <at> gmail.com:
bug acknowledged by developer. (Thu, 23 Jun 2016 15:13:02 GMT) Full text and rfc822 format available.

Message #13 received at 23833-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Noam Postavsky <npostavs <at> users.sourceforge.net>
Cc: ynyaaa <at> gmail.com, 23833-done <at> debbugs.gnu.org
Subject: Re: bug#23833: 24.5; assoc-string with CASE-FOLD may fail
Date: Thu, 23 Jun 2016 18:11:34 +0300
> From: Noam Postavsky <npostavs <at> users.sourceforge.net>
> Date: Thu, 23 Jun 2016 08:26:22 -0400
> Cc: 23833 <at> debbugs.gnu.org
> 
> assoc-string uses compare-strings, which uses upcase to ignore case,
> but upcase is not always the inverse of downcase:
> 
> (upcase (downcase "DZ")) ;=> "Dz"
> ;; Or to put it another way
> (= (upcase (downcase ?\u01F1)) ?\u01F2) ;=> t
> 
> Same behaviour seen in emacs-25 and master

Thanks.

This is a documentation issue: both 'assoc-string' and
'compare-strings' had inaccuracies in their doc strings.  I fixed this
on the emacs-25 branch, and I'm marking this bug done.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#23833; Package emacs. (Thu, 23 Jun 2016 15:54:01 GMT) Full text and rfc822 format available.

Message #16 received at 23833 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> users.sourceforge.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: ynyaaa <at> gmail.com, 23833 <at> debbugs.gnu.org
Subject: Re: bug#23833: 24.5; assoc-string with CASE-FOLD may fail
Date: Thu, 23 Jun 2016 11:53:00 -0400
On Thu, Jun 23, 2016 at 11:11 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>> From: Noam Postavsky <npostavs <at> users.sourceforge.net>
>> Date: Thu, 23 Jun 2016 08:26:22 -0400
>> Cc: 23833 <at> debbugs.gnu.org
>>
>> assoc-string uses compare-strings, which uses upcase to ignore case,
>> but upcase is not always the inverse of downcase:
>>
>> (upcase (downcase "DZ")) ;=> "Dz"
>> ;; Or to put it another way
>> (= (upcase (downcase ?\u01F1)) ?\u01F2) ;=> t
>>
>> Same behaviour seen in emacs-25 and master
>
> Thanks.
>
> This is a documentation issue: both 'assoc-string' and
> 'compare-strings' had inaccuracies in their doc strings.  I fixed this
> on the emacs-25 branch, and I'm marking this bug done.

Would it not make more sense if upcase converted dz into DZ? According
to https://en.wikipedia.org/wiki/Dz_(digraph)#Unicode Dz is the "title
case" form, not upper case form. Or is this something that depends on
locale?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#23833; Package emacs. (Thu, 23 Jun 2016 16:21:02 GMT) Full text and rfc822 format available.

Message #19 received at 23833 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Noam Postavsky <npostavs <at> users.sourceforge.net>
Cc: ynyaaa <at> gmail.com, 23833 <at> debbugs.gnu.org
Subject: Re: bug#23833: 24.5; assoc-string with CASE-FOLD may fail
Date: Thu, 23 Jun 2016 19:19:09 +0300
> From: Noam Postavsky <npostavs <at> users.sourceforge.net>
> Date: Thu, 23 Jun 2016 11:53:00 -0400
> Cc: ynyaaa <at> gmail.com, 23833 <at> debbugs.gnu.org
> 
> Would it not make more sense if upcase converted dz into DZ?

We want to go by what UnicodeData.txt says:

  01F1;LATIN CAPITAL LETTER DZ;Lu;0;L;<compat> 0044 005A;;;;N;;;;01F3;01F2
  01F2;LATIN CAPITAL LETTER D WITH SMALL LETTER Z;Lt;0;L;<compat> 0044 007A;;;;N;;;01F1;01F3;01F2
  01F3;LATIN SMALL LETTER DZ;Ll;0;L;<compat> 0064 007A;;;;N;;;01F1;;01F2

The problem here is that both DZ and Dz name dz as their lower-case
variant, and Emacs can only have one pair.  So we chose the other one
for the upcase conversion.  We could switch them, but one of them will
necessarily be lost, this way or another.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#23833; Package emacs. (Thu, 23 Jun 2016 19:28:02 GMT) Full text and rfc822 format available.

Message #22 received at 23833 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: npostavs <at> users.sourceforge.net
Cc: ynyaaa <at> gmail.com, 23833 <at> debbugs.gnu.org
Subject: Re: bug#23833: 24.5; assoc-string with CASE-FOLD may fail
Date: Thu, 23 Jun 2016 22:25:53 +0300
> Date: Thu, 23 Jun 2016 19:19:09 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: ynyaaa <at> gmail.com, 23833 <at> debbugs.gnu.org
> 
> > From: Noam Postavsky <npostavs <at> users.sourceforge.net>
> > Date: Thu, 23 Jun 2016 11:53:00 -0400
> > Cc: ynyaaa <at> gmail.com, 23833 <at> debbugs.gnu.org
> > 
> > Would it not make more sense if upcase converted dz into DZ?
> 
> We want to go by what UnicodeData.txt says:
> 
>   01F1;LATIN CAPITAL LETTER DZ;Lu;0;L;<compat> 0044 005A;;;;N;;;;01F3;01F2
>   01F2;LATIN CAPITAL LETTER D WITH SMALL LETTER Z;Lt;0;L;<compat> 0044 007A;;;;N;;;01F1;01F3;01F2
>   01F3;LATIN SMALL LETTER DZ;Ll;0;L;<compat> 0064 007A;;;;N;;;01F1;;01F2
> 
> The problem here is that both DZ and Dz name dz as their lower-case
> variant, and Emacs can only have one pair.  So we chose the other one
> for the upcase conversion.  We could switch them, but one of them will
> necessarily be lost, this way or another.

On second thought, I think you are right, and swapping the pairs will
yield better results, so I've just did that on master for these and a
few other similar characters.

Of course, this doesn't resolve the original issue in any way, since
characters that have title-case will still fail the OP's test.

Thanks.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 22 Jul 2016 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 336 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.