GNU bug report logs -
#39686
25.2; Wrong behaviour of bibtex-autokey-name-change-strings
Previous Next
Reported by: "Roland Winkler" <winkler <at> gnu.org>
Date: Thu, 20 Feb 2020 04:56:02 UTC
Severity: normal
Found in version 25.2
Done: "Roland Winkler" <winkler <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 39686 in the body.
You can then email your comments to 39686 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Thu, 20 Feb 2020 04:56:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Roland Winkler" <winkler <at> gnu.org>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 20 Feb 2020 04:56:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
For the records:
The bug report below was submitted to auctex (bug#39479).
I am reposting it here.
==================================================================
Hi,
The bibtex-generate-autokey function uses
'bibtex-autokey-name-change-strings' to substitute special or accented
characters or ligatures with ascii characters.
I noticed that it doesn't lead to the intended behaviour for '\oe' and
'\OE', which get converted to 'oee' rather than 'oe'. On the other
hand, '\o', '\"o', and their capitalized counterparts are correctly
converted to 'oe' (and also '\ae' to 'ae').
This quirk seems to be fixed if '\o' and '\oe' are swapped in
bibtex-autokey-name-change-strings. Then all variants are correctly
converted.
So I propose to change the current bibtex-autokey-name-change-strings into
'(("\\\\aa" . "a")
("\\\\AA" . "A")
("\\\"a\\|\\\\\\\"a\\|\\\\ae" . "ae")
("\\\"A\\|\\\\\\\"A\\|\\\\AE" . "Ae")
("\\\\i" . "i")
("\\\\j" . "j")
("\\\\l" . "l")
("\\\\L" . "L")
("\\\"o\\|\\\\\\\"o\\|\\\\oe\\|\\\\o" . "oe")
("\\\"O\\|\\\\\\\"O\\|\\\\OE\\|\\\\O" . "Oe")
("\\\"s\\|\\\\\\\"s\\|\\\\3" . "ss")
("\\\"u\\|\\\\\\\"u" . "ue")
("\\\"U\\|\\\\\\\"U" . "Ue")
("\\\\`\\|\\\\'\\|\\\\\\^\\|\\\\~\\|\\\\=\\|\\\\\\.\\|\\\\u\\|\\\\v\\|\\\\H\\|\\\\t\\|\\\\c\\|\\\\d\\|\\\\b"
. "")
("[`'\"{}#]" . "")
("\\\\-" . "")
("\\\\?[ \n]+\\|~" . " "))
Cheers!
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Thu, 20 Feb 2020 05:05:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 39686 <at> debbugs.gnu.org (full text, mbox):
> I noticed that it doesn't lead to the intended behaviour for '\oe'
> and '\OE', which get converted to 'oee' rather than 'oe'. On the
> other hand, '\o', '\"o', and their capitalized counterparts are
> correctly converted to 'oe' (and also '\ae' to 'ae').
>
> This quirk seems to be fixed if '\o' and '\oe' are swapped in
> bibtex-autokey-name-change-strings. Then all variants are
> correctly converted.
I suggest that the value of bibtex-autokey-transcriptions should be
calculated by regexp-opt. That makes the code more readable and it
fixes this problem for free.
Can you confirm that the following works as it should?
(defvar bibtex-autokey-transcriptions
(nconc
(mapcar (lambda (a) (cons (regexp-opt (car a)) (cdr a)))
'(;; language specific characters
(("\\aa") . "a") ; \aa -> a
(("\\AA") . "A") ; \AA -> A
(("\"a" "\\\"a" "\\ae") . "ae") ; "a,\"a,\ae -> ae
(("\"A" "\\\"A" "\\AE") . "Ae") ; "A,\"A,\AE -> Ae
(("\\i") . "i") ; \i -> i
(("\\j") . "j") ; \j -> j
(("\\l") . "l") ; \l -> l
(("\\L") . "L") ; \L -> L
(("\"o" "\\\"o" "\\o" "\\oe") . "oe") ; "o,\"o,\o,\oe -> oe
(("\"O" "\\\"O" "\\O" "\\OE") . "Oe") ; "O,\"O,\O,\OE -> Oe
(("\"s" "\\\"s" "\\3") . "ss") ; "s,\"s,\3 -> ss
(("\"u" "\\\"u") . "ue") ; "u,\"u -> ue
(("\"U" "\\\"U") . "Ue") ; "U,\"U -> Ue
;; hyphen, accents
(("\\-" "\\`" "\\'" "\\^" "\\~" "\\=" "\\." "\\u" "\\v"
"\\H" "\\t" "\\c" "\\d" "\\b") . "")
;; space
(("~") . " ")))
;; more spaces
'(("[\s\t\n]*\\(?:\\\\\\)?[\s\t\n]+" . " ")
;; braces, quotes, concatenation.
("[`'\"{}#]" . "")))
"Alist of (OLD-REGEXP . NEW-STRING) pairs.
Used by the default values of `bibtex-autokey-name-change-strings' and
`bibtex-autokey-titleword-change-strings'. Defaults to translating some
language specific characters to their ASCII transcriptions, and
removing any character accents.")
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Fri, 21 Feb 2020 14:07:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 39686 <at> debbugs.gnu.org (full text, mbox):
On Fri Feb 21 2020 gojjoe2 <at> googlemail.com wrote:
> Yes it does, at least for the specific problem I found with
> "\oe". Thank you very much!
Thanks for for confirming it fixes your problem.
> Can you tell me where to insert this redefinition in my init file,
> while I wait for the next Emacs version? I tried adding it to my
> own bibtex-hook and to "eval-after-load 'bibtex ...", but it
> doesn't seem to have any effect in either. At the moment is called
> directly in the init file.
Bibtex-mode uses the variable bibtex-autokey-transcriptions to
define the default values of `bibtex-autokey-name-change-strings'
and `bibtex-autokey-titleword-change-strings'. So something like
eval-after-load will work only if it deals with all three variables.
Say, you first set bibtex-autokey-transcriptions to its new value;
then you use this to set `bibtex-autokey-name-change-strings' and
`bibtex-autokey-titleword-change-strings'.
It is probably easier to simply set bibtex-autokey-transcriptions in
your init file to its new value. Normally, emacs should load
bibtex-mode later. Then it will use your value of
bibtex-autokey-transcriptions to set the values of
`bibtex-autokey-name-change-strings' and
`bibtex-autokey-titleword-change-strings'.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Fri, 21 Feb 2020 14:14:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 39686 <at> debbugs.gnu.org (full text, mbox):
Hi Eli
I do not know the exact status of Emacs 27. Is it OK to install the
patch below into the Emacs 27 branch? The patch fixes a bug that
must have existed for at least about 20 years. So likely it will
not affect many users. But even if the patch has a hidden new bug,
it should not break anything crucial for users either.
Roland
(defvar bibtex-autokey-transcriptions
(nconc
(mapcar (lambda (a) (cons (regexp-opt (car a)) (cdr a)))
'(;; language specific characters
(("\\aa") . "a") ; \aa -> a
(("\\AA") . "A") ; \AA -> A
(("\"a" "\\\"a" "\\ae") . "ae") ; "a,\"a,\ae -> ae
(("\"A" "\\\"A" "\\AE") . "Ae") ; "A,\"A,\AE -> Ae
(("\\i") . "i") ; \i -> i
(("\\j") . "j") ; \j -> j
(("\\l") . "l") ; \l -> l
(("\\L") . "L") ; \L -> L
(("\"o" "\\\"o" "\\o" "\\oe") . "oe") ; "o,\"o,\o,\oe -> oe
(("\"O" "\\\"O" "\\O" "\\OE") . "Oe") ; "O,\"O,\O,\OE -> Oe
(("\"s" "\\\"s" "\\3") . "ss") ; "s,\"s,\3 -> ss
(("\"u" "\\\"u") . "ue") ; "u,\"u -> ue
(("\"U" "\\\"U") . "Ue") ; "U,\"U -> Ue
;; hyphen, accents
(("\\-" "\\`" "\\'" "\\^" "\\~" "\\=" "\\." "\\u" "\\v"
"\\H" "\\t" "\\c" "\\d" "\\b") . "")
;; space
(("~") . " ")))
;; more spaces
'(("[\s\t\n]*\\(?:\\\\\\)?[\s\t\n]+" . " ")
;; braces, quotes, concatenation.
("[`'\"{}#]" . "")))
"Alist of (OLD-REGEXP . NEW-STRING) pairs.
Used by the default values of `bibtex-autokey-name-change-strings' and
`bibtex-autokey-titleword-change-strings'. Defaults to translating some
language specific characters to their ASCII transcriptions, and
removing any character accents.")
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Fri, 21 Feb 2020 15:02:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 39686 <at> debbugs.gnu.org (full text, mbox):
> Date: Fri, 21 Feb 2020 08:13:51 -0600
> From: "Roland Winkler" <winkler <at> gnu.org>
> CC: gojjoe2 <at> googlemail.com
>
> I do not know the exact status of Emacs 27. Is it OK to install the
> patch below into the Emacs 27 branch?
I don't see a patch, just a defvar. Which part(s) of that are
modified, and how?
Also, what commands/features will be affected by this change?
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Fri, 21 Feb 2020 15:33:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 39686 <at> debbugs.gnu.org (full text, mbox):
Hi Roland, thank you for looking into this.
> I suggest that the value of bibtex-autokey-transcriptions should be
> calculated by regexp-opt. That makes the code more readable and it
> fixes this problem for free.
>
> Can you confirm that the following works as it should?
Yes it does, at least for the specific problem I found with "\oe". Thank you very much!
Can you tell me where to insert this redefinition in my init file, while I wait for the next Emacs version? I tried adding it to my own bibtex-hook and to "eval-after-load 'bibtex ...", but it doesn't seem to have any effect in either. At the moment is called directly in the init file.
Cheers,
Luca
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Fri, 21 Feb 2020 15:33:03 GMT)
Full text and
rfc822 format available.
Message #23 received at 39686 <at> debbugs.gnu.org (full text, mbox):
> Bibtex-mode uses the variable bibtex-autokey-transcriptions to
> define the default values of `bibtex-autokey-name-change-strings'
> and `bibtex-autokey-titleword-change-strings'. So something like
> eval-after-load will work only if it deals with all three variables.
> Say, you first set bibtex-autokey-transcriptions to its new value;
> then you use this to set `bibtex-autokey-name-change-strings' and
> `bibtex-autokey-titleword-change-strings'.
>
> It is probably easier to simply set bibtex-autokey-transcriptions in
> your init file to its new value. Normally, emacs should load
> bibtex-mode later. Then it will use your value of
> bibtex-autokey-transcriptions to set the values of
> `bibtex-autokey-name-change-strings' and
> `bibtex-autokey-titleword-change-strings'>
Thank you for the explanation. I'm doing as you say and everything seems to work smoothly. Please feel free to close this bug report.
Cheers,
Luca
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Fri, 21 Feb 2020 16:06:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 39686 <at> debbugs.gnu.org (full text, mbox):
Correct me if I'm wrong, but it looks like the assumption is that all possible \-sequences have been enumerated in that list. More likely, some kind of anchoring after the match is needed, like word-end ("\\>"). Then the matching order doesn't matter.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Fri, 21 Feb 2020 19:51:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 39686 <at> debbugs.gnu.org (full text, mbox):
On Fri Feb 21 2020 Eli Zaretskii wrote:
> I don't see a patch, just a defvar. Which part(s) of that are
> modified, and how?
>
> Also, what commands/features will be affected by this change?
I am sorry, I had tried to keep things readable for the OP.
A proper patch is attached below.
The patch affects the autokey machinery of bibtex-mode that
calculates keys for new BibTeX entries based on the content of an
entry. The entry point for the autokey machinery is the function
bibtex-generate-autokey. In bibtex.el, this function is called once
by the user command bibtex-clean-entry. No other package inside
emacs relies on this.
diff --git a/lisp/textmodes/bibtex.el b/lisp/textmodes/bibtex.el
index a7be57e..670e763 100644
--- a/lisp/textmodes/bibtex.el
+++ b/lisp/textmodes/bibtex.el
@@ -1006,32 +1006,36 @@ bibtex-autokey-expand-strings
:type 'boolean)
(defvar bibtex-autokey-transcriptions
- '(;; language specific characters
- ("\\\\aa" . "a") ; \aa -> a
- ("\\\\AA" . "A") ; \AA -> A
- ("\\\"a\\|\\\\\\\"a\\|\\\\ae" . "ae") ; "a,\"a,\ae -> ae
- ("\\\"A\\|\\\\\\\"A\\|\\\\AE" . "Ae") ; "A,\"A,\AE -> Ae
- ("\\\\i" . "i") ; \i -> i
- ("\\\\j" . "j") ; \j -> j
- ("\\\\l" . "l") ; \l -> l
- ("\\\\L" . "L") ; \L -> L
- ("\\\"o\\|\\\\\\\"o\\|\\\\o\\|\\\\oe" . "oe") ; "o,\"o,\o,\oe -> oe
- ("\\\"O\\|\\\\\\\"O\\|\\\\O\\|\\\\OE" . "Oe") ; "O,\"O,\O,\OE -> Oe
- ("\\\"s\\|\\\\\\\"s\\|\\\\3" . "ss") ; "s,\"s,\3 -> ss
- ("\\\"u\\|\\\\\\\"u" . "ue") ; "u,\"u -> ue
- ("\\\"U\\|\\\\\\\"U" . "Ue") ; "U,\"U -> Ue
- ;; accents
- ("\\\\`\\|\\\\'\\|\\\\\\^\\|\\\\~\\|\\\\=\\|\\\\\\.\\|\\\\u\\|\\\\v\\|\\\\H\\|\\\\t\\|\\\\c\\|\\\\d\\|\\\\b" . "")
- ;; braces, quotes, concatenation.
- ("[`'\"{}#]" . "")
- ("\\\\-" . "") ; \- ->
- ;; spaces
- ("\\\\?[ \t\n]+\\|~" . " "))
+ (nconc
+ (mapcar (lambda (a) (cons (regexp-opt (car a)) (cdr a)))
+ '(;; language specific characters
+ (("\\aa") . "a") ; \aa -> a
+ (("\\AA") . "A") ; \AA -> A
+ (("\"a" "\\\"a" "\\ae") . "ae") ; "a,\"a,\ae -> ae
+ (("\"A" "\\\"A" "\\AE") . "Ae") ; "A,\"A,\AE -> Ae
+ (("\\i") . "i") ; \i -> i
+ (("\\j") . "j") ; \j -> j
+ (("\\l") . "l") ; \l -> l
+ (("\\L") . "L") ; \L -> L
+ (("\"o" "\\\"o" "\\o" "\\oe") . "oe") ; "o,\"o,\o,\oe -> oe
+ (("\"O" "\\\"O" "\\O" "\\OE") . "Oe") ; "O,\"O,\O,\OE -> Oe
+ (("\"s" "\\\"s" "\\3") . "ss") ; "s,\"s,\3 -> ss
+ (("\"u" "\\\"u") . "ue") ; "u,\"u -> ue
+ (("\"U" "\\\"U") . "Ue") ; "U,\"U -> Ue
+ ;; hyphen, accents
+ (("\\-" "\\`" "\\'" "\\^" "\\~" "\\=" "\\." "\\u" "\\v"
+ "\\H" "\\t" "\\c" "\\d" "\\b") . "")
+ ;; space
+ (("~") . " ")))
+ ;; more spaces
+ '(("[\s\t\n]*\\(?:\\\\\\)?[\s\t\n]+" . " ")
+ ;; braces, quotes, concatenation.
+ ("[`'\"{}#]" . "")))
"Alist of (OLD-REGEXP . NEW-STRING) pairs.
-Used by the default values of `bibtex-autokey-name-change-strings' and
+Used as default values of `bibtex-autokey-name-change-strings' and
`bibtex-autokey-titleword-change-strings'. Defaults to translating some
language specific characters to their ASCII transcriptions, and
-removing any character accents.")
+removing any accent characters.")
(defcustom bibtex-autokey-name-change-strings
bibtex-autokey-transcriptions
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Fri, 21 Feb 2020 20:04:01 GMT)
Full text and
rfc822 format available.
Message #32 received at 39686 <at> debbugs.gnu.org (full text, mbox):
On Fri Feb 21 2020 Mattias Engdegård wrote:
> Correct me if I'm wrong, but it looks like the assumption is that
> all possible \-sequences have been enumerated in that list. More
> likely, some kind of anchoring after the match is needed, like
> word-end ("\\>"). Then the matching order doesn't matter.
I am not sure I understand. The idea in bibtex-autokey-transcriptions
is that each match of OLD-REGEXP is replaced in full by NEW-STRING.
Subexpressions are ignored.
In LaTeX syntax, OLD-REGEXP can appear anywhere inside what LaTeX
considers a word (which even may include spaces). So to make things
more fool-proofed, it would be necessary to parse more carefully the
LaTeX code. I do not think this effort is needed here as these
regexps have worked well for at least two decades. The patch fixes
a minor problem of these regexps pointed out by the OP. But
otherwise it preserves their spirit.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Fri, 21 Feb 2020 20:44:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 39686 <at> debbugs.gnu.org (full text, mbox):
21 feb. 2020 kl. 21.03 skrev Roland Winkler <winkler <at> gnu.org>:
> In LaTeX syntax, OLD-REGEXP can appear anywhere inside what LaTeX
> considers a word (which even may include spaces). So to make things
> more fool-proofed, it would be necessary to parse more carefully the
> LaTeX code. I do not think this effort is needed here as these
> regexps have worked well for at least two decades. The patch fixes
> a minor problem of these regexps pointed out by the OP. But
> otherwise it preserves their spirit.
In LaTeX you can't just write 'b\oeuf'; it will complain that '\oeuf' is undefined. You have to write 'b\oe uf' or 'b{\oe}uf'. Thus there is a word break at the end. (Accents, like '\"o', are different; there is only a single letter after the '\"'.)
With your table, you replace '\o' with 'oe', but what if the text uses a different \-sequence starting with \o, like '\omega'? After substitution, you would have 'oemega' which wasn't intended.
Safer then to tack on a zero-width assertion, like
"\\\\\\(?:o\\|oe\\)\\>"
for example. Or, if you think it's hard to read,
(rx "\\" (or "o" "oe") word-end)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Fri, 21 Feb 2020 21:04:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 39686 <at> debbugs.gnu.org (full text, mbox):
On Fri Feb 21 2020 Mattias Engdegård wrote:
> In LaTeX you can't just write 'b\oeuf'; it will complain that
> '\oeuf' is undefined. You have to write 'b\oe uf' or
> 'b{\oe}uf'. Thus there is a word break at the end.
In bibtex-mode, all this is used in the context of what is supposed
to represent names and titles of books and articles. The current
scheme has been in place for at least two decades and nobody
complained about this. (Note that the current thread is about
something else.) So I assume that what you are concerned about is
so rare (at least in the context of the autokey machinery) that it
is not worth the effort.
But feel free to provide a patch along these lines. However, that
patch should certainly go into master because it modifies the
current scheme.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Fri, 21 Feb 2020 21:33:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 39686 <at> debbugs.gnu.org (full text, mbox):
21 feb. 2020 kl. 22.03 skrev Roland Winkler <winkler <at> gnu.org>:
> So I assume that what you are concerned about is
> so rare (at least in the context of the autokey machinery) that it
> is not worth the effort.
Fair enough. If you run into such a partial match problem in the future, you now know at least one more way to solve it.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39686
; Package
emacs
.
(Sat, 22 Feb 2020 09:13:01 GMT)
Full text and
rfc822 format available.
Message #44 received at 39686 <at> debbugs.gnu.org (full text, mbox):
> Date: Fri, 21 Feb 2020 13:50:01 -0600
> From: "Roland Winkler" <winkler <at> gnu.org>
> Cc: 39686 <at> debbugs.gnu.org,
> gojjoe2 <at> googlemail.com
>
> On Fri Feb 21 2020 Eli Zaretskii wrote:
> > I don't see a patch, just a defvar. Which part(s) of that are
> > modified, and how?
> >
> > Also, what commands/features will be affected by this change?
>
> I am sorry, I had tried to keep things readable for the OP.
> A proper patch is attached below.
>
> The patch affects the autokey machinery of bibtex-mode that
> calculates keys for new BibTeX entries based on the content of an
> entry. The entry point for the autokey machinery is the function
> bibtex-generate-autokey. In bibtex.el, this function is called once
> by the user command bibtex-clean-entry. No other package inside
> emacs relies on this.
Thanks. I'm fine with installing this on the emacs-27 branch.
Reply sent
to
"Roland Winkler" <winkler <at> gnu.org>
:
You have taken responsibility.
(Fri, 06 Mar 2020 08:47:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
"Roland Winkler" <winkler <at> gnu.org>
:
bug acknowledged by developer.
(Fri, 06 Mar 2020 08:47:02 GMT)
Full text and
rfc822 format available.
Message #49 received at 39686-done <at> debbugs.gnu.org (full text, mbox):
On Sat Feb 22 2020 Eli Zaretskii wrote:
> > The patch affects the autokey machinery of bibtex-mode that
> > calculates keys for new BibTeX entries based on the content of an
> > entry. The entry point for the autokey machinery is the function
> > bibtex-generate-autokey. In bibtex.el, this function is called once
> > by the user command bibtex-clean-entry. No other package inside
> > emacs relies on this.
>
> Thanks. I'm fine with installing this on the emacs-27 branch.
Done (commit cb1877321b8a04cdb9b890d76d99a9f5a7ed5bce).
I am sorry for the delay.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 03 Apr 2020 11:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 74 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.