GNU bug report logs - #78612
imenu list generation failing on some items of similar quality

Previous Next

Package: emacs;

Reported by: mdnorton <mdnorton <at> proton.me>

Date: Wed, 28 May 2025 01:46:03 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

To reply to this bug, email your comments to 78612 AT debbugs.gnu.org.
There is no need to reopen the bug first.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#78612; Package emacs. (Wed, 28 May 2025 01:46:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to mdnorton <mdnorton <at> proton.me>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 28 May 2025 01:46:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: mdnorton <mdnorton <at> proton.me>
To: "bug-gnu-emacs <at> gnu.org" <bug-gnu-emacs <at> gnu.org>
Subject: imenu list generation failing on some items of similar quality
Date: Tue, 27 May 2025 17:54:05 +0000
[Message part 1 (text/plain, inline)]
Good morning. I've checked the bug archive for imenu​ and did not find anything that seemed related to what I'm seeing, so going to try to report this via the email in 34.3. Hopefully I cover everything necessary in 34.3. I cannot use the report function as I don't have Emacs set up with a mail service.

- Emacs version: 30.1
- GNU Emacs 30.1 (build 2, x86_64-w64-mingw32) of 2025-02-23

- Windows OS

What I'm seeing is a failure in imenu​ to create a correct list for function names using the MATLAB package ([https://github.com/mathworks/Emacs-MATLAB-Mode/](https://github.com/mathworks/Emacs-MATLAB-Mode/issues/42)). I've been working with the current maintainer and I am pretty certain that this is not directly a MATLAB package issue, though I cannot tell if it is an interaction. The regular expression to identify the function name that is the point of the imenu​ list is pretty gnarly but seems to work with the corner cases required for the syntax.

In any event, this is the behavior I see that doesn't make a lot of sense to me. Given a MATLAB file with the following contents:

% -----------------------------------------------
function foobar1(a, b, c)
end

function foobar2(a, b, c)
end

function gen_pulse_avg_lin_data(a, b, c)
end

function gen_pulse_avg_log_data(a, b, c)
end

function gen_beamsharpened_data(a, b, c)
end

function foobar3(a, b, c)end

% --------------------------------------------------------

As presented, imenu​ will fail and only list the last 3 functions. In the following notes, the "real" functions are defined by those that are not "foobar". I needed to create some experimental padding. So:

- If any single character in either of the three "real" function names is deleted, imenu​ will work and list all functions
- If any any single character in either of the three "real" function names is altered to another character, imenu​ will fail and list only the last 3 functions.
- If a character is added to the first of the three real function names, imenu​ will fail and only list the last 3 functions.
- If a character is added to either of the 2nd or 3rd real function names, imenu​ will fail and only list the last 2 functions.

It just so happens that the three gen​ functions have the same number of characters, and so I thought it might be related to imenu-max-item-length​ however that value is set to 60 characters and so I don't think it's relevant. I can't think of any reason why the list creation is so sensitive to the token length here. Playing around with the token length produced results so strange that I cannot even speculate on a bug vector.

In any event, hope this helps. It's a weird one to me.

Best regards,
Mark Norton

Sent with [Proton Mail](https://proton.me/mail/home) secure email.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78612; Package emacs. (Wed, 28 May 2025 12:10:02 GMT) Full text and rfc822 format available.

Message #8 received at 78612 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: mdnorton <mdnorton <at> proton.me>
Cc: 78612 <at> debbugs.gnu.org
Subject: Re: bug#78612: imenu list generation failing on some items of similar
 quality
Date: Wed, 28 May 2025 15:08:42 +0300
> Date: Tue, 27 May 2025 17:54:05 +0000
> From:  mdnorton via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> * Emacs version: 30.1
> * GNU Emacs 30.1 (build 2, x86_64-w64-mingw32) of 2025-02-23
> * Windows OS
> 
> What I'm seeing is a failure in imenu​ to create a correct list for function names using the MATLAB package
> (https://github.com/mathworks/Emacs-MATLAB-Mode/).  I've been working with the current maintainer and I
> am pretty certain that this is not directly a MATLAB package issue, though I cannot tell if it is an interaction. 
> The regular expression to identify the function name that is the point of the imenu​ list is pretty gnarly but
> seems to work with the corner cases required for the syntax.  
> 
> In any event, this is the behavior I see that doesn't make a lot of sense to me.  Given a MATLAB file with the
> following contents:
> 
> % -----------------------------------------------
> function foobar1(a, b, c)
> end
> 
> function foobar2(a, b, c)
> end
> 
> function gen_pulse_avg_lin_data(a, b, c)
> end
> 
> function gen_pulse_avg_log_data(a, b, c)
> end
> 
> function gen_beamsharpened_data(a, b, c)
> end
> 
> function foobar3(a, b, c)
> end
> % --------------------------------------------------------
> 
> As presented, imenu​ will fail and only list the last 3 functions.

I cannot reproduce this: in my testing imenu finds all of the
functions in this file.  But since you haven't presented a complete
recipe for reproducing the problem, I cannot be sure I did the same as
you do.

Here's what I did:

  emacs -Q
  M-: (add-to-list 'load-path "/path/to/Emacs-MATLAB-Mode/") RET
  M-x load-file RET /path/to/Emacs-MATLAB-Mode/matlab-autoload.el RET
  C-x C-f testimenu.m RET
  M-x imenu RET foobar3 RET
  M-x imenu-add-menubar-index RET
  Click mouse on the menubar's Imenu item

(where testimenu.m is the file with the contents you posted above).
At this point I see all the functions in the file in the menu that
Emacs drops down.

If you run the same recipe as I did, do you still see the problem?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78612; Package emacs. (Thu, 29 May 2025 13:18:01 GMT) Full text and rfc822 format available.

Message #11 received at 78612 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: mdnorton <mdnorton <at> proton.me>
Cc: 78612 <at> debbugs.gnu.org
Subject: Re: bug#78612: imenu list generation failing on some items of similar
 quality
Date: Thu, 29 May 2025 16:16:52 +0300
[Message part 1 (text/plain, inline)]
[Please use Reply All to replay, to keep the bug tracker CC'ed.]

> Date: Thu, 29 May 2025 12:42:07 +0000
> From: mdnorton <mdnorton <at> proton.me>
> 
> Thank you for looking into this.  I performed the steps as indicated (with my own variation -- I called the file foo2.m for example).  I do NOT get your results.
> 
> Attached a snapshot of the imenu entry that shows the results, and you can see only the 3 last entries.
> 
> At this point, I do think this is probably an interaction, though I don't know exactly how much to partition to imenu and matlab-mode.  While John Ciolfi and I were swapping debug details on this issue in the Emacs-MATLAB-Mode Github repo (https://github.com/mathworks/Emacs-MATLAB-Mode/issues/42) he did note that the particularly gnarly regex had a trackback due to the way MATLAB handles line extensions (a three character sequence of periods "...") and he'd gone with a somewhat simpler one that doesn't handle ever case.  I'm still at the commit I'm using currently as I've not had a chance to repull the repository and experiment.  
> 
> Look-backs in a regex state machine are kind of notorious for having the potential for going off the rails.  However, should a malfunctioning regex be able to eliminate entries in imenu?  I suppose that's the bug that I believe might be in imenu.  Perhaps imenu is entirely at the mercy of the imenu-generic-expression regex, and if that regex has problems, then imenu's behavior is undefined.  If that's the case, then that's the way it's been written and it's upon the package developer to create a better working regex.  However my speculation (because I am not great at Elisp and have not dug into the code here for imenu) is that imenu operates serially through the buffer characters applying the regex, and then adding list items as it finds them.  So, should a malfunctioning regex be capable of removing PRIOR list items imenu has discovered (assuming my speculation on how it works is correct)?  How should imenu behave when the regex has issues?
> 
> In any event, I will update my clone of the Emacs-MATLAB-Mode repository and then I'll have John's newest simpler regex which will work for my purposes (I don't use the crazy cases of line continuation he was trying to cover.)  So maybe this is just a case where imenu is as the mercy of a regex and gracefully recovering from that is difficult to impossible to do.  Just figured I'd report and alert in case there was a fundamental behavior issue, and if it's a case that code cannot anticipate and recover from and that's accepted, then that's alright.  I know it's impossible to foolproof everything!
> 
> Details about this particular regex at this SHA below:
> 
> The SHA that I'm at is eea387.  Just for reference, this is the contents of imenu-generic-expression for this commit.  Note, there are literal ^M characters in there, so those have been transcribed to string "^M" which isn't really the same thing, but required for the email environment.  
> 
> Value in #<buffer foo2.m>
> ((nil
>   "^[[:blank:]]*function\\>\\(?:\\(?:[]\\[a-zA-Z0-9_,[:blank:]]*\\(?:\\.\\.\\.[[:blank:]]*\\(?:%[^^M\n]*\\)?^M?\n\\)?\\)+[[:blank:]]*=\\)?\\(?:[[:blank:]]*\\(?:\\.\\.\\.[[:blank:]]*\\(?:%[^^M\n]*\\)?^M?\n\\)?\\)*[\\.[:space:]\n^M]*\\([a-zA-Z][a-zA-Z0-9_]+\\)"
>   1))
> 
> And, the bit of matlab.el at this commit that creates this (because it's a bit easier to read than the non-evaluated string regex (even the evaluated string regex is pretty rough).  There are literal ^M's here too however they just show up as whitespace in the Elisp.
> 
> ;; -----------------
> ;; | Imenu support |
> ;; -----------------
> ;; Example functions we match, f0, f1, f2, f3, f4, f5, F6, g4
> ;;    function f0
> ;;    function...
> ;;        a = f1
> ;;    function f2
> ;;    function x = ...
> ;;            f3
> ;;    function [a, ...
> ;;              b ] ...
> ;;              = ...
> ;;              f4(c)
> ;;    function a = F6
> ;;    function [ ...
> ;;        a, ... % comment for a
> ;;        b  ... % comment for b
> ;;             ] ...
> ;;             = ...
> ;;             g4(c)
> ;;
> (defvar matlab-imenu-generic-expression
>   ;; Using concat to increase indentation and improve readability
>   `(,(list nil (concat
>                 "^[[:blank:]]*"
>                 "function\\>"
> 
>                 ;; Optional return args, function ARGS = NAME. Capture the 'ARGS ='
>                 (concat "\\(?:"
> 
>                         ;; ARGS can span multiple lines
>                         (concat "\\(?:"
>                                 ;; valid ARGS chars: "[" "]" variables "," space, tab
>                                 "[]\\[a-zA-Z0-9_,[:blank:]]*"
>                                 ;; Optional continue to next line "..." or "... % comment"
>                                 "\\(?:" matlab--ellipsis-to-eol-re "\\)?"
>                                 "\\)+")
> 
>                         ;; ARGS must be preceeded by the assignment operator, "="
>                         "[[:blank:]]*="
> 
>                         "\\)?")
> 
>                 ;; Optional space/tabs or '...' continuation
>                 (concat "\\(?:"
>                         "[[:blank:]]*"
>                         "\\(?:" matlab--ellipsis-to-eol-re "\\)?"
>                         "\\)*")
> 
>                 "[\\.[:space:]\n\r]*"
>                 "\\([a-zA-Z][a-zA-Z0-9_]+\\)" ;; function NAME
>                 )
>            1))
>   "Regexp to find function names in *.m files for `imenu'.")

I used the latest commit from the 'default' branch of the repository,
which is one commit ahead of yours.  So maybe the recent changes to
the package solved this problem.  Please try the latest Git.

[image.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78612; Package emacs. (Thu, 29 May 2025 17:58:02 GMT) Full text and rfc822 format available.

Message #14 received at 78612 <at> debbugs.gnu.org (full text, mbox):

From: mdnorton <mdnorton <at> proton.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 78612 <at> debbugs.gnu.org
Subject: Re: bug#78612: imenu list generation failing on some items of similar
 quality
Date: Thu, 29 May 2025 17:57:17 +0000
Yes, the latest version seems to do what is necessary.  That solves the initial issue, and I suppose imenu is just deeply reliant on a good regex for predictable behavior.  Thanks for considering this problem.


Sent with Proton Mail secure email.

On Thursday, May 29th, 2025 at 8:16 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:

> [Please use Reply All to replay, to keep the bug tracker CC'ed.]
> 
> > Date: Thu, 29 May 2025 12:42:07 +0000
> > From: mdnorton mdnorton <at> proton.me
> > 
> > Thank you for looking into this. I performed the steps as indicated (with my own variation -- I called the file foo2.m for example). I do NOT get your results.
> > 
> > Attached a snapshot of the imenu entry that shows the results, and you can see only the 3 last entries.
> > 
> > At this point, I do think this is probably an interaction, though I don't know exactly how much to partition to imenu and matlab-mode. While John Ciolfi and I were swapping debug details on this issue in the Emacs-MATLAB-Mode Github repo (https://github.com/mathworks/Emacs-MATLAB-Mode/issues/42) he did note that the particularly gnarly regex had a trackback due to the way MATLAB handles line extensions (a three character sequence of periods "...") and he'd gone with a somewhat simpler one that doesn't handle ever case. I'm still at the commit I'm using currently as I've not had a chance to repull the repository and experiment.
> > 
> > Look-backs in a regex state machine are kind of notorious for having the potential for going off the rails. However, should a malfunctioning regex be able to eliminate entries in imenu? I suppose that's the bug that I believe might be in imenu. Perhaps imenu is entirely at the mercy of the imenu-generic-expression regex, and if that regex has problems, then imenu's behavior is undefined. If that's the case, then that's the way it's been written and it's upon the package developer to create a better working regex. However my speculation (because I am not great at Elisp and have not dug into the code here for imenu) is that imenu operates serially through the buffer characters applying the regex, and then adding list items as it finds them. So, should a malfunctioning regex be capable of removing PRIOR list items imenu has discovered (assuming my speculation on how it works is correct)? How should imenu behave when the regex has issues?
> > 
> > In any event, I will update my clone of the Emacs-MATLAB-Mode repository and then I'll have John's newest simpler regex which will work for my purposes (I don't use the crazy cases of line continuation he was trying to cover.) So maybe this is just a case where imenu is as the mercy of a regex and gracefully recovering from that is difficult to impossible to do. Just figured I'd report and alert in case there was a fundamental behavior issue, and if it's a case that code cannot anticipate and recover from and that's accepted, then that's alright. I know it's impossible to foolproof everything!
> > 
> > Details about this particular regex at this SHA below:
> > 
> > The SHA that I'm at is eea387. Just for reference, this is the contents of imenu-generic-expression for this commit. Note, there are literal ^M characters in there, so those have been transcribed to string "^M" which isn't really the same thing, but required for the email environment.
> > 
> > Value in #<buffer foo2.m>
> > ((nil
> > "^[[:blank:]]function\\>\\(?:\\(?:[]\\[a-zA-Z0-9_,[:blank:]]\\(?:\\.\\.\\.[[:blank:]]\\(?:%[^^M\n]\\)?^M?\n\\)?\\)+[[:blank:]]=\\)?\\(?:[[:blank:]]\\(?:\\.\\.\\.[[:blank:]]\\(?:%[^^M\n]\\)?^M?\n\\)?\\)[\\.[:space:]\n^M]\\([a-zA-Z][a-zA-Z0-9_]+\\)"
> > 1))
> > 
> > And, the bit of matlab.el at this commit that creates this (because it's a bit easier to read than the non-evaluated string regex (even the evaluated string regex is pretty rough). There are literal ^M's here too however they just show up as whitespace in the Elisp.
> > 
> > ;; -----------------
> > ;; | Imenu support |
> > ;; -----------------
> > ;; Example functions we match, f0, f1, f2, f3, f4, f5, F6, g4
> > ;; function f0
> > ;; function...
> > ;; a = f1
> > ;; function f2
> > ;; function x = ...
> > ;; f3
> > ;; function [a, ...
> > ;; b ] ...
> > ;; = ...
> > ;; f4(c)
> > ;; function a = F6
> > ;; function [ ...
> > ;; a, ... % comment for a
> > ;; b ... % comment for b
> > ;; ] ...
> > ;; = ...
> > ;; g4(c)
> > ;;
> > (defvar matlab-imenu-generic-expression
> > ;; Using concat to increase indentation and improve readability
> > `(,(list nil (concat
> > "^[[:blank:]]*"
> > "function\\>"
> > 
> > ;; Optional return args, function ARGS = NAME. Capture the 'ARGS ='
> > (concat "\\(?:"
> > 
> > ;; ARGS can span multiple lines
> > (concat "\\(?:"
> > ;; valid ARGS chars: "[" "]" variables "," space, tab
> > "[]\\[a-zA-Z0-9_,[:blank:]]*"
> > ;; Optional continue to next line "..." or "... % comment"
> > "\\(?:" matlab--ellipsis-to-eol-re "\\)?"
> > "\\)+")
> > 
> > ;; ARGS must be preceeded by the assignment operator, "="
> > "[[:blank:]]*="
> > 
> > "\\)?")
> > 
> > ;; Optional space/tabs or '...' continuation
> > (concat "\\(?:"
> > "[[:blank:]]"
> > "\\(?:" matlab--ellipsis-to-eol-re "\\)?"
> > "\\)")
> > 
> > "[\\.[:space:]\n\r]*"
> > "\\([a-zA-Z][a-zA-Z0-9_]+\\)" ;; function NAME
> > )
> > 1))
> > "Regexp to find function names in *.m files for `imenu'.")
> 
> 
> I used the latest commit from the 'default' branch of the repository,
> which is one commit ahead of yours. So maybe the recent changes to
> the package solved this problem. Please try the latest Git.




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Fri, 30 May 2025 07:10:02 GMT) Full text and rfc822 format available.

Notification sent to mdnorton <mdnorton <at> proton.me>:
bug acknowledged by developer. (Fri, 30 May 2025 07:10:02 GMT) Full text and rfc822 format available.

Message #19 received at 78612-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: mdnorton <mdnorton <at> proton.me>
Cc: 78612-done <at> debbugs.gnu.org
Subject: Re: bug#78612: imenu list generation failing on some items of similar
 quality
Date: Fri, 30 May 2025 10:09:04 +0300
> Date: Thu, 29 May 2025 17:57:17 +0000
> From: mdnorton <mdnorton <at> proton.me>
> Cc: 78612 <at> debbugs.gnu.org
> 
> Yes, the latest version seems to do what is necessary.  That solves the initial issue, and I suppose imenu is just deeply reliant on a good regex for predictable behavior.  Thanks for considering this problem.

Thanks, I'm therefore closing this bug.




This bug report was last modified 19 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.