GNU bug report logs -
#74132
31.0.50; thing-at-pt, ffap and Github markdown
Previous Next
To reply to this bug, email your comments to 74132 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Thu, 31 Oct 2024 10:43:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Madhu <enometh <at> meer.net>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 31 Oct 2024 10:43:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Consider the following text as is typically found on README.md
```
[](https://github.com/raysan5/raylib/releases)
```
If the point is say at "r" at "raylib/releases", invoking
(ffap-url-at-point) fails. this eventually calls
thing-at-point-bounds-of-url-at-point, which has hardcoded behaviour
to, skip over "allowed characters" backwards to find the beginning of
the bound. here it it finds the space character (in "Release
Downloads") and the whole thing fails.
This particular failure can be addressed by supplying the lim
paramater to the skip-chars-backward, as shown in the attached
patch.
does this look like a problem which ought to be solved? and is this
appropriate? (I was going to post on emacs-devel but decided to post
to the bug list instead) -- Best Regards, Madhu
[0001-lisp-thingatpt.el-recognize-urls-better-in-markdown-.patch (text/x-patch, inline)]
From 5971b7c10d7c38d540fdf278a0cd559c96b10ed2 Mon Sep 17 00:00:00 2001
From: Madhu <enometh <at> net.meer>
Date: Thu, 31 Oct 2024 15:40:42 +0530
Subject: [PATCH] lisp/thingatpt.el: recognize urls better in markdown text
* lisp/thingatpt.el: (thing-at-point-bounds-of-url-at-point): supply a
LIM when calling (skip-chars-backward allowed-chars), which is the
position where `thing-at-point-beginning-of-url-regexp' matches
backwards
problematic url e.g.
```
[](https://github.com/raysan5/raylib/releases)
```
If the point is in the the second url, skip-chars-backwards goes to the
space (between s and D) and `ffap-url-at-point' eventually fails.
but if we supply a limit with a left anchor, we work around it.
---
lisp/thingatpt.el | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/lisp/thingatpt.el b/lisp/thingatpt.el
index 3cfd3905701..0b8e28af5b9 100644
--- a/lisp/thingatpt.el
+++ b/lisp/thingatpt.el
@@ -502,9 +502,14 @@ thing-at-point-bounds-of-url-at-point
(let* ((allowed-chars "--:=&?$+@-Z_[:alpha:]~#,%;*()!'[]")
(skip-before "^[0-9a-zA-Z]")
(skip-after ":;.,!?'")
+ (hard-beg (and thing-at-point-beginning-of-url-regexp
+ (save-excursion
+ (re-search-backward
+ thing-at-point-beginning-of-url-regexp nil t)
+ (point))))
(pt (point))
(beg (save-excursion
- (skip-chars-backward allowed-chars)
+ (skip-chars-backward allowed-chars hard-beg)
(skip-chars-forward skip-before pt)
(point)))
(end (save-excursion
--
2.46.0.27.gfa3b914457
Added tag(s) patch.
Request was from
Stefan Kangas <stefankangas <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Fri, 01 Nov 2024 01:26:04 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Fri, 01 Nov 2024 05:39:02 GMT)
Full text and
rfc822 format available.
Message #10 received at 74132 <at> debbugs.gnu.org (full text, mbox):
There was a typo in the patch I posted. It should instead look like this
```
diff --git a/lisp/thingatpt.el b/lisp/thingatpt.el
--- a/lisp/thingatpt.el
+++ b/lisp/thingatpt.el
@@ -502,9 +502,15 @@ thing-at-point-bounds-of-url-at-point
(let* ((allowed-chars "--:=&?$+@-Z_[:alpha:]~#,%;*()!'[]")
(skip-before "^[0-9a-zA-Z]")
(skip-after ":;.,!?'")
+ (hard-beg (and thing-at-point-beginning-of-url-regexp
+ (save-excursion
+ (and
+ (re-search-backward
+ thing-at-point-beginning-of-url-regexp nil t)
+ (point)))))
(pt (point))
(beg (save-excursion
- (skip-chars-backward allowed-chars)
+ (skip-chars-backward allowed-chars hard-beg)
(skip-chars-forward skip-before pt)
(point)))
(end (save-excursion
```
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Sat, 02 Nov 2024 22:06:02 GMT)
Full text and
rfc822 format available.
Message #13 received at 74132 <at> debbugs.gnu.org (full text, mbox):
Madhu <enometh <at> meer.net> writes:
> There was a typo in the patch I posted. It should instead look like this
Could you please resend the amended patch as an attachment?
Also, how about having some tests for this?
> ```
> diff --git a/lisp/thingatpt.el b/lisp/thingatpt.el
> --- a/lisp/thingatpt.el
> +++ b/lisp/thingatpt.el
> @@ -502,9 +502,15 @@ thing-at-point-bounds-of-url-at-point
> (let* ((allowed-chars "--:=&?$+@-Z_[:alpha:]~#,%;*()!'[]")
> (skip-before "^[0-9a-zA-Z]")
> (skip-after ":;.,!?'")
> + (hard-beg (and thing-at-point-beginning-of-url-regexp
> + (save-excursion
> + (and
> + (re-search-backward
> + thing-at-point-beginning-of-url-regexp nil t)
> + (point)))))
> (pt (point))
> (beg (save-excursion
> - (skip-chars-backward allowed-chars)
> + (skip-chars-backward allowed-chars hard-beg)
> (skip-chars-forward skip-before pt)
> (point)))
> (end (save-excursion
> ```
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Sat, 09 Nov 2024 10:30:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 74132 <at> debbugs.gnu.org (full text, mbox):
> Date: Thu, 31 Oct 2024 16:06:49 +0530 (IST)
> From: Madhu <enometh <at> meer.net>
>
> Consider the following text as is typically found on README.md
>
> ```
> [](https://github.com/raysan5/raylib/releases)
> ```
>
> If the point is say at "r" at "raylib/releases", invoking
> (ffap-url-at-point) fails. this eventually calls
> thing-at-point-bounds-of-url-at-point, which has hardcoded behaviour
> to, skip over "allowed characters" backwards to find the beginning of
> the bound. here it it finds the space character (in "Release
> Downloads") and the whole thing fails.
>
> This particular failure can be addressed by supplying the lim
> paramater to the skip-chars-backward, as shown in the attached
> patch.
>
> does this look like a problem which ought to be solved? and is this
> appropriate? (I was going to post on emacs-devel but decided to post
> to the bug list instead) -- Best Regards, Madhu
>
> >From 5971b7c10d7c38d540fdf278a0cd559c96b10ed2 Mon Sep 17 00:00:00 2001
> From: Madhu <enometh <at> net.meer>
> Date: Thu, 31 Oct 2024 15:40:42 +0530
> Subject: [PATCH] lisp/thingatpt.el: recognize urls better in markdown text
>
> * lisp/thingatpt.el: (thing-at-point-bounds-of-url-at-point): supply a
> LIM when calling (skip-chars-backward allowed-chars), which is the
> position where `thing-at-point-beginning-of-url-regexp' matches
> backwards
>
> problematic url e.g.
> ```
> [](https://github.com/raysan5/raylib/releases)
> ```
> If the point is in the the second url, skip-chars-backwards goes to the
> space (between s and D) and `ffap-url-at-point' eventually fails.
> but if we supply a limit with a left anchor, we work around it.
What will this do to URLs such as
http://web.archive.org/web/20240221082647/https://www.imdb.com/
? More generally, to any URL that has another URL embedded in it?
I'm not sure I see how to resolve this dilemma. Stefan, any ideas?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Sat, 09 Nov 2024 15:53:01 GMT)
Full text and
rfc822 format available.
Message #19 received at 74132 <at> debbugs.gnu.org (full text, mbox):
> What will this do to URLs such as
>
> http://web.archive.org/web/20240221082647/https://www.imdb.com/
Depends where point is: if it's after the `https`, then you get the
"sub-URL" and if it's on or before the `https` then you get the whole URL.
> I'm not sure I see how to resolve this dilemma. Stefan, any ideas?
"url at point" is inherently heuristic, so I'm not too worried.
But I do very much agree with Stefan that we need tests, because it's
all too easy to run around in circles otherwise, fixing the heuristic to
handle case A but breaking case B along the way.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Sat, 09 Nov 2024 16:34:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 74132 <at> debbugs.gnu.org (full text, mbox):
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Madhu <enometh <at> meer.net>, 74132 <at> debbugs.gnu.org
> Date: Sat, 09 Nov 2024 10:52:06 -0500
>
> > What will this do to URLs such as
> >
> > http://web.archive.org/web/20240221082647/https://www.imdb.com/
>
> Depends where point is: if it's after the `https`, then you get the
> "sub-URL" and if it's on or before the `https` then you get the whole URL.
Exactly. This could be considered a bug, because the actual URL is
the entire thing.
> > I'm not sure I see how to resolve this dilemma. Stefan, any ideas?
>
> "url at point" is inherently heuristic, so I'm not too worried.
> But I do very much agree with Stefan that we need tests, because it's
> all too easy to run around in circles otherwise, fixing the heuristic to
> handle case A but breaking case B along the way.
I'm okay with adding tests, of course, but I'm not sure which of the
two behaviors leave you "not too worried": the current or the new one
after the proposed change. And why.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Sat, 09 Nov 2024 17:00:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 74132 <at> debbugs.gnu.org (full text, mbox):
>> > What will this do to URLs such as
>> > http://web.archive.org/web/20240221082647/https://www.imdb.com/
>> Depends where point is: if it's after the `https`, then you get the
>> "sub-URL" and if it's on or before the `https` then you get the whole URL.
> Exactly. This could be considered a bug, because the actual URL is
> the entire thing.
We could refine our heuristic to as to keep looking backward when the
apparent beginning of the URL is immediately preceded by a /, of course,
but I'm not sure it's worth the trouble.
AFAICT any behavior we come up with will have such cases.
>> > I'm not sure I see how to resolve this dilemma. Stefan, any ideas?
>> "url at point" is inherently heuristic, so I'm not too worried.
>> But I do very much agree with Stefan that we need tests, because it's
>> all too easy to run around in circles otherwise, fixing the heuristic to
>> handle case A but breaking case B along the way.
> I'm okay with adding tests, of course, but I'm not sure which of the
> two behaviors leave you "not too worried": the current or the new one
> after the proposed change. And why.
The [...](...) case mentioned by Madhu is a fairly common one IME, so
I'm in favor of fixing it. As for the behavior in your example, to the
extent that the users can control which URL they get (depending on where
they place point (or click)), I'm OK with either behavior.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Sat, 09 Nov 2024 18:00:01 GMT)
Full text and
rfc822 format available.
Message #28 received at 74132 <at> debbugs.gnu.org (full text, mbox):
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: enometh <at> meer.net, 74132 <at> debbugs.gnu.org
> Date: Sat, 09 Nov 2024 11:59:44 -0500
>
> >> > What will this do to URLs such as
> >> > http://web.archive.org/web/20240221082647/https://www.imdb.com/
> >> Depends where point is: if it's after the `https`, then you get the
> >> "sub-URL" and if it's on or before the `https` then you get the whole URL.
> > Exactly. This could be considered a bug, because the actual URL is
> > the entire thing.
>
> We could refine our heuristic to as to keep looking backward when the
> apparent beginning of the URL is immediately preceded by a /, of course,
> but I'm not sure it's worth the trouble.
The problem is that AFAIU '/' is not the only such character. It
could also be '=', I think (as in query URLs), and perhaps some
others.
> AFAICT any behavior we come up with will have such cases.
Yes, which is why I said I didn't know how to solve this.
> The [...](...) case mentioned by Madhu is a fairly common one IME, so
> I'm in favor of fixing it.
Not universally so, IME. It is common in Markdown files and perhaps
also in Org. So maybe this should be fine-tuned by major modes?
Severity set to 'wishlist' from 'normal'
Request was from
Stefan Kangas <stefankangas <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Thu, 02 Jan 2025 02:11:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Thu, 13 Feb 2025 10:18:02 GMT)
Full text and
rfc822 format available.
Message #33 received at 74132 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> So maybe this should be fine-tuned by major modes?
One complication is that you sometimes run into Markdown in other modes
too, for example Markdown inside of Python docstrings, which is quite
common. So even if we had such a mechanism, I still think there is no
way around trying to handle at least the most common cases by default.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Thu, 13 Feb 2025 10:18:02 GMT)
Full text and
rfc822 format available.
Message #36 received at 74132 <at> debbugs.gnu.org (full text, mbox):
Stefan Kangas <stefankangas <at> gmail.com> writes:
> Madhu <enometh <at> meer.net> writes:
>
>> There was a typo in the patch I posted. It should instead look like this
>
> Could you please resend the amended patch as an attachment?
>
> Also, how about having some tests for this?
Ping!
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Thu, 13 Feb 2025 10:53:02 GMT)
Full text and
rfc822 format available.
Message #39 received at 74132 <at> debbugs.gnu.org (full text, mbox):
I'd like to update my copy of emacs before reworking the patch, and
hope to get around to both by monday. But if the problem and answer
are clear enough, maybe you go ahead with a solution?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Thu, 13 Feb 2025 11:19:01 GMT)
Full text and
rfc822 format available.
Message #42 received at 74132 <at> debbugs.gnu.org (full text, mbox):
Madhu <enometh <at> meer.net> writes:
> I'd like to update my copy of emacs before reworking the patch, and
> hope to get around to both by monday. But if the problem and answer
> are clear enough, maybe you go ahead with a solution?
I don't have the bandwidth to take this on right now myself, sorry, so I
think we can wait for your patch. There's no rush with this, so please
take your time.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Tue, 18 Feb 2025 01:20:02 GMT)
Full text and
rfc822 format available.
Message #45 received at 74132 <at> debbugs.gnu.org (full text, mbox):
* Stefan Kangas <stefankangas <at> gmail.com> <CADwFkm=8ewN4B0b2bWnYtNJHNUxXLK-ePMmz8SxJQdztPaAZFg <at> mail.gmail.com>
Wrote on Thu, 13 Feb 2025 02:17:05 -0800
>> Also, how about having some tests for this?
I'm not sure about ert and am dealing with it for the first time. I
added the following:
```
--- a/test/lisp/thingatpt-tests.el
+++ b/test/lisp/thingatpt-tests.el
@@ -67,6 +67,11 @@ thing-at-point-test-data
("This (http://example.com/a(b))" 30 url "http://example.com/a(b)")
("This (http://example.com/a(b))" 5 url nil)
("http://example.com/ab)c" 4 url "http://example.com/ab)c")
+ ;; Github markdown
+ ("[](https://github.com/raysan5/raylib/releases)"
+ 85 "https://img.shields.io/github/downloads/raysan5/raylib/total")
+ ("[](https://github.com/raysan5/raylib/releases)"
+ 113 "https://github.com/raysan5/raylib/releases")
;; URL markup, lacking schema
("<url:foo <at> example.com>" 1 url "mailto:foo <at> example.com")
("<url:ftp.example.net/abc/>" 1 url "ftp://ftp.example.net/abc/")
```
But I don't think we need fill the code base with github url. I will
rework it with foobar and example.com in the strings. If you have
suggestions or any other things this patch should test for please let
me know.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Tue, 18 Feb 2025 01:27:02 GMT)
Full text and
rfc822 format available.
Message #48 received at submit <at> debbugs.gnu.org (full text, mbox):
* Madhu <20250218.064925.1530283513394051502.enometh <at> meer.net> :
Wrote on Tue, 18 Feb 2025 06:49:25 +0530 (IST):
> --- a/test/lisp/thingatpt-tests.el
> +++ b/test/lisp/thingatpt-tests.el
This patch is nonsense. I pasted it by mistake, please ignore.
One cannot use `thing-at-point-test-data' to test this change. because
of the additional setup. I think this should be a new test under
test/lisp/ffap-tests.el.
> But I don't think we need fill the code base with github url. I will
> rework it with foobar and example.com in the strings. If you have
> suggestions or any other things this patch should test for please let
> me know.
This part is the same
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#74132
; Package
emacs
.
(Thu, 20 Feb 2025 09:21:02 GMT)
Full text and
rfc822 format available.
Message #51 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I'm including the fixed patch (hand edited with a test, the test
severely needs review, thanks
[0001-lisp-thingatpt.el-recognize-urls-better-in-markdown-.patch (text/x-patch, attachment)]
This bug report was last modified 116 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.