GNU bug report logs - #6345
css-mode `css-extract-keyword-list' does not actually [PATCH]

Previous Next

Package: emacs;

Reported by: MON KEY <monkey <at> sandpframing.com>

Date: Thu, 3 Jun 2010 18:02:02 UTC

Severity: minor

Tags: patch

Fixed in version 25.1

Done: Simen Heggestøyl <simenheg <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6345 in the body.
You can then email your comments to 6345 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6345; Package emacs. (Thu, 03 Jun 2010 18:02:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to MON KEY <monkey <at> sandpframing.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 03 Jun 2010 18:02:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: MON KEY <monkey <at> sandpframing.com>
To: bug-gnu-emacs <at> gnu.org
Subject: css-mode `css-extract-keyword-list' does not actually [PATCH]
Date: Thu, 3 Jun 2010 14:01:06 -0400
[Message part 1 (text/plain, inline)]
`css-extract-keyword-list' does not actually [PATCH]

In function `css-extract-keyword-list' the search for "Appendix
H. Index" fails e.g. this form:

   (search-backward "Appendix H. Index")

when used to search this the contents of this URL:

 "http://www.w3.org/TR/REC-CSS2/css2.txt"

which is dated: W3C Candidate Recommendation 08 September 2009

Returns this message:

 css-extract-keyword-list: Search failed: "Appendix H. Index"

It appears this function was originally supplied to scrape CSS
keywords as per the commented code in: lisp/textmodes/css-mode.el

,----
| (css-extract-keyword-list
|   '((pseudo . "^ +\\* :\\([^ \n,]+\\)")
|     (at . "^ +\\* @\\([^ \n,]+\\)")
|     (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)")
|     (media . "^ +\\* '\\([^ '\n]+\\)' media group")
|    (property . "^ +\\* '\\([^ '\n]+\\)',")))
`----

However, W3C has gone behined Stefan's back and changed the Appendix
enumeration without asking his permission first :)

"Appendix H" is now "Appendix I".

Compare the version scraped (presumably):

 (URL `http://www.w3.org/TR/2008/REC-CSS2-20080411/indexlist.html')
 (URL `http://www.w3.org/TR/2008/REC-CSS2-20080411/css2.txt')

with the current version:

 (URL `http://www.w3.org/TR/CSS2/indexlist.html')
 (URL `http://www.w3.org/TR/CSS2/css2.txt')

The following regexp may be more robust and appears to works for
either the older version or the latest version and leaves room for W3C
to continue add appendices J-M:

 (search-backward-regexp "[_━]\\{60,79\\}\xa[[:space:]]+Appendix [A-M]\. Index")

This said, `css-extract-keyword-list' is now borking on regexps in
these conses:

 (css-extract-keyword-list
  '((pseudo . "^ +\\* :\\([^ \n,]+\\)")
    (at . "^ +\\* @\\([^ \n,]+\\)")
    (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)")
    (media . "^ +\\* '\\([^ '\n]+\\)' media group")
    (property . "^ +\\* '\\([^ '\n]+\\)',")))

and seems to be failing per `url-insert-file-contents' reliance on
`decode-coding-inserted-region' which frobs the asterisks `*' (char
#x2a) into a bullet `•' (char #x2022) -- at least on on my system.

If we substitute occurences of "\\*" with "[*•]" (e.g. "[\x2a\x2022]")
the following regexps now seem to work correctly:

 (pp (css-extract-keyword-list
      '((pseudo . "^ +[\x2a\x2022] :\\([^ \n,]+\\)")
        (at . "^ +[\x2a\x2022] @\\([^ \n,]+\\)")
        (descriptor . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' (descriptor)")
        (media . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' media group")
        (property . "^ +[\x2a\x2022] '\\([^ '\n]+\\)',")))
     (current-buffer))


Following diffed against Bazaar revision 100231

;;; ==============================

*** ediff3753M5g	2010-06-03 09:43:04.000000000 -0400
--- lisp/textmodes/css-mode.el	2010-06-03 09:42:43.000000000 -0400
***************
*** 41,49 ****

  (defun css-extract-keyword-list (res)
    (with-temp-buffer
!     (url-insert-file-contents "http://www.w3.org/TR/REC-CSS2/css2.txt")
      (goto-char (point-max))
!     (search-backward "Appendix H. Index")
      (forward-line)
      (delete-region (point-min) (point))
      (let ((result nil)
--- 41,49 ----

  (defun css-extract-keyword-list (res)
    (with-temp-buffer
!     (url-insert-file-contents
"http://www.w3.org/TR/2008/REC-CSS2-20080411/css2.txt")
      (goto-char (point-max))
!     (search-backward-regexp "[_━]\\{60,79\\}\xa[[:space:]]+Appendix
[A-M]\. Index")
      (forward-line)
      (delete-region (point-min) (point))
      (let ((result nil)
***************
*** 115,125 ****

  ;; Extraction was done with:
  ;; (css-extract-keyword-list
! ;;  '((pseudo . "^ +\\* :\\([^ \n,]+\\)")
! ;;    (at . "^ +\\* @\\([^ \n,]+\\)")
! ;;    (descriptor . "^ +\\* '\\([^ '\n]+\\)' (descriptor)")
! ;;    (media . "^ +\\* '\\([^ '\n]+\\)' media group")
! ;;    (property . "^ +\\* '\\([^ '\n]+\\)',")))

  (defconst css-pseudo-ids
    '("active" "after" "before" "first" "first-child" "first-letter"
"first-line"
--- 115,125 ----

  ;; Extraction was done with:
  ;; (css-extract-keyword-list
! ;;      '((pseudo . "^ +[\x2a\x2022] :\\([^ \n,]+\\)")
! ;;        (at . "^ +[\x2a\x2022] @\\([^ \n,]+\\)")
! ;;        (descriptor . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' (descriptor)")
! ;;        (media . "^ +[\x2a\x2022] '\\([^ '\n]+\\)' media group")
! ;;        (property . "^ +[\x2a\x2022] '\\([^ '\n]+\\)',")))

  (defconst css-pseudo-ids
    '("active" "after" "before" "first" "first-child" "first-letter"
"first-line"
[css-mode.diff-2010-06-03 (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#6345; Package emacs. (Tue, 10 Apr 2012 11:13:02 GMT) Full text and rfc822 format available.

Message #8 received at 6345 <at> debbugs.gnu.org (full text, mbox):

From: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
To: MON KEY <monkey <at> sandpframing.com>
Cc: 6345 <at> debbugs.gnu.org
Subject: Re: bug#6345: css-mode `css-extract-keyword-list' does not actually
	[PATCH]
Date: Tue, 10 Apr 2012 13:11:19 +0200
MON KEY <monkey <at> sandpframing.com> writes:

> `css-extract-keyword-list' does not actually [PATCH]

[...]

> !     (search-backward-regexp "[_]\\{60,79\\}\xa[[:space:]]+Appendix [A-M]\. Index")

The rest of the patch seems reasonable (I think), but is there a way to
rework this?  Having characters like that in the source code isn't
ideal, if it can be avoided.

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#6345; Package emacs. (Tue, 10 Apr 2012 12:08:02 GMT) Full text and rfc822 format available.

Message #11 received at 6345 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
Cc: MON KEY <monkey <at> sandpframing.com>, 6345 <at> debbugs.gnu.org
Subject: Re: bug#6345: css-mode `css-extract-keyword-list' does not actually
	[PATCH]
Date: Tue, 10 Apr 2012 08:06:53 -0400
>> `css-extract-keyword-list' does not actually [PATCH]
> [...]
>> !     (search-backward-regexp "[_煤]\\{60,79\\}\xa[[:space:]]+Appendix [A-M]\. Index")

> The rest of the patch seems reasonable (I think), but is there a way to
> rework this?  Having characters like that in the source code isn't
> ideal, if it can be avoided.

Indeed: the code is only run occasionally to update the keyword-list, so
it's not super important for it to be terribly robust.  In a sense, the
code is only kept as documentation to have a good stating point for the
next time I need such a thing.


        Stefan




Reply sent to Simen Heggestøyl <simenheg <at> gmail.com>:
You have taken responsibility. (Thu, 19 Mar 2015 22:43:02 GMT) Full text and rfc822 format available.

Notification sent to MON KEY <monkey <at> sandpframing.com>:
bug acknowledged by developer. (Thu, 19 Mar 2015 22:43:02 GMT) Full text and rfc822 format available.

Message #16 received at 6345-done <at> debbugs.gnu.org (full text, mbox):

From: Simen Heggestøyl <simenheg <at> gmail.com>
To: 6345-done <at> debbugs.gnu.org
Subject: Re: Status: css-mode `css-extract-keyword-list' does not actually
 [PATCH]
Date: Thu, 19 Mar 2015 23:42:33 +0100
[Message part 1 (text/plain, inline)]
Version: 25.1

As of commit 7ec63a3afa52213b7b3cd3ecc0717c6e6504dc43, that code is no
longer part of css-mode.

Thanks for your report!

-- Simen
[Message part 2 (text/html, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 17 Apr 2015 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 71 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.