GNU bug report logs - #27391
25.2.50; utf-8 coding cookie is not applied on some specific markdown file

Previous Next

Package: emacs;

Reported by: vincent.belaiche <at> gmail.com (Vincent Belaïche)

Date: Fri, 16 Jun 2017 10:01:01 UTC

Severity: normal

Found in version 25.2.50

Done: Philipp Stephani <p.stephani2 <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: vincent.belaiche <at> gmail.com (Vincent Belaïche)
To: Eli Zaretskii <eliz <at> gnu.org>, 27391 <at> debbugs.gnu.org, p.stephani2 <at> gmail.com
Cc: Vincent Belaïche <vincent.belaiche <at> gmail.com>
Subject: bug#27391: 25.2.50; utf-8 coding cookie is not applied on some specific markdown file
Date: Sat, 17 Jun 2017 07:45:35 +0200

Le 17/06/2017 à 00:23, Vincent Belaïche a écrit :
>
>
> Le 17/06/2017 à 00:09, Vincent Belaïche a écrit :
>>
>> Le 16/06/2017 à 21:37, Vincent Belaïche a écrit :
>>>
>>> Le 16/06/2017 à 21:15, Vincent Belaïche a écrit :
>> [...]
>>
>>>>
>>> After some more investigation, I think that the bug is in function
>>> insert-file-contents of fileio.c which is the one that decide and sets
>>> the coding system well before the other local variables are looked into.
>> I have located the bug.
>>
>> After some more investigation, in the end the find-auto-coding of
>> mule.el is what is called to detect the coding.
>>
>> This function evaluates this expression to find the local variables:
>>
>>   (re-search-forward
>>            "[\r\n]\\([^[\r\n]*\\)[ \t]*Local Variables:[ \t]*\\([^\r\n]*\\)[\r\n]"
>>            tail-end t)
>>
>> This expression evaluates to nil over file CONTRIBUTING.md
>>
>> I can make a simple fix if you tell me on which branch to do it.
>>
>> However I think that the root of the problem is poor code factorization
>> of local variable parsing between mule.el and file.el. A better, more
>> futureproof fix would be some unique local variable parser with some
>> input constrain telling what sort of setting are sought. The output of
>> the parse could be used in file.el and mule.el.
>>
>>    Vincent.
>>
>>
> Ooops... my lengthy email of T23:34 was unwantedly sent. A shorter
> version with only the conclusion and w/o all the details of my
> investigation is above.
>
> Anyway, Philipp's patch is what I had in mind as a quick fix. Although I
> don't think that this is a good solution not to factorize code when
> possible. Factorizing makes it more maintainable.
>
>  V.

Just to mention the following points noted by me when comparing the code
in find-auto-coding and in hack-local-variables:

* In hack-local-variables the tailing local variables section is
  considered to be at max 3000 characters from eob, while in
  find-auto-coding it is considered to be 3072. The « correct » figure
  should be 3072, not 3000, for consistency with « 1024 * 3 » code in
  function Finsert_file_contents of fileio.c :

		  if (nread == 1024)
		    {
		      int ntail;
		      if (lseek (fd, - (1024 * 3), SEEK_END) < 0)
			report_file_error ("Setting file position",
					   orig_filename);
		      ntail = emacs_read_quit (fd, read_buf + nread, 1024 * 3);
		      nread = ntail < 0 ? ntail : nread + ntail;
		    }

   Maybe the exact value should be in some constant.

* In find-auto-coding there is no such thing as regexp operator "^" (for
  bol) or "$" (for eol) used, instead there is "[\r\n]". I suspect that
  this is because at this stage the coding system is not yet set, and
  therefore there is no such thing as bol or eol, the whole buffer is a
  single line. If as such, I withdraw my previous statement that code
  factorization is desirable.


* In both cases what is sought for is the *FIRST* occurrence searched
  *FORWARD* of case sensitive string "Local Variables:" in the buffer
  tailing 3000--3072 characters. I think that this is a problem and that
  either we should search it *BACKWARD* or after finding the 1st
  occurrence, possible subsequent occurrences should be searched for,
  and the last occurrence should be considered instead. I say this
  because with emacs-template package it is possible that the template
  file has some local variables in the template definition section that
  differ from that of template itself. See
                (info "(template) DefSect")
  For instance the end of the template file would be as follow:


--8<----8<----8<----8<----8<-- begin -->8---->8---->8---->8---->8----

... blah blah blah template content ...

// Local Variables:
// toto: "tata"
// End:

>>>TEMPLATE-DEFINITION-SECTION<<<

... blah blah blah Lisp Template rules ...

;; Local Variables:
;; foo: "bar"
;; End:
--8<----8<----8<----8<----8<--  end  -->8---->8---->8---->8---->8----

  Maybe preventing the [ character in the prefix string is not a typo
  but was some intentional design to allow preventing false detection of
  the local variable section. I strongly recommend that before doing any
  fix, somebody dig in file history to find when and *WHY* this [
  preventing has been introduced --- sorry, but I do not volunteer for
  this tedious/time consuming kind of work...

   Vincent.

---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus





This bug report was last modified 8 years and 25 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.