GNU bug report logs -
#13598
24.3.50; url-http.el doesn't correctly parse headers when they are sent line-by-line
Previous Next
Reported by: coroa <at> online.de (Jonas Hoersch)
Date: Thu, 31 Jan 2013 18:11:01 UTC
Severity: normal
Merged with 14372
Found in versions 24.3, 24.3.50
Fixed in version 24.4
Done: Glenn Morris <rgm <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13598 in the body.
You can then email your comments to 13598 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13598
; Package
emacs
.
(Thu, 31 Jan 2013 18:11:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
coroa <at> online.de (Jonas Hoersch)
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 31 Jan 2013 18:11:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
hej, everyone,
i just finished hunting down an improbable bug in url-http.el, which
appears when a url is retrieved from a server which sends the headers
line-by-line instead of in one junk, like it is the case for the
BaseHTTPServer classes coming along in python 2.
A simple test-case looks like the following (sorry for the long
non-emacs setup stuff, but it's the minimalist example i could come up
with)
cd into a directory containing only a single minimal text file and
start python's SimpleHTTPServer so it serves it.
$ cd $(mktemp -d)
$ echo "hello world" > textfile
$ python -m SimpleHTTPServer 8000 # works only for python 2.x
(switch-to-buffer (url-retrieve-synchronously
"http://127.0.0.1:8000/textfile"))
now correctly will retrieve the "hello world" but
the buffer-local-variables url-http-content-type and
url-http-content-length are nil in the returned buffer, although one
sees that they have been transmitted by python.
adding an extra debug line to url-http's
url-http-wait-for-headers-change-function around line 1043,
------
(when (re-search-forward "^\r*$" nil t)
;; Saw the end of the headers
(url-http-debug "Saw end of headers... (%s)" (buffer-name))
+ (url-http-debug "when the buffer contained...\n%s" (buffer-substring (point-min) (point-max)))
(setq url-http-end-of-headers (set-marker (make-marker)
(point))
end-of-headers t)
-------
will show you in *URL-DEBUG* (url-debug being t)
-------
http -> Saw end of headers... ( *http 127.0.0.1:8000*-273882)
http -> when the buffer contained...
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/2.7.3
http -> url-http-parse-response called in ( *http 127.0.0.1:8000*-273882)
http -> No content-length, being dumb.
-------
that the headers haven't completely arrived yet, when url-http decides
it has seen the end of them.
changing the regex in (re-search-forward "^\r*$" nil t) to "^\r*\n"
solves the problem for me, but i'm unsure about what i might possibly be
breaking that way.
thanks for looking into it,
jonas hörsch
In GNU Emacs 24.3.50.1 (x86_64-unknown-linux-gnu, X toolkit, Xaw3d scroll bars)
of 2013-01-29 on kafka
Bzr revision: michael.albinus <at> gmx.de-20130129081211-mmthn9p4bh75h5pr
Windowing system distributor `The X.Org Foundation', version 11.0.11302000
Configured using:
`configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
--libexecdir=/usr/lib --mandir=/usr/share/man --without-sound
--with-xft --with-x-toolkit=lucid'
Important settings:
value of $LANG: en_GB.UTF-8
locale-coding-system: utf-8-unix
default enable-multibyte-characters: t
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13598
; Package
emacs
.
(Thu, 07 Feb 2013 18:32:02 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
On Thu, Jan 31 2013, Jonas Hoersch wrote:
> changing the regex in (re-search-forward "^\r*$" nil t) to "^\r*\n"
> solves the problem for me, but i'm unsure about what i might possibly be
> breaking that way.
i'm positive now, that changing the regex to "^\r+$" is the way to go.
i would be happy to supply a patch, but i understand it is probably to
trivial a matter to justify going through the legal requirements first.
the following advice can serve as a hotfix:
(defadvice url-http-wait-for-headers-change-function (around
url-http-properly-wait-for-headers-advice
activate)
(save-excursion
(goto-char (point-min))
(if (re-search-forward "^\r+$" nil t)
ad-do-it
(url-http-debug "Incomplete headers...: %d" (point-max)))))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13598
; Package
emacs
.
(Wed, 13 Feb 2013 17:20:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 13598 <at> debbugs.gnu.org (full text, mbox):
Hi Jonas,
coroa <at> online.de (Jonas Hörsch) writes:
> On Thu, Jan 31 2013, Jonas Hoersch wrote:
>
>> changing the regex in (re-search-forward "^\r*$" nil t) to "^\r*\n"
>> solves the problem for me, but i'm unsure about what i might possibly be
>> breaking that way.
>
> i'm positive now, that changing the regex to "^\r+$" is the way to go.
>
> i would be happy to supply a patch, but i understand it is probably to
> trivial a matter to justify going through the legal requirements first.
>
> the following advice can serve as a hotfix:
>
> (defadvice url-http-wait-for-headers-change-function (around
> url-http-properly-wait-for-headers-advice
> activate)
> (save-excursion
> (goto-char (point-min))
> (if (re-search-forward "^\r+$" nil t)
> ad-do-it
> (url-http-debug "Incomplete headers...: %d" (point-max)))))
I confirm both the problem and the fix.
It does not look critical though. Stefan, Glenn, should I
commit the patch into trunk (or emacs-24)?
--
Bastien
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13598
; Package
emacs
.
(Wed, 13 Feb 2013 19:32:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 13598 <at> debbugs.gnu.org (full text, mbox):
> It does not look critical though. Stefan, Glenn, should I
> commit the patch into trunk (or emacs-24)?
AFAIK this is not a regression, so => trunk
And thanks for taking care of it,
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13598
; Package
emacs
.
(Wed, 13 Feb 2013 19:44:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 13598 <at> debbugs.gnu.org (full text, mbox):
Bastien wrote:
>> i'm positive now, that changing the regex to "^\r+$" is the way to go.
I don't understand how this can be correct. What is this supposed to be
matching?
http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.3
The line terminator for message-header fields is the sequence CRLF.
However, we recommend that applications, when parsing such headers,
recognize a single LF as a line terminator and ignore the leading CR.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13598
; Package
emacs
.
(Wed, 13 Feb 2013 21:40:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 13598 <at> debbugs.gnu.org (full text, mbox):
>>> i'm positive now, that changing the regex to "^\r+$" is the way to go.
Would "^\r?\n" be better?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13598
; Package
emacs
.
(Thu, 14 Feb 2013 06:10:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 13598 <at> debbugs.gnu.org (full text, mbox):
Hi Glenn,
Glenn Morris <rgm <at> gnu.org> writes:
>>>> i'm positive now, that changing the regex to "^\r+$" is the way to go.
>
> Would "^\r?\n" be better?
The quote of the OP is misleading -- he proposed to change the regexp
in (re-search-forward "^\r*$" nil t) to "^\r*\n", which is the fix I'm
talking about.
But yes, "^\r?\n" is slightly better than "^\r*\n" because AFAIK there
can be only one CR in the line separating the headers from the body.
Let me know if you want to fix this yourself or if I should do it.
Thanks,
--
Bastien
Reply sent
to
Glenn Morris <rgm <at> gnu.org>
:
You have taken responsibility.
(Sat, 16 Feb 2013 02:08:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
coroa <at> online.de (Jonas Hoersch)
:
bug acknowledged by developer.
(Sat, 16 Feb 2013 02:08:03 GMT)
Full text and
rfc822 format available.
Message #28 received at 13598-done <at> debbugs.gnu.org (full text, mbox):
Version: 24.4
Glenn Morris wrote:
> Would "^\r?\n" be better?
Applied.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 16 Mar 2013 11:24:04 GMT)
Full text and
rfc822 format available.
bug unarchived.
Request was from
Glenn Morris <rgm <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Wed, 08 May 2013 23:09:03 GMT)
Full text and
rfc822 format available.
Forcibly Merged 13598 14372.
Request was from
Glenn Morris <rgm <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Wed, 08 May 2013 23:09:03 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 06 Jun 2013 11:24:04 GMT)
Full text and
rfc822 format available.
bug unarchived.
Request was from
Blazej Adamczyk <blazej.adamczyk <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Wed, 26 Feb 2014 16:45:04 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13598
; Package
emacs
.
(Wed, 26 Feb 2014 16:46:01 GMT)
Full text and
rfc822 format available.
Message #41 received at 13598 <at> debbugs.gnu.org (full text, mbox):
Hello,
I had to reopen the bug because I faced the same problem as OP. His didn't make himself clear:
By example:
When parsing response we may get in state when we will receive only the following:
"HTTP/1.0 200 OK^M
"
without double quotes (I added them to show the newline character).
In case of current implementation the regexp "^\r?$" and the previous regexp "^\r*$" both are matching the end of string. That is wrong because there will be something in the new line after a while.
RFC 2616 states clear:
generic-message = start-line
*(message-header CRLF)
CRLF
[ message-body ]
start-line = Request-Line | Status-Line
there has to be one (exactly one) CR in a single line between headers and body. Thus I propose a simple regexp "^\r$".
--
Blazej
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13598
; Package
emacs
.
(Wed, 26 Feb 2014 16:55:01 GMT)
Full text and
rfc822 format available.
Message #44 received at 13598 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
I had to reopen the bug because I faced the same problem as OP. His didn't make himself clear:
By example:
When parsing response we may get in state when we will receive only the following:
"HTTP/1.0 200 OK^M
"
without double quotes (I added them to show the newline character).
In case of current implementation the regexp "^\r?$" and the previous regexp "^\r*$" both are matching the end of string. That is wrong because there will be something in the new line after a while.
RFC 2616 states clear:
generic-message = start-line
*(message-header CRLF)
CRLF
[ message-body ]
start-line = Request-Line | Status-Line
there has to be one (exactly one) CR in a single line between headers and body. Thus I propose a simple regexp "^\r$".
--
Blazej
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13598
; Package
emacs
.
(Thu, 27 Feb 2014 22:44:01 GMT)
Full text and
rfc822 format available.
Message #47 received at 13598 <at> debbugs.gnu.org (full text, mbox):
Blazej Adamczyk wrote:
> By example:
> When parsing response we may get in state when we will receive only
> the following:
>
> "HTTP/1.0 200 OK^M
> "
>
> without double quotes (I added them to show the newline character).
>
> In case of current implementation the regexp "^\r?$" and the previous
> regexp "^\r*$" both are matching the end of string. That is wrong
> because there will be something in the new line after a while.
The current implementation uses "^\r?\n", not "^\r?$".
Where did you get "^\r?$" from?
As such I do not see that it will match your example.
> RFC 2616 states clear:
> generic-message = start-line
> *(message-header CRLF)
> CRLF
> [ message-body ]
> start-line = Request-Line | Status-Line
>
> there has to be one (exactly one) CR in a single line between headers
> and body. Thus I propose a simple regexp "^\r$".
Yes, but as I already quoted in
http://debbugs.gnu.org/13598#17
it also recommends tolerance:
The line terminator for message-header fields is the sequence CRLF.
However, we recommend that applications, when parsing such headers,
recognize a single LF as a line terminator and ignore the leading CR.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#13598
; Package
emacs
.
(Mon, 03 Mar 2014 06:11:02 GMT)
Full text and
rfc822 format available.
Message #50 received at 13598 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Ahh yes my mistake! I was looking at some wrong sources. Obviously the current "^\r?\n" is correct.
Sorry and thanks!
Blazej
From Glenn Morris <rgm <at> gnu.org> w dniu 27 lut 2014, o godz. 23:43:
Blazej Adamczyk wrote:
By example:
When parsing response we may get in state when we will receive only
the following:
"HTTP/1.0 200 OK^M
"
without double quotes (I added them to show the newline character).
In case of current implementation the regexp "^\r?$" and the previous
regexp "^\r*$" both are matching the end of string. That is wrong
because there will be something in the new line after a while.
The current implementation uses "^\r?\n", not "^\r?$".
Where did you get "^\r?$" from?
As such I do not see that it will match your example.
RFC 2616 states clear:
generic-message = start-line
*(message-header CRLF)
CRLF
[ message-body ]
start-line = Request-Line | Status-Line
there has to be one (exactly one) CR in a single line between headers
and body. Thus I propose a simple regexp "^\r$".
Yes, but as I already quoted in
http://debbugs.gnu.org/13598#17
it also recommends tolerance:
The line terminator for message-header fields is the sequence CRLF.
However, we recommend that applications, when parsing such headers,
recognize a single LF as a line terminator and ignore the leading CR.
[signature.asc (application/pgp-signature, attachment)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 31 Mar 2014 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 77 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.