GNU bug report logs - #10147
HTTP "Expires" header should handle non-date values

Previous Next

Package: guile;

Reported by: Daniel Hartwig <mandyke <at> gmail.com>

Date: Sun, 27 Nov 2011 10:42:02 UTC

Severity: normal

Tags: patch

Found in version 2.0.3

Done: Andy Wingo <wingo <at> pobox.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 10147 in the body.
You can then email your comments to 10147 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#10147; Package guile. (Sun, 27 Nov 2011 10:42:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniel Hartwig <mandyke <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Sun, 27 Nov 2011 10:42:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Daniel Hartwig <mandyke <at> gmail.com>
To: "R. P. Dillon" <rpdillon <at> gmail.com>
Cc: bug-guile <at> gnu.org, guile-user <at> gnu.org
Subject: HTTP "Expires" header should handle non-date values
Date: Sun, 27 Nov 2011 18:39:12 +0800
[Message part 1 (text/plain, inline)]
Package: guile
Version: 2.0.3
Tags: patch

On 6 November 2011 13:49, R. P. Dillon <rpdillon <at> gmail.com> wrote:
> (use-modules (web request) (web response) (web uri) (rnrs bytevectors))
> (define port (socket PF_INET SOCK_STREAM 0))
> (define address (addrinfo:addr (car (getaddrinfo "www.google.com" "http"))))
> (connect port address)
> (define request (build-request (build-uri 'http #:host "www.google.com")))
> (write-request request port)
> (define response (read-response port))
> (read-response ...) consistently fails with Google:
> web/http.scm:754:6: In procedure parse-asctime-date:
> web/http.scm:754:6: Throw to key `bad-header' with args `(date "-1")'.
> The expiration is set to -1 in the headers, and this seems to cause a
> problem for the web libraries in Guile.
> This same request seems to work well for my own domain (killring.org).

This is definitely a bug on Guile's part, HTTP/1.1 permits such values
for "Expires" headers [1], treating them as though they were a date in
the past:

   HTTP/1.1 clients and caches MUST treat other invalid date formats,
   especially including the value "0", as in the past (i.e., "already
   expired").

[1] http://tools.ietf.org/html/rfc2616#section-14.21

Attached patch permits non-date values for "Expires", leaving them as
strings (preferable, as such responses can be transparently forwarded
to other clients). The staleness of a response could be determined
quite crudely, e.g.

(define (response-stale? r)
  (let ((expires (response-expires r)))
    (and expires
         (or (not (date? expires)) ;; Indicates already expired.
             (time<=? (date->time-utc expires)
                      (current-time))))))

This approach completely ignores the recommended way of determining
whether a response has expired.  See section 13.2.4 of the RFC for
calculations involving various factors such as the time that a request
was sent, "Cache-Control" directives, etc.


Regards

Daniel
[0001-Permit-non-date-values-for-Expires-header.patch (text/x-patch, attachment)]

Information forwarded to bug-guile <at> gnu.org:
bug#10147; Package guile. (Thu, 22 Dec 2011 02:54:01 GMT) Full text and rfc822 format available.

Message #8 received at 10147 <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> pobox.com>
To: Daniel Hartwig <mandyke <at> gmail.com>
Cc: "R. P. Dillon" <rpdillon <at> gmail.com>, guile-user <at> gnu.org,
	10147 <at> debbugs.gnu.org
Subject: Re: bug#10147: HTTP "Expires" header should handle non-date values
Date: Wed, 21 Dec 2011 21:51:04 -0500
Hi Daniel,

So sorry for the delay.

On Sun 27 Nov 2011 05:39, Daniel Hartwig <mandyke <at> gmail.com> writes:

> This is definitely a bug on Guile's part, HTTP/1.1 permits such values
> for "Expires" headers [1], treating them as though they were a date in
> the past:
>
>    HTTP/1.1 clients and caches MUST treat other invalid date formats,
>    especially including the value "0", as in the past (i.e., "already
>    expired").
>
> [1] http://tools.ietf.org/html/rfc2616#section-14.21

But that's right after saying

   The format is an absolute date and time as defined by HTTP-date in
   section 3.3.1; it MUST be in RFC 1123 date format:

      Expires = "Expires" ":" HTTP-date

But, pragmatism may rule, here...

> Attached patch permits non-date values for "Expires", leaving them as
> strings (preferable, as such responses can be transparently forwarded
> to other clients). The staleness of a response could be determined
> quite crudely, e.g.
>
> (define (response-stale? r)
>   (let ((expires (response-expires r)))
>     (and expires
>          (or (not (date? expires)) ;; Indicates already expired.
>              (time<=? (date->time-utc expires)
>                       (current-time))))))

Let us assume that it is a good idea to include this hack.  Wouldn't it
be better to keep the expires header as a date?  Would any date in the
past work fine?

Would it be best to allow some special cases like "0" or "-1" instead?

I'm just trying to limit the damage here :)  WDYT?

Andy
-- 
http://wingolog.org/




Information forwarded to bug-guile <at> gnu.org:
bug#10147; Package guile. (Thu, 22 Dec 2011 04:31:02 GMT) Full text and rfc822 format available.

Message #11 received at 10147 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Hartwig <mandyke <at> gmail.com>
To: Andy Wingo <wingo <at> pobox.com>
Cc: "R. P. Dillon" <rpdillon <at> gmail.com>, guile-user <at> gnu.org,
	10147 <at> debbugs.gnu.org
Subject: Re: bug#10147: HTTP "Expires" header should handle non-date values
Date: Thu, 22 Dec 2011 12:28:01 +0800
On 22 December 2011 10:51, Andy Wingo <wingo <at> pobox.com> wrote:
>
> On Sun 27 Nov 2011 05:39, Daniel Hartwig <mandyke <at> gmail.com> writes:
>
>> This is definitely a bug on Guile's part, HTTP/1.1 permits such values
>> for "Expires" headers [1], treating them as though they were a date in
>> the past:
>>
>>    HTTP/1.1 clients and caches MUST treat other invalid date formats,
>>    especially including the value "0", as in the past (i.e., "already
>>    expired").
>>
>> [1] http://tools.ietf.org/html/rfc2616#section-14.21
>
> But that's right after saying
>
>   The format is an absolute date and time as defined by HTTP-date in
>   section 3.3.1; it MUST be in RFC 1123 date format:
>
>      Expires = "Expires" ":" HTTP-date
>
> But, pragmatism may rule, here...
>

... especially given that players like Google are using "-1" ;-)

> ... Wouldn't it
> be better to keep the expires header as a date?  Would any date in the
> past work fine?
>

That is what I initially considered.  A great solution because it
requires no changes to existing code which would be expecting a date
value.  I think any date would do, ideally it would match the value of
the Date header, but the current parsing code does not allow for
access to it.  I considered using a single, well defined value for
date-in-the-past (Unix epoch).

The *only* concern I had with this approach is wrt implementing a
cache/proxy.  My idea was that by storing the non-date values as a
string you can store/forward these unmodified and still check for the
"already expired" condition.

Admitedly this is a very minor concern, as there is no change in
semantics at the protocol level -- both approaches result in the
client understanding that the content is already expired.

I think what I came up with was a solution in need of a problem
(which should be solved more generally across the whole module
any way).


> Would it be best to allow some special cases like "0" or "-1" instead?
>

Not sure precisely what you mean here.  Is it something like:

(or (false-if-exception (parse-date str))
    (and (memq str '("0" "-1")) str)
    date-in-the-past)

?

> I'm just trying to limit the damage here :)  WDYT?
>

I am certainly in favour of keeping the value as a date to achieve
this aim.




Information forwarded to bug-guile <at> gnu.org:
bug#10147; Package guile. (Thu, 22 Dec 2011 12:38:02 GMT) Full text and rfc822 format available.

Message #14 received at 10147 <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> pobox.com>
To: Daniel Hartwig <mandyke <at> gmail.com>
Cc: "R. P. Dillon" <rpdillon <at> gmail.com>, guile-user <at> gnu.org,
	10147 <at> debbugs.gnu.org
Subject: Re: bug#10147: HTTP "Expires" header should handle non-date values
Date: Thu, 22 Dec 2011 07:35:24 -0500
On Wed 21 Dec 2011 23:28, Daniel Hartwig <mandyke <at> gmail.com> writes:

> On 22 December 2011 10:51, Andy Wingo <wingo <at> pobox.com> wrote:
>>
>> On Sun 27 Nov 2011 05:39, Daniel Hartwig <mandyke <at> gmail.com> writes:
>>
>>>    HTTP/1.1 clients and caches MUST treat other invalid date formats,
>>>    especially including the value "0", as in the past (i.e., "already
>>>    expired").
>>
>> But, pragmatism may rule, here...
>
> ... especially given that players like Google are using "-1" ;-)

Yes, indeed.

>> ... Wouldn't it
>> be better to keep the expires header as a date?  Would any date in the
>> past work fine?
>
> That is what I initially considered.  I considered using a single,
> well defined value for date-in-the-past (Unix epoch).

I think I would prefer this.  It makes user code easier, and with more
of a chance of being correct.  I think that Mozilla at least used to use
the beginnning of the epoch as this date.

>> Would it be best to allow some special cases like "0" or "-1" instead?
>>
>
> Not sure precisely what you mean here.  Is it something like:
>
> (or (false-if-exception (parse-date str))
>     (and (memq str '("0" "-1")) str)
>     date-in-the-past)

More like:

  (if (member str '("0" "-1"))
      date-in-the-past
      (parse-date str))

Then we can wait and see -- if only these two values are out there, then
we are good, and we keep the "validating" characteristic of our date
parser.  Otherwise we can fall back to the false-if-exception dance if
someone submits a bug report.

WDYT?  Want to send another patch? :-)

Andy
-- 
http://wingolog.org/




Information forwarded to bug-guile <at> gnu.org:
bug#10147; Package guile. (Tue, 27 Dec 2011 15:52:01 GMT) Full text and rfc822 format available.

Message #17 received at 10147 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Hartwig <mandyke <at> gmail.com>
To: Andy Wingo <wingo <at> pobox.com>
Cc: guile-user <at> gnu.org, 10147 <at> debbugs.gnu.org
Subject: Re: bug#10147: HTTP "Expires" header should handle non-date values
Date: Tue, 27 Dec 2011 23:49:00 +0800
[Message part 1 (text/plain, inline)]
On 22 December 2011 20:35, Andy Wingo <wingo <at> pobox.com> wrote:
>> Not sure precisely what you mean here.  Is it something like:
>>
>> (or (false-if-exception (parse-date str))
>>     (and (memq str '("0" "-1")) str)
>>     date-in-the-past)
>
> More like:
>
>  (if (member str '("0" "-1"))
>      date-in-the-past
>      (parse-date str))
>
> Then we can wait and see -- if only these two values are out there, then
> we are good, and we keep the "validating" characteristic of our date
> parser.  Otherwise we can fall back to the false-if-exception dance if
> someone submits a bug report.

A rough check against ~2600 sites scraped from dmoz.org shows only a
handful with other values.  These two:

"Mon, 12 Jul 1996 1:00:00 GMT"
                  ^ misses leading `0'
"Thu, 01 Jan 1970 00:00:00 +0000"
                           ^ should be `GMT'

The second (use of `+0000') was also encountered amongst other
date-valued headers in ~1% of pages sampled.  There might be a case
here for relaxing `parse-date' as I don't think these should be
handled specifically for "Expires" headers.

There were three more like:

"{ts '2011-12-27 08:12:22'}"

which only appeared for "Expires" headers.  They look something like
server directives which should have been transformed to legit
expiration dates but haven't been, due to misconfiguration.  In this
case I'd rather throw an error than parse it (wrongly) to
date-in-the-past.

Given those points, I have attached a patch implementing the suggested
handling for "Expires" and will take a look at perhaps relaxing
parse-date (and others).  Anyone have ideas on that?


Daniel
[0001-permit-non-date-values-for-Expires-header.patch (text/x-patch, attachment)]

Reply sent to Andy Wingo <wingo <at> pobox.com>:
You have taken responsibility. (Mon, 09 Jan 2012 22:38:02 GMT) Full text and rfc822 format available.

Notification sent to Daniel Hartwig <mandyke <at> gmail.com>:
bug acknowledged by developer. (Mon, 09 Jan 2012 22:38:03 GMT) Full text and rfc822 format available.

Message #22 received at 10147-done <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> pobox.com>
To: Daniel Hartwig <mandyke <at> gmail.com>
Cc: guile-user <at> gnu.org, 10147-done <at> debbugs.gnu.org
Subject: Re: bug#10147: HTTP "Expires" header should handle non-date values
Date: Mon, 09 Jan 2012 23:36:44 +0100
Hi Daniel,

Thanks very much for the thorough analysis!

On Tue 27 Dec 2011 16:49, Daniel Hartwig <mandyke <at> gmail.com> writes:

> Given those points, I have attached a patch implementing the suggested
> handling for "Expires" and will take a look at perhaps relaxing
> parse-date (and others).  Anyone have ideas on that?

I applied your patch, and I think some sensible parse-date relaxations
are a good idea too.

Regards,

Andy
-- 
http://wingolog.org/




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 07 Feb 2012 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 186 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.