GNU bug report logs - #2790
emacs 22.1.1 cannot open 5GB file on 64GB 64-bit Linux box

Previous Next

Package: emacs;

Reported by: Mike Coleman <tutufan <at> gmail.com>

Date: Thu, 26 Mar 2009 16:00:03 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #150 received at 2790 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 2790 <at> debbugs.gnu.org, tutufan <at> gmail.com
Subject: Re: bug#2790: emacs 22.1.1 cannot open 5GB file on 64GB 64-bit	GNU/Linux box
Date: Sun, 29 Mar 2009 16:10:26 -0400
> The patch below does this:

>> -	      || st.st_size > INT_MAX / 4)
>> +	      /* Actually, it should test either INT_MAX or LONG_MAX
>> +		 depending on which one is used for EMACS_INT.  But in
>> +		 any case, in practice, this test is redundant with the
>> +		 one above.
>> +		 || st.st_size > INT_MAX / 4 */)
>> error ("Maximum buffer size exceeded");

> But what about the commentary immediately preceding the modified code:
>   The calculations below double the file size twice, so check that it
>   can be multiplied by 4 safely.

The patch also adds a comment explaining that this test is actually
redundant in practice (and it will stay redundant as long as our Lisp
integers have at least 2bits of tag).

> I'm not sure to which calculations it alludes, but if you think they
> are no longer relevant, please remove that part of the comment,
> otherwise we will wonder in a couple of years why the code does not do
> what the comment says it should.

Since I'm not sure either, I kept the comment and added another one
explaining why I removed the check anyway.

> Personally, I would change INT_MAX/4 to LONG_MAX/4, because that does
> TRT on all supported platforms, 32-bit and 64-bit alike (long and int
> are both 32-bit wide on 32-bit machines).  That would avoid too
> radical changes during a pretest, which is a Good Thing, IMO.

In that case I'd rather do the check more directly, e.g.:

    (((EMACS_INT)st.st_size)*4)/4 == st.st_size

But as explained, I'm not convinced the check is needed/useful.

>> Note also that when you open large files, it's worthwhile to use
>> find-file-literally to be sure it's opened in unibyte mode;
>> otherwise it gets decoded which takes ages.
> Perhaps the prompt we pop for large file should suggest visiting
> literally as an option.

Yes, that's also what I was thinking.  Together with having different
"large-threshold" values for unibyte and multibyte.

>> Also if the file has many lines (my
>> 800MB file was made up by copying a C file many times, so it had
>> millions of lines), turning off line-number-mode is is needed to recover
>> responsiveness when navigating near the end of the buffer.

> Perhaps we should make the default value of line-number-display-limit
> non-nil, at least in 64-bit builds.

Agreed.  We could even do something better:
- do it more efficiently (once computed for a page, it should be able
  to update the count instantly when paging up/down, whereas it seems
  not to always be able to do that).
- when computing really would take a lot of time (e.g. we're far from
  the closest known line position), display ??? and postpone the actual
  computation to some future idle time.

In any case, large file introduce lots of problem.


        Stefan




This bug report was last modified 13 years and 316 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.