From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 20 17:04:20 2011 Received: (at submit) by debbugs.gnu.org; 20 Apr 2011 21:04:20 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCeZP-000411-QS for submit@debbugs.gnu.org; Wed, 20 Apr 2011 17:04:20 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCeZO-00040p-0L for submit@debbugs.gnu.org; Wed, 20 Apr 2011 17:04:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QCeZH-0002Uy-Q7 for submit@debbugs.gnu.org; Wed, 20 Apr 2011 17:04:12 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, RFC_ABUSE_POST, T_DKIM_INVALID,T_TO_NO_BRKTS_FREEMAIL autolearn=no version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:58515) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QCeZH-0002Uu-O9 for submit@debbugs.gnu.org; Wed, 20 Apr 2011 17:04:11 -0400 Received: from eggs.gnu.org ([140.186.70.92]:55252) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QCeZG-0001UB-Vq for bug-gnu-emacs@gnu.org; Wed, 20 Apr 2011 17:04:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QCeZG-0002Uk-9t for bug-gnu-emacs@gnu.org; Wed, 20 Apr 2011 17:04:10 -0400 Received: from mail-pw0-f41.google.com ([209.85.160.41]:33161) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QCeZG-0002Ue-59 for bug-gnu-emacs@gnu.org; Wed, 20 Apr 2011 17:04:10 -0400 Received: by pwi10 with SMTP id 10so882324pwi.0 for ; Wed, 20 Apr 2011 14:04:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:subject:date:message-id:mime-version :content-type; bh=Dlrqjsz0zC9DnS/i49lVYKUqSstLsBkAoxRkIajIpRM=; b=V6/9rhxAnqugEytwHEWTnyv83tWXWiPdinrqaUYGhYUvBwooIwYxYNFJbceHVK37nG nI+n/hOTF1qT/poCBXjw7gUb0jjH8g4A/J3H8p2KJ00xJVbatk+MWVG5/xRcbJ01MXjX SsI/lP/67vZ2M7dy0IRr0BjC8QS0Fsea7TzT8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:subject:date:message-id:mime-version:content-type; b=hbMW2T0uQoKTiqbNf4dyyQ4MtBMJWmqgCIEBWgglWbf+OSwJ19QTopsPtRFnltioyY ra2oaiZZWtWW7VJ655LNALH8L35rKVA8QhCeYPOULG+dYIzdVZ17d4hC/fil442Iym2e PX4/h8a0X1chxAn5UT3sIOuvr2eonhKfMjTt8= Received: by 10.68.23.33 with SMTP id j1mr11165809pbf.443.1303333449137; Wed, 20 Apr 2011 14:04:09 -0700 (PDT) Received: from braintron.67.42.142.120 ([67.42.142.120]) by mx.google.com with ESMTPS id d3sm840246pbh.73.2011.04.20.14.04.07 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 20 Apr 2011 14:04:08 -0700 (PDT) From: Evans Winner To: bug-gnu-emacs@gnu.org Subject: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit Date: Wed, 20 Apr 2011 15:04:06 -0600 Message-ID: <87bp00iqih.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -5.9 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) My understanding is that a 32-bit GNU Emacs should be able to open files up to 512 M. If I am wrong about that, please let me know. I have compiled Emacs trunk from source several times in the last couple of months and somewhere in the last month or so it seems that the limit on my machine has become 128 M. My math could be off, but on the assumption that 128 Mebibytes = 2^27 bytes = 1024 * 131072 bytes, and starting with emacs -Q I tried: $ dd if=/dev/zero of=testfile bs=1024 count=131072 and tried to open the file, and got: "Maximum buffer size exceeded". Then I tried one K less: $ dd if=/dev/zero of=testfile bs=1024 count=131071 and the buffer opened. I have verified using the `top' command that there is sufficient free memory for the files. Also, for what it's worth: ELISP> most-positive-fixnum ==> 536870911 I discovered this as a result of not being able to open a large (~160Mb) .pdf file that I had earlier been able to open. Please let me know if there is any other information I can provide, or if there is something simple I am doing wrong. In GNU Emacs 24.0.50.1 (i686-pc-linux-gnu, GTK+ Version 3.0.8) of 2011-04-19 on braintron Windowing system distributor `The X.Org Foundation', version 11.0.11001000 configured using `configure '--with-x-toolkit=gtk3'' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: en_US.UTF-8 value of $XMODIFIERS: nil locale-coding-system: utf-8-unix default enable-multibyte-characters: t From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 21 01:53:44 2011 Received: (at 8528) by debbugs.gnu.org; 21 Apr 2011 05:53:44 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCmpk-0007MS-Il for submit@debbugs.gnu.org; Thu, 21 Apr 2011 01:53:44 -0400 Received: from mtaout21.012.net.il ([80.179.55.169]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCmpi-0007MC-3x for 8528@debbugs.gnu.org; Thu, 21 Apr 2011 01:53:43 -0400 Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0LJZ00100MHNBT00@a-mtaout21.012.net.il> for 8528@debbugs.gnu.org; Thu, 21 Apr 2011 08:52:36 +0300 (IDT) Received: from HOME-C4E4A596F7 ([77.124.129.240]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0LJZ000SVMZMY250@a-mtaout21.012.net.il>; Thu, 21 Apr 2011 08:52:36 +0300 (IDT) Date: Thu, 21 Apr 2011 08:52:34 +0300 From: Eli Zaretskii Subject: Re: bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit In-reply-to: <87bp00iqih.fsf@gmail.com> X-012-Sender: halo1@inter.net.il To: Evans Winner , Paul Eggert Message-id: <83r58w2lst.fsf@gnu.org> References: <87bp00iqih.fsf@gmail.com> X-Spam-Score: -2.1 (--) X-Debbugs-Envelope-To: 8528 Cc: 8528@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.1 (--) > From: Evans Winner > Date: Wed, 20 Apr 2011 15:04:06 -0600 > > My understanding is that a 32-bit GNU Emacs should be able > to open files up to 512 M. If I am wrong about that, please > let me know. I have compiled Emacs trunk from source > several times in the last couple of months and somewhere in > the last month or so it seems that the limit on my machine > has become 128 M. My math could be off, but on the > assumption that 128 Mebibytes = 2^27 bytes = 1024 * 131072 > bytes, and starting with emacs -Q I tried: > > $ dd if=/dev/zero of=testfile bs=1024 count=131072 > > and tried to open the file, and got: "Maximum buffer size > exceeded". This happens because of the following test in insert-file-contents: /* Arithmetic overflow can occur if an Emacs integer cannot represent the file size, or if the calculations below overflow. The calculations below double the file size twice, so check that it can be multiplied by 4 safely. Also check whether the size is negative, which can happen on a platform that allows file sizes greater than the maximum off_t value. */ if (! not_regular && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM / 4)) error ("Maximum buffer size exceeded"); This test was commented out for the last 2 years, but lately it was uncommented by Paul Eggert in revision 103841 on the trunk. Paul, could you please tell where do you see twice doubling of the file size in insert-file-contents? Back in 1999, when this test was first introduced, there was indeed such doubling. But even then it was only when the REPLACE argument was non-nil (according to my reading of the code). In any case, that part of code was completely rewritten since then, and I don't believe we double the file size even once. By disabling that test, I was able to visit a 260-MB file on a 32-bit machine. So it seems like this test could be removed, if I'm not missing anything. From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 21 02:19:09 2011 Received: (at 8528) by debbugs.gnu.org; 21 Apr 2011 06:19:09 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCnEK-0007vp-8M for submit@debbugs.gnu.org; Thu, 21 Apr 2011 02:19:09 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCnEI-0007vK-9M for 8528@debbugs.gnu.org; Thu, 21 Apr 2011 02:19:06 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 945C539E8105; Wed, 20 Apr 2011 23:19:00 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9zB+2pHzON56; Wed, 20 Apr 2011 23:19:00 -0700 (PDT) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 2036D39E80B1; Wed, 20 Apr 2011 23:19:00 -0700 (PDT) Message-ID: <4DAFCC4F.1080900@cs.ucla.edu> Date: Wed, 20 Apr 2011 23:18:55 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8 MIME-Version: 1.0 To: Eli Zaretskii Subject: Re: bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit References: <87bp00iqih.fsf@gmail.com> <83r58w2lst.fsf@gnu.org> In-Reply-To: <83r58w2lst.fsf@gnu.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 8528 Cc: 8528@debbugs.gnu.org, Evans Winner X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.0 (---) On 04/20/11 22:52, Eli Zaretskii wrote: > Paul, could you please tell where do you see twice doubling of the > file size in insert-file-contents? I assumed that it was because the internal buffers contain an Emacs-encoded version of the file, which could be as long as four times the actual file size, because a single byte in the file might expand to 4 bytes inside Emacs in some cases. That would explain the behavior that you saw: if your file's internal encoding was the same as the external, you wouldn't observe any problem. The problem would be exhibited only with files containing many characters that bloat when read into memory. However, I didn't investigate the matter thoroughly; perhaps someone who's more expert on how Emacs encodes things internally could speak up. From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 21 02:40:37 2011 Received: (at 8528) by debbugs.gnu.org; 21 Apr 2011 06:40:37 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCnZ6-0008QI-74 for submit@debbugs.gnu.org; Thu, 21 Apr 2011 02:40:36 -0400 Received: from mtaout20.012.net.il ([80.179.55.166]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCnZ2-0008Q1-U8 for 8528@debbugs.gnu.org; Thu, 21 Apr 2011 02:40:34 -0400 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0LJZ00K00OUZHU00@a-mtaout20.012.net.il> for 8528@debbugs.gnu.org; Thu, 21 Apr 2011 09:40:25 +0300 (IDT) Received: from HOME-C4E4A596F7 ([77.124.129.240]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0LJZ00JS4P78KH40@a-mtaout20.012.net.il>; Thu, 21 Apr 2011 09:40:24 +0300 (IDT) Date: Thu, 21 Apr 2011 09:40:26 +0300 From: Eli Zaretskii Subject: Re: bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit In-reply-to: <4DAFCC4F.1080900@cs.ucla.edu> X-012-Sender: halo1@inter.net.il To: Paul Eggert Message-id: <83mxjk2jl1.fsf@gnu.org> References: <87bp00iqih.fsf@gmail.com> <83r58w2lst.fsf@gnu.org> <4DAFCC4F.1080900@cs.ucla.edu> X-Spam-Score: -2.1 (--) X-Debbugs-Envelope-To: 8528 Cc: 8528@debbugs.gnu.org, ego111@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.1 (--) > Date: Wed, 20 Apr 2011 23:18:55 -0700 > From: Paul Eggert > CC: Evans Winner , 8528@debbugs.gnu.org > > On 04/20/11 22:52, Eli Zaretskii wrote: > > Paul, could you please tell where do you see twice doubling of the > > file size in insert-file-contents? > > I assumed that it was because the internal buffers contain an > Emacs-encoded version of the file, which could be as long as four > times the actual file size, because a single byte in the file > might expand to 4 bytes inside Emacs in some cases. Actually, it could potentially expand even 5-fold (because Emacs extends UTF-8 to codepoints as large as 0x3FFFFF). But we test the buffer size and avoid overflowing it in many other places, both further down in insert-file-contents and in insdel.c. If those are not enough, we could add more such tests, particularly after decoding the file's contents, where we know the full buffer size in bytes. So I think artificially limiting the maximum size of a file that can be visited in that particular place in insert-file-contents is too harsh. > That would explain the behavior that you saw: if your file's > internal encoding was the same as the external, you wouldn't observe any > problem. The problem would be exhibited only with files containing > many characters that bloat when read into memory. Right, but wouldn't you agree that such a limitation is too stringent? E.g., I should be able to use find-file-literally to visit a 512MB file, but currently I cannot. From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 21 02:58:46 2011 Received: (at 8528) by debbugs.gnu.org; 21 Apr 2011 06:58:46 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCnqf-0000PW-5A for submit@debbugs.gnu.org; Thu, 21 Apr 2011 02:58:45 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCnqd-0000PJ-91 for 8528@debbugs.gnu.org; Thu, 21 Apr 2011 02:58:44 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id A92E239E80DB; Wed, 20 Apr 2011 23:58:37 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ytXRuvAUrthg; Wed, 20 Apr 2011 23:58:36 -0700 (PDT) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id A03DC39E8083; Wed, 20 Apr 2011 23:58:36 -0700 (PDT) Message-ID: <4DAFD59C.5090602@cs.ucla.edu> Date: Wed, 20 Apr 2011 23:58:36 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8 MIME-Version: 1.0 To: Eli Zaretskii Subject: Re: bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit References: <87bp00iqih.fsf@gmail.com> <83r58w2lst.fsf@gnu.org> <4DAFCC4F.1080900@cs.ucla.edu> <83mxjk2jl1.fsf@gnu.org> In-Reply-To: <83mxjk2jl1.fsf@gnu.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 8528 Cc: 8528@debbugs.gnu.org, ego111@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.0 (---) On 04/20/11 23:40, Eli Zaretskii wrote: > Right, but wouldn't you agree that such a limitation is too stringent? Yes, absolutely, the limit should be removed if possible. In a brief look at the code, it appeared to me that there were places where the it does not check for integer overflow in size calculations when converting external to internal form. So it could well be that this preliminary check may be needed to avoid catastrophe later. I have not checked this out carefully, though, and I could be wrong. (One way to find out would be to test it with a worst-case-bloat file, but I haven't had time to do that.) > E.g., I should be able to use find-file-literally to visit a 512MB > file, but currently I cannot. If we know that byte bloat cannot occur, which is the case with find-file-literally, then the divide-by-4 limit should not be needed. That case should be easy, in that it shouldn't require a lot of analysis to fix that case safely. From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 21 09:20:13 2011 Received: (at 8528) by debbugs.gnu.org; 21 Apr 2011 13:20:13 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCtnp-0001Bg-6T for submit@debbugs.gnu.org; Thu, 21 Apr 2011 09:20:13 -0400 Received: from mtaout22.012.net.il ([80.179.55.172]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QCtni-0001B2-92 for 8528@debbugs.gnu.org; Thu, 21 Apr 2011 09:20:12 -0400 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0LK000L007NPA900@a-mtaout22.012.net.il> for 8528@debbugs.gnu.org; Thu, 21 Apr 2011 16:19:57 +0300 (IDT) Received: from HOME-C4E4A596F7 ([77.124.129.240]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0LK000LPE7OZ7F10@a-mtaout22.012.net.il>; Thu, 21 Apr 2011 16:19:57 +0300 (IDT) Date: Thu, 21 Apr 2011 16:20:54 +0300 From: Eli Zaretskii Subject: Re: bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit In-reply-to: <4DAFD59C.5090602@cs.ucla.edu> X-012-Sender: halo1@inter.net.il To: Paul Eggert Message-id: <838vv33fm1.fsf@gnu.org> References: <87bp00iqih.fsf@gmail.com> <83r58w2lst.fsf@gnu.org> <4DAFCC4F.1080900@cs.ucla.edu> <83mxjk2jl1.fsf@gnu.org> <4DAFD59C.5090602@cs.ucla.edu> X-Spam-Score: -2.1 (--) X-Debbugs-Envelope-To: 8528 Cc: 8528@debbugs.gnu.org, ego111@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.1 (--) > Date: Wed, 20 Apr 2011 23:58:36 -0700 > From: Paul Eggert > CC: ego111@gmail.com, 8528@debbugs.gnu.org > > On 04/20/11 23:40, Eli Zaretskii wrote: > > Right, but wouldn't you agree that such a limitation is too stringent? > > Yes, absolutely, the limit should be removed if possible. > > In a brief look at the code, it appeared to me that there were > places where the it does not check for integer overflow > in size calculations when converting external to internal form. So > it could well be that this preliminary check may be needed to > avoid catastrophe later. I have not checked this out carefully, > though, and I could be wrong. I have now reviewed the code involved in this, and I think the limit can be lifted. In general, there could be two ways for us to insert text into the buffer as result of calling insert-file-contents: (a) directly, by reading from the file into its buffer (or some temporary buffer used as part of processing); or (b) indirectly, by decoding inserted text through various functions in coding.c, which write the decoded text into the destination buffer. I found that both of these ways include tests for potential overflows of the buffer size. Inserting text directly is protected because it enlarges the buffer's gap before inserting text, and make_gap which does that errors out if the new size will overflow (actually, it errors out 2000 bytes too early, because it wants some extra space). Insertion by decoding text is also protected because it makes sure the destination buffer has enough space before it writes another chunk of decoded text into it. It assures that by enlarging the gap, which again goes through make_gap. I found only one place where we were not protected from overflowing MOST_POSITIVE_FIXNUM (not sure if it is relevant to insert-file-contents), and one other place where I wasn't sure we were protected, so I added a suitable protection in both those places. See the proposed patch below. If no one objects, I will commit these changes in a week or so. > > E.g., I should be able to use find-file-literally to visit a 512MB > > file, but currently I cannot. > > If we know that byte bloat cannot occur, which is the case with > find-file-literally, then the divide-by-4 limit should not be needed. > That case should be easy, in that it shouldn't require a lot > of analysis to fix that case safely. Yes, definitely. But I think the patch below solves this problem as well, so there's no need for special treatment for unibyte or pure ASCII files. Here's the proposed patch. Evans, I'd appreciate if you could try it and see if it solves the original problem for you. === modified file 'src/ChangeLog' --- src/ChangeLog 2011-04-19 10:48:30 +0000 +++ src/ChangeLog 2011-04-21 12:35:30 +0000 @@ -1,3 +1,16 @@ +2011-04-21 Eli Zaretskii + + * coding.c (coding_alloc_by_realloc): Error out if destination + will grow beyond MOST_POSITIVE_FIXNUM. + (decode_coding_emacs_mule): Abort if there isn't enough place in + charbuf for the composition carryover bytes. Reserve an extra + space for up to 2 characters produced in a loop. + (decode_coding_iso_2022): Abort if there isn't enough place in + charbuf for the composition carryover bytes. + + * fileio.c (Finsert_file_contents): Don't limit file size to 1/4 + of MOST_POSITIVE_FIXNUM. + 2011-04-19 Eli Zaretskii * syntax.h (SETUP_SYNTAX_TABLE_FOR_OBJECT): Fix setting of === modified file 'src/coding.c' --- src/coding.c 2011-04-14 05:04:02 +0000 +++ src/coding.c 2011-04-21 12:35:33 +0000 @@ -1071,6 +1071,8 @@ coding_set_destination (struct coding_sy static void coding_alloc_by_realloc (struct coding_system *coding, EMACS_INT bytes) { + if (coding->dst_bytes > MOST_POSITIVE_FIXNUM - bytes) + error ("Maximum size of buffer or string exceeded"); coding->destination = (unsigned char *) xrealloc (coding->destination, coding->dst_bytes + bytes); coding->dst_bytes += bytes; @@ -2333,7 +2335,9 @@ decode_coding_emacs_mule (struct coding_ /* We may produce two annotations (charset and composition) in one loop and one more charset annotation at the end. */ int *charbuf_end - = coding->charbuf + coding->charbuf_size - (MAX_ANNOTATION_LENGTH * 3); + = coding->charbuf + coding->charbuf_size - (MAX_ANNOTATION_LENGTH * 3) + /* We can produce up to 2 characters in a loop. */ + - 1; EMACS_INT consumed_chars = 0, consumed_chars_base; int multibytep = coding->src_multibyte; EMACS_INT char_offset = coding->produced_char; @@ -2348,6 +2352,8 @@ decode_coding_emacs_mule (struct coding_ { int i; + if (charbuf_end - charbuf < cmp_status->length) + abort (); for (i = 0; i < cmp_status->length; i++) *charbuf++ = cmp_status->carryover[i]; coding->annotated = 1; @@ -3479,6 +3485,8 @@ decode_coding_iso_2022 (struct coding_sy if (cmp_status->state != COMPOSING_NO) { + if (charbuf_end - charbuf < cmp_status->length) + abort (); for (i = 0; i < cmp_status->length; i++) *charbuf++ = cmp_status->carryover[i]; coding->annotated = 1; === modified file 'src/fileio.c' --- src/fileio.c 2011-04-14 20:20:17 +0000 +++ src/fileio.c 2011-04-21 12:07:44 +0000 @@ -3245,15 +3245,10 @@ variable `last-coding-system-used' to th record_unwind_protect (close_file_unwind, make_number (fd)); - /* Arithmetic overflow can occur if an Emacs integer cannot represent the - file size, or if the calculations below overflow. The calculations below - double the file size twice, so check that it can be multiplied by 4 - safely. - - Also check whether the size is negative, which can happen on a platform - that allows file sizes greater than the maximum off_t value. */ + /* Check whether the size is too large or negative, which can happen on a + platform that allows file sizes greater than the maximum off_t value. */ if (! not_regular - && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM / 4)) + && ! (0 <= st.st_size && st.st_size <= MOST_POSITIVE_FIXNUM)) error ("Maximum buffer size exceeded"); /* Prevent redisplay optimizations. */ From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 29 15:49:25 2011 Received: (at 8528-done) by debbugs.gnu.org; 29 Apr 2011 19:49:26 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QFtgr-0001SH-If for submit@debbugs.gnu.org; Fri, 29 Apr 2011 15:49:25 -0400 Received: from mtaout20.012.net.il ([80.179.55.166]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QFtgp-0001S1-M3 for 8528-done@debbugs.gnu.org; Fri, 29 Apr 2011 15:49:24 -0400 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0LKF00A00ISGKT00@a-mtaout20.012.net.il> for 8528-done@debbugs.gnu.org; Fri, 29 Apr 2011 22:49:16 +0300 (IDT) Received: from HOME-C4E4A596F7 ([77.124.150.132]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0LKF00996J220UQ0@a-mtaout20.012.net.il>; Fri, 29 Apr 2011 22:49:16 +0300 (IDT) Date: Fri, 29 Apr 2011 22:49:16 +0300 From: Eli Zaretskii Subject: Re: bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit In-reply-to: <838vv33fm1.fsf@gnu.org> X-012-Sender: halo1@inter.net.il To: eggert@cs.ucla.edu, ego111@gmail.com Message-id: <83y62s6doj.fsf@gnu.org> References: <87bp00iqih.fsf@gmail.com> <83r58w2lst.fsf@gnu.org> <4DAFCC4F.1080900@cs.ucla.edu> <83mxjk2jl1.fsf@gnu.org> <4DAFD59C.5090602@cs.ucla.edu> <838vv33fm1.fsf@gnu.org> X-Spam-Score: -2.1 (--) X-Debbugs-Envelope-To: 8528-done Cc: 8528-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.1 (--) > Date: Thu, 21 Apr 2011 16:20:54 +0300 > From: Eli Zaretskii > Cc: 8528@debbugs.gnu.org, ego111@gmail.com > > If no one objects, I will commit these changes in a week or so. No one objected, so I installed this. From debbugs-submit-bounces@debbugs.gnu.org Mon May 02 10:53:35 2011 Received: (at 8528) by debbugs.gnu.org; 2 May 2011 14:53:35 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QGuVC-0002Ti-N4 for submit@debbugs.gnu.org; Mon, 02 May 2011 10:53:35 -0400 Received: from fencepost.gnu.org ([140.186.70.10]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QGuVA-0002TX-QU for 8528@debbugs.gnu.org; Mon, 02 May 2011 10:53:33 -0400 Received: from 121-249-126-200.fibertel.com.ar ([200.126.249.121]:51784 helo=ceviche.home) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1QGuV5-0003nq-8t; Mon, 02 May 2011 10:53:27 -0400 Received: by ceviche.home (Postfix, from userid 20848) id 74E4B66119; Mon, 2 May 2011 11:53:24 -0300 (ART) From: Stefan Monnier To: 8528@debbugs.gnu.org Subject: Re: bug#8528: 24.0.50; 32-bit Emacs with apparent 128M buffer size limit Message-ID: References: <87bp00iqih.fsf@gmail.com> <83r58w2lst.fsf@gnu.org> <4DAFCC4F.1080900@cs.ucla.edu> <83mxjk2jl1.fsf@gnu.org> <4DAFD59C.5090602@cs.ucla.edu> <838vv33fm1.fsf@gnu.org> <83y62s6doj.fsf@gnu.org> Date: Mon, 02 May 2011 11:53:24 -0300 In-Reply-To: <83y62s6doj.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 29 Apr 2011 22:49:16 +0300") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -6.0 (------) X-Debbugs-Envelope-To: 8528 Cc: eliz@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.0 (------) > No one objected, so I installed this. Thanks, Stefan From unknown Sat Aug 16 21:18:08 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 31 May 2011 11:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator