GNU bug report logs - #17343
24.2; Exponential growth of files using raw-mode

Previous Next

Package: emacs;

Reported by: Jeremy Barbay <jbarbay <at> dcc.uchile.cl>

Date: Fri, 25 Apr 2014 03:51:03 UTC

Severity: normal

Tags: notabug

Found in version 24.2

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17343 in the body.
You can then email your comments to 17343 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#17343; Package emacs. (Fri, 25 Apr 2014 03:51:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jeremy Barbay <jbarbay <at> dcc.uchile.cl>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 25 Apr 2014 03:51:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jeremy Barbay <jbarbay <at> dcc.uchile.cl>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.2; Exponential growth of files using raw-mode
Date: Thu, 24 Apr 2014 15:58:41 -0300
Hi.

Following the short recipe below shows how a user saving files in "raw
mode" could end up with files doubling their size each time saved, if
following emacs' suggestion to save it in raw mode:

* Recipe:

  1. Save the following line in a file "testAccentsMinimal.txt"

  N����

  2. Repeatedly, 

     0) measure the size of the file (wc -c testAccentsMinimal.txt); 
     1) open emacs loading the file (emacs -q testAccentsMinimal.txt);
     2) insert and delete a character in it (manually);
     3) save it selecting the suggested raw encoding (manually);
     4) quit emacs (or force the reload of the file).

* Result:

  This should give something akin to the following, where one can see
  the size of the file growing exponentially with the number of savings.

  >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
  11 testAccentsMinimal.txt
  >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
  19 testAccentsMinimal.txt
  >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
  35 testAccentsMinimal.txt
  >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
  67 testAccentsMinimal.txt
  >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
  131 testAccentsMinimal.txt
  >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
  259 testAccentsMinimal.txt
  >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
  515 testAccentsMinimal.txt
  >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
  1027 testAccentsMinimal.txt
  >wc -c testAccentsMinimal.txt ; emacs -Q testAccentsMinimal.txt
  2051 testAccentsMinimal.txt

* (Tentative) Explanation:

  - Even though the file is saved in "raw" mode, it is read in another
    mode which prefix the "special" characters with a unicode code.
  - Due to symbols from incompatible encodings, emacs is confused about
    which encoding to use for saving and asks the user about it.

* Why it matters:

  - The faulty sequence above occured naturally from copy pasting from
    various webpages (containing accented characters) into the same
    document, and was identified when some files grew too large.  -
    Files (e.g. of notes) end up doubling in size at each edition, until
    they fill the memory and/or hard-drive, slow down the system and
    make Emacs complain about the size of the file.

* (Potential) Solutions:

  - when saving a file with conflicting encodings, instead of merely
    suggesting the raw encoding, add an option to "clean" the file
    instead of merely save it in raw mode, for instance by projecting
    the file to an encoding by deleting all symbols which are
    incompatible with it.

I think that I signaled this bug 1 year ago in Emacs 23 and was answered
at the time that this would be solved by the next version (24), but it
occured to me recently that this undesirable behavior was still there :(

I hope it helps.  
-- 
Jeremy                             (http://www.dcc.uchile.cl/~jbarbay)


In GNU Emacs 24.2.1 (x86_64-unknown-linux-gnu, X toolkit, Xaw scroll bars)
 of 2013-02-27 on raven
Windowing system distributor `The X.Org Foundation', version 11.0.11300000
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: en_US.UTF-8
  value of $LC_NUMERIC: en_US.UTF-8
  value of $LC_TIME: en_US.UTF-8
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Text

Minor modes in effect:
  tooltip-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<menu-bar> <help-menu> <send-emacs-bug-report>

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Loading vc-git...done
Scanning for dabbrevs...done
dabbrev-expand: No dynamic expansion for `Expo' found

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr dabbrev emacsbug message format-spec
rfc822 mml easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums mm-util mail-prsvr mail-utils vc-git ind-util regexp-opt
time-date tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd
tool-bar dnd fontset image fringe lisp-mode register page menu-bar
rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax
facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak
czech european ethiopic indian cyrillic chinese case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces
cus-face files text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget hashtable-print-readable backquote
make-network-process dbusbind dynamic-setting system-font-setting
font-render-setting x-toolkit x multi-tty emacs)





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17343; Package emacs. (Fri, 25 Apr 2014 07:14:02 GMT) Full text and rfc822 format available.

Message #8 received at 17343 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Jeremy Barbay <jbarbay <at> dcc.uchile.cl>
Cc: 17343 <at> debbugs.gnu.org
Subject: Re: bug#17343: 24.2; Exponential growth of files using raw-mode
Date: Fri, 25 Apr 2014 10:13:29 +0300
> Date: Thu, 24 Apr 2014 15:58:41 -0300
> From: Jeremy Barbay <jbarbay <at> dcc.uchile.cl>
> 
> Following the short recipe below shows how a user saving files in "raw
> mode" could end up with files doubling their size each time saved, if
> following emacs' suggestion to save it in raw mode:
> 
> * Recipe:
> 
>   1. Save the following line in a file "testAccentsMinimal.txt"
> 
>   Nà¥\206à¤\206\206à¥\206
> 
>   2. Repeatedly, 
> 
>      0) measure the size of the file (wc -c testAccentsMinimal.txt); 
>      1) open emacs loading the file (emacs -q testAccentsMinimal.txt);
>      2) insert and delete a character in it (manually);
>      3) save it selecting the suggested raw encoding (manually);
>      4) quit emacs (or force the reload of the file).
> 
> * Result:
> 
>   This should give something akin to the following, where one can see
>   the size of the file growing exponentially with the number of savings.
> 
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   11 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   19 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   35 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   67 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   131 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   259 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   515 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   1027 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -Q testAccentsMinimal.txt
>   2051 testAccentsMinimal.txt
> 
> * (Tentative) Explanation:
> 
>   - Even though the file is saved in "raw" mode, it is read in another
>     mode which prefix the "special" characters with a unicode code.
>   - Due to symbols from incompatible encodings, emacs is confused about
>     which encoding to use for saving and asks the user about it.
> 
> * Why it matters:
> 
>   - The faulty sequence above occured naturally from copy pasting from
>     various webpages (containing accented characters) into the same
>     document, and was identified when some files grew too large.  -
>     Files (e.g. of notes) end up doubling in size at each edition, until
>     they fill the memory and/or hard-drive, slow down the system and
>     make Emacs complain about the size of the file.
> 
> * (Potential) Solutions:
> 
>   - when saving a file with conflicting encodings, instead of merely
>     suggesting the raw encoding, add an option to "clean" the file
>     instead of merely save it in raw mode, for instance by projecting
>     the file to an encoding by deleting all symbols which are
>     incompatible with it.
> 
> I think that I signaled this bug 1 year ago in Emacs 23 and was answered
> at the time that this would be solved by the next version (24), but it
> occured to me recently that this undesirable behavior was still there :(

It's not a bug.  When you modify a file, its size can grow, sometimes
a lot, due to a change in encoding.  This is intended behavior.

To avoid the problem in the first place, once you discover that the
file was visited with raw-text encoding, use "C-x RET r" to re-visit
the buffer in the encoding you think is correct, and then manually fix
the bad sequences.  Then the growth will not happen.




Added tag(s) notabug. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 28 Apr 2014 00:29:01 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 17343 <at> debbugs.gnu.org and Jeremy Barbay <jbarbay <at> dcc.uchile.cl> Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 28 Apr 2014 00:29:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17343; Package emacs. (Tue, 29 Apr 2014 03:58:04 GMT) Full text and rfc822 format available.

Message #15 received at 17343 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Jeremy Barbay <jbarbay <at> dcc.uchile.cl>
Cc: 17343 <at> debbugs.gnu.org
Subject: Re: bug#17343: 24.2; Exponential growth of files using raw-mode
Date: Fri, 25 Apr 2014 14:15:34 -0400
[Message part 1 (text/plain, inline)]
>   1. Save the following line in a file "testAccentsMinimal.txt"

>   Nà¥
[Message part 2 (text/plain, inline)]
[Message part 3 (text/plain, inline)]
à¤
[Message part 4 (text/plain, inline)]
††
[Message part 5 (text/plain, inline)]
à¥
[Message part 6 (text/plain, inline)]
†

>   2. Repeatedly, 

>      0) measure the size of the file (wc -c testAccentsMinimal.txt); 
>      1) open emacs loading the file (emacs -q testAccentsMinimal.txt);
>      2) insert and delete a character in it (manually);
>      3) save it selecting the suggested raw encoding (manually);

I cannot reproduce this. At this step, Emacs just saves the
file silently.  And the file keeps its size constant.
I tried with Debian's Emacs-24.3 and Emacs-23.4 as well as with the
current pretest (and using "-Q" rather than "-q").

> I think that I signaled this bug 1 year ago in Emacs 23 and was answered
> at the time that this would be solved by the next version (24), but it
> occured to me recently that this undesirable behavior was still there :(

We're pretesting 24.4, so hopefully we can finally crush this one for
24.4.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17343; Package emacs. (Tue, 29 Apr 2014 05:49:01 GMT) Full text and rfc822 format available.

Message #18 received at 17343 <at> debbugs.gnu.org (full text, mbox):

From: Jarek Czekalski <jarekczek <at> poczta.onet.pl>
To: 17343 <at> debbugs.gnu.org
Subject: 24.2; Exponential growth of files using raw-mode
Date: Tue, 29 Apr 2014 07:48:40 +0200
I also cannot reproduce. I will try again but please:
1. Provide 3 files for the first 3 steps (11, 19 and 35 bytes) - I'm not 
sure that I paste the correct sequence
2. Explain: in step 2 - is it a good way to reproduce: inserting "a" 
character as the first one on the second (empty) line in the file and 
deleting it?

Jarek





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 27 May 2014 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 77 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.