GNU bug report logs -
#17343
24.2; Exponential growth of files using raw-mode
Previous Next
Reported by: Jeremy Barbay <jbarbay <at> dcc.uchile.cl>
Date: Fri, 25 Apr 2014 03:51:03 UTC
Severity: normal
Tags: notabug
Found in version 24.2
Done: Glenn Morris <rgm <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> Date: Thu, 24 Apr 2014 15:58:41 -0300
> From: Jeremy Barbay <jbarbay <at> dcc.uchile.cl>
>
> Following the short recipe below shows how a user saving files in "raw
> mode" could end up with files doubling their size each time saved, if
> following emacs' suggestion to save it in raw mode:
>
> * Recipe:
>
> 1. Save the following line in a file "testAccentsMinimal.txt"
>
> Nà¥\206à¤\206\206à¥\206
>
> 2. Repeatedly,
>
> 0) measure the size of the file (wc -c testAccentsMinimal.txt);
> 1) open emacs loading the file (emacs -q testAccentsMinimal.txt);
> 2) insert and delete a character in it (manually);
> 3) save it selecting the suggested raw encoding (manually);
> 4) quit emacs (or force the reload of the file).
>
> * Result:
>
> This should give something akin to the following, where one can see
> the size of the file growing exponentially with the number of savings.
>
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 11 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 19 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 35 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 67 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 131 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 259 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 515 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 1027 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -Q testAccentsMinimal.txt
> 2051 testAccentsMinimal.txt
>
> * (Tentative) Explanation:
>
> - Even though the file is saved in "raw" mode, it is read in another
> mode which prefix the "special" characters with a unicode code.
> - Due to symbols from incompatible encodings, emacs is confused about
> which encoding to use for saving and asks the user about it.
>
> * Why it matters:
>
> - The faulty sequence above occured naturally from copy pasting from
> various webpages (containing accented characters) into the same
> document, and was identified when some files grew too large. -
> Files (e.g. of notes) end up doubling in size at each edition, until
> they fill the memory and/or hard-drive, slow down the system and
> make Emacs complain about the size of the file.
>
> * (Potential) Solutions:
>
> - when saving a file with conflicting encodings, instead of merely
> suggesting the raw encoding, add an option to "clean" the file
> instead of merely save it in raw mode, for instance by projecting
> the file to an encoding by deleting all symbols which are
> incompatible with it.
>
> I think that I signaled this bug 1 year ago in Emacs 23 and was answered
> at the time that this would be solved by the next version (24), but it
> occured to me recently that this undesirable behavior was still there :(
It's not a bug. When you modify a file, its size can grow, sometimes
a lot, due to a change in encoding. This is intended behavior.
To avoid the problem in the first place, once you discover that the
file was visited with raw-text encoding, use "C-x RET r" to re-visit
the buffer in the encoding you think is correct, and then manually fix
the bad sequences. Then the growth will not happen.
This bug report was last modified 11 years and 79 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.