Package: emacs;
Reported by: Juanma Barranquero <lekktu <at> gmail.com>
Date: Sat, 19 May 2012 16:12:02 UTC
Severity: normal
Found in version 24.1.50
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Eli Zaretskii <eliz <at> gnu.org> To: Stefan Monnier <monnier <at> iro.umontreal.ca> Cc: handa <at> gnu.org, schwab <at> linux-m68k.org, rms <at> gnu.org, 11519 <at> debbugs.gnu.org, lekktu <at> gmail.com Subject: bug#11519: "Wrong type argument: characterp" building custom-deps while boostrapping Date: Thu, 24 May 2012 19:22:46 +0300
> From: Stefan Monnier <monnier <at> iro.umontreal.ca> > Cc: rms <at> gnu.org, handa <at> gnu.org, schwab <at> linux-m68k.org, lekktu <at> gmail.com, 11519 <at> debbugs.gnu.org > Date: Wed, 23 May 2012 16:07:05 -0400 > > >> > Which other places use C pointers to buffer text and call functions > >> > that can allocate memory? > >> IIUC any place that uses STRING_CHAR_AND_LENGTH on buffer text is > >> vulnerable to the problem. > > That's not true. As long as you access buffer text through character > > position, you are safe. > > Right, some of those uses might be safe, indeed. Of course it's not > only STRING_CHAR_AND_LENGTH but STRING_CHAR_ADVANCE as well, together > with FETCH_* macros which use those, etc... No, FETCH_* macros are safe, because they accept buffer positions, fetch only one character at a time, and call STRING_CHAR_* _after_ they access the buffer. > Grepping for those macros shows they're used at *many* places, and I'd > be amazed if re_search is the only place where we don't go through the > BYTE_POS_ADDR rigmarole. > > Let's see ...hmmm... yup, set-buffer-multibyte is another example, > find_charsets_in_text yet another, and I'm not even trying hard. > Just grep for "STRING_CHAR_" and see for yourself. I didn't mean STRING_CHAR_*. I agree that they should be fixed not to have such unexpected side effect. They should be read-only operations. As a temporary band-aid for Emacs 24.1, I suggest the change below. What I meant is the code besides STRING_CHAR_* macros. I don't think you will find code that manipulates C pointers to buffer text and calls functions that can allocate memory. > >> But on other platforms where we use mmap, we do suffer from this > >> fragmentation, and yet it doesn't seem to be a real source of problem. > > I don't know enough about mmap to answer that. I vaguely recollect > > that mmap avoids such fragmentation as an inherent feature, but I may > > be wrong. > > No, fragmentation is a property of the address space, so without > relocation you can't avoid it. I asked Gerd Möllmann, who wrote the mmap-based buffer allocation code, about this. That code originally resided on ralloc.c and was meant to replace the sbrk-based implementation. So I would expect that the issue of relocation was considered back then, and I hope Gerd will remember why the mmap-based code was considered good enough to replace ralloc.c. > > I find it hard to believe that going through system malloc on > > MS-Windows will let us use buffers as large as 1.5 GB (on a 32-bit > > machine). To achieve this today, we reserve a 2GB contiguous chunk of > > address space at startup, and then commit and uncommit parts of it as > > needed (see w32heap.c). ralloc.c has an important part in this > > arrangement. > > You mean that Windows's system malloc library has a memory that's too > fragmented to be able to allocate a single 1.5G chunk? Why? This has nothing to do with Windows APIs, so you are well equipped to reason about this ;-) You said "malloc", so I took an issue with the MS C runtime implementation of malloc. Since all the other implementations suffer from fragmentation, there's no reason to believe that the MS implementation avoids that danger. A general-purpose function that cannot move buffers behind the scenes cannot possibly avoid that. Doing better was the original reason for writing ralloc.c. If you meant to bypass malloc and use the Windows memory-allocation APIs, such as VirtualAlloc, directly, then that's what we already do in w32heap.c, which implements an emulation of sbrk that is good enough for Emacs. The fact that gmalloc and ralloc are used on top of that are simply to avoid reinventing the wheel. We could easily turn off buffer relocation in ralloc.c for good, by fixing the value of use_relocatable_buffers at zero. But I'm worried that this would cause Emacs on Windows run out of memory (or act as if it were) faster. For example, in an Emacs session that runs for 2 weeks and has a 200MB working set, I just visited a 1.3GB file, went to its middle and typed "C-u 30000 d" to insert 30K characters. Emacs had no problems enlarging the buffer, although it has only 1.9GB of reserved memory space on that machine, so if it needed to allocate another 1.3GB+30KB buffer (due to fragmentation, which is a certainty after such a long time), it would have failed, I presume. Yet another alternative is to emulate mmap on Windows using the equivalent Windows API. But that requires a research comparing mmap features we need and use on Posix platforms with the features offered by Windows, to make sure this is at all feasible. Such a research would need to be done by someone who knows enough about mmap and is willing to do the job. Do we have such a person on board? And then there's the implementation and testing. Doesn't sound like an efficient use of our time and energy. Are there other alternatives? Here's the band-aid I propose for emacs-24, to make the STRING_CHAR_* macros safe: === modified file 'src/charset.c' --- src/charset.c 2012-01-19 07:21:25 +0000 +++ src/charset.c 2012-05-24 16:14:05 +0000 @@ -1641,6 +1641,12 @@ maybe_unify_char (int c, Lisp_Object val return c; CHECK_CHARSET_GET_CHARSET (val, charset); +#ifdef REL_ALLOC + /* The call to load_charset below can allocate memory, whcih screws + callers of this function through STRING_CHAR_* macros that hold C + pointers to buffer text, if REL_ALLOC is used. */ + r_alloc_inhibit_buffer_relocation (1); +#endif load_charset (charset, 1); if (! inhibit_load_charset_map) { @@ -1656,6 +1662,9 @@ maybe_unify_char (int c, Lisp_Object val if (unified > 0) c = unified; } +#ifdef REL_ALLOC + r_alloc_inhibit_buffer_relocation (0); +#endif return c; } === modified file 'src/ralloc.c' --- src/ralloc.c 2012-05-23 17:32:28 +0000 +++ src/ralloc.c 2012-05-24 16:16:14 +0000 @@ -1204,7 +1204,12 @@ r_alloc_reset_variable (POINTER *old, PO void r_alloc_inhibit_buffer_relocation (int inhibit) { - use_relocatable_buffers = !inhibit; + if (use_relocatable_buffers < 0) + use_relocatable_buffers = 0; + if (inhibit) + use_relocatable_buffers++; + else if (use_relocatable_buffers > 0) + use_relocatable_buffers--; }
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.