Package: emacs;
Reported by: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
Date: Thu, 18 Aug 2011 09:04:02 UTC
Severity: normal
Found in version 23.3.50
Fixed in version 24.0.93
Done: Glenn Morris <rgm <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9318 in the body.
You can then email your comments to 9318 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Thu, 18 Aug 2011 09:04:02 GMT) Full text and rfc822 format available.Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
:bug-gnu-emacs <at> gnu.org
.
(Thu, 18 Aug 2011 09:04:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> To: bug-gnu-emacs <at> gnu.org Subject: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Date: Thu, 18 Aug 2011 18:01:13 +0900
When I start Emacs and evaluate the below code, unexpected result returns. (let ((func (lambda () (with-temp-buffer (mapc 'insert '(166 25339)) (encode-coding-region (point-min) (point-max) 'ctext-unix) (buffer-string))))) (cons (funcall func) (funcall func))) -> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B") car of the result is not constant. In the worst case, emacs crashes. It doesn't occur on Linux. If I evaluate twice, car and cdr of the last result are correct. Using encode-coding-string instead of encode-coding-region has no problem. (let ((func (lambda () (encode-coding-string (mapconcat 'char-to-string '(166 25339) "") 'ctext-unix)))) (cons (funcall func) (funcall func))) -> ("^[$(D\"C^[$(H*f^[(B" . "^[$(D\"C^[$(H*f^[(B") Before calling encode-coding-string also can avoid problem. (let ((func (lambda () (with-temp-buffer (mapc 'insert '(166 25339)) (encode-coding-region (point-min) (point-max) 'ctext-unix) (buffer-string))))) (encode-coding-string (mapconcat 'char-to-string '(166 25339) "") 'ctext-unix) (cons (funcall func) (funcall func))) -> ("^[$(D\"C^[$(H*f^[(B" . "^[$(D\"C^[$(H*f^[(B") -- Kazuhiro Ito
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Thu, 18 Aug 2011 09:51:01 GMT) Full text and rfc822 format available.Message #8 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Andreas Schwab <schwab <at> linux-m68k.org> To: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> Cc: 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Date: Thu, 18 Aug 2011 11:48:36 +0200
Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> writes: > Before calling encode-coding-string also can avoid problem. Perhaps something is clobbered by some autoloading? Andreas. -- Andreas Schwab, schwab <at> linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Thu, 18 Aug 2011 21:37:01 GMT) Full text and rfc822 format available.Message #11 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> To: Andreas Schwab <schwab <at> linux-m68k.org> Cc: 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result on on Windows Date: Fri, 19 Aug 2011 06:33:41 +0900
> Perhaps something is clobbered by some autoloading? I think I don't understand what you mean excatly, but these phenomena are reproducible on precompiled binary (*1) with -Q option. (*1) http://ftp.gnu.org/pub/gnu/emacs/windows/emacs-23.3-bin-i386.zip -- Kazuhiro Ito
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Fri, 19 Aug 2011 13:49:02 GMT) Full text and rfc822 format available.Message #14 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> To: 9318 <at> debbugs.gnu.org Subject: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Fri, 19 Aug 2011 22:46:18 +0900
> When I start Emacs and evaluate the below code, unexpected result returns. > (let ((func (lambda () > (with-temp-buffer > (mapc 'insert '(166 25339)) > (encode-coding-region (point-min) (point-max) 'ctext-unix) > (buffer-string))))) > (cons (funcall func) > (funcall func))) > -> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B") > car of the result is not constant. I noticed this problem is not Windows specific. I confirmed that it is reproducible in Emacs 23.3.1 (build by pkgsrc) on NetBSD/amd64 via SSH from remote host. But it doesn't occur on openSUSE 11.3. -- Kazuhiro Ito
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Sat, 20 Aug 2011 21:29:02 GMT) Full text and rfc822 format available.Message #17 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Chong Yidong <cyd <at> stupidchicken.com> To: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> Cc: 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Sat, 20 Aug 2011 17:26:04 -0400
Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> writes: >> When I start Emacs and evaluate the below code, unexpected result returns. > >> (let ((func (lambda () >> (with-temp-buffer >> (mapc 'insert '(166 25339)) >> (encode-coding-region (point-min) (point-max) 'ctext-unix) >> (buffer-string))))) >> (cons (funcall func) >> (funcall func))) >> -> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B") > >> car of the result is not constant. > > I noticed this problem is not Windows specific. I confirmed that it > is reproducible in Emacs 23.3.1 (build by pkgsrc) on NetBSD/amd64 via > SSH from remote host. But it doesn't occur on openSUSE 11.3. Could you run Emacs under a debugger, trigger the crash, and provide a backtrace? (You will need to have compiled Emacs with debugging symbols.)
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Sun, 21 Aug 2011 00:20:02 GMT) Full text and rfc822 format available.Message #20 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> To: Chong Yidong <cyd <at> stupidchicken.com> Cc: 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Sun, 21 Aug 2011 09:17:05 +0900
> >> When I start Emacs and evaluate the below code, unexpected result returns. > > > >> (let ((func (lambda () > >> (with-temp-buffer > >> (mapc 'insert '(166 25339)) > >> (encode-coding-region (point-min) (point-max) 'ctext-unix) > >> (buffer-string))))) > >> (cons (funcall func) > >> (funcall func))) > >> -> ("¦拻^@^@^@^@^@^@^@^@^@^@" . "^[$(D\"C^[$(H*f^[(B") > > > > I noticed this problem is not Windows specific. I confirmed that it > > is reproducible in Emacs 23.3.1 (build by pkgsrc) on NetBSD/amd64 via > > SSH from remote host. But it doesn't occur on openSUSE 11.3. > > Could you run Emacs under a debugger, trigger the crash, and provide a > backtrace? (You will need to have compiled Emacs with debugging > symbols.) I built Emacs 23.3 with "-O0 -g" option on NetBSD 5.1 (amd64), and started with below commad (via SSH). gdb --args emacs -Q --no-splash Next, inputtedand below code and evaluated with C-x C-e. (progn (goto-char (point-min)) (insert #x80) (insert (make-string 16 ?A)) (encode-coding-region 1 18 'ctext-unix)) backtrace is below. Please let me know if you need more information. Program received signal SIGSEGV, Segmentation fault. 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473 5473 if (STRING_MARKED_P (ptr)) (gdb) bt full #0 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473 ptr = (struct Lisp_String *) 0x4141414141414140 obj = 4702111234474983745 cdr_count = 0 #1 0x0000000000557320 in mark_char_table (ptr=0x1281800) at alloc.c:5405 val = 4702111234474983745 size = 130 i = 0 #2 0x0000000000557315 in mark_char_table (ptr=0x17f6c00) at alloc.c:5402 val = 19404805 size = 34 i = 14 #3 0x0000000000557315 in mark_char_table (ptr=0x13ea700) at alloc.c:5402 val = 25127941 size = 18 i = 6 #4 0x0000000000557315 in mark_char_table (ptr=0x10ba800) at alloc.c:5402 val = 20883205 size = 68 i = 4 #5 0x0000000000557838 in mark_object (arg=17541125) at alloc.c:5567 obj = 17541125 cdr_count = 0 #6 0x0000000000557228 in mark_vectorlike (ptr=0xb16480) at alloc.c:5377 size = 10 i = 9 #7 0x0000000000557855 in mark_object (arg=11625605) at alloc.c:5569 obj = 11625605 cdr_count = 0 #8 0x0000000000557228 in mark_vectorlike (ptr=0xb56000) at alloc.c:5377 size = 434 i = 107 #9 0x0000000000557855 in mark_object (arg=11886597) at alloc.c:5569 obj = 11886597 cdr_count = 0 #10 0x00000000005577b0 in mark_object (arg=10786565) at alloc.c:5562 h = (struct Lisp_Hash_Table *) 0xa49700 obj = 10786565 cdr_count = 0 #11 0x00000000005568ff in Fgarbage_collect () at alloc.c:5092 bind = (struct specbinding *) 0xb96526 catch = (struct catchtag *) 0x7f7fffffc508 handler = (struct handler *) 0x10 stack_top_variable = 0 '\0' i = 418 message_p = 0 total = {140187732526192, 140187732526008, 140187732526000, 4294967295, 12148454, 10960258, 10312685, 68} count = 8 t1 = { tv_sec = 1313842937, tv_usec = 498976 } t2 = { tv_sec = 0, tv_usec = 140187732530104 } t3 = { tv_sec = 11465618, tv_usec = 0 } #12 0x0000000000577bb4 in Ffuncall (nargs=2, args=0x7f7fffffc4f0) at eval.c:2965 fun = 10313885 original_fun = 10959186 funcar = 10762338 numargs = 1 lisp_numargs = 10950075 val = 68 backtrace = { next = 0x7f7fffffc9a0, function = 0x7f7fffffc4f8, args = 0x7f7fffffc500, nargs = 1, evalargs = 0 '\0', debug_on_exit = 0 '\0' } internal_args = (Lisp_Object *) 0x7f7fffffc500 i = 0 #13 0x00000000005ce3c1 in Fbyte_code (bytestr=9300689, vector=9300725, maxdepth=12) at bytecode.c:680 count = 7 op = 1 vectorp = (Lisp_Object *) 0x8deb00 bytestr_length = 18 stack = { pc = 0x96972f ")\207", top = 0x7f7fffffc4f8, bottom = 0x7f7fffffc4f0, byte_string = 9300689, byte_string_start = 0x96971f "\b\203\b", constants = 9300725, next = 0x7f7fffffcb40 } top = (Lisp_Object *) 0x7f7fffffc4f0 result = 10956883 #14 0x00000000005788cc in funcall_lambda (fun=9300621, nargs=1, arg_vector=0x7f7fffffca28) at eval.c:3220 val = 10762242 syms_left = 10762242 next = 18577650 count = 6 i = 1 optional = 0 rest = 0 #15 0x000000000057821a in Ffuncall (nargs=2, args=0x7f7fffffca20) at eval.c:3077 fun = 9300621 original_fun = 18577602 funcar = 18577842 numargs = 1 lisp_numargs = 10956963 val = 10762242 backtrace = { next = 0x7f7fffffced0, function = 0x7f7fffffca20, args = 0x7f7fffffca28, nargs = 1, evalargs = 0 '\0', debug_on_exit = 0 '\0' } internal_args = (Lisp_Object *) 0xa730a3 i = 0 #16 0x00000000005ce3c1 in Fbyte_code (bytestr=9301185, vector=9301221, maxdepth=12) at bytecode.c:680 count = 5 op = 1 vectorp = (Lisp_Object *) 0x8decf0 bytestr_length = 31 stack = { pc = 0x969692 "\v)B\211\034A\n=\204\033", top = 0x7f7fffffca28, bottom = 0x7f7fffffca20, byte_string = 9301185, byte_string_start = 0x969685 "\b\204\b", constants = 9301221, next = 0x0 } top = (Lisp_Object *) 0x7f7fffffca20 result = 10762242 #17 0x00000000005788cc in funcall_lambda (fun=9301109, nargs=1, arg_vector=0x7f7fffffcfa8) at eval.c:3220 val = 140187732528832 syms_left = 10762242 next = 18577650 count = 4 i = 1 optional = 0 rest = 0 #18 0x000000000057821a in Ffuncall (nargs=2, args=0x7f7fffffcfa0) at eval.c:3077 fun = 9301109 original_fun = 11438610 funcar = 5059672 numargs = 1 lisp_numargs = 5059670 val = 10762242 backtrace = { next = 0x7f7fffffd310, function = 0x7f7fffffcfa0, args = 0x7f7fffffcfa8, nargs = 1, evalargs = 0 '\0', debug_on_exit = 0 '\0' } internal_args = (Lisp_Object *) 0xa77993 i = 0 #19 0x000000000057296b in Fcall_interactively (function=11438610, record_flag=10762242, keys=10790405) at callint.c:869 val = 4 args = (Lisp_Object *) 0x7f7fffffcfa0 visargs = (Lisp_Object *) 0x7f7fffffcf80 specs = 9301281 filter_specs = 9301281 teml = 5734938 up_event = 10762242 enable = 10762242 speccount = 2 next_event = 2 prefix_arg = 10762242 string = (unsigned char *) 0x7f7fffffcfc0 "P" tem = (unsigned char *) 0x61652c "" varies = (int *) 0x7f7fffffcf60 i = 2 j = 1 count = 1 foo = 1 prompt1 = '\0' <repeats 99 times> tem1 = 0x0 arg_from_tty = 0 gcpro1 = { next = 0xa43802, var = 0xa43802, nvars = 0 } gcpro2 = { next = 0xa53bc2, var = 0xa51c05, nvars = 10828738 } gcpro3 = { next = 0xa55952, var = 0xa53bc2, nvars = 2 } gcpro4 = { next = 0xa43802, var = 0xb4a776, nvars = 2 } gcpro5 = { next = 0xa43802, var = 0xa43802, nvars = 10836306 } key_count = 2 record_then_fail = 0 save_this_command = 11438610 save_last_command = 11490098 save_this_original_command = 11438610 save_real_this_command = 11438610 #20 0x0000000000577f70 in Ffuncall (nargs=4, args=0x7f7fffffd3b0) at eval.c:3037 fun = 10312397 original_fun = 10978002 funcar = 4294967297 numargs = 3 lisp_numargs = 10937344 val = 315 backtrace = { next = 0x0, function = 0x7f7fffffd3b0, args = 0x7f7fffffd3b8, nargs = 3, evalargs = 0 '\0', debug_on_exit = 0 '\0' } internal_args = (Lisp_Object *) 0x7f7fffffd3b8 i = 0 #21 0x000000000057795d in call3 (fn=10978002, arg1=11438610, arg2=10762242, arg3=10762242) at eval.c:2857 ret_ungc_val = 9301109 gcpro1 = { next = 0x8dec75, var = 0xa43802, nvars = 4 } args = {10978002, 11438610, 10762242, 10762242} #22 0x00000000004e4bca in Fcommand_execute (cmd=11438610, record_flag=10762242, keys=10762242, special=10762242) at keyboard.c:10562 final = 9301109 tem = 10762242 prefixarg = 10762242 #23 0x00000000004d564d in command_loop_1 () at keyboard.c:1906 cmd = 11438610 lose = 1 keybuf = {96, 20, 8, 0, 140187732530800, 18451712, 1893, 0, 140187732530816, 1983, 18451712, 4294967317, 140187732530800, 6299742, 10656928, 216, 10937344, 7378697632079252736, 140187732530864, 9720, 274877896416, 140187732531032, 0, 140187732530872, 140187732530384, 0, 10762242, 12348018, 8166853, 10762242} i = 2 prev_modiff = 158 prev_buffer = (struct buffer *) 0xa51c00 already_adjusted = 0 #24 0x0000000000575049 in internal_condition_case (bfun=0x4d3a17 <command_loop_1>, handlers=10851522, hfun=0x4d34bc <cmd_error>) at eval.c:1492 val = 10762242 c = { tag = 10762242, val = 10762242, next = 0x7f7fffffd880, gcpro = 0x0, jmp = {2129, 140187732531264, 140187732541408, 140187698962432, 140187696909296, 3, 140187732531000, 5722036, 0, 140187732531488, 18636288}, backlist = 0x0, handlerlist = 0x0, lisp_eval_depth = 0, pdlcount = 2, poll_suppress_count = 0, interrupt_input_blocked = 0, byte_stack = 0x0 } h = { handler = 10851522, var = 10762242, chosen_clause = 0, tag = 0x7f7fffffd790, next = 0x0 } #25 0x00000000004d389f in command_loop_2 () at keyboard.c:1362 val = 1 #26 0x0000000000574a0e in internal_catch (tag=10846786, func=0x4d3885 <command_loop_2>, arg=10762242) at eval.c:1228 c = { tag = 10846786, val = 10762242, next = 0x0, gcpro = 0x0, jmp = {2129, 140187732531488, 140187732541408, 140187698962432, 140187696909296, 3, 140187732531288, 5720565, 4301358603, 10820608, 11046651}, backlist = 0x0, handlerlist = 0x0, lisp_eval_depth = 0, pdlcount = 2, poll_suppress_count = 0, interrupt_input_blocked = 0, byte_stack = 0x0 } #27 0x00000000004d3859 in command_loop () at keyboard.c:1341 No locals. #28 0x00000000004d3004 in recursive_edit_1 () at keyboard.c:956 count = 1 val = 5059007 #29 0x00000000004d31a6 in Frecursive_edit () at keyboard.c:1018 count = 0 buffer = 10762242 #30 0x00000000004d169a in main (argc=3, argv=0x7f7fffffdb70) at emacs.c:1833 dummy = 140187730444288 stack_bottom_variable = 0 '\0' do_initial_setlocale = 1 skip_args = 0 rlim = { rlim_cur = 8720384, rlim_max = 33554432 } no_loadup = 0 junk = 0x0 dname_arg = 0x0 Lisp Backtrace: "eval-last-sexp-1" (0xffffca28) "eval-last-sexp" (0xffffcfa8) "call-interactively" (0xffffd3b8) -- Kazuhiro Ito
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Wed, 24 Aug 2011 09:41:02 GMT) Full text and rfc822 format available.Message #23 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> To: Chong Yidong <cyd <at> stupidchicken.com> Cc: 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Wed, 24 Aug 2011 18:37:24 +0900
> I built Emacs 23.3 with "-O0 -g" option on NetBSD 5.1 (amd64), and > started with below commad (via SSH). > > gdb --args emacs -Q --no-splash > > Next, inputtedand below code and evaluated with C-x C-e. > > (progn > (goto-char (point-min)) > (insert #x80) > (insert (make-string 16 ?A)) > (encode-coding-region 1 18 'ctext-unix)) > > backtrace is below. Please let me know if you need more information. > > > Program received signal SIGSEGV, Segmentation fault. > 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473 > 5473 if (STRING_MARKED_P (ptr)) I think relocation of buffer may cause the problem. The comment for CODING_DECODE_CHAR macro in coding.c says as below. > /* This wrapper macro is used to preserve validity of pointers into > buffer text across calls to decode_char, which could cause > relocation of buffers if it loads a charset map, because loading a > charset map allocates large structures. */ encode_coding_iso_2022() uses ENCODE_ISO_CHARACTER macro, which uses ENCODE_CHAR macro. ENCODE_CHAR macro calls encode_char() and it may load a charset map. If this is the cause of the problem, encode_coding_emace_mule() has the same problem. -- Kazuhiro Ito
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Wed, 24 Aug 2011 12:10:02 GMT) Full text and rfc822 format available.Message #26 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> Cc: cyd <at> stupidchicken.com, 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Wed, 24 Aug 2011 15:06:48 +0300
> Date: Wed, 24 Aug 2011 18:37:24 +0900 > From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> > Cc: 9318 <at> debbugs.gnu.org > > > (progn > > (goto-char (point-min)) > > (insert #x80) > > (insert (make-string 16 ?A)) > > (encode-coding-region 1 18 'ctext-unix)) > > > > backtrace is below. Please let me know if you need more information. > > > > > > Program received signal SIGSEGV, Segmentation fault. > > 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473 > > 5473 if (STRING_MARKED_P (ptr)) > > I think relocation of buffer may cause the problem. > > The comment for CODING_DECODE_CHAR macro in coding.c says as below. > > > /* This wrapper macro is used to preserve validity of pointers into > > buffer text across calls to decode_char, which could cause > > relocation of buffers if it loads a charset map, because loading a > > charset map allocates large structures. */ > > encode_coding_iso_2022() uses ENCODE_ISO_CHARACTER macro, which uses > ENCODE_CHAR macro. ENCODE_CHAR macro calls encode_char() and it may > load a charset map. But which pointer(s) in encode_coding_iso_2022 can be altered by relocation? Do you actually see any of the pointers used by this function modified by relocation of some buffer?
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Wed, 24 Aug 2011 18:03:01 GMT) Full text and rfc822 format available.Message #29 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Andreas Schwab <schwab <at> linux-m68k.org> To: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> Cc: Chong Yidong <cyd <at> stupidchicken.com>, 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Wed, 24 Aug 2011 19:59:34 +0200
Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> writes: > I think relocation of buffer may cause the problem. Does that help? diff --git a/src/coding.c b/src/coding.c index 65c8a76..f34a023 100644 --- a/src/coding.c +++ b/src/coding.c @@ -915,8 +915,8 @@ record_conversion_result (struct coding_system *coding, } } -/* This wrapper macro is used to preserve validity of pointers into - buffer text across calls to decode_char, which could cause +/* These wrapper macros are used to preserve validity of pointers into + buffer text across calls to decode_char/encode_char, which could cause relocation of buffers if it loads a charset map, because loading a charset map allocates large structures. */ #define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \ @@ -935,6 +935,21 @@ record_conversion_result (struct coding_system *coding, src_end += offset; \ } \ } while (0) +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \ + do { \ + charset_map_loaded = 0; \ + code = ENCODE_CHAR (charset, c); \ + if (charset_map_loaded) \ + { \ + const unsigned char *orig = coding->destination; \ + EMACS_INT offset; \ + \ + coding_set_destination (coding); \ + offset = coding->destination - orig; \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) /* If there are at least BYTES length of room at dst, allocate memory @@ -2652,7 +2667,7 @@ encode_coding_emacs_mule (struct coding_system *coding) { charset = CHARSET_FROM_ID (preferred_charset_id); if (CHAR_CHARSET_P (c, charset)) - code = ENCODE_CHAR (charset, c); + CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code); else charset = char_charset (c, charset_list, &code); } @@ -4185,7 +4200,8 @@ decode_coding_iso_2022 (struct coding_system *coding) #define ENCODE_ISO_CHARACTER(charset, c) \ do { \ - int code = ENCODE_CHAR ((charset), (c)); \ + int code; \ + CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code); \ \ if (CHARSET_DIMENSION (charset) == 1) \ ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ Andreas. -- Andreas Schwab, schwab <at> linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Thu, 25 Aug 2011 09:59:01 GMT) Full text and rfc822 format available.Message #32 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Thu, 25 Aug 2011 18:49:52 +0900
> > > (progn > > > (goto-char (point-min)) > > > (insert #x80) > > > (insert (make-string 16 ?A)) > > > (encode-coding-region 1 18 'ctext-unix)) > > > > > > backtrace is below. Please let me know if you need more information. > > > > > > > > > Program received signal SIGSEGV, Segmentation fault. > > > 0x0000000000557419 in mark_object (arg=4702111234474983745) at alloc.c:5473 > > > 5473 if (STRING_MARKED_P (ptr)) > > > > I think relocation of buffer may cause the problem. > > > > The comment for CODING_DECODE_CHAR macro in coding.c says as below. > > > > > /* This wrapper macro is used to preserve validity of pointers into > > > buffer text across calls to decode_char, which could cause > > > relocation of buffers if it loads a charset map, because loading a > > > charset map allocates large structures. */ > > > > encode_coding_iso_2022() uses ENCODE_ISO_CHARACTER macro, which uses > > ENCODE_CHAR macro. ENCODE_CHAR macro calls encode_char() and it may > > load a charset map. > > But which pointer(s) in encode_coding_iso_2022 can be altered by > relocation? encode_coding() sets coding->destination with coding_set_destination() before calling encode_coding_iso_2022(). I think at least correct value of coding->destination can change in encode_coding_iso_2022() by loading charset maps. > Do you actually see any of the pointers used by this > function modified by relocation of some buffer? No, beacuse I don't know how to see. -- Kazuhiro Ito
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Thu, 25 Aug 2011 09:59:02 GMT) Full text and rfc822 format available.Message #35 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> To: Andreas Schwab <schwab <at> linux-m68k.org> Cc: Eli Zaretskii <eliz <at> gnu.org>, Chong Yidong <cyd <at> stupidchicken.com>, 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Thu, 25 Aug 2011 18:54:13 +0900
> > I think relocation of buffer may cause the problem. > > Does that help? > > diff --git a/src/coding.c b/src/coding.c > index 65c8a76..f34a023 100644 > --- a/src/coding.c > +++ b/src/coding.c > @@ -915,8 +915,8 @@ record_conversion_result (struct coding_system *coding, > } > } > > -/* This wrapper macro is used to preserve validity of pointers into > - buffer text across calls to decode_char, which could cause > +/* These wrapper macros are used to preserve validity of pointers into > + buffer text across calls to decode_char/encode_char, which could cause > relocation of buffers if it loads a charset map, because loading a > charset map allocates large structures. */ > #define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \ > @@ -935,6 +935,21 @@ record_conversion_result (struct coding_system *coding, > src_end += offset; \ > } \ > } while (0) > +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \ > + do { \ > + charset_map_loaded = 0; \ > + code = ENCODE_CHAR (charset, c); \ > + if (charset_map_loaded) \ > + { \ > + const unsigned char *orig = coding->destination; \ > + EMACS_INT offset; \ > + \ > + coding_set_destination (coding); \ > + offset = coding->destination - orig; \ > + dst += offset; \ > + dst_end += offset; \ > + } \ > + } while (0) > > > /* If there are at least BYTES length of room at dst, allocate memory > @@ -2652,7 +2667,7 @@ encode_coding_emacs_mule (struct coding_system *coding) > { > charset = CHARSET_FROM_ID (preferred_charset_id); > if (CHAR_CHARSET_P (c, charset)) > - code = ENCODE_CHAR (charset, c); > + CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code); > else > charset = char_charset (c, charset_list, &code); > } > @@ -4185,7 +4200,8 @@ decode_coding_iso_2022 (struct coding_system *coding) > #define ENCODE_ISO_CHARACTER(charset, c) \ > do { \ > - int code = ENCODE_CHAR ((charset), (c)); \ > + int code; \ > + CODING_ENCODE_CHAR (coding, dst, dst_end, charset, c, code); \ > \ > if (CHARSET_DIMENSION (charset) == 1) \ > ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ Andreas' patch resolved the problem partially. It resolved the problem on NetBSD with '-O0' CFLAGS, but failed on NetBSD with '-O2' and Windows. I confirmed that adding the protection of coding->dst_object to Andreas' patch resolved the problem on NetBSD with '-O2' but not on Windows. I don't know whether it is incorrect way or is not enough. --- src/coding.c 2011-07-01 11:03:55 +0000 +++ src/coding.c 2011-08-24 23:39:49 +0000 @@ -7397,10 +7436,15 @@ setup_ccl_program (&cclspec.ccl, CODING_CCL_ENCODER (coding)); } do { + struct gcpro gcpro1; + GCPRO1 (coding->dst_object); + coding_set_source (coding); consume_chars (coding, translation_table, max_lookup); coding_set_destination (coding); (*(coding->encoder)) (coding); + + UNGCPRO; } while (coding->consumed_char < coding->src_chars); if (BUFFERP (coding->dst_object) && coding->produced_char > 0) -- Kazuhiro Ito
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Fri, 26 Aug 2011 11:46:01 GMT) Full text and rfc822 format available.Message #38 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> To: Andreas Schwab <schwab <at> linux-m68k.org> Cc: Eli Zaretskii <eliz <at> gnu.org>, Chong Yidong <cyd <at> stupidchicken.com>, 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Fri, 26 Aug 2011 20:41:57 +0900
> > > I think relocation of buffer may cause the problem. > > > > Does that help? > > Andreas' patch resolved the problem partially. It resolved the problem on > NetBSD with '-O0' CFLAGS, but failed on NetBSD with '-O2' and Windows. > > I confirmed that adding the protection of coding->dst_object to > Andreas' patch resolved the problem on NetBSD with '-O2' but not on > Windows. I don't know whether it is incorrect way or is not enough. I noticed char_charset() could cause relocation of buffers because it could call encode_char(). I confirmed similar changes to callers of char_charset() fixed my problem (without the protection of coding->dst_object). SUMMARY OF THE PROBLEM: In encode_coding_XXX(), calling encode_char() could cause relocation of buffers. char_charset(), ENCODE_ISO_CHARACTER and ENCODE_CHAR could also cause relocation because they could call encode_char(). After using of them, coding->destination, dst, dst_end should be updated as needed. -- Kazuhiro Ito
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Sun, 28 Aug 2011 00:09:01 GMT) Full text and rfc822 format available.Message #41 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> To: Andreas Schwab <schwab <at> linux-m68k.org> Cc: Eli Zaretskii <eliz <at> gnu.org>, Chong Yidong <cyd <at> stupidchicken.com>, 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Sun, 28 Aug 2011 09:04:49 +0900
> SUMMARY OF THE PROBLEM: > In encode_coding_XXX(), calling encode_char() could cause relocation > of buffers. char_charset(), ENCODE_ISO_CHARACTER and ENCODE_CHAR > could also cause relocation because they could call encode_char(). > After using of them, coding->destination, dst, dst_end should be > updated as needed. I noticed CHAR_CHARSET_P macro slipped out of my check. CHAR_CHARSET_P could also cause relocation of buffers. -- Kazuhiro Ito
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Tue, 30 Aug 2011 23:35:02 GMT) Full text and rfc822 format available.Message #44 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> To: Andreas Schwab <schwab <at> linux-m68k.org> Cc: Eli Zaretskii <eliz <at> gnu.org>, Chong Yidong <cyd <at> stupidchicken.com>, 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Wed, 31 Aug 2011 08:30:47 +0900
> > SUMMARY OF THE PROBLEM: > > In encode_coding_XXX(), calling encode_char() could cause relocation > > of buffers. char_charset(), ENCODE_ISO_CHARACTER and ENCODE_CHAR > > could also cause relocation because they could call encode_char(). > > After using of them, coding->destination, dst, dst_end should be > > updated as needed. > > I noticed CHAR_CHARSET_P macro slipped out of my check. > CHAR_CHARSET_P could also cause relocation of buffers. Here is the patch for the code, which contains Andreas' patch. In my environment, problems are fixed. I think it would be better that the interface of encode_designation_at_bol() is changed. === modified file 'src/coding.c' --- src/coding.c 2011-05-09 09:59:23 +0000 +++ src/coding.c 2011-08-28 07:33:54 +0000 @@ -1026,6 +1026,54 @@ } \ } while (0) +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \ + do { \ + charset_map_loaded = 0; \ + code = ENCODE_CHAR (charset, c); \ + if (charset_map_loaded) \ + { \ + const unsigned char *orig = coding->destination; \ + EMACS_INT offset; \ + \ + coding_set_destination (coding); \ + offset = coding->destination - orig; \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + +#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \ + do { \ + charset_map_loaded = 0; \ + charset = char_charset (c, charset_list, code_return); \ + if (charset_map_loaded) \ + { \ + const unsigned char *orig = coding->destination; \ + EMACS_INT offset; \ + \ + coding_set_destination (coding); \ + offset = coding->destination - orig; \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + +#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result) \ + do { \ + charset_map_loaded = 0; \ + result = CHAR_CHARSET_P(c, charset); \ + if (charset_map_loaded) \ + { \ + const unsigned char *orig = coding->destination; \ + EMACS_INT offset; \ + \ + coding_set_destination (coding); \ + offset = coding->destination - orig; \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + /* If there are at least BYTES length of room at dst, allocate memory for coding->destination and update dst and dst_end. We don't have @@ -2778,14 +2826,19 @@ if (preferred_charset_id >= 0) { + int result; + charset = CHARSET_FROM_ID (preferred_charset_id); - if (CHAR_CHARSET_P (c, charset)) + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); + if (result) code = ENCODE_CHAR (charset, c); else - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); } else - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); if (! charset) { c = coding->default_char; @@ -2794,7 +2847,8 @@ EMIT_ONE_ASCII_BYTE (c); continue; } - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); } dimension = CHARSET_DIMENSION (charset); emacs_mule_id = CHARSET_EMACS_MULE_ID (charset); @@ -4317,8 +4371,9 @@ #define ENCODE_ISO_CHARACTER(charset, c) \ do { \ - int code = ENCODE_CHAR ((charset),(c)); \ - \ + int code; \ + CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code); \ + \ if (CHARSET_DIMENSION (charset) == 1) \ ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ else \ @@ -4476,7 +4531,17 @@ c = *charbuf++; if (c == '\n') break; + + charset_map_loaded = 0; charset = char_charset (c, charset_list, NULL); + if (charset_map_loaded) + { + const unsigned char *orig = coding->destination; + + coding_set_destination (coding); + dst += coding->destination - orig; + } + id = CHARSET_ID (charset); reg = CODING_ISO_REQUEST (coding, id); if (reg >= 0 && r[reg] < 0) @@ -4543,6 +4608,12 @@ /* We have to produce designation sequences if any now. */ dst = encode_designation_at_bol (coding, charbuf, charbuf_end, dst); + if (charset_map_loaded) + { + EMACS_INT offset = coding->destination + coding->dst_bytes - dst_end; + dst_end += offset; + dst_prev += offset; + } bol_designation = 0; /* We are sure that designation sequences are all ASCII bytes. */ produced_chars += dst - dst_prev; @@ -4616,12 +4687,17 @@ if (preferred_charset_id >= 0) { + int result; + charset = CHARSET_FROM_ID (preferred_charset_id); - if (! CHAR_CHARSET_P (c, charset)) - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); + if (! result) + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + NULL, charset); } else - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + NULL, charset); if (!charset) { if (coding->mode & CODING_MODE_SAFE_ENCODING) @@ -4632,7 +4708,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, + charset_list, NULL, charset); } } ENCODE_ISO_CHARACTER (charset, c); @@ -5064,7 +5141,9 @@ else { unsigned code; - struct charset *charset = char_charset (c, charset_list, &code); + struct charset *charset; + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); if (!charset) { @@ -5076,7 +5155,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, + charset_list, &code, charset); } } if (code == CHARSET_INVALID_CODE (charset)) @@ -5153,7 +5233,9 @@ else { unsigned code; - struct charset *charset = char_charset (c, charset_list, &code); + struct charset *charset; + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); if (! charset) { @@ -5165,7 +5247,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, + charset_list, &code, charset); } } if (code == CHARSET_INVALID_CODE (charset)) @@ -5747,7 +5831,9 @@ } else { - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, + &code, charset); + if (charset) { if (CHARSET_DIMENSION (charset) == 1) -- Kazuhiro Ito
bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Thu, 01 Dec 2011 01:57:02 GMT) Full text and rfc822 format available.Message #47 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kenichi Handa <handa <at> m17n.org> To: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> Cc: cyd <at> stupidchicken.com, schwab <at> linux-m68k.org, 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Thu, 01 Dec 2011 10:56:12 +0900
In article <20110830233131.C74A61E0043 <at> msa101.auone-net.jp>, Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> writes: > Here is the patch for the code, which contains Andreas' patch. In my > environment, problems are fixed. I think it would be better that the > interface of encode_designation_at_bol() is changed. Oops, sorry, I have vaguely thought that your patch below has already been applied, but just noticed that it was not. I'll commit a slightly modified version including the improved interface for encode_designation_at_bol soon. By the way, it would be good if we had a way to suppress buffer text relocation temporarily. --- Kenichi Handa handa <at> m17n.org > === modified file 'src/coding.c' > --- src/coding.c 2011-05-09 09:59:23 +0000 > +++ src/coding.c 2011-08-28 07:33:54 +0000 > @@ -1026,6 +1026,54 @@ > } \ > } while (0) > +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \ > + do { \ > + charset_map_loaded = 0; \ > + code = ENCODE_CHAR (charset, c); \ > + if (charset_map_loaded) \ > + { \ > + const unsigned char *orig = coding->destination; \ > + EMACS_INT offset; \ > + \ > + coding_set_destination (coding); \ > + offset = coding->destination - orig; \ > + dst += offset; \ > + dst_end += offset; \ > + } \ > + } while (0) > + > +#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \ > + do { \ > + charset_map_loaded = 0; \ > + charset = char_charset (c, charset_list, code_return); \ > + if (charset_map_loaded) \ > + { \ > + const unsigned char *orig = coding->destination; \ > + EMACS_INT offset; \ > + \ > + coding_set_destination (coding); \ > + offset = coding->destination - orig; \ > + dst += offset; \ > + dst_end += offset; \ > + } \ > + } while (0) > + > +#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result) \ > + do { \ > + charset_map_loaded = 0; \ > + result = CHAR_CHARSET_P(c, charset); \ > + if (charset_map_loaded) \ > + { \ > + const unsigned char *orig = coding->destination; \ > + EMACS_INT offset; \ > + \ > + coding_set_destination (coding); \ > + offset = coding->destination - orig; \ > + dst += offset; \ > + dst_end += offset; \ > + } \ > + } while (0) > + > /* If there are at least BYTES length of room at dst, allocate memory > for coding->destination and update dst and dst_end. We don't have > @@ -2778,14 +2826,19 @@ > if (preferred_charset_id >= 0) > { > + int result; > + > charset = CHARSET_FROM_ID (preferred_charset_id); > - if (CHAR_CHARSET_P (c, charset)) > + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); > + if (result) > code = ENCODE_CHAR (charset, c); > else > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > } > else > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > if (! charset) > { > c = coding->default_char; > @@ -2794,7 +2847,8 @@ > EMIT_ONE_ASCII_BYTE (c); > continue; > } > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > } > dimension = CHARSET_DIMENSION (charset); > emacs_mule_id = CHARSET_EMACS_MULE_ID (charset); > @@ -4317,8 +4371,9 @@ > #define ENCODE_ISO_CHARACTER(charset, c) \ > do { \ > - int code = ENCODE_CHAR ((charset),(c)); \ > - \ > + int code; \ > + CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code); \ > + \ > if (CHARSET_DIMENSION (charset) == 1) \ > ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ > else \ > @@ -4476,7 +4531,17 @@ > c = *charbuf++; > if (c == '\n') > break; > + > + charset_map_loaded = 0; > charset = char_charset (c, charset_list, NULL); > + if (charset_map_loaded) > + { > + const unsigned char *orig = coding->destination; > + > + coding_set_destination (coding); > + dst += coding->destination - orig; > + } > + > id = CHARSET_ID (charset); > reg = CODING_ISO_REQUEST (coding, id); > if (reg >= 0 && r[reg] < 0) > @@ -4543,6 +4608,12 @@ > /* We have to produce designation sequences if any now. */ > dst = encode_designation_at_bol (coding, charbuf, charbuf_end, dst); > + if (charset_map_loaded) > + { > + EMACS_INT offset = coding->destination + coding->dst_bytes - dst_end; > + dst_end += offset; > + dst_prev += offset; > + } > bol_designation = 0; > /* We are sure that designation sequences are all ASCII bytes. */ > produced_chars += dst - dst_prev; > @@ -4616,12 +4687,17 @@ > if (preferred_charset_id >= 0) > { > + int result; > + > charset = CHARSET_FROM_ID (preferred_charset_id); > - if (! CHAR_CHARSET_P (c, charset)) > - charset = char_charset (c, charset_list, NULL); > + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); > + if (! result) > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + NULL, charset); > } > else > - charset = char_charset (c, charset_list, NULL); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + NULL, charset); > if (!charset) > { > if (coding->mode & CODING_MODE_SAFE_ENCODING) > @@ -4632,7 +4708,8 @@ > else > { > c = coding->default_char; > - charset = char_charset (c, charset_list, NULL); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, > + charset_list, NULL, charset); > } > } > ENCODE_ISO_CHARACTER (charset, c); > @@ -5064,7 +5141,9 @@ > else > { > unsigned code; > - struct charset *charset = char_charset (c, charset_list, &code); > + struct charset *charset; > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > if (!charset) > { > @@ -5076,7 +5155,8 @@ > else > { > c = coding->default_char; > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, > + charset_list, &code, charset); > } > } > if (code == CHARSET_INVALID_CODE (charset)) > @@ -5153,7 +5233,9 @@ > else > { > unsigned code; > - struct charset *charset = char_charset (c, charset_list, &code); > + struct charset *charset; > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > if (! charset) > { > @@ -5165,7 +5247,8 @@ > else > { > c = coding->default_char; > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, > + charset_list, &code, charset); > } > } > if (code == CHARSET_INVALID_CODE (charset)) > @@ -5747,7 +5831,9 @@ > } > else > { > - charset = char_charset (c, charset_list, &code); > + CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, > + &code, charset); > + > if (charset) > { > if (CHARSET_DIMENSION (charset) == 1) > -- > Kazuhiro Ito
bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Mon, 05 Dec 2011 07:11:02 GMT) Full text and rfc822 format available.Message #50 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kenichi Handa <handa <at> m17n.org> To: 9318 <at> debbugs.gnu.org Cc: kzhr <at> d1.dion.ne.jp, schwab <at> linux-m68k.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Mon, 05 Dec 2011 16:10:11 +0900
In article <tl7zkfdnjgj.fsf <at> m17n.org>, Kenichi Handa <handa <at> m17n.org> writes: > In article <20110830233131.C74A61E0043 <at> msa101.auone-net.jp>, Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> writes: > > Here is the patch for the code, which contains Andreas' patch. In my > > environment, problems are fixed. I think it would be better that the > > interface of encode_designation_at_bol() is changed. > Oops, sorry, I have vaguely thought that your patch below > has already been applied, but just noticed that it was not. > I'll commit a slightly modified version including the > improved interface for encode_designation_at_bol soon. I've just installed the following changes. As I don't have cygwin environment now, could you please check if this change surely fix the problem? --- Kenichi Handa handa <at> m17n.org 2011-12-05 Kenichi Handa <handa <at> m17n.org> * coding.c (encode_designation_at_bol): New args charbuf_end and dst. Return the number of produced bytes. Callers changed. (coding_set_source): Return how many bytes coding->source was relocated. (coding_set_destination): Return how many bytes coding->destination was relocated. (CODING_DECODE_CHAR, CODING_ENCODE_CHAR, CODING_CHAR_CHARSET) (CODING_CHAR_CHARSET_P): Adjusted for the avove changes. 2011-12-05 Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> (tiny change) * coding.c (CODING_CHAR_CHARSET_P): New macro. (encode_coding_emacs_mule, encode_coding_iso_2022): Use the above macro (Bug#9318). 2011-12-05 Andreas Schwab <schwab <at> linux-m68k.org> The following changes are to fix Bug#9318. * coding.c (CODING_ENCODE_CHAR, CODING_CHAR_CHARSET): New macros. (encode_coding_emacs_mule, ENCODE_ISO_CHARACTER) (encode_coding_iso_2022, encode_coding_sjis) (encode_coding_big5, encode_coding_charset): Use the above macros. === modified file 'src/coding.c' --- src/coding.c 2011-11-07 01:57:07 +0000 +++ src/coding.c 2011-12-05 06:14:46 +0000 @@ -847,16 +847,16 @@ static void decode_coding_raw_text (struct coding_system *); static int encode_coding_raw_text (struct coding_system *); -static void coding_set_source (struct coding_system *); -static void coding_set_destination (struct coding_system *); +static EMACS_INT coding_set_source (struct coding_system *); +static EMACS_INT coding_set_destination (struct coding_system *); static void coding_alloc_by_realloc (struct coding_system *, EMACS_INT); static void coding_alloc_by_making_gap (struct coding_system *, EMACS_INT, EMACS_INT); static unsigned char *alloc_destination (struct coding_system *, EMACS_INT, unsigned char *); static void setup_iso_safe_charsets (Lisp_Object); -static unsigned char *encode_designation_at_bol (struct coding_system *, - int *, unsigned char *); +static int encode_designation_at_bol (struct coding_system *, + int *, int *, unsigned char *); static int detect_eol (const unsigned char *, EMACS_INT, enum coding_category); static Lisp_Object adjust_coding_eol_type (struct coding_system *, int); @@ -915,27 +915,68 @@ } } -/* This wrapper macro is used to preserve validity of pointers into - buffer text across calls to decode_char, which could cause - relocation of buffers if it loads a charset map, because loading a - charset map allocates large structures. */ +/* These wrapper macros are used to preserve validity of pointers into + buffer text across calls to decode_char, encode_char, etc, which + could cause relocation of buffers if it loads a charset map, + because loading a charset map allocates large structures. */ + #define CODING_DECODE_CHAR(coding, src, src_base, src_end, charset, code, c) \ do { \ + EMACS_INT offset; \ + \ charset_map_loaded = 0; \ c = DECODE_CHAR (charset, code); \ - if (charset_map_loaded) \ + if (charset_map_loaded \ + && (offset = coding_set_source (coding))) \ { \ - const unsigned char *orig = coding->source; \ - EMACS_INT offset; \ - \ - coding_set_source (coding); \ - offset = coding->source - orig; \ src += offset; \ src_base += offset; \ src_end += offset; \ } \ } while (0) +#define CODING_ENCODE_CHAR(coding, dst, dst_end, charset, c, code) \ + do { \ + EMACS_INT offset; \ + \ + charset_map_loaded = 0; \ + code = ENCODE_CHAR (charset, c); \ + if (charset_map_loaded \ + && (offset = coding_set_destination (coding))) \ + { \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + +#define CODING_CHAR_CHARSET(coding, dst, dst_end, c, charset_list, code_return, charset) \ + do { \ + EMACS_INT offset; \ + \ + charset_map_loaded = 0; \ + charset = char_charset (c, charset_list, code_return); \ + if (charset_map_loaded \ + && (offset = coding_set_destination (coding))) \ + { \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + +#define CODING_CHAR_CHARSET_P(coding, dst, dst_end, c, charset, result) \ + do { \ + EMACS_INT offset; \ + \ + charset_map_loaded = 0; \ + result = CHAR_CHARSET_P (c, charset); \ + if (charset_map_loaded \ + && (offset = coding_set_destination (coding))) \ + { \ + dst += offset; \ + dst_end += offset; \ + } \ + } while (0) + /* If there are at least BYTES length of room at dst, allocate memory for coding->destination and update dst and dst_end. We don't have @@ -1015,9 +1056,14 @@ | ((p)[-1] & 0x3F)))) -static void +/* Update coding->source from coding->src_object, and return how many + bytes coding->source was changed. */ + +static EMACS_INT coding_set_source (struct coding_system *coding) { + const unsigned char *orig = coding->source; + if (BUFFERP (coding->src_object)) { struct buffer *buf = XBUFFER (coding->src_object); @@ -1036,11 +1082,18 @@ /* Otherwise, the source is C string and is never relocated automatically. Thus we don't have to update anything. */ } + return coding->source - orig; } -static void + +/* Update coding->destination from coding->dst_object, and return how + many bytes coding->destination was changed. */ + +static EMACS_INT coding_set_destination (struct coding_system *coding) { + const unsigned char *orig = coding->destination; + if (BUFFERP (coding->dst_object)) { if (BUFFERP (coding->src_object) && coding->src_pos < 0) @@ -1065,6 +1118,7 @@ /* Otherwise, the destination is C string and is never relocated automatically. Thus we don't have to update anything. */ } + return coding->destination - orig; } @@ -2650,14 +2704,19 @@ if (preferred_charset_id >= 0) { + int result; + charset = CHARSET_FROM_ID (preferred_charset_id); - if (CHAR_CHARSET_P (c, charset)) + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); + if (result) code = ENCODE_CHAR (charset, c); else - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); } else - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); if (! charset) { c = coding->default_char; @@ -2666,7 +2725,8 @@ EMIT_ONE_ASCII_BYTE (c); continue; } - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); } dimension = CHARSET_DIMENSION (charset); emacs_mule_id = CHARSET_EMACS_MULE_ID (charset); @@ -4185,7 +4245,8 @@ #define ENCODE_ISO_CHARACTER(charset, c) \ do { \ - int code = ENCODE_CHAR ((charset), (c)); \ + int code; \ + CODING_ENCODE_CHAR (coding, dst, dst_end, (charset), (c), code); \ \ if (CHARSET_DIMENSION (charset) == 1) \ ENCODE_ISO_CHARACTER_DIMENSION1 ((charset), code); \ @@ -4283,15 +4344,19 @@ /* Produce designation sequences of charsets in the line started from - SRC to a place pointed by DST, and return updated DST. + CHARBUF to a place pointed by DST, and return the number of + produced bytes. DST should not directly point a buffer text area + which may be relocated by char_charset call. If the current block ends before any end-of-line, we may fail to find all the necessary designations. */ -static unsigned char * -encode_designation_at_bol (struct coding_system *coding, int *charbuf, +static int +encode_designation_at_bol (struct coding_system *coding, + int *charbuf, int *charbuf_end, unsigned char *dst) { + unsigned char *orig; struct charset *charset; /* Table of charsets to be designated to each graphic register. */ int r[4]; @@ -4309,7 +4374,7 @@ for (reg = 0; reg < 4; reg++) r[reg] = -1; - while (found < 4) + while (charbuf < charbuf_end && found < 4) { int id; @@ -4334,7 +4399,7 @@ ENCODE_DESIGNATION (CHARSET_FROM_ID (r[reg]), reg, coding); } - return dst; + return dst - orig; } /* See the above "GENERAL NOTES on `encode_coding_XXX ()' functions". */ @@ -4378,13 +4443,26 @@ if (bol_designation) { - unsigned char *dst_prev = dst; - /* We have to produce designation sequences if any now. */ - dst = encode_designation_at_bol (coding, charbuf, dst); - bol_designation = 0; + unsigned char desig_buf[16]; + int nbytes; + EMACS_INT offset; + + charset_map_loaded = 0; + nbytes = encode_designation_at_bol (coding, charbuf, charbuf_end, + desig_buf); + if (charset_map_loaded + && (offset = coding_set_destination (coding))) + { + dst += offset; + dst_end += offset; + } + memcpy (dst, desig_buf, nbytes); + dst += nbytes; /* We are sure that designation sequences are all ASCII bytes. */ - produced_chars += dst - dst_prev; + produced_chars += nbytes; + bol_designation = 0; + ASSURE_DESTINATION (safe_room); } c = *charbuf++; @@ -4455,12 +4533,17 @@ if (preferred_charset_id >= 0) { + int result; + charset = CHARSET_FROM_ID (preferred_charset_id); - if (! CHAR_CHARSET_P (c, charset)) - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET_P (coding, dst, dst_end, c, charset, result); + if (! result) + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + NULL, charset); } else - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + NULL, charset); if (!charset) { if (coding->mode & CODING_MODE_SAFE_ENCODING) @@ -4471,7 +4554,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, NULL); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, + charset_list, NULL, charset); } } ENCODE_ISO_CHARACTER (charset, c); @@ -4897,7 +4981,9 @@ else { unsigned code; - struct charset *charset = char_charset (c, charset_list, &code); + struct charset *charset; + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); if (!charset) { @@ -4909,7 +4995,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, + charset_list, &code, charset); } } if (code == CHARSET_INVALID_CODE (charset)) @@ -4984,7 +5071,9 @@ else { unsigned code; - struct charset *charset = char_charset (c, charset_list, &code); + struct charset *charset; + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); if (! charset) { @@ -4996,7 +5085,8 @@ else { c = coding->default_char; - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, + charset_list, &code, charset); } } if (code == CHARSET_INVALID_CODE (charset)) @@ -5572,7 +5662,9 @@ } else { - charset = char_charset (c, charset_list, &code); + CODING_CHAR_CHARSET (coding, dst, dst_end, c, charset_list, + &code, charset); + if (charset) { if (CHARSET_DIMENSION (charset) == 1)
bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Mon, 05 Dec 2011 09:13:01 GMT) Full text and rfc822 format available.Message #53 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: Kenichi Handa <handa <at> m17n.org> Cc: 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Mon, 05 Dec 2011 01:11:53 -0800
That patch (bzr 106613) causes Emacs to use an uninitialized variable; I found this via static checking with GCC. I installed the following further patch, which I think is right and anyway does not introduce a bug -- can you please check it? Thanks. * coding.c (encode_designation_at_bol): Don't use uninitialized local variable (Bug#9318). === modified file 'src/coding.c' --- src/coding.c 2011-12-05 07:03:31 +0000 +++ src/coding.c 2011-12-05 09:00:44 +0000 @@ -4356,7 +4356,7 @@ int *charbuf, int *charbuf_end, unsigned char *dst) { - unsigned char *orig; + unsigned char *orig = dst; struct charset *charset; /* Table of charsets to be designated to each graphic register. */ int r[4];
bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Mon, 05 Dec 2011 11:33:02 GMT) Full text and rfc822 format available.Message #56 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> To: Kenichi Handa <handa <at> m17n.org> Cc: schwab <at> linux-m68k.org, 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Mon, 05 Dec 2011 20:31:33 +0900
> In article <tl7zkfdnjgj.fsf <at> m17n.org>, Kenichi Handa <handa <at> m17n.org> writes: > > > In article <20110830233131.C74A61E0043 <at> msa101.auone-net.jp>, Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> writes: > > > Here is the patch for the code, which contains Andreas' patch. In my > > > environment, problems are fixed. I think it would be better that the > > > interface of encode_designation_at_bol() is changed. > > > Oops, sorry, I have vaguely thought that your patch below > > has already been applied, but just noticed that it was not. > > I'll commit a slightly modified version including the > > improved interface for encode_designation_at_bol soon. > > I've just installed the following changes. As I don't have > cygwin environment now, could you please check if this > change surely fix the problem? As far as I confirmed, the problems were fixed (except the point Paul pointed out). Thank you. Additionally, if you have time, please confirm Bug#8619 and Bug#9389. -- Kazuhiro Ito
bug-gnu-emacs <at> gnu.org
:bug#9318
; Package emacs
.
(Tue, 06 Dec 2011 00:32:01 GMT) Full text and rfc822 format available.Message #59 received at 9318 <at> debbugs.gnu.org (full text, mbox):
From: Kenichi Handa <handa <at> m17n.org> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: 9318 <at> debbugs.gnu.org Subject: Re: bug#9318: 23.3.50; The first call of encode-coding-region() returns wrong result Date: Tue, 06 Dec 2011 09:30:33 +0900
In article <4EDC8AD9.3050004 <at> cs.ucla.edu>, Paul Eggert <eggert <at> cs.ucla.edu> writes: > That patch (bzr 106613) causes Emacs to use an uninitialized variable; > I found this via static checking with GCC. I installed the following > further patch, which I think is right and anyway does not introduce a bug -- > can you please check it? Thanks. Oops, my fault. Yes, your patch is correct. Thank you. --- Kenichi Handa handa <at> m17n.org > * coding.c (encode_designation_at_bol): Don't use uninitialized > local variable (Bug#9318). > === modified file 'src/coding.c' > --- src/coding.c 2011-12-05 07:03:31 +0000 > +++ src/coding.c 2011-12-05 09:00:44 +0000 > @@ -4356,7 +4356,7 @@ > int *charbuf, int *charbuf_end, > unsigned char *dst) > { > - unsigned char *orig; > + unsigned char *orig = dst; > struct charset *charset; > /* Table of charsets to be designated to each graphic register. */ > int r[4];
Glenn Morris <rgm <at> gnu.org>
to control <at> debbugs.gnu.org
.
(Tue, 06 Dec 2011 08:36:02 GMT) Full text and rfc822 format available.Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
.
(Tue, 03 Jan 2012 12:24:03 GMT) Full text and rfc822 format available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.