From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 15 18:16:57 2013 Received: (at submit) by debbugs.gnu.org; 15 Aug 2013 22:16:57 +0000 Received: from localhost ([127.0.0.1]:33081 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VA5qj-0002hp-Fl for submit@debbugs.gnu.org; Thu, 15 Aug 2013 18:16:57 -0400 Received: from eggs.gnu.org ([208.118.235.92]:40205) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VA5qh-0002ha-4h for submit@debbugs.gnu.org; Thu, 15 Aug 2013 18:16:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VA5qR-0003Z1-Vb for submit@debbugs.gnu.org; Thu, 15 Aug 2013 18:16:49 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-100.0 required=5.0 tests=BAYES_20, USER_IN_WHITELIST autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:36596) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VA5qR-0003Yw-Sj for submit@debbugs.gnu.org; Thu, 15 Aug 2013 18:16:39 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41801) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VA5qJ-0001Ps-Pe for bug-gnu-emacs@gnu.org; Thu, 15 Aug 2013 18:16:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VA5qB-0003VZ-Nt for bug-gnu-emacs@gnu.org; Thu, 15 Aug 2013 18:16:31 -0400 Received: from mailout2-12.pacific.net.au ([125.255.80.139]:54069 helo=mailout4-syd3.pacific.net.au) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VA5qB-0003UV-DV for bug-gnu-emacs@gnu.org; Thu, 15 Aug 2013 18:16:23 -0400 Received: from mailproxy3-syd3.pacific.net.au (mailproxy3-syd3.pacific.net.au [61.8.2.164]) by mailout4-syd3.pacific.net.au (Postfix) with ESMTP id 53ECF480F79 for ; Fri, 16 Aug 2013 08:16:18 +1000 (EST) Received: from blah.blah (unknown [203.26.175.102]) by mailproxy3-syd3.pacific.net.au (Postfix) with ESMTP id 7D6AF578071 for ; Fri, 16 Aug 2013 08:16:17 +1000 (EST) Received: from gg by blah.blah with local (Exim 4.80) (envelope-from ) id 1VA5ph-00018B-3H for bug-gnu-emacs@gnu.org; Fri, 16 Aug 2013 08:15:53 +1000 From: Kevin Ryde To: bug-gnu-emacs@gnu.org Subject: 24.3; replace-regexp-in-string wrong on \` Date: Fri, 16 Aug 2013 08:15:53 +1000 Message-ID: <87eh9uk3c6.fsf@blah.blah> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -3.4 (---) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.4 (---) replace-regexp-in-string behaves incorrectly if a regexp has \` among its matches. (replace-regexp-in-string "\\`\\|X" "Z" "--XX--" t t) => "Z--ZXZX--" where I expected "Z--ZZ--" This seems to be due to the optimization in replace-regexp-in-string which re-matches on the matched substring. \' can match the substring where it did not match in the middle of the full string. In the example above "X" is the match in the full string, but on taking that "X" as a substring it can match "\\`". Probably similar mismatches on the substring occur for things like \' ^ $ \b \< etc. Maybe the comment in the code about munging the match data would be a better way. In GNU Emacs 24.3.1 (i486-pc-linux-gnu, X toolkit, Xaw3d scroll bars) of 2013-05-29 on blah.blah, modified by Debian System Description: Debian GNU/Linux testing/unstable Configured using: `configure '--build' 'i486-linux-gnu' '--build' 'i486-linux-gnu' '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib' '--localstatedir=/var/lib' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-pop=yes' '--enable-locallisppath=/etc/emacs24:/etc/emacs:/usr/local/share/emacs/24.3/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/24.3/site-lisp:/usr/share/emacs/site-lisp' '--with-crt-dir=/usr/lib/i386-linux-gnu' '--with-x=yes' '--with-x-toolkit=lucid' '--with-toolkit-scroll-bars' '--without-gconf' 'build_alias=i486-linux-gnu' 'CFLAGS=-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Wall' 'LDFLAGS=-Wl,-z,relro -Wl,-znocombreloc' 'CPPFLAGS=-D_FORTIFY_SOURCE=2'' Important settings: value of $LANG: en_AU locale-coding-system: iso-latin-1-unix default enable-multibyte-characters: t From debbugs-submit-bounces@debbugs.gnu.org Sun Mar 06 01:35:21 2016 Received: (at 15107) by debbugs.gnu.org; 6 Mar 2016 06:35:21 +0000 Received: from localhost ([127.0.0.1]:36080 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1acSHg-00016x-JD for submit@debbugs.gnu.org; Sun, 06 Mar 2016 01:35:21 -0500 Received: from mail-ig0-f178.google.com ([209.85.213.178]:32932) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1acQ9g-0005yh-4B for 15107@debbugs.gnu.org; Sat, 05 Mar 2016 23:18:56 -0500 Received: by mail-ig0-f178.google.com with SMTP id ig19so2985514igb.0 for <15107@debbugs.gnu.org>; Sat, 05 Mar 2016 20:18:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to; bh=UFA8re6XnRuDq4Lr4tqpPF1qhWKL6SnhUvKtopeo1h0=; b=lbrifFJvYHr5/NZWu2v4v/QfvcsqUicHcUgsl5vPij9kE8FYOWn+QFx7NmshmVQ3wV Tqy/txRAYDR6RHaeU0ej3r0Q7byIle0wpTnB9y7/vgxzLj6PZM9cS/DZi4IsFZ2+unmC NRAnkg+3N7ksp5FAjFiRMXnOX+/nUE8U0z6cf3JLDie58Lsute3Gey6mewwabNA84pM6 yuTUQapL/QVt9t1yhBuEnMl95fLrOWwtcfESxY4KJxT5GpH3Z/laL8CM5BiJYbhg6/pE 7cXhqtCJ8ptgkaHFzUFp5VV04kfEfa/z20rmcVc/g/JbHhSggvq6LXcfXYrAH2e3MKPc l0yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to; bh=UFA8re6XnRuDq4Lr4tqpPF1qhWKL6SnhUvKtopeo1h0=; b=U+YBJ8vt0Lv7RDdUtoyCDJ4H4jao8rWA79mRXXRvBCduMoeH6G6wA//cZCoeLM+f3J 4z7wXKY3mAubddOf3p+nAnAwTJgsGsWCsWt3Rgj9+OL6FI3nGk/PMfxBxW/H31dIsaur Js8EvWlJBN2jdTcDoASXyhbKkEbMuDCwVnzYb0e9QXYH1luwulqMrH9+iGz3e4KsXjXL 7ssq+achl3mB1UEbqaThF2x/ixAHrDRxGAXG2G8ZgBQbQmk5M1lwoyxCsr3PsD0gX1we P5M6vT256eRXaS/+uzQT84wGuHtWLXgDI4Ohwh8Y1Grb3XNIEaPFDybNsCWpZB1Ke0EO FCCQ== X-Gm-Message-State: AD7BkJJbZwwpJyRdvcsPrVBJ6LMJh6ZkaysUL6tBonh3UJxyW5xAvxbTlFh6BaN5gy6GANwyHI0MWK1gqtzOcg== MIME-Version: 1.0 X-Received: by 10.50.160.9 with SMTP id xg9mr4046625igb.41.1457237930512; Sat, 05 Mar 2016 20:18:50 -0800 (PST) Received: by 10.50.30.234 with HTTP; Sat, 5 Mar 2016 20:18:50 -0800 (PST) Date: Sat, 5 Mar 2016 20:18:50 -0800 Message-ID: Subject: Re: bug#15107: reproduced on emacs-25 branch From: Michael Wright To: 15107@debbugs.gnu.org Content-Type: multipart/alternative; boundary=001a11c31746339b7e052d59a529 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 15107 X-Mailman-Approved-At: Sun, 06 Mar 2016 01:35:18 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --001a11c31746339b7e052d59a529 Content-Type: text/plain; charset=UTF-8 Kevin Ryde writes: > replace-regexp-in-string behaves incorrectly if a regexp has \` among > its matches. > > (replace-regexp-in-string "\\`\\|X" "Z" "--XX--" t t) > => > "Z--ZXZX--" > > where I expected > > "Z--ZZ--" > > This seems to be due to the optimization in replace-regexp-in-string > which re-matches on the matched substring. \' can match the substring > where it did not match in the middle of the full string. In the example > above "X" is the match in the full string, but on taking that "X" as a > substring it can match "\\`". I built the emacs-25 git branch I recreated the above bug today. GNU Emacs 25.0.92.1 (x86_64-apple-darwin13.4.0, NS appkit-1265.21 Version 10.9.5 (Build 13F1507)) of 2016-03-05 --001a11c31746339b7e052d59a529 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Kevin Ryde <u= ser42@zip.com.au> writes:

> replace-= regexp-in-string behaves incorrectly if a regexp has \` among
>= ; its matches.
>
> =C2=A0 =C2=A0 (replace-regexp-= in-string "\\`\\|X" "Z" "--XX--" t t)
> =C2=A0 =C2=A0 =3D>
> =C2=A0 =C2=A0 "Z--ZXZX--&= quot;
>
> where I expected
>
> =C2=A0 =C2=A0 "Z--ZZ--"
>
> Thi= s seems to be due to the optimization in replace-regexp-in-string
> which re-matches on the matched substring. =C2=A0\' can match the= substring
> where it did not match in the middle of the full = string.=C2=A0 In the example
> above "X" is the matc= h in the full string, but on taking that "X" as a
> = substring it can match "\\`".

I built th= e emacs-25 git branch I recreated the above bug today.=C2=A0

=
GNU Emacs 25.0.92.1 (x86_64-apple-darwin13.4.0, NS appkit-1= 265.21 Version 10.9.5 (Build 13F1507))
=C2=A0of 2016-03-05
<= /div>
--001a11c31746339b7e052d59a529-- From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 05 20:30:14 2016 Received: (at control) by debbugs.gnu.org; 6 Aug 2016 00:30:14 +0000 Received: from localhost ([127.0.0.1]:56992 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bVpVG-0005Op-1X for submit@debbugs.gnu.org; Fri, 05 Aug 2016 20:30:14 -0400 Received: from mail-io0-f175.google.com ([209.85.223.175]:35128) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bVpVD-0005Gi-R8 for control@debbugs.gnu.org; Fri, 05 Aug 2016 20:30:12 -0400 Received: by mail-io0-f175.google.com with SMTP id m101so313724503ioi.2 for ; Fri, 05 Aug 2016 17:30:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=3umWcJzRLwTtd2QhdhEtx4DYFwhdXONo4KTGjmN50DY=; b=MGvdOD/76uSinNmK5+KAdcWR+lTiXP1DsMSQzJ13PS+xXeZy8ngtYbo4nMf/WC62yH 62gl797nIOXR0X3FATRz07Jv1nXZ6ZYbBMHWsToF7U7WV6GkIkJ3viupcaHI6jFRm+9+ xRilQopyGFDe6xxBjLv8gnM0OZPTJ5ACaGHIuCiSO8sSyWHI7u+NoCmujO/oeHQSFwQ1 +c+LIjT6n6cdQNq9XCbycSfRaViZU8j0S+/Y3N7YZghUttO7oHwxydvpRVtJr1KuX2bf ACem3UlRXozmDOovebaS0K/PxUn2GgdSti+Ykdr2lsWfI2yrj8WzfoOwTIOVw5sbfZQG ZmaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:subject:references:date :in-reply-to:message-id:user-agent:mime-version; bh=3umWcJzRLwTtd2QhdhEtx4DYFwhdXONo4KTGjmN50DY=; b=LFKV2KXeKpqA6rF8IsFLtDlWxcWknwWstVxlhdnI2RG0JtgJl2z8c+4TqoqXYqs6go cVuFXWTe0T2C9ZMNwxLdfvAPEtjnreepHU8AgUW2Zd/d+AU7AcGih2jf9zTGqg6317Bf ZSwU+S0ljUFtoWhSl9RXLMAXO5fVrQGliPPo1XjNxZAwznTMa55gABjj7EZ0vG4UnBx0 Bi2J5rts1E9cAIGwh19ovV7Zcu36XjRA0Mpv7ZGkEkzzV2U8UWINY2fPQkpgyvHHBIOe JjeMUzpTIQm8jmc61NUBf5+kzSWiVWsfHEorHcyfrIPBbmwpbZkNa0KVrUq0mZE/JtmI bNfg== X-Gm-Message-State: AEkoousCCPrbJrrLzWaLcLNU4EFqBwjFUODpPilI6JvacaAEqcPb+LWhgxliV6FzOU8h1w== X-Received: by 10.107.16.77 with SMTP id y74mr83106938ioi.161.1470443406351; Fri, 05 Aug 2016 17:30:06 -0700 (PDT) Received: from zony (206-188-64-44.cpe.distributel.net. [206.188.64.44]) by smtp.googlemail.com with ESMTPSA id e196sm9094100ioe.3.2016.08.05.17.30.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 05 Aug 2016 17:30:05 -0700 (PDT) From: npostavs@users.sourceforge.net To: control@debbugs.gnu.org Subject: Re: bug#15107: reproduced on emacs-25 branch References: <87eh9uk3c6.fsf@blah.blah> Date: Fri, 05 Aug 2016 20:30:14 -0400 In-Reply-To: (Michael Wright's message of "Sat, 5 Mar 2016 20:18:50 -0800") Message-ID: <87r3a27n8p.fsf@users.sourceforge.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.93 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) tags 15107 confirmed found 15107 25.1 quit Michael Wright writes: > Kevin Ryde writes: > >> replace-regexp-in-string behaves incorrectly if a regexp has \` among >> its matches. >> >> (replace-regexp-in-string "\\`\\|X" "Z" "--XX--" t t) >> => >> "Z--ZXZX--" >> >> where I expected >> >> "Z--ZZ--" >> >> This seems to be due to the optimization in replace-regexp-in-string >> which re-matches on the matched substring. \' can match the substring >> where it did not match in the middle of the full string. In the example >> above "X" is the match in the full string, but on taking that "X" as a >> substring it can match "\\`". > > I built the emacs-25 git branch I recreated the above bug today. Still the case in 25.1-rc1 From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 30 20:08:12 2016 Received: (at 15107) by debbugs.gnu.org; 31 Aug 2016 00:08:12 +0000 Received: from localhost ([127.0.0.1]:44803 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bet4d-00076A-Tn for submit@debbugs.gnu.org; Tue, 30 Aug 2016 20:08:12 -0400 Received: from mail-ua0-f178.google.com ([209.85.217.178]:36167) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1besue-0006ps-Hn for 15107@debbugs.gnu.org; Tue, 30 Aug 2016 19:57:53 -0400 Received: by mail-ua0-f178.google.com with SMTP id m60so60917217uam.3 for <15107@debbugs.gnu.org>; Tue, 30 Aug 2016 16:57:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=NQIpdUmk5t7ass4NzQ0uXBbtB0IEKvfRSpNuHIykHjQ=; b=00LZA77uzpcS/9W4eUEDxxOdwKY+7aJdAMdrWkCUsc2pbX755tEW73l1Y8F7W1Vk6F CRjLM3oZ41km+LdyPmzU9w2/3SBMkEsGGtzrjYnk5KrkrIkfmYI8rkjPm+vFYjtLogQG +zBCEj5ALKSzYGcT7Q+KrRstAva6b5hunoN8M32Eqhk/E0XTpn7cQOi5ypcnMmKnPWqR wOJv8OXJ3EIFvyiL3Qz7/EOyoOfnLRfc1RvJjAV/PpR5pJXwPlxeK51/wzX2LKK7ccxA Nf43Hyxf6Aqiio6jQw6rYn46+jE1am5RsRg9QBMbz9EN7jyq5ax3Ps7cHXsdistATxK2 ah9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=NQIpdUmk5t7ass4NzQ0uXBbtB0IEKvfRSpNuHIykHjQ=; b=MXGpGj79Ao0Qb3Tq9CmUr+LaSbwMCEi1f4VF6zWPXMTyVWrcqNyT57L1Lnm55cJj+5 05UU1svKt/5nDV8nd1SKrEVmL4DGUHRm3UGDHSavveLK0HHeEkq9AnQPMr5F93PDmviG VgdrJGu2j+ess5cOn/Gwcku7mzuwhfTLhORo12x3n9A+F7qOzJ4+0v45bQdxtfNaIm9s 5gireoTVyrzvErbkN7/h79zz2dp9PkSN6EBuN9PLiqNaCe+jR67LPlKDKqTB+9rStWIJ iQ3u6m00tmrJYocnH1F1TxL/YDnxE69DEUYNIvwBZedkIlVvaoeLfksCQfL/UmL0BbVA qz8g== X-Gm-Message-State: AE9vXwOVYZzPTB3fnIA7i0QdIvjl+aM5CEWWr1E5AKxTlL9bFCCbm+3fUgGBwzBDAMDTFdCjCvgy2eMbI4PeFA== X-Received: by 10.31.51.75 with SMTP id z72mr3956402vkz.75.1472601466584; Tue, 30 Aug 2016 16:57:46 -0700 (PDT) MIME-Version: 1.0 From: Erik Anderson Date: Tue, 30 Aug 2016 23:57:35 +0000 Message-ID: Subject: [PATCH] Add replace-regexp-in-string regression test To: 15107@debbugs.gnu.org Content-Type: multipart/alternative; boundary=001a114473a04fc621053b52bf5f X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 15107 X-Mailman-Approved-At: Tue, 30 Aug 2016 20:08:11 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --001a114473a04fc621053b52bf5f Content-Type: text/plain; charset=UTF-8 I can confirm the buggy behavior on emacs 24.5.1 and 25.1.50.1 for Kevin's example as well as: (replace-regexp-in-string "^.\\| ." #'upcase "foo bar") > "Foo bar" (should be "Foo Bar") Some close variants which behave correctly: (replace-regexp-in-string "^F\\| ." #'upcase "foo bar") > "Foo Bar" (replace-regexp-in-string "^.K\\| ." #'upcase "ok corral") > "OK Corral" (replace-regexp-in-string "^..\\| ." #'upcase "ok corral") > "OK Corral" This was discussed here: http://emacs.stackexchange.com/questions/26590/replace-regexp-in-string-stops-replacement-with Here is a regression test for when someone has a chance to tackle this: --- test/lisp/subr-tests.el | 3 +++ 1 file changed, 3 insertions(+) diff --git a/test/lisp/subr-tests.el b/test/lisp/subr-tests.el index ce21290..969d5c2 100644 --- a/test/lisp/subr-tests.el +++ b/test/lisp/subr-tests.el @@ -224,5 +224,8 @@ (error-message-string (should-error (version-to-list "beta22_8alpha3"))) "Invalid version syntax: `beta22_8alpha3' (must start with a number)")))) +(ert-deftest replace-regexp-in-string-test () + (should (equal (replace-regexp-in-string "^.\\| ." #'upcase "foo bar") "Foo Bar"))) + (provide 'subr-tests) ;;; subr-tests.el ends here -- Regards, Erik Anderson. --001a114473a04fc621053b52bf5f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I can confirm the buggy behavior on emacs 24.5.1 and= =C2=A025.1.50.1 for Kevin's example as well as:
(r=
eplace-regexp-in-string "^.\\| ." #'upcase "foo bar"=
;)
> "Foo bar"  (should be "Foo Bar")
=
Some close variants which behave correctly:
(replace-regexp-in-st=
ring "^F\\| ." #'upcase "foo bar")
> "Foo Bar"
(replace-regexp-in-string "^.K\\| ." #'upcase "ok corral=
")
> "OK Corral"
(replace-regexp-in-string "^..\\| ." #'upcase "ok corral=
")
> "OK Corral"
This was discussed here:=C2=A0http://emacs.stackexchange.com/questions/2659= 0/replace-regexp-in-string-stops-replacement-with

Here is a regression test for when someone has a chance to= tackle this:

---
=C2=A0test/lisp/subr-t= ests.el | 3 +++
=C2=A01 file changed, 3 insertions(+)
<= br>
diff --git a/test/lisp/subr-tests.el b/test/lisp/subr-tests.e= l
index ce21290..969d5c2 100644
--- a/test/lisp/subr-te= sts.el
+++ b/test/lisp/subr-tests.el
@@ -224,5 +224,8 @= @
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(error-m= essage-string (should-error (version-to-list "beta22_8alpha3")))<= /div>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0"Inva= lid version syntax: `beta22_8alpha3' (must start with a number)"))= ))
=C2=A0
+(ert-deftest replace-regexp-in-string-test (= )
+ =C2=A0(should (equal (replace-regexp-in-string "^.\\| .&= quot; #'upcase "foo bar") "Foo Bar")))
+<= /div>
=C2=A0(provide 'subr-tests)
=C2=A0;;; subr-tests.el= ends here
--=C2=A0

Regards,
E= rik Anderson.
--001a114473a04fc621053b52bf5f-- From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 31 10:25:03 2016 Received: (at 15107) by debbugs.gnu.org; 31 Aug 2016 14:25:04 +0000 Received: from localhost ([127.0.0.1]:45541 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf6Rm-0008Fj-Is for submit@debbugs.gnu.org; Wed, 31 Aug 2016 10:25:03 -0400 Received: from eggs.gnu.org ([208.118.235.92]:59040) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf6Rf-0008FR-OQ for 15107@debbugs.gnu.org; Wed, 31 Aug 2016 10:24:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bf6RV-00082c-Jd for 15107@debbugs.gnu.org; Wed, 31 Aug 2016 10:24:46 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_05,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:44154) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bf6RV-00082W-G6; Wed, 31 Aug 2016 10:24:41 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2911 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bf6RT-00031E-Mf; Wed, 31 Aug 2016 10:24:40 -0400 Date: Wed, 31 Aug 2016 17:24:39 +0300 Message-Id: <83twe1kod4.fsf@gnu.org> From: Eli Zaretskii To: Erik Anderson In-reply-to: (message from Erik Anderson on Tue, 30 Aug 2016 23:57:35 +0000) Subject: Re: bug#15107: [PATCH] Add replace-regexp-in-string regression test References: <87eh9uk3c6.fsf@blah.blah> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.5 (------) X-Debbugs-Envelope-To: 15107 Cc: 15107@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.5 (------) > From: Erik Anderson > Date: Tue, 30 Aug 2016 23:57:35 +0000 > > I can confirm the buggy behavior on emacs 24.5.1 and 25.1.50.1 for Kevin's example as well as: > > (replace-regexp-in-string "^.\\| ." #'upcase "foo bar") > > "Foo bar" (should be "Foo Bar") Maybe I'm missing something, but I don't see why this is a bug. The input string "foo bar" matches the "^." alternative in its entirety, so there's no reason to expect Emacs to apply 'upcase' twice. From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 31 10:36:38 2016 Received: (at 15107) by debbugs.gnu.org; 31 Aug 2016 14:36:38 +0000 Received: from localhost ([127.0.0.1]:45551 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf6cz-00005m-QJ for submit@debbugs.gnu.org; Wed, 31 Aug 2016 10:36:38 -0400 Received: from mail-ua0-f181.google.com ([209.85.217.181]:35071) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf6cu-00005W-77 for 15107@debbugs.gnu.org; Wed, 31 Aug 2016 10:36:32 -0400 Received: by mail-ua0-f181.google.com with SMTP id i32so91737430uai.2 for <15107@debbugs.gnu.org>; Wed, 31 Aug 2016 07:36:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=j+JU0TpPgny1Qq2vBUHqlDGiaLT3IYh5TclWjSfimMo=; b=thkXDhE+SUkQSOEGZlTtJvpXquYOPvw9AHVYwJ3dX+4mapLgn1D1Poyu+bR1X+A7tl HajyiMiPv2PPaj/c60r3QrrTETxWSFp2Ce9ChNcbyv+mgLABcYl/iwStoDmCNF/EvvdK JdmABkwW/LAcwhmHKXOVbEVEWQmC0Ylfw/nDQwM8AGa1sSaQHGTit4nxAl2uEJ++89zC UBprtLut/pm+iwZmleZfKdUixwEbOP/EqhgebGNN4KMGr8j5WeeAWzG9DBb1NK5QBz0A wd7XT+b/4qXv6JHSvDUA12nqx2q0zwk0wMxjnIf+K5r8tYZorTR74keAVElKtEEAm/IC u1zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=j+JU0TpPgny1Qq2vBUHqlDGiaLT3IYh5TclWjSfimMo=; b=d6rhZWtJPpXLJfC7gERk9Bl01s1H4SIqF5/wpuIQPcFMSXvN59iSXpG6v2XVZyXhOT SkfYkjAwF2v50Zpr30qiz6e7HWYjPiM4K1iywTo9fG3mCZ1osYTdFYbGIISI5hgnRwvu YFY+Nu8q1QVoDCegAu6fExPT1eDcW9C30drdFJAxYca5Wy+L79/f1efvFmPTv8RmrIhw vUGBfkroeCkXcGko/Fi7bIhu701Wai69cEULbZL1Z1SRsG8o0m+f/5qdHsqrmU3SK6uY JOTJ5N8WalYrj5HoHNaskgRgqfkTNqIxJFtPTbFp3OHjRjSL3+fwsBpJnDm1OpB9PVvO 6tQQ== X-Gm-Message-State: AE9vXwOvSv5wWawtWO3ZikbVpWVPpjUdY4tQIyzs7S2XpmT2uPDCu2rdct/7vphynO1tqRJdCtrqVkkDUMRqSw== X-Received: by 10.176.4.129 with SMTP id 1mr5873896uaw.26.1472654179363; Wed, 31 Aug 2016 07:36:19 -0700 (PDT) MIME-Version: 1.0 References: <87eh9uk3c6.fsf@blah.blah> <83twe1kod4.fsf@gnu.org> In-Reply-To: <83twe1kod4.fsf@gnu.org> From: Erik Anderson Date: Wed, 31 Aug 2016 14:36:06 +0000 Message-ID: Subject: Re: bug#15107: [PATCH] Add replace-regexp-in-string regression test To: Eli Zaretskii Content-Type: multipart/alternative; boundary=94eb2c12580a3cf286053b5f051a X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 15107 Cc: 15107@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --94eb2c12580a3cf286053b5f051a Content-Type: text/plain; charset=UTF-8 Per the replace-regexp-in-string docstring: "Replace all matches for REGEXP with REP in STRING." My email was a comment to an existing open bug from 2013-08-15: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=15107 On Wed, Aug 31, 2016 at 9:25 AM Eli Zaretskii wrote: > > From: Erik Anderson > > Date: Tue, 30 Aug 2016 23:57:35 +0000 > > > > I can confirm the buggy behavior on emacs 24.5.1 and 25.1.50.1 for > Kevin's example as well as: > > > > (replace-regexp-in-string "^.\\| ." #'upcase "foo bar") > > > "Foo bar" (should be "Foo Bar") > > Maybe I'm missing something, but I don't see why this is a bug. The > input string "foo bar" matches the "^." alternative in its entirety, > so there's no reason to expect Emacs to apply 'upcase' twice. > --94eb2c12580a3cf286053b5f051a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Per the replace-regexp-in-string docstring: "Rep= lace all matches for REGEXP with REP in STRING."

My = email was a comment to an existing open bug from 2013-08-15: http://debbugs.gnu.org/c= gi/bugreport.cgi?bug=3D15107

=
On Wed, Aug 31, 2016 at 9:25 AM Eli Zaretskii <eliz@gnu.org> wrote:
> From: Erik Anderson <erikbpanderson@gmail.com>
> Date: Tue, 30 Aug 2016 23:57:35 +0000
>
> I can confirm the buggy behavior on emacs 24.5.1 and 25.1.50.1 for Kev= in's example as well as:
>
> (replace-regexp-in-string "^.\\| ." #'upcase "foo b= ar")
> > "Foo bar"=C2=A0 (should be "Foo Bar")

Maybe I'm missing something, but I don't see why this is a bug.=C2= =A0 The
input string "foo bar" matches the "^." alternative in = its entirety,
so there's no reason to expect Emacs to apply 'upcase' twice.
--94eb2c12580a3cf286053b5f051a-- From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 31 11:01:34 2016 Received: (at 15107) by debbugs.gnu.org; 31 Aug 2016 15:01:34 +0000 Received: from localhost ([127.0.0.1]:45568 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf718-0000j2-Ea for submit@debbugs.gnu.org; Wed, 31 Aug 2016 11:01:34 -0400 Received: from eggs.gnu.org ([208.118.235.92]:39826) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf712-0000if-EA for 15107@debbugs.gnu.org; Wed, 31 Aug 2016 11:01:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bf70u-0007e7-75 for 15107@debbugs.gnu.org; Wed, 31 Aug 2016 11:01:19 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:44586) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bf70u-0007e3-3g; Wed, 31 Aug 2016 11:01:16 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2968 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bf70s-0001xd-6U; Wed, 31 Aug 2016 11:01:14 -0400 Date: Wed, 31 Aug 2016 18:01:12 +0300 Message-Id: <83k2exkmo7.fsf@gnu.org> From: Eli Zaretskii To: Erik Anderson In-reply-to: (message from Erik Anderson on Wed, 31 Aug 2016 14:36:06 +0000) Subject: Re: bug#15107: [PATCH] Add replace-regexp-in-string regression test References: <87eh9uk3c6.fsf@blah.blah> <83twe1kod4.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.5 (------) X-Debbugs-Envelope-To: 15107 Cc: 15107@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.5 (------) > From: Erik Anderson > Date: Wed, 31 Aug 2016 14:36:06 +0000 > Cc: 15107@debbugs.gnu.org > > Per the replace-regexp-in-string docstring: "Replace all matches for REGEXP with REP in STRING." Yes, and there is a single match in this case, so a single replacement. The _entire_ input string matches the regexp, so after that match there's nothing else left to match. What am I missing? From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 31 11:13:21 2016 Received: (at 15107) by debbugs.gnu.org; 31 Aug 2016 15:13:21 +0000 Received: from localhost ([127.0.0.1]:45577 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf7CX-000101-6K for submit@debbugs.gnu.org; Wed, 31 Aug 2016 11:13:21 -0400 Received: from mail-oi0-f43.google.com ([209.85.218.43]:35289) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf7CS-0000zj-5k for 15107@debbugs.gnu.org; Wed, 31 Aug 2016 11:13:16 -0400 Received: by mail-oi0-f43.google.com with SMTP id p186so39559553oia.2 for <15107@debbugs.gnu.org>; Wed, 31 Aug 2016 08:13:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=8K5cwjHMrXwT9WNLRRDJAeJVjiUilv7/XgREPsncCuw=; b=YVMExcNOECy0oM/J6YQH3Pfkl9xug+hlv+Ztj4Fd5Vzk+f250R2inveUJgq4CahlhP HYma7wBx55BlzZ0G89dpPoHeQpCxm3Pk7N5PbvpX6IUWl7s3uDaER7Q89pFDnb5bUNHP iebjJYEyhzx7s+bi/6cD4AF4qde1e2fh9Xl2p4abaE3e1K9mRggRPjbPc0pD4jkKSqZ/ tp+vS1YS58j/GqS15DSGfErLLrKX7cpTbMz7RkXjkeODn1uYzGf/1Rj4OiK76fgyCD0G BI9jHrkfbpPmRe0onUL6WTcPMgDRT4ht4UujSnaP1koKzq/IQRWavwbC+PyLfwh/asAI qyEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=8K5cwjHMrXwT9WNLRRDJAeJVjiUilv7/XgREPsncCuw=; b=ai7ayxct2cKaTuxPp6DGStw7Rg/49NjovTF9GOyCPMc5bo9dpicbGRdJ5/BgZfMFLz gIRckFxw1SPk/Er9q4Yau71KK2WJ4YYR1OFFGXnqcildNdCe0rvLP7KcqVfTfXh5Tk9x RauEsaKNSTT33Febi01Gpk3OcsZzaAvlQO8jU08mWY0fMSvrQ7w35ZKnq+m03KldJFDk 8WXPVs+ymwoqQ0TkbvxotqF3r2ZZqGfjM2pHuu9O/ualM5EhWDRYKLJaDcbctwu6BYco SQJk9NrarGb5iqieO90eer7ucSEpkLGC+CJcZ1/P1wkSJ3XFGNezl0D4hlwieaKiRv5c vNYg== X-Gm-Message-State: AE9vXwP0CTO6Xh9IoZk+wC7ZHuJzV6MWF0vuG+jSgC0XEKtlafMZxSLVZWFYoSF13Lq/I0EEfB5O1x+J15HKqQ== X-Received: by 10.157.49.81 with SMTP id v17mr10440074otd.134.1472656385600; Wed, 31 Aug 2016 08:13:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.11.125 with HTTP; Wed, 31 Aug 2016 08:13:05 -0700 (PDT) In-Reply-To: <83k2exkmo7.fsf@gnu.org> References: <87eh9uk3c6.fsf@blah.blah> <83twe1kod4.fsf@gnu.org> <83k2exkmo7.fsf@gnu.org> From: Noam Postavsky Date: Wed, 31 Aug 2016 11:13:05 -0400 X-Google-Sender-Auth: hokkvCqJhs6UzGa8TSvusD9gu1o Message-ID: Subject: Re: bug#15107: [PATCH] Add replace-regexp-in-string regression test To: Eli Zaretskii Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 15107 Cc: Erik Anderson , 15107@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Wed, Aug 31, 2016 at 11:01 AM, Eli Zaretskii wrote: >> From: Erik Anderson >> Date: Wed, 31 Aug 2016 14:36:06 +0000 >> Cc: 15107@debbugs.gnu.org >> >> Per the replace-regexp-in-string docstring: "Replace all matches for REGEXP with REP in STRING." > > Yes, and there is a single match in this case, so a single > replacement. The _entire_ input string matches the regexp, so after > that match there's nothing else left to match. > > What am I missing? "^." matches only the first character of "foo bar", but maybe you have a different idea of "matches" than I do. I would consider "^..*" to match the whole string. From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 31 11:33:24 2016 Received: (at 15107) by debbugs.gnu.org; 31 Aug 2016 15:33:24 +0000 Received: from localhost ([127.0.0.1]:45593 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf7Vw-0001Wa-Hc for submit@debbugs.gnu.org; Wed, 31 Aug 2016 11:33:24 -0400 Received: from mail-ua0-f178.google.com ([209.85.217.178]:34566) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf7Vq-0001WH-Jy for 15107@debbugs.gnu.org; Wed, 31 Aug 2016 11:33:18 -0400 Received: by mail-ua0-f178.google.com with SMTP id q42so36554710uaq.1 for <15107@debbugs.gnu.org>; Wed, 31 Aug 2016 08:33:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Bq8yznsHtGggJtvbxdSxjwhewREsOpqfLsDYiI777T0=; b=oxZ/7AIp2RIlNoz3YU5P1WPbQp9W9lmuyslx2jWOetVOwjQb7xOpD+aYKZmrfXdPqI pWQsv83ZbKTkOduYfYkWvlG1Z56mu9d17zFzrPvl6yeaYpROzrgDkS6Cr2zDyB0RBNFS agRdHifwL+dTTIoAK3tZwnSWkKiIUNEcS0adU9awKoSOGgK2EhyZRHcML6ihsPSJ2qfK /cyQSK8I+AWt6pBO7Ne9ihJ+hR7nenWsfjEIIEI40nbKdqkauM93Opc2z+JNpeGrrGQc lC03Ti9ui9yyE1+ZZarclTAazSJ/1hI6yeSzOBOADjT4ZnV38uemK9+nXxC578pGfDx6 onHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Bq8yznsHtGggJtvbxdSxjwhewREsOpqfLsDYiI777T0=; b=SCuj5XGoiePG1LGToP1GYnvAj/zsr/kT+ePwNc+GxCXTdkSoI8oBU8gqli/YlyFQov m76cnAF0PG/ngw2b7L+X5ribvVC1ywqi8VEAovH1msbN6m7ZnLWYoJ3QhVbR4B8oNvQc pZg0OblWuzXYpb0mf6OmgudjHqt3Co509Uwgo2Z3iJWJO0P4QlhvvLPnyn+0dLcsH4Ry fClF0auQl5ggsv3hDgfBGImeonossY2tuB0xX34YIu2JrgD4j/FKMR7ofwC7V4o+aSSc Mmld+ej6UBNv7JIXJbvcYoUJnP9sa2Wy14qgQerEo4E6D+mHMjOhTVyYH6Ls6V0ciLV0 SnOw== X-Gm-Message-State: AE9vXwN0TobEAlS+I5Ne0HHAH7+UfczO4RV6Shv5Ar6ZGkLOEEc4Q1dNI24MTYpbq/z56y5BwuZ1L5ImNdJtKg== X-Received: by 10.176.4.129 with SMTP id 1mr6083460uaw.26.1472657589146; Wed, 31 Aug 2016 08:33:09 -0700 (PDT) MIME-Version: 1.0 References: <87eh9uk3c6.fsf@blah.blah> <83twe1kod4.fsf@gnu.org> <83k2exkmo7.fsf@gnu.org> In-Reply-To: From: Erik Anderson Date: Wed, 31 Aug 2016 15:32:58 +0000 Message-ID: Subject: Re: bug#15107: [PATCH] Add replace-regexp-in-string regression test To: Noam Postavsky , Eli Zaretskii Content-Type: multipart/alternative; boundary=94eb2c12580a7a1979053b5fd020 X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 15107 Cc: 15107@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) --94eb2c12580a7a1979053b5fd020 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I suspect that since ".*" is such a commonly used term in regexps, Eli might be misreading the regexp. >From the Emacs manual on regular expression special characters: "=E2=80=98.=E2=80=99 (Period) is a special character that matches any single character except a newline. Using concatenation, we can make regular expressions like =E2=80=98a.b=E2= =80=99, which matches any three-character string that begins with =E2=80=98a=E2=80=99 and= ends with =E2=80=98b=E2=80=99." You can verify the behavior of "." (string-match "^." "No greedy modifiers here") (match-data) > (0 1) (string-match "^.*" "This has a greedy modifier") (match-data) > (0 26) This is a helpful document: https://www.gnu.org/software/emacs/manual/html_node/elisp/Regexp-Special.ht= ml#Regexp-Special Further discussion should be moved off this list. -Erik. On Wed, Aug 31, 2016 at 10:13 AM Noam Postavsky < npostavs@users.sourceforge.net> wrote: > On Wed, Aug 31, 2016 at 11:01 AM, Eli Zaretskii wrote: > >> From: Erik Anderson > >> Date: Wed, 31 Aug 2016 14:36:06 +0000 > >> Cc: 15107@debbugs.gnu.org > >> > >> Per the replace-regexp-in-string docstring: "Replace all matches for > REGEXP with REP in STRING." > > > > Yes, and there is a single match in this case, so a single > > replacement. The _entire_ input string matches the regexp, so after > > that match there's nothing else left to match. > > > > What am I missing? > > "^." matches only the first character of "foo bar", but maybe you have > a different idea of "matches" than I do. I would consider "^..*" to > match the whole string. > --94eb2c12580a7a1979053b5fd020 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I suspect that since ".*" is such a commonl= y used term in regexps, Eli might be misreading the regexp.

From the Emacs manual on regular expression special characters:
&= quot;=E2=80=98.=E2=80=99 (Pe= riod)

is a special character that matches any single character except a newlin= e. Using concatenation, we can make regular expressions like =E2=80=98a.= b=E2=80=99, which matches any three-character string that begins with =E2=80=98a= =E2=80=99 and ends with =E2=80=98b=E2=80=99."

You can verify the be= havior of "."

(string-match "^." "No g= reedy modifiers here")
(match-data)
> (0 1)

(string-ma= tch "^.*" "This has a greedy modifier")
(match-data)=
> (0 26)


Furth= er discussion should be moved off this list.

-Erik.

On Wed, Aug 31, 20= 16 at 10:13 AM Noam Postavsky <npostavs@users.sourceforge.net> wrote:
On Wed, Aug 31, 2016 at 11:01 AM, Eli= Zaretskii <eliz@gnu.o= rg> wrote:
>> From: Erik Anderson <erikbpanderson@gmail.com>
>> Date: Wed, 31 Aug 2016 14:36:06 +0000
>> Cc: 151= 07@debbugs.gnu.org
>>
>> Per the replace-regexp-in-string docstring: "Replace all matc= hes for REGEXP with REP in STRING."
>
> Yes, and there is a single match in this case, so a single
> replacement.=C2=A0 The _entire_ input string matches the regexp, so af= ter
> that match there's nothing else left to match.
>
> What am I missing?

"^." matches only the first character of "foo bar", but= maybe you have
a different idea of "matches" than I do. I would consider "^= ..*" to
match the whole string.
--94eb2c12580a7a1979053b5fd020-- From debbugs-submit-bounces@debbugs.gnu.org Wed Aug 31 12:05:11 2016 Received: (at 15107) by debbugs.gnu.org; 31 Aug 2016 16:05:11 +0000 Received: from localhost ([127.0.0.1]:45608 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf80h-0002IP-G1 for submit@debbugs.gnu.org; Wed, 31 Aug 2016 12:05:11 -0400 Received: from eggs.gnu.org ([208.118.235.92]:54167) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bf80c-0002Ho-H6 for 15107@debbugs.gnu.org; Wed, 31 Aug 2016 12:05:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bf80S-0004Su-GQ for 15107@debbugs.gnu.org; Wed, 31 Aug 2016 12:04:57 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_05,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:45366) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bf80S-0004Sf-D0; Wed, 31 Aug 2016 12:04:52 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:3190 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bf80Q-0006or-D1; Wed, 31 Aug 2016 12:04:50 -0400 Date: Wed, 31 Aug 2016 19:04:49 +0300 Message-Id: <83eg54lyam.fsf@gnu.org> From: Eli Zaretskii To: Noam Postavsky In-reply-to: (message from Noam Postavsky on Wed, 31 Aug 2016 11:13:05 -0400) Subject: Re: bug#15107: [PATCH] Add replace-regexp-in-string regression test References: <87eh9uk3c6.fsf@blah.blah> <83twe1kod4.fsf@gnu.org> <83k2exkmo7.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.5 (------) X-Debbugs-Envelope-To: 15107 Cc: erikbpanderson@gmail.com, 15107@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.5 (------) > From: Noam Postavsky > Date: Wed, 31 Aug 2016 11:13:05 -0400 > Cc: Erik Anderson , 15107@debbugs.gnu.org > > "^." matches only the first character of "foo bar", but maybe you have > a different idea of "matches" than I do. I would consider "^..*" to > match the whole string. OMG, I was sure the * was there! Sorry. From debbugs-submit-bounces@debbugs.gnu.org Thu Sep 01 11:47:01 2016 Received: (at 15107) by debbugs.gnu.org; 1 Sep 2016 15:47:01 +0000 Received: from localhost ([127.0.0.1]:46598 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bfUCe-00050v-3C for submit@debbugs.gnu.org; Thu, 01 Sep 2016 11:47:01 -0400 Received: from eggs.gnu.org ([208.118.235.92]:47635) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bfUCY-00050f-31 for 15107@debbugs.gnu.org; Thu, 01 Sep 2016 11:46:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bfUCP-0001y6-OG for 15107@debbugs.gnu.org; Thu, 01 Sep 2016 11:46:44 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:34233) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bfUCP-0001y2-Kh; Thu, 01 Sep 2016 11:46:41 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1119 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bfUCO-0006yb-9o; Thu, 01 Sep 2016 11:46:41 -0400 Date: Thu, 01 Sep 2016 18:46:35 +0300 Message-Id: <83d1knfwro.fsf@gnu.org> From: Eli Zaretskii To: Erik Anderson In-reply-to: (message from Erik Anderson on Tue, 30 Aug 2016 23:57:35 +0000) Subject: Re: bug#15107: [PATCH] Add replace-regexp-in-string regression test References: <87eh9uk3c6.fsf@blah.blah> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.5 (------) X-Debbugs-Envelope-To: 15107 Cc: 15107@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.5 (------) > From: Erik Anderson > Date: Tue, 30 Aug 2016 23:57:35 +0000 > > (replace-regexp-in-string "^.\\| ." #'upcase "foo bar") > > "Foo bar" (should be "Foo Bar") It looks like an algorithmic design flaw. Here's the relevant part of replace-regexp-in-string: (while (and (< start l) (string-match regexp string start)) (setq mb (match-beginning 0) me (match-end 0)) ;; If we matched the empty string, make sure we advance by one char (when (= me mb) (setq me (min l (1+ mb)))) ;; Generate a replacement for the matched substring. ;; Operate only on the substring to minimize string consing. ;; Set up match data for the substring for replacement; ;; presumably this is likely to be faster than munging the ;; match data directly in Lisp. (string-match regexp (setq str (substring string mb me))) (setq matches (cons (replace-match (if (stringp rep) rep (funcall rep (match-string 0 str))) fixedcase literal str subexp) As you see, it first matches the (rest of the) string against REGEXP, then takes the substring that matched, and matches that substring again. But the evident assumption that the match in the substring will yield the same result is false. In this case, the substring of "oo bar" that matches "^.\\| ." is " b", but matching it again against the same regexp yields just " ", because the first alternative matches. So 'upcase' is applied to the blank, and the rest is history. From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 25 09:58:30 2020 Received: (at control) by debbugs.gnu.org; 25 Nov 2020 14:58:30 +0000 Received: from localhost ([127.0.0.1]:36646 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1khwFZ-0006g3-OU for submit@debbugs.gnu.org; Wed, 25 Nov 2020 09:58:29 -0500 Received: from mail33c50.megamailservers.eu ([91.136.10.43]:38316) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1khwFX-0006fr-2u; Wed, 25 Nov 2020 09:58:28 -0500 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1606316305; bh=KgQ7vsrvrRieaUxw0hGKexXZ5ZQtp+ZgIogmG8OeuEw=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=coRSS68n7BPlmU+oT4ltbhydtaCQ3KXPgtWdHIUivYHz/rG8soVoaYwaYiITyYnaN wXVHk6atVA+qfv7TTiHWAIH4k8QMEl/nG61VGMVHGkX0jnaIq96mdFxoOCp8ASwFdF a0UGWvbvNiBoNviww8VgDW03T56PxGrVk0LDle2c= Feedback-ID: mattiase@acm.or Received: from stanniol.lan (c-064ae655.032-75-73746f71.bbcust.telenor.se [85.230.74.6]) (authenticated bits=0) by mail33c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id 0APEwNbf004969; Wed, 25 Nov 2020 14:58:24 +0000 Content-Type: multipart/mixed; boundary="Apple-Mail=_00941A56-F79B-4BEA-9413-E45E955F9583" Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.17\)) Subject: Re: bug#44861: 27.1; [PATCH] signal in `replace-regexp-in-string' From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= In-Reply-To: <6F768DED-2E1B-4D06-A776-FFA162AC32AD@acm.org> Date: Wed, 25 Nov 2020 15:58:22 +0100 Message-Id: <97535AF5-D542-4267-A5A9-1483C32A61AC@acm.org> References: <6F768DED-2E1B-4D06-A776-FFA162AC32AD@acm.org> To: Shigeru Fukaya X-Mailer: Apple Mail (2.3445.104.17) X-CTCH-RefID: str=0001.0A782F23.5FBE7111.004D, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Rules: X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.3 cv=C6KXNjH+ c=1 sm=1 tr=0 a=Ni+dBsiEfW2GqKMPYZim9A==:117 a=Ni+dBsiEfW2GqKMPYZim9A==:17 a=M51BFTxLslgA:10 a=4GVXjkHO3qP_L8fHQkQA:9 a=CjuIK1q_8ugA:10 a=neCVTB8LpOVk-CJV-C0A:9 a=B2y7HmGcmWMA:10 X-Origin-Country: SE X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: control Cc: 44861@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) --Apple-Mail=_00941A56-F79B-4BEA-9413-E45E955F9583 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii forcemerge 15107 44861 stop Suggested patch attached. A small test suite for = replace-regexp-in-string has already been pushed to master -- very = rudimentary, but better than nothing -- and the patch amends it with = some new relevant cases that didn't work before. It is basically your patch but slightly optimised; it turned out that = the function call and allocation overhead of the original patch made it = a tad too expensive (a pity, because it was very neat). Now performance = is about the same as before when the pattern contains no submatches, and = slightly above (< 10% slower) with one submatch. It seems worth the = correctness. --Apple-Mail=_00941A56-F79B-4BEA-9413-E45E955F9583 Content-Disposition: attachment; filename=0001-Fix-replace-regexp-in-string-substring-match-data-tr.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="0001-Fix-replace-regexp-in-string-substring-match-data-tr.patch" Content-Transfer-Encoding: quoted-printable =46rom=209bc8dc80be5cee517fa53e6b8f37881d4220f162=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20=3D?UTF-8?q?Mattias=3D20Engdeg=3DC3=3DA5rd?=3D=20= =0ADate:=20Wed,=2025=20Nov=202020=2015:32:08=20+0100=0A= Subject:=20[PATCH]=20Fix=20replace-regexp-in-string=20substring=20match=20= data=20translation=0A=0AFor=20certain=20patterns,=20re-matching=20the=20= same=20regexp=20on=20the=20matched=0Asubstring=20does=20not=20produce=20= correctly=20translated=20match=20data=0A(bug#15107=20and=20bug#44861).=0A= =0AReported=20by=20Kevin=20Ryde=20and=20Shigeru=20Fukaya.=0A=0A*=20= lisp/subr.el=20(replace-regexp-in-string):=20Translate=20the=20match=20= data=0Aby=20explicit=20manipulation=20instead=20of=20trusting=20a=20call=20= to=20string-match=20on=0Athe=20matched=20string=20to=20do=20the=20job.=0A= *=20test/lisp/subr-tests.el=20(subr-replace-regexp-in-string):=0AAdd=20= test=20cases.=0A---=0A=20lisp/subr.el=20=20=20=20=20=20=20=20=20=20=20=20= |=2017=20++++++++++++-----=0A=20test/lisp/subr-tests.el=20|=20=206=20= +++++-=0A=202=20files=20changed,=2017=20insertions(+),=206=20= deletions(-)=0A=0Adiff=20--git=20a/lisp/subr.el=20b/lisp/subr.el=0Aindex=20= 1fb0f9ab7e..0ee2199933=20100644=0A---=20a/lisp/subr.el=0A+++=20= b/lisp/subr.el=0A@@=20-4537,7=20+4537,7=20@@=20replace-regexp-in-string=0A= =20=20=20;;=20might=20be=20reasonable=20to=20do=20so=20for=20long=20= enough=20STRING.]=0A=20=20=20(let=20((l=20(length=20string))=0A=20=09= (start=20(or=20start=200))=0A-=09matches=20str=20mb=20me)=0A+=09matches=20= str=20mb=20me=20md)=0A=20=20=20=20=20(save-match-data=0A=20=20=20=20=20=20= =20(while=20(and=20(<=20start=20l)=20(string-match=20regexp=20string=20= start))=0A=20=09(setq=20mb=20(match-beginning=200)=0A@@=20-4546,10=20= +4546,17=20@@=20replace-regexp-in-string=0A=20=09(when=20(=3D=20me=20mb)=20= (setq=20me=20(min=20l=20(1+=20mb))))=0A=20=09;;=20Generate=20a=20= replacement=20for=20the=20matched=20substring.=0A=20=09;;=20Operate=20on=20= only=20the=20substring=20to=20minimize=20string=20consing.=0A-=09;;=20= Set=20up=20match=20data=20for=20the=20substring=20for=20replacement;=0A-=09= ;;=20presumably=20this=20is=20likely=20to=20be=20faster=20than=20munging=20= the=0A-=09;;=20match=20data=20directly=20in=20Lisp.=0A-=09(string-match=20= regexp=20(setq=20str=20(substring=20string=20mb=20me)))=0A+=0A+=20=20=20=20= =20=20=20=20;;=20Translate=20the=20match=20data=20so=20that=20it=20= applies=20to=20the=20matched=20substring.=0A+=20=20=20=20=20=20=20=20= (setq=20md=20(match-data=20nil=20md=20t))=20=20;=20Reuse=20list=20from=20= previous=20match.=0A+=20=20=20=20=20=20=20=20(let=20((m=20md))=0A+=20=20=20= =20=20=20=20=20=20=20(while=20m=0A+=20=20=20=20=20=20=20=20=20=20=20=20= (when=20(car=20m)=0A+=20=20=20=20=20=20=20=20=20=20=20=20=20=20(setcar=20= m=20(-=20(car=20m)=20mb)))=0A+=20=20=20=20=20=20=20=20=20=20=20=20(setq=20= m=20(cdr=20m)))=0A+=20=20=20=20=20=20=20=20=20=20(set-match-data=20md))=0A= +=0A+=20=20=20=20=20=20=20=20(setq=20str=20(substring=20string=20mb=20= me))=0A=20=09(setq=20matches=0A=20=09=20=20=20=20=20=20(cons=20= (replace-match=20(if=20(stringp=20rep)=0A=20=09=09=09=09=20=20=20=20=20=20= =20rep=0Adiff=20--git=20a/test/lisp/subr-tests.el=20= b/test/lisp/subr-tests.el=0Aindex=20c77be511dc..67f7fc9749=20100644=0A= ---=20a/test/lisp/subr-tests.el=0A+++=20b/test/lisp/subr-tests.el=0A@@=20= -545,7=20+545,11=20@@=20subr-replace-regexp-in-string=0A=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= (match-beginning=201)=20(match-end=201)))=0A=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20"babbcaacabc")=0A=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20"ba"))=0A= -=20=20)=0A+=20=20;;=20anchors=20(bug#15107,=20bug#44861)=0A+=20=20= (should=20(equal=20(replace-regexp-in-string=20"a\\B"=20"b"=20"a=20= aaaa")=0A+=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20"a=20= bbba"))=0A+=20=20(should=20(equal=20(replace-regexp-in-string=20= "\\`\\|x"=20"z"=20"--xx--")=0A+=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20"z--zz--")))=0A=20=0A=20(provide=20'subr-tests)=0A=20;;;=20= subr-tests.el=20ends=20here=0A--=20=0A2.21.1=20(Apple=20Git-122.3)=0A=0A= --Apple-Mail=_00941A56-F79B-4BEA-9413-E45E955F9583-- From unknown Tue Jun 17 22:18:05 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 28 Dec 2020 12:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator