From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 26 01:50:32 2012 Received: (at submit) by debbugs.gnu.org; 26 Jul 2012 05:50:33 +0000 Received: from localhost ([127.0.0.1]:40167 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SuGxz-0001AO-JA for submit@debbugs.gnu.org; Thu, 26 Jul 2012 01:50:32 -0400 Received: from eggs.gnu.org ([208.118.235.92]:34742) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SuGxx-0001AF-6J for submit@debbugs.gnu.org; Thu, 26 Jul 2012 01:50:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SuGrL-0002VS-8T for submit@debbugs.gnu.org; Thu, 26 Jul 2012 01:43:40 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:43544) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SuGrL-0002VO-5Z for submit@debbugs.gnu.org; Thu, 26 Jul 2012 01:43:39 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46558) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SuGrK-0002sl-5f for bug-gnu-emacs@gnu.org; Thu, 26 Jul 2012 01:43:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SuGrJ-0002VC-8c for bug-gnu-emacs@gnu.org; Thu, 26 Jul 2012 01:43:38 -0400 Received: from rcsinet15.oracle.com ([148.87.113.117]:22187) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SuGrJ-0002V6-1o for bug-gnu-emacs@gnu.org; Thu, 26 Jul 2012 01:43:37 -0400 Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by rcsinet15.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id q6Q5hWNO009215 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Thu, 26 Jul 2012 05:43:35 GMT Received: from acsmt358.oracle.com (acsmt358.oracle.com [141.146.40.158]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id q6Q5hWIG003704 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 26 Jul 2012 05:43:32 GMT Received: from abhmt112.oracle.com (abhmt112.oracle.com [141.146.116.64]) by acsmt358.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id q6Q5hVL2022060 for ; Thu, 26 Jul 2012 00:43:32 -0500 Received: from dradamslap1 (/71.202.147.44) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 25 Jul 2012 22:43:31 -0700 From: "Drew Adams" To: Subject: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Wed, 25 Jul 2012 22:43:29 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 Thread-Index: Ac1q8ZWiklG+0IdrSMygaC9KJ5GsPg== X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 1) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.1 (------) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.1 (------) emacs -Q (defface foo '((t (:background "Yellow"))) "" :group 'faces) (setq nobreak-char-display nil) (font-lock-add-keywords nil '(("[\240]+" (0 'foo t))) 'APPEND) Insert a no-break space: C-x 8 RET no-break-space (or C-q 240 RET) Turn font-lock-mode off, then back on. With point before the no-break-space, C-u C-x =. That shows that the character is indeed a no-break-space, and there is no face on it. In Emacs 22, the char is shown clearly in face foo. Am I missing something? The same recipe with non-breaking-hyphen highlights that character fine. What is different about no-break-space? Shouldn't it be treated similarly? This works in Emacs 22 but stops working in Emacs 23. Normal? Regression? In GNU Emacs 24.1.1 (i386-mingw-nt5.1.2600) of 2012-06-10 on MARVIN Windowing system distributor `Microsoft Corp.', version 5.1.2600 Configured using: `configure --with-gcc (4.6) --cflags -ID:/devel/emacs/libs/libXpm-3.5.8/include -ID:/devel/emacs/libs/libXpm-3.5.8/src -ID:/devel/emacs/libs/libpng-dev_1.4.3-1/include -ID:/devel/emacs/libs/zlib-dev_1.2.5-2/include -ID:/devel/emacs/libs/giflib-4.1.4-1/include -ID:/devel/emacs/libs/jpeg-6b-4/include -ID:/devel/emacs/libs/tiff-3.8.2-1/include -ID:/devel/emacs/libs/gnutls-3.0.9/include' From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 16 19:41:51 2012 Received: (at 12054) by debbugs.gnu.org; 16 Sep 2012 23:41:51 +0000 Received: from localhost ([127.0.0.1]:37679 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TDOTH-0005cY-2a for submit@debbugs.gnu.org; Sun, 16 Sep 2012 19:41:51 -0400 Received: from rcsinet15.oracle.com ([148.87.113.117]:32815) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TDOTF-0005cR-AJ for 12054@debbugs.gnu.org; Sun, 16 Sep 2012 19:41:49 -0400 Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by rcsinet15.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id q8GNeb6W029048 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for <12054@debbugs.gnu.org>; Sun, 16 Sep 2012 23:40:37 GMT Received: from acsmt357.oracle.com (acsmt357.oracle.com [141.146.40.157]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id q8GNeadW016996 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for <12054@debbugs.gnu.org>; Sun, 16 Sep 2012 23:40:36 GMT Received: from abhmt112.oracle.com (abhmt112.oracle.com [141.146.116.64]) by acsmt357.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id q8GNeaoi006325 for <12054@debbugs.gnu.org>; Sun, 16 Sep 2012 18:40:36 -0500 Received: from dradamslap1 (/10.159.170.142) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 16 Sep 2012 16:40:36 -0700 From: "Drew Adams" To: <12054@debbugs.gnu.org> References: Subject: RE: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sun, 16 Sep 2012 16:40:25 -0700 Message-ID: <4A59280025964859846FC8C6AB7A92F3@us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: Thread-Index: Ac1q8ZWiklG+0IdrSMygaC9KJ5GsPgpcwtZQ X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-Spam-Score: -7.4 (-------) X-Debbugs-Envelope-To: 12054 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -7.4 (-------) ping From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 06:53:58 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 10:53:58 +0000 Received: from localhost ([127.0.0.1]:46719 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUbMT-0000Rr-Un for submit@debbugs.gnu.org; Sat, 03 Nov 2012 06:53:58 -0400 Received: from mail-pb0-f44.google.com ([209.85.160.44]:65264) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUbMQ-0000Rh-L8 for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 06:53:55 -0400 Received: by mail-pb0-f44.google.com with SMTP id ro8so2955229pbb.3 for <12054@debbugs.gnu.org>; Sat, 03 Nov 2012 03:51:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-type:content-transfer-encoding; bh=dE+7LQ4U0KVSgAfxy9GSRkfJGFX4GCUXSmuu7bMRWBo=; b=MBnkH/yhP6ns3wRP6lpPf+MZBg4dKe2MSw+cvcu2YnYOpCQRij6wPbqwa4KqG4wime hlJg0gkolX5r/GLnMgM0bpUcr2GPJRd/5awmSkWB+kgYqcHdao15wWJKHmI+SricbjFw 7bQz2JFxhdb8mqVwFlSNjoPYpUINJcHcopbA2rUBsb4GBSqor4GK4kZCex3swWsbhq+V r9RRuyRRUIvjQIQxaDNHfRuyakyHNJvwswSf3m1i9k2NIWgx6qPrxByGqYXA6JXNPey7 0y8qCCVS15POP5z2y2/bDVkUANaXPEin6PpMqGJR+QxkDhgHPaOs1RQMK4vWtmxSpfXR vSjw== Received: by 10.66.80.66 with SMTP id p2mr12904750pax.84.1351939859145; Sat, 03 Nov 2012 03:50:59 -0700 (PDT) Received: from ulysses (cm198.gamma83.maxonline.com.sg. [202.156.83.198]) by mx.google.com with ESMTPS id a10sm7227868paz.35.2012.11.03.03.50.56 (version=SSLv3 cipher=OTHER); Sat, 03 Nov 2012 03:50:58 -0700 (PDT) From: Chong Yidong To: "Drew Adams" Subject: Re: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display References: Date: Sat, 03 Nov 2012 18:50:53 +0800 In-Reply-To: (Drew Adams's message of "Wed, 25 Jul 2012 22:43:29 -0700") Message-ID: <87mwyzyn76.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 12054 Cc: 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.1 (/) "Drew Adams" writes: > (defface foo '((t (:background "Yellow"))) "" :group 'faces) > (setq nobreak-char-display nil) > (font-lock-add-keywords nil '(("[\240]+" (0 'foo t))) 'APPEND) >=20=20 > Insert a no-break space: > C-x 8 RET no-break-space (or C-q 240 RET) >=20=20 > Turn font-lock-mode off, then back on. >=20=20 > With point before the no-break-space, C-u C-x =3D. That shows that the > character is indeed a no-break-space, and there is no face on it. "[\240]+" doesn't do what you want. Octal 240 is a unibyte character, so that string constant specifies a unibyte string. When this unibyte string is converted to multibyte, the raw byte becomes codepoint #x3ffa0. You should use either of these instead: (font-lock-add-keywords nil '(("[\u00a0]+" (0 'foo t))) 'APPEND) (font-lock-add-keywords nil '(("[=C2=A0]+" (0 'foo t))) 'APPEND) From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 07:06:50 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 11:06:50 +0000 Received: from localhost ([127.0.0.1]:46733 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUbYw-0000lA-M7 for submit@debbugs.gnu.org; Sat, 03 Nov 2012 07:06:50 -0400 Received: from mail-pb0-f44.google.com ([209.85.160.44]:54289) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUbYt-0000l0-Rb for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 07:06:48 -0400 Received: by mail-pb0-f44.google.com with SMTP id ro8so2959103pbb.3 for <12054@debbugs.gnu.org>; Sat, 03 Nov 2012 04:03:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-type; bh=k6eujSQnytL3N+bsN/IQhcfyOhmMLmws2JhTmb4Wefc=; b=jdDeb/UiKnWa+1QVyE6DO8TxFqjMPDx+lQDTZwSHU3UR+w96lLUwPkH6OVwUzk5VqN d8fO8Vo8UFKqWQ/gUNE17iuzppY9dYYhEpyznMMmBAHwsrPRbhkn7/OnOZyuqtU3waAy rZcSJBDTyJ1aE/ITMLH3xBTr4MObB8ZqtP5Rgqn7A1x2JtiDvutYst7dd5XfPInw1LBM OppkixlWyI39amHNRqMs4IVxclqyOj7H844L29lHSwOJyi+h69kqlTSUQymnwQldjeI3 4FihXyuXyxkeUqQKrea8Gg84F+XUksW25HPHNle91YzwEZJHePkUdmzhkvy0ZgrS75tg PJVA== Received: by 10.68.200.33 with SMTP id jp1mr14426627pbc.54.1351940633501; Sat, 03 Nov 2012 04:03:53 -0700 (PDT) Received: from ulysses (cm198.gamma83.maxonline.com.sg. [202.156.83.198]) by mx.google.com with ESMTPS id jw14sm7247591pbb.36.2012.11.03.04.03.50 (version=SSLv3 cipher=OTHER); Sat, 03 Nov 2012 04:03:52 -0700 (PDT) From: Chong Yidong To: "Drew Adams" Subject: Re: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display References: <87mwyzyn76.fsf@gnu.org> Date: Sat, 03 Nov 2012 19:03:48 +0800 In-Reply-To: <87mwyzyn76.fsf@gnu.org> (Chong Yidong's message of "Sat, 03 Nov 2012 18:50:53 +0800") Message-ID: <87625naqy3.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 12054 Cc: 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) Chong Yidong writes: > "[\240]+" doesn't do what you want. Octal 240 is a unibyte character, > so that string constant specifies a unibyte string. When this unibyte > string is converted to multibyte, the raw byte becomes codepoint > #x3ffa0. I've updated the docs to clarify this situation. Closing the bug. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 07:07:00 2012 Received: (at control) by debbugs.gnu.org; 3 Nov 2012 11:07:00 +0000 Received: from localhost ([127.0.0.1]:46736 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUbZ5-0000lc-Tv for submit@debbugs.gnu.org; Sat, 03 Nov 2012 07:07:00 -0400 Received: from mail-da0-f44.google.com ([209.85.210.44]:46735) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUbZ3-0000lT-M8 for control@debbugs.gnu.org; Sat, 03 Nov 2012 07:06:58 -0400 Received: by mail-da0-f44.google.com with SMTP id h15so1985917dan.3 for ; Sat, 03 Nov 2012 04:04:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:subject:date:message-id:mime-version:content-type; bh=3Z0iEBebSl2kq7D4f1/eSM0oM94gAmz5rOoZKm7jugw=; b=ZQjvpLHW4ofw8yMAunHCWn8bcDmJ0Ue8bXbR7CEdLhQA765plb1urwsV3ajybP0BoN BJFM/FrSv354eOGz6opz5vT88cUW3UvwXEWphin1oZpZJnPvXOFXjyy0IHZoItmf+w5T ToVKAWiArIGY6GPAm391lN7yvOSK8Yg01UkcpndHOLVFFFech6R+fCWVV39ZZxAjqoA4 UlYqxS8rE5XFo11oG0wlZ4sI4PPiqZWtz16FkbT6OTO5sNS2QlAUkblrGJ10ePToqs4J R4V8ceseeh/j0g/61Go/gDNgw0j5WDeN1numbh/tejMGHsheLlpqbQdOyjWnVU9HYjiK CqTw== Received: by 10.68.200.72 with SMTP id jq8mr14496717pbc.38.1351940643523; Sat, 03 Nov 2012 04:04:03 -0700 (PDT) Received: from ulysses (cm198.gamma83.maxonline.com.sg. [202.156.83.198]) by mx.google.com with ESMTPS id o10sm7241265paz.37.2012.11.03.04.04.01 (version=SSLv3 cipher=OTHER); Sat, 03 Nov 2012 04:04:02 -0700 (PDT) From: Chong Yidong To: control@debbugs.gnu.org Subject: close 12054 Date: Sat, 03 Nov 2012 19:03:58 +0800 Message-ID: <87vcdnne1t.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.1 (/) close 12054 thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 12:28:45 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 16:28:45 +0000 Received: from localhost ([127.0.0.1]:47334 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUgaS-0002Eu-OJ for submit@debbugs.gnu.org; Sat, 03 Nov 2012 12:28:45 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:36884) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUgaQ-0002Em-Fc for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 12:28:43 -0400 Received: from ucsinet22.oracle.com (ucsinet22.oracle.com [156.151.31.94]) by aserp1040.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qA3GPkAT009913 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 3 Nov 2012 16:25:46 GMT Received: from acsmt357.oracle.com (acsmt357.oracle.com [141.146.40.157]) by ucsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id qA3GPiYS002528 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Nov 2012 16:25:45 GMT Received: from abhmt117.oracle.com (abhmt117.oracle.com [141.146.116.69]) by acsmt357.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id qA3GPiC6000925; Sat, 3 Nov 2012 11:25:44 -0500 Received: from dradamslap1 (/10.159.185.65) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sat, 03 Nov 2012 09:25:44 -0700 From: "Drew Adams" To: "'Chong Yidong'" References: <87mwyzyn76.fsf@gnu.org> Subject: RE: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sat, 3 Nov 2012 09:25:35 -0700 Message-ID: <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 11 In-reply-to: <87mwyzyn76.fsf@gnu.org> Thread-Index: Ac25sR2QKsk9max0RWOKbVzjuCkbmAAKrIpw X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: ucsinet22.oracle.com [156.151.31.94] X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 12054 Cc: 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.2 (/) > > With point before the no-break-space, C-u C-x =3D. That=20 > > shows that the character is indeed a no-break-space, > > and there is no face on it. >=20 > "[\240]+" doesn't do what you want. Octal 240 is a unibyte character, > so that string constant specifies a unibyte string. When this unibyte > string is converted to multibyte, the raw byte becomes codepoint > #x3ffa0. >=20 > You should use either of these instead: > (font-lock-add-keywords nil '(("[\u00a0]+" (0 'foo t))) 'APPEND) > (font-lock-add-keywords nil '(("[=A0]+" (0 'foo t))) 'APPEND) I still have some questions. `C-q 240' and `C-x 8 RET no-break space' insert the same char. C-u C-x =3D says this about it: (codepoint 160, #o240, #xa0) And with your font-lock sexp that char is indeed highlighted as expected (yellow bg). Emacs says the char is octal 240. Just why is it that the regexp "[\240]+" does not match this char? Why = should a character-alternative expression care whether the representation is = unibyte or multibyte? Isn't that a bug? How to use octal syntax to match that char? The Elisp manual says = clearly that "The most general read syntax for a character represents the character = code in either octal or hex." MOST GENERAL, not most limited and partial. Are you saying that for regexps octal and hex are no longer "the most = general syntax", and that to represent (at least some) unicode chars in a regexp = we must use the \u... syntax? Is there no way for the `font-lock-add-keywords' = sexp to use either octal or hex here? With the current state of affairs, which you say is not bugged, how can = an Emacs version < 23 (i.e., without \u... syntax) be used to highlight the char? Shouldn't it be possible in Emacs 22 to pick up a file that has Unicode = chars and highlight them using font-lock, even if you cannot use Emacs 22 to = insert such chars? And for Emacs 20 there is not even hex syntax - shouldn't we be able to = do everything using just octal syntax, since it is supposedly "the most = general syntax"? I haven't seen your doc clarification yet, but given the questions above = I would imagine that things need to be clarified in several places of the = manual. But isn't treating this as a doc bug a bit of a cop-out? Shouldn't it = be possible to use octal syntax to match Unicode chars? From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 12:40:39 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 16:40:39 +0000 Received: from localhost ([127.0.0.1]:47356 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUgly-0002Xy-8R for submit@debbugs.gnu.org; Sat, 03 Nov 2012 12:40:38 -0400 Received: from mail-out.m-online.net ([212.18.0.9]:36196) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUglt-0002Xn-Kz for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 12:40:35 -0400 Received: from frontend1.mail.m-online.net (unknown [192.168.8.180]) by mail-out.m-online.net (Postfix) with ESMTP id 3Xv5RP4srqz4KK5t; Sat, 3 Nov 2012 17:37:37 +0100 (CET) X-Auth-Info: c2JQoGXz3hzePfJXar4zm6xK5/Ituvn4pG90o3imXE0= Received: from igel.home (ppp-88-217-99-8.dynamic.mnet-online.de [88.217.99.8]) by mail.mnet-online.de (Postfix) with ESMTPA id 3Xv5RP4GSczbbg3; Sat, 3 Nov 2012 17:37:37 +0100 (CET) Received: by igel.home (Postfix, from userid 501) id 20A31CA2A2; Sat, 3 Nov 2012 17:37:37 +0100 (CET) From: Andreas Schwab To: Chong Yidong Subject: Re: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display References: <87mwyzyn76.fsf@gnu.org> X-Yow: I have the power to HALT PRODUCTION on all TEENAGE SEX COMEDIES!! Date: Sat, 03 Nov 2012 17:37:36 +0100 In-Reply-To: <87mwyzyn76.fsf@gnu.org> (Chong Yidong's message of "Sat, 03 Nov 2012 18:50:53 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 12054 Cc: Drew Adams , 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -0.5 (/) Chong Yidong writes: > (font-lock-add-keywords nil '(("[\u00a0]+" (0 'foo t))) 'APPEND) > (font-lock-add-keywords nil '(("[ ]+" (0 'foo t))) 'APPEND) None of these need bracket expressions. (font-lock-add-keywords nil '(("\u00a0+" (0 'foo t))) 'append) (font-lock-add-keywords nil '((" +" (0 'foo t))) 'append) Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 13:00:38 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 17:00:38 +0000 Received: from localhost ([127.0.0.1]:47365 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUh5K-0003qY-H8 for submit@debbugs.gnu.org; Sat, 03 Nov 2012 13:00:38 -0400 Received: from mtaout20.012.net.il ([80.179.55.166]:47535) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUh5H-0003qP-H1 for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 13:00:37 -0400 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MCX00J008CZ0900@a-mtaout20.012.net.il> for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 18:57:01 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MCX00I188EZU150@a-mtaout20.012.net.il>; Sat, 03 Nov 2012 18:56:59 +0200 (IST) Date: Sat, 03 Nov 2012 18:56:49 +0200 From: Eli Zaretskii Subject: Re: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display In-reply-to: <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> To: Drew Adams Message-id: <83vcdm4oby.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: QUOTED-PRINTABLE X-012-Sender: halo1@inter.net.il References: <87mwyzyn76.fsf@gnu.org> <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: "Drew Adams" > Date: Sat, 3 Nov 2012 09:25:35 -0700 > Cc: 12054@debbugs.gnu.org > > Just why is it that the regexp "[\240]+" does not match this char? Because, for histerical reasons, 'insert' treats strings such as "\nnn" as unibyte strings. [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.166 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.5000] X-Debbugs-Envelope-To: 12054 Cc: cyd@gnu.org, 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: "Drew Adams" > Date: Sat, 3 Nov 2012 09:25:35 -0700 > Cc: 12054@debbugs.gnu.org > > Just why is it that the regexp "[\240]+" does not match this char? Because, for histerical reasons, 'insert' treats strings such as "\nnn" as unibyte strings. [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.166 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4993] > From: "Drew Adams" > Date: Sat, 3 Nov 2012 09:25:35 -0700 > Cc: 12054@debbugs.gnu.org >=20 > Just why is it that the regexp "[\240]+" does not match this char? Because, for histerical reasons, 'insert' treats strings such as "\nnn" as unibyte strings. > Why should a character-alternative expression care whether the > representation is unibyte or multibyte? Isn't that a bug? It's an unfortunate dark corner, due to the ambiguity of what \240 really means in a string. > How to use octal syntax to match that char? Why do you need the octal syntax? Why not just use a literal =A0? I= s that only for the sake of old Emacs versions, or for some other reason? > The Elisp manual says clearly that > "The most general read syntax for a character represents the charac= ter code in > either octal or hex." MOST GENERAL, not most limited and partial. I see no contradiction or incorrect information in this cited text. The octal notation does work in your example, it's just that its semantics is not what you expected. Or am I missing something? From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 13:08:47 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 17:08:48 +0000 Received: from localhost ([127.0.0.1]:47370 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUhDD-00040y-Gb for submit@debbugs.gnu.org; Sat, 03 Nov 2012 13:08:47 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:50733) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUhDB-00040q-36 for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 13:08:45 -0400 Received: from ucsinet21.oracle.com (ucsinet21.oracle.com [156.151.31.93]) by userp1040.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qA3H5mAh013923 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 3 Nov 2012 17:05:48 GMT Received: from acsmt356.oracle.com (acsmt356.oracle.com [141.146.40.156]) by ucsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id qA3H5kRm010038 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Nov 2012 17:05:47 GMT Received: from abhmt120.oracle.com (abhmt120.oracle.com [141.146.116.72]) by acsmt356.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id qA3H5jjR031684; Sat, 3 Nov 2012 12:05:45 -0500 Received: from dradamslap1 (/10.159.185.65) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sat, 03 Nov 2012 10:05:45 -0700 From: "Drew Adams" To: "'Andreas Schwab'" , "'Chong Yidong'" References: <87mwyzyn76.fsf@gnu.org> Subject: RE: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sat, 3 Nov 2012 10:05:36 -0700 Message-ID: <7BFE464D27324701B1F41AD4F7A344F7@us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 11 In-reply-to: Thread-Index: Ac254YtVORNVPV+mTkyNkRa+Qlbq8QAA9ovQ X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: ucsinet21.oracle.com [156.151.31.93] X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 12054 Cc: 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.2 (/) > None of these need bracket expressions. > (font-lock-add-keywords nil '(("\u00a0+" (0 'foo t))) 'append) > (font-lock-add-keywords nil '(("=A0+" (0 'foo t))) 'append) Good point. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 13:09:32 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 17:09:32 +0000 Received: from localhost ([127.0.0.1]:47374 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUhDv-00042J-Vz for submit@debbugs.gnu.org; Sat, 03 Nov 2012 13:09:32 -0400 Received: from mail-da0-f44.google.com ([209.85.210.44]:52082) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUhDs-00042A-Pk for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 13:09:30 -0400 Received: by mail-da0-f44.google.com with SMTP id h15so2064765dan.3 for <12054@debbugs.gnu.org>; Sat, 03 Nov 2012 10:06:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-type; bh=Acaug5HEWMwg4suZdxInd9P0s0gZ7nTMuKuw1EIm514=; b=hkPJ9gQqrsrOUGtLzTMAPgw8aq+Cb3+ubOXUXITvxC82/k3kt19/ycFmOA6ZsQoe+j 1tsD24731Yq25eeIsOVLv1bIynRZkrB3rRWZLv5NsebJF67hgYd5ck9LRLh0zA7S1M1o ae8YebrpYKQIrYK1OQl922FDfy/LFS1N7T9+lRiLpMa+8Ik1BTPnsdxlKb9GjoedWl+N 2tMGJZrNJ73Xe3FcpJ6nKx+tBDxR7xzyj3UJ1ImftCuAFRVtMq6UZoygUMvUPLW+9zTh RjVHpYxVoC7qU0muJo5aRfuyIcuZlwQUSSNeTp6wPAtsRuP2mrlyzf5X/l7Zx//jUOns wnXQ== Received: by 10.68.136.229 with SMTP id qd5mr16434347pbb.154.1351962393103; Sat, 03 Nov 2012 10:06:33 -0700 (PDT) Received: from ulysses (cm198.gamma83.maxonline.com.sg. [202.156.83.198]) by mx.google.com with ESMTPS id o1sm7634992paz.34.2012.11.03.10.06.30 (version=SSLv3 cipher=OTHER); Sat, 03 Nov 2012 10:06:32 -0700 (PDT) From: Chong Yidong To: "Drew Adams" Subject: Re: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display References: <87mwyzyn76.fsf@gnu.org> <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> Date: Sun, 04 Nov 2012 01:06:28 +0800 In-Reply-To: <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> (Drew Adams's message of "Sat, 3 Nov 2012 09:25:35 -0700") Message-ID: <87lieimx9n.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 12054 Cc: 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -0.7 (/) "Drew Adams" writes: > Just why is it that the regexp "[\240]+" does not match this char? > Why should a character-alternative expression care whether the > representation is unibyte or multibyte? Isn't that a bug? When \240 occurs in a unibyte string, Emacs recognizes it as an eight-bit raw byte. When converting unibyte strings to multibyte, Emacs does not "unify" eight-bit raw bytes with Unicode characters #x80-#xff; they get their own code points, in this case #x3fffa0. (One reason for doing this is to allow unibyte strings to be specified using string constants in Emacs Lisp source code.) > How to use octal syntax to match that char? The Elisp manual says > clearly that "The most general read syntax for a character represents > the character code in either octal or hex." MOST GENERAL, not most > limited and partial. I've already edited the documentation to take out this sentence. It is incorrect anyway, for the reason that octal escapes are limited to three digits. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 13:26:09 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 17:26:09 +0000 Received: from localhost ([127.0.0.1]:47387 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUhU0-0004PE-KZ for submit@debbugs.gnu.org; Sat, 03 Nov 2012 13:26:08 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:45541) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUhTy-0004P5-6f for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 13:26:06 -0400 Received: from ucsinet22.oracle.com (ucsinet22.oracle.com [156.151.31.94]) by aserp1040.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qA3HN9Uc003797 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 3 Nov 2012 17:23:10 GMT Received: from acsmt358.oracle.com (acsmt358.oracle.com [141.146.40.158]) by ucsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id qA3HN8HP009702 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Nov 2012 17:23:09 GMT Received: from abhmt114.oracle.com (abhmt114.oracle.com [141.146.116.66]) by acsmt358.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id qA3HN82S015114; Sat, 3 Nov 2012 12:23:08 -0500 Received: from dradamslap1 (/10.159.185.65) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sat, 03 Nov 2012 10:23:08 -0700 From: "Drew Adams" To: "'Eli Zaretskii'" References: <87mwyzyn76.fsf@gnu.org> <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> <83vcdm4oby.fsf@gnu.org> Subject: RE: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sat, 3 Nov 2012 10:22:59 -0700 Message-ID: <124464DE2A8F4248A1F0C156E00E815D@us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 11 In-reply-to: <83vcdm4oby.fsf@gnu.org> Thread-Index: Ac255FfHQUhqjMhrTn23YTWi+0J+EQAAS6tA X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: ucsinet22.oracle.com [156.151.31.94] X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 12054 Cc: cyd@gnu.org, 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.2 (/) > > Just why is it that the regexp "[\240]+" does not match this char? >=20 > Because, for histerical reasons, 'insert' treats strings such as > "\nnn" as unibyte strings. Sorry, I don't understand your point. My question was about the regexp = (not) matching, not about (not) being able to insert the char. I don't see a problem with inserting the char. As I said, the correct = char gets inserted AFAICT, as shown both by `C-u C-x =3D' and by Yidong's = correction of the font-lock regexp. You can insert the _same_ char using either `C-q 240' or `C-x 8 RET = no-break space', at least AFAICT (via Yidong's highlighting and via `C-u C-x = =3D'). > > Why should a character-alternative expression care whether the > > representation is unibyte or multibyte? Isn't that a bug? >=20 > It's an unfortunate dark corner, due to the ambiguity of what \240 > really means in a string. That just makes it darker for me. Can you please elaborate? > > How to use octal syntax to match that char? >=20 > Why do you need the octal syntax? Why not just use a literal =A0? Is > that only for the sake of old Emacs versions, or for some other > reason? 1. Yes, for the sake of older Emacs versions. 2. The manual says that octal syntax is the most general syntax. So one would expect that one can use it more, not less. ;-) 3. Why not? Why turn it around and speak of "need" to use it? The real question is why _not_ be able to use octal syntax here? > > The Elisp manual says clearly that > > "The most general read syntax for a character represents=20 > > the character code in either octal or hex." > > > > MOST GENERAL, not most limited and partial. >=20 > I see no contradiction or incorrect information in this cited text. > The octal notation does work in your example, it's just that its > semantics is not what you expected. Or am I missing something? Dunno whether you are missing something. I am missing how the octal = notation "works" in my example. It certainly does not highlight the char I want = to highlight, i.e., does not do what I intended. How to do that? I'm missing how to use octal notation in such a font-lock-add-keywords = sexp to match that char. IOW, my incorrect use of it doesn't do the job. = Please show me how to use octal notation to get that char highlighted. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 13:35:45 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 17:35:45 +0000 Received: from localhost ([127.0.0.1]:47394 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUhdI-0004cQ-P6 for submit@debbugs.gnu.org; Sat, 03 Nov 2012 13:35:45 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:17093) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUhdG-0004cJ-S0 for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 13:35:43 -0400 Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by aserp1040.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qA3HWk2Y008295 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 3 Nov 2012 17:32:47 GMT Received: from acsmt357.oracle.com (acsmt357.oracle.com [141.146.40.157]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id qA3HWkQ9011179 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Nov 2012 17:32:46 GMT Received: from abhmt101.oracle.com (abhmt101.oracle.com [141.146.116.53]) by acsmt357.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id qA3HWjDd025818; Sat, 3 Nov 2012 12:32:45 -0500 Received: from dradamslap1 (/10.159.185.65) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sat, 03 Nov 2012 10:32:45 -0700 From: "Drew Adams" To: "'Chong Yidong'" References: <87mwyzyn76.fsf@gnu.org><45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> <87lieimx9n.fsf@gnu.org> Subject: RE: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sat, 3 Nov 2012 10:32:36 -0700 Message-ID: <36A068647715410282C1FB4F5FC89CEC@us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-reply-to: <87lieimx9n.fsf@gnu.org> Thread-Index: Ac255ZOvlaqbUNMMR3aSGHlxwqmi5AAAnnOg X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 12054 Cc: 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.2 (/) > > Just why is it that the regexp "[\240]+" does not match this char? > > Why should a character-alternative expression care whether the > > representation is unibyte or multibyte? Isn't that a bug? > > When \240 occurs in a unibyte string, Emacs recognizes it as an > eight-bit raw byte. When converting unibyte strings to > multibyte, Emacs does not "unify" eight-bit raw bytes with > Unicode characters #x80-#xff; they get their own code points, > in this case #x3fffa0. (One reason for doing this is to allow > unibyte strings to be specified using string constants in Emacs > Lisp source code.) > > > How to use octal syntax to match that char? The Elisp manual says > > clearly that "The most general read syntax for a character > > represents the character code in either octal or hex." > > MOST GENERAL, not most limited and partial. > > I've already edited the documentation to take out this > sentence. It is incorrect anyway, for the reason that > octal escapes are limited to three digits. Hm. I admit that I do not have a grasp of this yet. I will read the updated doc when I get hold of it. You didn't answer the question "How to use..." I guess that silence indicates that it is impossible (?). Anyway, trying to put together your statement that the old text was incorrect with Eli's claim that it is still correct has me perplexed. So just what is the "most general read syntax for a char" now? And what is a general read syntax that will work also for older Emacs versions when reading Unicode chars present in a file? From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 14:03:13 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 18:03:13 +0000 Received: from localhost ([127.0.0.1]:47449 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUi3p-00067E-9l for submit@debbugs.gnu.org; Sat, 03 Nov 2012 14:03:12 -0400 Received: from mail-pa0-f44.google.com ([209.85.220.44]:62845) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUi3m-000677-MI for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 14:03:07 -0400 Received: by mail-pa0-f44.google.com with SMTP id fb11so3081273pad.3 for <12054@debbugs.gnu.org>; Sat, 03 Nov 2012 11:00:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-type; bh=TpYasE5x4LN+A2mWmnbxYLoSXoKOuVB/XPLN4fAct9o=; b=zvYKaE94P5AqJTfI8USCSw1pJty4NIyamGSAvmpTDUF6BM6vNQcrDIM8qSQtPck6GY g5TMrDJOIGnh0YakZ4S7Nc4uwZp87QKDZFcTqkW2Y6Ko5Hd3jDZQcirH23ZU+johu/HK lteZcZxOeYTQ3dhR4o2QTuMQqlhaaQRE0cySbxwGFkFkqFyEy3MMAW/L4c0TkHn1QXzV mFe7qAfOeWAerj/4jX7PvTsCJqEeAZjSM58Fya4RdGB52W/WSOlebrcu2xjtjykEGBDz 0P+0ORmEkd9//EmDnJUOOYLbaQ6NQGh71lAHfwqaHY7ki9l7bhSq0p5R2FrVxIwFNcvt IB+A== Received: by 10.68.129.227 with SMTP id nz3mr16814725pbb.111.1351965610939; Sat, 03 Nov 2012 11:00:10 -0700 (PDT) Received: from ulysses (cm198.gamma83.maxonline.com.sg. [202.156.83.198]) by mx.google.com with ESMTPS id m8sm7690404pax.38.2012.11.03.11.00.08 (version=SSLv3 cipher=OTHER); Sat, 03 Nov 2012 11:00:09 -0700 (PDT) From: Chong Yidong To: "Drew Adams" Subject: Re: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display References: <87mwyzyn76.fsf@gnu.org> <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> <87lieimx9n.fsf@gnu.org> <36A068647715410282C1FB4F5FC89CEC@us.oracle.com> Date: Sun, 04 Nov 2012 02:00:05 +0800 In-Reply-To: <36A068647715410282C1FB4F5FC89CEC@us.oracle.com> (Drew Adams's message of "Sat, 3 Nov 2012 10:32:36 -0700") Message-ID: <87mwyyk1ne.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 12054 Cc: 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -0.7 (/) "Drew Adams" writes: > So just what is the "most general read syntax for a char" now? The literal representation of the character. This should work on older Emacsen too, I think. And on Emacs >= 22, you can use \uNNNN and \U00NNNNNN escape sequences if you like. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 14:07:57 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 18:07:57 +0000 Received: from localhost ([127.0.0.1]:47456 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUi8S-0006DT-Fa for submit@debbugs.gnu.org; Sat, 03 Nov 2012 14:07:57 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:46795) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUi8R-0006DM-2q for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 14:07:55 -0400 Received: from ucsinet21.oracle.com (ucsinet21.oracle.com [156.151.31.93]) by aserp1040.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qA3I4w57022489 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 3 Nov 2012 18:04:59 GMT Received: from acsmt356.oracle.com (acsmt356.oracle.com [141.146.40.156]) by ucsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id qA3I4vLe020514 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Nov 2012 18:04:58 GMT Received: from abhmt102.oracle.com (abhmt102.oracle.com [141.146.116.54]) by acsmt356.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id qA3I4v2d020253; Sat, 3 Nov 2012 13:04:57 -0500 Received: from dradamslap1 (/10.159.185.65) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sat, 03 Nov 2012 11:04:56 -0700 From: "Drew Adams" To: "'Chong Yidong'" References: <87mwyzyn76.fsf@gnu.org><45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com><87lieimx9n.fsf@gnu.org><36A068647715410282C1FB4F5FC89CEC@us.oracle.com> <87mwyyk1ne.fsf@gnu.org> Subject: RE: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sat, 3 Nov 2012 11:04:47 -0700 Message-ID: <1B5F63574D8D49EBAF8F9EFB8849D2CF@us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-reply-to: <87mwyyk1ne.fsf@gnu.org> Thread-Index: Ac257RL06CD0ObC+RnG+SZFojW7YNgAABSOA X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: ucsinet21.oracle.com [156.151.31.93] X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 12054 Cc: 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.2 (/) > > So just what is the "most general read syntax for a char" now? > > The literal representation of the character. This should > work on older Emacsen too, I think. And on Emacs >= 22, you > can use \uNNNN and \U00NNNNNN escape sequences if you like. Got it. So I guess there is no escape syntax that will work with older Emacs versions also. (You didn't say that, but I'm guessing.) One problem with using a literal char is when you need the Lisp code to be digestible by applications that choke on such chars. That's one reason we _have_ an escape syntax. For example, uploading files containing certain control chars to certain sites can result in them being filtered out. Using escape syntax allows the actual chars in the file to be ascii. I understand that the \u and \U escape syntax fits the bill here, but not for older Emacs versions. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 15:04:41 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 19:04:41 +0000 Received: from localhost ([127.0.0.1]:47510 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUj1M-0000oh-K7 for submit@debbugs.gnu.org; Sat, 03 Nov 2012 15:04:41 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:38546) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUj1I-0000oY-MR for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 15:04:38 -0400 Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by userp1040.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qA3J1dA0003322 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 3 Nov 2012 19:01:40 GMT Received: from acsmt357.oracle.com (acsmt357.oracle.com [141.146.40.157]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id qA3J1cxn011286 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Nov 2012 19:01:39 GMT Received: from abhmt118.oracle.com (abhmt118.oracle.com [141.146.116.70]) by acsmt357.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id qA3J1cl8025876; Sat, 3 Nov 2012 14:01:38 -0500 Received: from dradamslap1 (/10.159.185.65) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sat, 03 Nov 2012 12:01:38 -0700 From: "Drew Adams" To: "'Chong Yidong'" References: <87mwyzyn76.fsf@gnu.org><45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> <87lieimx9n.fsf@gnu.org> Subject: RE: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sat, 3 Nov 2012 12:01:29 -0700 Message-ID: <0B444DBDD1D14FD7B5EDE10E30ED320D@us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-reply-to: <87lieimx9n.fsf@gnu.org> Thread-Index: Ac255ZOvlaqbUNMMR3aSGHlxwqmi5AABfZhg X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 12054 Cc: 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.2 (--) > > Just why is it that the regexp "[\240]+" does not match this char? > > Why should a character-alternative expression care whether the > > representation is unibyte or multibyte? Isn't that a bug? > > When \240 occurs in a unibyte string, Emacs recognizes it as an > eight-bit raw byte. When converting unibyte strings to > multibyte, Emacs does not "unify" eight-bit raw bytes with > Unicode characters #x80-#xff; they get their own code points, > in this case #x3fffa0. I think I understand this (but I might be misunderstanding). The \240 in the 4-char ASCII regexp string "\240" is interpreted (read?) as a raw byte, not as the char I wanted. That is, the literal string in my code is read as a string that contains only a single raw byte of octal 240 in place of the 4 chars \240 (and instead of as a string with the multibyte char no-break space). Is that right? And putting that together with Eli's statement about insertion ("'insert' treats strings such as "\nnn" as unibyte strings"), I understand that the buffer text after I type `C-q 240' contains a unibyte raw byte, and not the multibyte char no-break space. But in that case I do not understand why `C-u C-x =' says that it _is_ the Unicode no-break space char. And I do not understand why Yidong's font-lock correction also shows that it is a no-break space char. So I'm confused about what is actually in the buffer. From the doc and from Eli's statement, I gather that there is a unibyte raw byte (octal 240) at that position. But `C-u C-x =' and font-lock seem to tell me that there is a (multibyte) no-break space char there. If there is in fact a multibyte char there and the literal "\240" in my font-lock sexp results in a unibyte raw byte search, that would explain the mismatch. But I still wonder about this motivation for the treatment of \nnn in literal strings in Lisp code: > (One reason for doing this is to allow unibyte strings to > be specified using string constants in Emacs Lisp source code.) I can see how that can be useful. But I can also see how it would be useful to have some way of using octal syntax to match multibyte chars. Isn't there some reasonable way to allow for both? E.g. can I specify a multibyte string somehow, starting with octal syntax? Is there a way, for example, to use octal sytax to provide octal codes 0302 and 0240, which together define U+00AO for UTF8? [See below.] Is there, for example, (or could there be added) a function that one can apply to the unibyte string for \240 that would convert it to a string that DTRT wrt multibyte? So I could do something like this (assuming the function is available for older Emacs versions too), where `foo' is the function: (font-lock-add-keywords nil `((,(foo "\240+") (0 'foo t))) 'APPEND) >From the doc, I was thinking that perhaps `string-to-multibyte' would do the trick, i.e., (string-to-multibyte "\240+") would return "\u00a0+" or the literal Unibyte char in a multibyte string. But it returns "\240+". I can understand that the actual chars in that input string are all ASCII, so that makes sense, I guess. But I was thinking from Yidong's statement above that such a literal string in Lisp code gets read as a unibyte, raw-byte string. Since that doesn't seem to be the case here (?), is there a function that will convert "\240" (4 chars) to a string with just that one "eight-bit raw byte" char? I tried `read', but that didn't help. I hope I'm just missing something, and that there is a function (or combination of functions) to which I can pass the 4-char ASCII string "\240" (or the 8-char string "\302\240") and that will return the proper multibyte string containing the Unicode no-break space char. Ideal would be such a function that works also in older Emacs versions. ... OK, digging some more, it seems that this will do the trick: (decode-coding-string "\302\240" 'utf-8) That allows use of only octal syntax - good. But it still doesn't solve the problem for older Emacs versions - they raise the error (coding-system-error utf-8). Is there a way to use only octal syntax with older Emacs versions, so the font-locking code highlights such a Unicode char in a file/buffer? Judging by my current confusion, I am sure that my statements above must be full of misconceptions. I will be glad to be shown my misunderstanding and a simple solution. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 15:53:21 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 19:53:21 +0000 Received: from localhost ([127.0.0.1]:47561 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUjmT-0002nt-Ee for submit@debbugs.gnu.org; Sat, 03 Nov 2012 15:53:21 -0400 Received: from pruche.dit.umontreal.ca ([132.204.246.22]:40954) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUjmR-0002nl-0r for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 15:53:19 -0400 Received: from fmsmemgm.homelinux.net (lechon.iro.umontreal.ca [132.204.27.242]) by pruche.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id qA3JoL8m025043; Sat, 3 Nov 2012 15:50:21 -0400 Received: by fmsmemgm.homelinux.net (Postfix, from userid 20848) id 0201FAE224; Sat, 3 Nov 2012 15:50:19 -0400 (EDT) From: Stefan Monnier To: Eli Zaretskii Subject: Re: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Message-ID: References: <87mwyzyn76.fsf@gnu.org> <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> <83vcdm4oby.fsf@gnu.org> Date: Sat, 03 Nov 2012 15:50:19 -0400 In-Reply-To: <83vcdm4oby.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 03 Nov 2012 18:56:49 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-NAI-Spam-Flag: NO X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 1 Rules triggered RV4391=0 X-NAI-Spam-Version: 2.2.0.9309 : core <4391> : streams <851095> : uri <1259497> X-Spam-Score: -1.5 (-) X-Debbugs-Envelope-To: 12054 Cc: cyd@gnu.org, Drew Adams , 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.2 (----) > Because, for histerical reasons, 'insert' treats strings such as > "\nnn" as unibyte strings. Actually, this has nothing to do with `insert', right? It's the reader that interprets the \240 in "[\240]+" as a byte rather than a char. >> Why should a character-alternative expression care whether the >> representation is unibyte or multibyte? Isn't that a bug? There are many different ways to interpret this, and I can give you one where the behavior is explained without paying attention to multibyte/unibyte differences. \240 in your string means "the byte with octal number 0240". Stefan From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 16:05:38 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 20:05:38 +0000 Received: from localhost ([127.0.0.1]:47565 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUjyL-000357-Hx for submit@debbugs.gnu.org; Sat, 03 Nov 2012 16:05:38 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:45497) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUjyI-000350-FJ for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 16:05:36 -0400 Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by userp1040.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qA3K2aLe032175 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 3 Nov 2012 20:02:37 GMT Received: from acsmt356.oracle.com (acsmt356.oracle.com [141.146.40.156]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id qA3K2aYD018763 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Nov 2012 20:02:36 GMT Received: from abhmt108.oracle.com (abhmt108.oracle.com [141.146.116.60]) by acsmt356.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id qA3K2ZZS028356; Sat, 3 Nov 2012 15:02:35 -0500 Received: from dradamslap1 (/10.159.185.65) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sat, 03 Nov 2012 13:02:35 -0700 From: "Drew Adams" To: "'Stefan Monnier'" , "'Eli Zaretskii'" References: <87mwyzyn76.fsf@gnu.org><45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com><83vcdm4oby.fsf@gnu.org> Subject: RE: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sat, 3 Nov 2012 13:02:23 -0700 Message-ID: <008212816D764555A4B1B60717406C05@us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-reply-to: Thread-Index: Ac25/Hk05w9oueY/TzCgA69OEzZJIQAAMRZw X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 12054 Cc: cyd@gnu.org, 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.5 (/) > >> Why should a character-alternative expression care whether the > >> representation is unibyte or multibyte? Isn't that a bug? > > There are many different ways to interpret this, and I can > give you one where the behavior is explained without paying > attention to multibyte/unibyte differences. > > \240 in your string means "the byte with octal number 0240". OK, so then do you think this should DTRT? (font-lock-add-keywords nil '(("\\(\302\240\\)+" (0 'foo t))) 'APPEND) I'm guessing it shouldn't, because IIUC the buffer in fact contains only the single raw byte \240 and not the multibyte sequence of two raw bytes \302 and \240. But I barely understand this stuff at all; I mostly misunderstand it still. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 16:39:48 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 20:39:48 +0000 Received: from localhost ([127.0.0.1]:47580 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUkVQ-0003qN-86 for submit@debbugs.gnu.org; Sat, 03 Nov 2012 16:39:48 -0400 Received: from pruche.dit.umontreal.ca ([132.204.246.22]:44016) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUkVO-0003qE-00 for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 16:39:46 -0400 Received: from fmsmemgm.homelinux.net (lechon.iro.umontreal.ca [132.204.27.242]) by pruche.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id qA3Kamuw026604; Sat, 3 Nov 2012 16:36:48 -0400 Received: by fmsmemgm.homelinux.net (Postfix, from userid 20848) id E22C1AE224; Sat, 3 Nov 2012 16:36:42 -0400 (EDT) From: Stefan Monnier To: "Drew Adams" Subject: Re: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Message-ID: References: <87mwyzyn76.fsf@gnu.org> <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> <83vcdm4oby.fsf@gnu.org> <008212816D764555A4B1B60717406C05@us.oracle.com> Date: Sat, 03 Nov 2012 16:36:42 -0400 In-Reply-To: <008212816D764555A4B1B60717406C05@us.oracle.com> (Drew Adams's message of "Sat, 3 Nov 2012 13:02:23 -0700") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-NAI-Spam-Flag: NO X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 1 Rules triggered RV4391=0 X-NAI-Spam-Version: 2.2.0.9309 : core <4391> : streams <851116> : uri <1259524> X-Spam-Score: -1.5 (-) X-Debbugs-Envelope-To: 12054 Cc: 'Eli Zaretskii' , cyd@gnu.org, 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.8 (--) > OK, so then do you think this should DTRT? > (font-lock-add-keywords nil '(("\\(\302\240\\)+" (0 'foo t))) 'APPEND) That will match if your buffer contains a \302 byte or a \240 byte. "contains" is different from "is represented internally". The internal representation should normally stay hidden and only appear if you use dangerous things like string-as-multibyte or call set-buffer-multibyte in a non-empty buffer. > I'm guessing it shouldn't, because IIUC the buffer in fact contains only the > single raw byte \240 and not the multibyte sequence of two raw bytes \302 and > \240. AFAIK your buffer contains none of that. It contains a NBSP character, which is not a byte. Stefan From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 16:45:53 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 20:45:53 +0000 Received: from localhost ([127.0.0.1]:47588 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUkbI-00040v-Te for submit@debbugs.gnu.org; Sat, 03 Nov 2012 16:45:53 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:16766) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUkbE-00040m-7G for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 16:45:49 -0400 Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by aserp1040.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qA3Kgpsl028363 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 3 Nov 2012 20:42:51 GMT Received: from acsmt357.oracle.com (acsmt357.oracle.com [141.146.40.157]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id qA3KgoYK012847 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Nov 2012 20:42:50 GMT Received: from abhmt107.oracle.com (abhmt107.oracle.com [141.146.116.59]) by acsmt357.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id qA3KgoTf028862; Sat, 3 Nov 2012 15:42:50 -0500 Received: from dradamslap1 (/10.159.185.65) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sat, 03 Nov 2012 13:42:49 -0700 From: "Drew Adams" To: "'Stefan Monnier'" References: <87mwyzyn76.fsf@gnu.org><45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com><83vcdm4oby.fsf@gnu.org> <008212816D764555A4B1B60717406C05@us.oracle.com> Subject: RE: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sat, 3 Nov 2012 13:42:40 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-reply-to: Thread-Index: Ac26AvZjaZgmeVADTmKMFaG8PcdkrgAAIQ+Q X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 12054 Cc: 'Eli Zaretskii' , cyd@gnu.org, 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.2 (/) > That will match if your buffer contains a \302 byte or a \240 byte. > "contains" is different from "is represented internally". > > The internal representation should normally stay hidden So much the better. So that was a red herring, I guess. > AFAIK your buffer contains none of that. It contains a NBSP > character, which is not a byte. Yes. I was thinking about internal representation, which I'm glad I don't have to worry about. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 17:01:26 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 21:01:26 +0000 Received: from localhost ([127.0.0.1]:47622 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUkqL-0004NO-Rh for submit@debbugs.gnu.org; Sat, 03 Nov 2012 17:01:26 -0400 Received: from mtaout20.012.net.il ([80.179.55.166]:42042) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUkqJ-0004NG-A9 for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 17:01:24 -0400 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MCX00L00JJ87900@a-mtaout20.012.net.il> for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 22:57:47 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MCX00KL1JK9O0F0@a-mtaout20.012.net.il>; Sat, 03 Nov 2012 22:57:46 +0200 (IST) Date: Sat, 03 Nov 2012 22:57:36 +0200 From: Eli Zaretskii Subject: Re: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display In-reply-to: <124464DE2A8F4248A1F0C156E00E815D@us.oracle.com> X-012-Sender: halo1@inter.net.il To: Drew Adams Message-id: <83sj8q4d6n.fsf@gnu.org> References: <87mwyzyn76.fsf@gnu.org> <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> <83vcdm4oby.fsf@gnu.org> <124464DE2A8F4248A1F0C156E00E815D@us.oracle.com> X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: "Drew Adams" > Cc: , <12054@debbugs.gnu.org> > Date: Sat, 3 Nov 2012 10:22:59 -0700 > > > > Just why is it that the regexp "[\240]+" does not match this char? > > > > Because, for histerical reasons, 'insert' treats strings such as > > "\nnn" as unibyte strings. > > Sorry, I don't understand your point. My question was about the regexp (not) > matching, not about (not) being able to insert the char. [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.166 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4983] X-Debbugs-Envelope-To: 12054 Cc: cyd@gnu.org, 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: "Drew Adams" > Cc: , <12054@debbugs.gnu.org> > Date: Sat, 3 Nov 2012 10:22:59 -0700 > > > > Just why is it that the regexp "[\240]+" does not match this char? > > > > Because, for histerical reasons, 'insert' treats strings such as > > "\nnn" as unibyte strings. > > Sorry, I don't understand your point. My question was about the regexp (not) > matching, not about (not) being able to insert the char. [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.166 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4265] > From: "Drew Adams" > Cc: , <12054@debbugs.gnu.org> > Date: Sat, 3 Nov 2012 10:22:59 -0700 > > > > Just why is it that the regexp "[\240]+" does not match this char? > > > > Because, for histerical reasons, 'insert' treats strings such as > > "\nnn" as unibyte strings. > > Sorry, I don't understand your point. My question was about the regexp (not) > matching, not about (not) being able to insert the char. It doesn't matter. "\nnn" in a string is still interpreted as unibyte. > I don't see a problem with inserting the char. As I said, the correct char gets > inserted AFAICT, as shown both by `C-u C-x =' and by Yidong's correction of the > font-lock regexp. Insertion with C-q does something different. > > It's an unfortunate dark corner, due to the ambiguity of what \240 > > really means in a string. > > That just makes it darker for me. Can you please elaborate? \240 could be taken as NBPS or as a literal byte. They have different representations in Emacs and are treated differently, but are identical numerically outside of Emacs. > 3. Why not? Why turn it around and speak of "need" to use it? > The real question is why _not_ be able to use octal syntax here? For the same reason you'd use ?a and not \141: it's more clear to the human reader. Using octal escapes for non-ASCII characters in Emacs is deprecated and dangerous. You just bumped into one danger; there are more. I suggest you avoid this notation as much as you can. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 17:04:27 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 21:04:27 +0000 Received: from localhost ([127.0.0.1]:47626 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUktH-0004Ri-3s for submit@debbugs.gnu.org; Sat, 03 Nov 2012 17:04:27 -0400 Received: from mtaout20.012.net.il ([80.179.55.166]:42825) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUktE-0004RX-T2 for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 17:04:25 -0400 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MCX00L00JO57P00@a-mtaout20.012.net.il> for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 23:00:45 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MCX00KFKJP8O0G0@a-mtaout20.012.net.il>; Sat, 03 Nov 2012 23:00:45 +0200 (IST) Date: Sat, 03 Nov 2012 23:00:35 +0200 From: Eli Zaretskii Subject: Re: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display In-reply-to: <87mwyyk1ne.fsf@gnu.org> X-012-Sender: halo1@inter.net.il To: Chong Yidong Message-id: <83r4oa4d1o.fsf@gnu.org> References: <87mwyzyn76.fsf@gnu.org> <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> <87lieimx9n.fsf@gnu.org> <36A068647715410282C1FB4F5FC89CEC@us.oracle.com> <87mwyyk1ne.fsf@gnu.org> X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Chong Yidong > Date: Sun, 04 Nov 2012 02:00:05 +0800 > Cc: 12054@debbugs.gnu.org > > "Drew Adams" writes: > > > So just what is the "most general read syntax for a char" now? > > The literal representation of the character. This should work on older > Emacsen too, I think. [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.166 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.5000] X-Debbugs-Envelope-To: 12054 Cc: drew.adams@oracle.com, 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Chong Yidong > Date: Sun, 04 Nov 2012 02:00:05 +0800 > Cc: 12054@debbugs.gnu.org > > "Drew Adams" writes: > > > So just what is the "most general read syntax for a char" now? > > The literal representation of the character. This should work on older > Emacsen too, I think. [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.166 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4977] > From: Chong Yidong > Date: Sun, 04 Nov 2012 02:00:05 +0800 > Cc: 12054@debbugs.gnu.org > > "Drew Adams" writes: > > > So just what is the "most general read syntax for a char" now? > > The literal representation of the character. This should work on older > Emacsen too, I think. It doesn't, AFAIR: in Emacs before v23, an NBSP would be decoded into a different internal representation depending on the encoding of the file from which it is read. That encoding could be explicit, using the coding: cookie, or implicit, based on the current locale. But in any case, the result will only match NBSP in the same charset. E.g., if \240 was decoded into a Latin-2 NBSP, it will not match a Latin-1 NBSP. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 03 17:16:51 2012 Received: (at 12054) by debbugs.gnu.org; 3 Nov 2012 21:16:51 +0000 Received: from localhost ([127.0.0.1]:47637 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUl5G-0004jP-Ja for submit@debbugs.gnu.org; Sat, 03 Nov 2012 17:16:50 -0400 Received: from mtaout21.012.net.il ([80.179.55.169]:50357) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TUl5E-0004jF-73 for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 17:16:49 -0400 Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0MCX00G00K4DX400@a-mtaout21.012.net.il> for 12054@debbugs.gnu.org; Sat, 03 Nov 2012 23:13:50 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MCX00GWVKB2VI50@a-mtaout21.012.net.il>; Sat, 03 Nov 2012 23:13:50 +0200 (IST) Date: Sat, 03 Nov 2012 23:13:40 +0200 From: Eli Zaretskii Subject: Re: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display In-reply-to: <0B444DBDD1D14FD7B5EDE10E30ED320D@us.oracle.com> X-012-Sender: halo1@inter.net.il To: Drew Adams Message-id: <83pq3u4cfv.fsf@gnu.org> References: <87mwyzyn76.fsf@gnu.org> <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> <87lieimx9n.fsf@gnu.org> <0B444DBDD1D14FD7B5EDE10E30ED320D@us.oracle.com> X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: "Drew Adams" > Date: Sat, 3 Nov 2012 12:01:29 -0700 > Cc: 12054@debbugs.gnu.org > > I think I understand this (but I might be misunderstanding). The \240 in the > 4-char ASCII regexp string "\240" is interpreted (read?) as a raw byte, not as > the char I wanted. [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.169 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4981] X-Debbugs-Envelope-To: 12054 Cc: cyd@gnu.org, 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.7 (/) > From: "Drew Adams" > Date: Sat, 3 Nov 2012 12:01:29 -0700 > Cc: 12054@debbugs.gnu.org > > I think I understand this (but I might be misunderstanding). The \240 in the > 4-char ASCII regexp string "\240" is interpreted (read?) as a raw byte, not as > the char I wanted. Yes. > That is, the literal string in my code is read as a string that contains only a > single raw byte of octal 240 in place of the 4 chars \240 (and instead of as a > string with the multibyte char no-break space). Is that right? Yes. > And putting that together with Eli's statement about insertion ("'insert' treats > strings such as "\nnn" as unibyte strings"), I understand that the buffer text > after I type `C-q 240' contains a unibyte raw byte, and not the multibyte char > no-break space. No. It contains the NBSP. Try it. C-q inserts a multibyte character, unlike '(insert "\240")', for example. > But in that case I do not understand why `C-u C-x =' says that it _is_ the > Unicode no-break space char. Because it is. > And I do not understand why Yidong's font-lock correction also shows > that it is a no-break space char. Chong didn't use "\240". > So I'm confused about what is actually in the buffer. From the doc and from > Eli's statement, I gather that there is a unibyte raw byte (octal 240) at that > position. But `C-u C-x =' and font-lock seem to tell me that there is a > (multibyte) no-break space char there. Try '(insert "\240")' and then "C-x =" will show a unibyte byte. > > (One reason for doing this is to allow unibyte strings to > > be specified using string constants in Emacs Lisp source code.) > > I can see how that can be useful. But I can also see how it would be useful to > have some way of using octal syntax to match multibyte chars. Isn't there some > reasonable way to allow for both? Maybe, but we didn't find one, at least not one that would be backward-compatible. > Is there, for example, (or could there be added) a function that one can apply > to the unibyte string for \240 that would convert it to a string that DTRT wrt > multibyte? Such functions do exist, see the "Converting Representations" node in the ELisp manual. > (decode-coding-string "\302\240" 'utf-8) > > That allows use of only octal syntax - good. But it still doesn't solve the > problem for older Emacs versions - they raise the error (coding-system-error > utf-8). You don't want this, because even if you succeed in producing a NBSP in Emacs 22 and older, the result will not match NBSP in other charsets. It's simply impossible with those versions of Emacs. From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 04 18:37:40 2012 Received: (at 12054) by debbugs.gnu.org; 4 Nov 2012 23:37:40 +0000 Received: from localhost ([127.0.0.1]:49414 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TV9l5-0003Mk-PQ for submit@debbugs.gnu.org; Sun, 04 Nov 2012 18:37:40 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:29478) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TV9l3-0003Mc-1m for 12054@debbugs.gnu.org; Sun, 04 Nov 2012 18:37:37 -0500 Received: from ucsinet22.oracle.com (ucsinet22.oracle.com [156.151.31.94]) by userp1040.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id qA4NYXQB013193 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 4 Nov 2012 23:34:34 GMT Received: from acsmt357.oracle.com (acsmt357.oracle.com [141.146.40.157]) by ucsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id qA4NYWva015682 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 4 Nov 2012 23:34:33 GMT Received: from abhmt113.oracle.com (abhmt113.oracle.com [141.146.116.65]) by acsmt357.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id qA4NYWCX001281; Sun, 4 Nov 2012 17:34:32 -0600 Received: from dradamslap1 (/10.159.185.1) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sun, 04 Nov 2012 15:34:32 -0800 From: "Drew Adams" To: "'Eli Zaretskii'" References: <87mwyzyn76.fsf@gnu.org> <45DEAA69BC6E4630BA8DA0B07A0ECE92@us.oracle.com> <87lieimx9n.fsf@gnu.org> <0B444DBDD1D14FD7B5EDE10E30ED320D@us.oracle.com> <83pq3u4cfv.fsf@gnu.org> Subject: RE: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sun, 4 Nov 2012 15:34:20 -0800 Message-ID: <9AE79C1A519B43A98B0DEF0142363C1F@us.oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <83pq3u4cfv.fsf@gnu.org> Thread-Index: Ac26CCBq7rQLygClQq+bXsfwQvRtZQA27MBg X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Source-IP: ucsinet22.oracle.com [156.151.31.94] X-Spam-Score: 0.4 (/) X-Debbugs-Envelope-To: 12054 Cc: cyd@gnu.org, 12054@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.4 (/) > > That is, the literal string in my code is read as a string > > that contains only a single raw byte of octal 240 in place > > of the 4 chars \240 (and instead of as a string with the > > multibyte char no-break space). Is that right? > > Yes. > > > And putting that together with Eli's statement about > > insertion ("'insert' treats strings such as "\nnn" as > > unibyte strings"), I understand that the buffer text > > after I type `C-q 240' contains a unibyte raw byte, and > > not the multibyte char no-break space. > > No. It contains the NBSP. Try it. Well, I was saying since the beginning tha that appeared to be the case. But you replied that insertion inserted a raw \240 byte. That red herring threw me off. > C-q inserts a multibyte character, unlike '(insert "\240")', for example. Thanks, I finally got that from what Stefan said. It would have been clearer if you had said that from the beginning, since I mentioned `C-q' and you replied instead about "insert". Anyway, I understand now. > Try '(insert "\240")' and then "C-x =" will show a unibyte byte. Yes, I got it (from Stefan's reply). But no one mentioned using `insert' or insertion, except you. I know you were trying to help, but that just confused things, for me. > > I can see how that can be useful. But I can also see how > > it would be useful to have some way of using octal syntax to > > match multibyte chars. Isn't there some reasonable way to > > allow for both? > > Maybe, but we didn't find one, at least not one that would be > backward-compatible. OK, that was my question. Thx. > > (decode-coding-string "\302\240" 'utf-8) > > > > That allows use of only octal syntax - good. But it still > > doesn't solve the problem for older Emacs versions - they > > raise the error (coding-system-error utf-8). > > You don't want this, because even if you succeed in producing a NBSP > in Emacs 22 and older, the result will not match NBSP in other > charsets. It's simply impossible with those versions of Emacs. Got it. That is the bottom line - the answer to my question. Thx to all who took the time to help me understand better. From unknown Fri Jun 20 19:54:36 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 03 Dec 2012 12:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator