From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 08 11:06:48 2014 Received: (at submit) by debbugs.gnu.org; 8 Apr 2014 15:06:49 +0000 Received: from localhost ([127.0.0.1]:40857 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WXXbs-0002b6-0W for submit@debbugs.gnu.org; Tue, 08 Apr 2014 11:06:48 -0400 Received: from eggs.gnu.org ([208.118.235.92]:41197) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WXXbp-0002ay-BE for submit@debbugs.gnu.org; Tue, 08 Apr 2014 11:06:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WXXbg-0002un-5S for submit@debbugs.gnu.org; Tue, 08 Apr 2014 11:06:45 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_20 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:37760) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WXXbg-0002uj-31 for submit@debbugs.gnu.org; Tue, 08 Apr 2014 11:06:36 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42837) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WXXbZ-0001BF-UA for bug-grep@gnu.org; Tue, 08 Apr 2014 11:06:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WXXbT-0002qn-Pb for bug-grep@gnu.org; Tue, 08 Apr 2014 11:06:29 -0400 Received: from kiwi.cs.ucla.edu ([131.179.128.19]:50682) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WXXbT-0002lw-HT for bug-grep@gnu.org; Tue, 08 Apr 2014 11:06:23 -0400 Received: from kiwi.cs.ucla.edu (localhost.cs.ucla.edu [127.0.0.1]) by kiwi.cs.ucla.edu (8.14.5+Sun/8.14.5/UCLACS-6.0) with ESMTP id s38F6FKC020941 for ; Tue, 8 Apr 2014 08:06:15 -0700 (PDT) Received: (from eggert@localhost) by kiwi.cs.ucla.edu (8.14.5+Sun/8.14.5/Submit) id s38F6FPY020940 for bug-grep@gnu.org; Tue, 8 Apr 2014 08:06:15 -0700 (PDT) Message-Id: <201404081506.s38F6FPY020940@kiwi.cs.ucla.edu> From: Paul Eggert Date: Tue, 8 Apr 2014 08:04:36 -0700 Subject: [PATCH] grep: port better to hosts with nonstandard nl_langinfo To: bug-grep@gnu.org X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) On some hosts, nl_langinfo returns strings other than "UTF-8" when UTF-8 is used, and (worse) return "UTF-8" even if the encoding is single-byte. Work around these problems by trying a sample character instead. * src/dfa.c, src/pcresearch.c, src/searchutils.c: Don't include . * src/dfa.c (using_utf8): Test for UTF-8 by trying a character rather than by invoking nl_langinfo (CODESET); this is more portable in practice, and removes a dependency on HAVE_LANGINFO_CODESET. * src/pcresearch.c: Include dfa.h, for using_utf8. (Pcompile): Use using_utf8 rather than nl_langinfo. --- src/dfa.c | 14 +++++--------- src/pcresearch.c | 8 ++------ src/searchutils.c | 3 --- 3 files changed, 7 insertions(+), 18 deletions(-) diff --git a/src/dfa.c b/src/dfa.c index 76f7e79..34f230e 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -60,10 +60,6 @@ typedef bool bool_bf; #include #include -#if HAVE_LANGINFO_CODESET -# include -#endif - #include "xalloc.h" /* HPUX defines these as macros in sys/param.h. */ @@ -819,14 +815,14 @@ setbit_case_fold_c (int b, charclass c) int using_utf8 (void) { -#ifdef HAVE_LANGINFO_CODESET static int utf8 = -1; if (utf8 < 0) - utf8 = STREQ (nl_langinfo (CODESET), "UTF-8"); + { + wchar_t wc; + mbstate_t mbs = { 0 }; + utf8 = mbrtowc (&wc, "\xc4\x80", 2, &mbs) == 2 && wc == 0x100; + } return utf8; -#else - return 0; -#endif } /* Return true if the current locale is known to be a unibyte locale diff --git a/src/pcresearch.c b/src/pcresearch.c index 319155f..a5e953f 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -20,14 +20,12 @@ #include #include "search.h" +#include "dfa.h" #if HAVE_PCRE_H # include #elif HAVE_PCRE_PCRE_H # include #endif -#if HAVE_LANGINFO_CODESET -# include -#endif #if HAVE_LIBPCRE /* Compiled internal form of a Perl regular expression. */ @@ -60,14 +58,12 @@ Pcompile (char const *pattern, size_t size) char const *p; char const *pnul; -# if defined HAVE_LANGINFO_CODESET - if (STREQ (nl_langinfo (CODESET), "UTF-8")) + if (using_utf8 ()) { /* Enable PCRE's UTF-8 matching. Note also the use of PCRE_NO_UTF8_CHECK when calling pcre_extra, below. */ flags |= PCRE_UTF8; } -# endif /* FIXME: Remove these restrictions. */ if (memchr (pattern, '\n', size)) diff --git a/src/searchutils.c b/src/searchutils.c index 6749945..6440f07 100644 --- a/src/searchutils.c +++ b/src/searchutils.c @@ -20,9 +20,6 @@ #include #include "search.h" #include "dfa.h" -#if HAVE_LANGINFO_CODESET -# include -#endif #define NCHAR (UCHAR_MAX + 1) -- 1.9.0 From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 08 11:09:00 2014 Received: (at control) by debbugs.gnu.org; 8 Apr 2014 15:09:00 +0000 Received: from localhost ([127.0.0.1]:40862 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WXXe0-0002fH-34 for submit@debbugs.gnu.org; Tue, 08 Apr 2014 11:09:00 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:56291) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WXXdx-0002f5-Jj for control@debbugs.gnu.org; Tue, 08 Apr 2014 11:08:58 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id E5497A60001 for ; Tue, 8 Apr 2014 08:08:56 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f42laIAxW20m for ; Tue, 8 Apr 2014 08:08:48 -0700 (PDT) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 68B3639E8015 for ; Tue, 8 Apr 2014 08:08:48 -0700 (PDT) Message-ID: <53441100.9040001@cs.ucla.edu> Date: Tue, 08 Apr 2014 08:08:48 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: control@debbugs.gnu.org Subject: 17221 is applied Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.6 (--) tags 17221 + patch close 17221 thanks From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 08 13:42:20 2014 Received: (at 17221) by debbugs.gnu.org; 8 Apr 2014 17:42:20 +0000 Received: from localhost ([127.0.0.1]:38040 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WXa2O-000260-4c for submit@debbugs.gnu.org; Tue, 08 Apr 2014 13:42:20 -0400 Received: from mail-pb0-f41.google.com ([209.85.160.41]:33340) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WXa2K-00025l-T2 for 17221@debbugs.gnu.org; Tue, 08 Apr 2014 13:42:17 -0400 Received: by mail-pb0-f41.google.com with SMTP id jt11so1353013pbb.28 for <17221@debbugs.gnu.org>; Tue, 08 Apr 2014 10:42:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=7emJtFc+LsbVAUurF7BqrZ/YSg+qYfPd7SALHdZV2fo=; b=F9o0rYdAX1MOIGMdqi3t6NvrkqxhNvYNLTBk6mJQAMTd47wNL3HT1VeuknCF+RuyZF L9vQM0Oc8FawKidE6zxFj96MopgD5xowCq8Uq7LrxHwo9doV0PVXX/d6p9aK1S7qSk0X JsS+CsrsiQCL+RSOFF5rxVwBrJqZ7ixkLE5AK5G/mERvIGnHJqY9gd45nHulq/4DWPGF +3r2z/cMW5/D52hX73XlkquGrV/TSx/9oCuzZNDtibO7HcF85rcLsWVLgAsZ1ljxquPq ZoigSFPUdEfgoYc7XhTk2ywZncRsmewRn7xs6XIj5KTqIi5MAYg0p2ruwh03tslf5r4I gsNw== X-Received: by 10.68.137.136 with SMTP id qi8mr6139622pbb.79.1396978931080; Tue, 08 Apr 2014 10:42:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Tue, 8 Apr 2014 10:41:51 -0700 (PDT) In-Reply-To: <201404081506.s38F6FPY020940@kiwi.cs.ucla.edu> References: <201404081506.s38F6FPY020940@kiwi.cs.ucla.edu> From: Jim Meyering Date: Tue, 8 Apr 2014 10:41:51 -0700 X-Google-Sender-Auth: duGZmAKGCtUei6lmH9WouJS-hik Message-ID: Subject: Re: bug#17221: [PATCH] grep: port better to hosts with nonstandard nl_langinfo To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 17221 Cc: 17221@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Tue, Apr 8, 2014 at 8:04 AM, Paul Eggert wrote: > On some hosts, nl_langinfo returns strings other than "UTF-8" when > UTF-8 is used, and (worse) return "UTF-8" even if the encoding is > single-byte. Work around these problems by trying a sample > character instead. > * src/dfa.c, src/pcresearch.c, src/searchutils.c: > Don't include . > * src/dfa.c (using_utf8): Test for UTF-8 by trying a character > rather than by invoking nl_langinfo (CODESET); this is more > portable in practice, and removes a dependency on > HAVE_LANGINFO_CODESET. > * src/pcresearch.c: Include dfa.h, for using_utf8. > (Pcompile): Use using_utf8 rather than nl_langinfo. Nicely done. And thanks for handling so many of Norihiro's patches. From unknown Sat Aug 16 22:47:40 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 07 May 2014 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator