GNU bug report logs - #72524
how does grep determine locale if no LC environment variables are set

Previous Next

Package: grep;

Reported by: <mark.yagnatinsky <at> barclays.com>

Date: Thu, 8 Aug 2024 12:55:02 UTC

Severity: normal

To reply to this bug, email your comments to 72524 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#72524; Package grep. (Thu, 08 Aug 2024 12:55:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to <mark.yagnatinsky <at> barclays.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Thu, 08 Aug 2024 12:55:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: <mark.yagnatinsky <at> barclays.com>
To: <bug-grep <at> gnu.org>
Subject: how does grep determine locale if no LC environment variables are set
Date: Thu, 8 Aug 2024 12:53:08 +0000
[Message part 1 (text/plain, inline)]
I ran into an odd issue... the workaround is easy enough but the issue is weird.
In case this relevant, my grep coms from git bash.  (which I think is mostly Cygwin? (or maybe msys2??))
Anyway, grep -P doesn't work if no LC vars are set, and complains that it only works in unibyte locales or UTF-8.
Normally, the git bash mintty launcher sets LC_CTYPE to en_us.UTF-8 but not if I bypass the launcher and run grep directly.
Here's the weird part, if I ask /usr/bin/locale what LC_TYPE "should" be, it says C.UTF-8.
If I run grep with C.UTF-8 then it also works.  So it must be deriving a default locale an different way.
But how?

Thanks for any pointers.
Mark.

This message is for information purposes only. It is not a recommendation, advice, offer or solicitation to buy or sell a product or service, nor an official confirmation of any transaction. It is directed at persons who are professionals and is intended for the recipient(s) only. It is not directed at retail customers. This message is subject to the terms at: https://www.ib.barclays/disclosures/web-and-email-disclaimer.html. 

For important disclosures, please see: https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding marketing commentary from Barclays Sales and/or Trading desks, who are active market participants; https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html regarding our standard terms for Barclays Investment Bank where we trade with you in principal-to-principal wholesale markets transactions; and in respect to Barclays Research, including disclosures relating to specific issuers, see: https://publicresearch.barclays.com.
__________________________________________________________________________________ 
If you are incorporated or operating in Australia, read these important disclosures: https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__________________________________________________________________________________
For more details about how we use personal information, see our privacy notice: https://www.ib.barclays/disclosures/personal-information-use.html. 
__________________________________________________________________________________
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#72524; Package grep. (Thu, 08 Aug 2024 16:11:01 GMT) Full text and rfc822 format available.

Message #8 received at 72524 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: mark.yagnatinsky <at> barclays.com
Cc: 72524 <at> debbugs.gnu.org
Subject: Re: bug#72524: how does grep determine locale if no LC environment
 variables are set
Date: Thu, 8 Aug 2024 09:09:49 -0700
On 2024-08-08 05:53, mark.yagnatinsky--- via Bug reports for GNU grep wrote:
> I ran into an odd issue... the workaround is easy enough but the issue is weird.
> In case this relevant, my grep coms from git bash.  (which I think is mostly Cygwin? (or maybe msys2??))
> Anyway, grep -P doesn't work if no LC vars are set, and complains that it only works in unibyte locales or UTF-8.
> Normally, the git bash mintty launcher sets LC_CTYPE to en_us.UTF-8 but not if I bypass the launcher and run grep directly.
> Here's the weird part, if I ask /usr/bin/locale what LC_TYPE "should" be, it says C.UTF-8.
> If I run grep with C.UTF-8 then it also works.  So it must be deriving a default locale an different way.

My guess is that your default environment says that it supports UTF-8, 
but it doesn't support it well enough to pass grep's test; see 
grep/lib/localeinfo.c's is_using_utf8. If my guess is right, you may be 
encountering subtle bugs in programs other than grep.

When you say "I run grep with C.UTF-8" how exactly do you do that?

Is there any difference in output between these two shell commands?

localeinfo
LC_ALL=C.UTF-8 localeinfo

If you have a debugger, you might look into why is_using_utf8 returns 
false in your default locale.




Information forwarded to bug-grep <at> gnu.org:
bug#72524; Package grep. (Thu, 08 Aug 2024 20:55:01 GMT) Full text and rfc822 format available.

Message #11 received at 72524 <at> debbugs.gnu.org (full text, mbox):

From: <mark.yagnatinsky <at> barclays.com>
To: <eggert <at> cs.ucla.edu>
Cc: 72524 <at> debbugs.gnu.org
Subject: RE: bug#72524: how does grep determine locale if no LC environment
 variables are set
Date: Thu, 8 Aug 2024 19:24:51 +0000
Re: how am I doing that ... via bash, just like the way you suggested I run "locale" the second time:
LC_CTYPE=C.UTF-8 grep -P needle haystack.txt  # just CTYPE seems to be enough, no need for ALL

Re: output difference for locale:
First way, everything is C.UTF-8 except LANG and LC_ALL, which are blank
Second way: same thing, except LC_ALL is no longer blank

Re: is_using_utf8 ... It relies on mbrtowc, which in turn relies on the current locale.
It seems that this function should NEVER return false in a UTF-8 locale.
But how does grep decide what the locale even is?
Presumably it must call setlocale at some point, or else it would be using the C locale, which is surely a unibyte locale.
Does it just do the obvious thing and call it with the empty string?
If so, is there any good way to find out what locale it actually got "resolved" to?

-----Original Message-----
From: Paul Eggert <eggert <at> cs.ucla.edu> 
Sent: Thursday, August 8, 2024 12:10 PM
To: Yagnatinsky, Mark : IT (NYK) <mark.yagnatinsky <at> barclays.com>
Cc: 72524 <at> debbugs.gnu.org
Subject: Re: bug#72524: how does grep determine locale if no LC environment variables are set

 CAUTION:  This email originated from outside our organization - eggert <at> cs.ucla.edu  Do not click on links, open attachments, or respond unless you recognize the sender and can validate the content is safe. 

______________________________________________________________________
On 2024-08-08 05:53, mark.yagnatinsky--- via Bug reports for GNU grep wrote:
> I ran into an odd issue... the workaround is easy enough but the issue is weird.
> In case this relevant, my grep coms from git bash.  (which I think is 
> mostly Cygwin? (or maybe msys2??)) Anyway, grep -P doesn't work if no LC vars are set, and complains that it only works in unibyte locales or UTF-8.
> Normally, the git bash mintty launcher sets LC_CTYPE to en_us.UTF-8 but not if I bypass the launcher and run grep directly.
> Here's the weird part, if I ask /usr/bin/locale what LC_TYPE "should" be, it says C.UTF-8.
> If I run grep with C.UTF-8 then it also works.  So it must be deriving a default locale an different way.

My guess is that your default environment says that it supports UTF-8, but it doesn't support it well enough to pass grep's test; see grep/lib/localeinfo.c's is_using_utf8. If my guess is right, you may be encountering subtle bugs in programs other than grep.

When you say "I run grep with C.UTF-8" how exactly do you do that?

Is there any difference in output between these two shell commands?

localeinfo
LC_ALL=C.UTF-8 localeinfo

If you have a debugger, you might look into why is_using_utf8 returns false in your default locale.

This message is for information purposes only. It is not a recommendation, advice, offer or solicitation to buy or sell a product or service, nor an official confirmation of any transaction. It is directed at persons who are professionals and is intended for the recipient(s) only. It is not directed at retail customers. This message is subject to the terms at: https://www.ib.barclays/disclosures/web-and-email-disclaimer.html. 

For important disclosures, please see: https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding marketing commentary from Barclays Sales and/or Trading desks, who are active market participants; https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html regarding our standard terms for Barclays Investment Bank where we trade with you in principal-to-principal wholesale markets transactions; and in respect to Barclays Research, including disclosures relating to specific issuers, see: https://publicresearch.barclays.com.
__________________________________________________________________________________ 
If you are incorporated or operating in Australia, read these important disclosures: https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__________________________________________________________________________________
For more details about how we use personal information, see our privacy notice: https://www.ib.barclays/disclosures/personal-information-use.html. 
__________________________________________________________________________________

Information forwarded to bug-grep <at> gnu.org:
bug#72524; Package grep. (Fri, 09 Aug 2024 21:58:01 GMT) Full text and rfc822 format available.

Message #14 received at 72524 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: mark.yagnatinsky <at> barclays.com
Cc: 72524 <at> debbugs.gnu.org
Subject: Re: bug#72524: how does grep determine locale if no LC environment
 variables are set
Date: Fri, 9 Aug 2024 14:57:15 -0700
On 2024-08-08 12:24, mark.yagnatinsky <at> barclays.com wrote:
> Re: how am I doing that ... via bash, just like the way you suggested I run "locale" the second time:
> LC_CTYPE=C.UTF-8 grep -P needle haystack.txt  # just CTYPE seems to be enough, no need for ALL

As an aside, I wouldn't mess with LC_CTYPE independently. One can get 
into trouble if the LC_CTYPE locale disagrees with the others. However, 
I don't think that's your problem.


> Re: is_using_utf8 ... It relies on mbrtowc, which in turn relies on the current locale.
> It seems that this function should NEVER return false in a UTF-8 locale.

Correct.

> But how does grep decide what the locale even is?
> Presumably it must call setlocale at some point, or else it would be using the C locale, which is surely a unibyte locale.

Correct, it calls 'setlocale (LC_ALL, "")' first thing.


> is there any good way to find out what locale it actually got "resolved" to?

You could modify the source code to add a call like this:

   fprintf (stderr, "grep: locale is %s\n", setlocale (LC_ALL, 0));

after the earlier call to setlocale. Or you could run 'setlocale 
(LC_ALL, 0)' in a debugger.




Information forwarded to bug-grep <at> gnu.org:
bug#72524; Package grep. (Fri, 09 Aug 2024 22:01:01 GMT) Full text and rfc822 format available.

Message #17 received at 72524 <at> debbugs.gnu.org (full text, mbox):

From: <mark.yagnatinsky <at> barclays.com>
To: <eggert <at> cs.ucla.edu>
Cc: 72524 <at> debbugs.gnu.org
Subject: RE: bug#72524: how does grep determine locale if no LC environment
 variables are set
Date: Fri, 9 Aug 2024 21:59:14 +0000
Re: debugger: will that work if grep wasn't built with debug symbols?  I think this one wasn't.

-----Original Message-----
From: Paul Eggert <eggert <at> cs.ucla.edu> 
Sent: Friday, August 9, 2024 5:57 PM
To: Yagnatinsky, Mark : IT (NYK) <mark.yagnatinsky <at> barclays.com>
Cc: 72524 <at> debbugs.gnu.org
Subject: Re: bug#72524: how does grep determine locale if no LC environment variables are set

 CAUTION:  This email originated from outside our organization - eggert <at> cs.ucla.edu  Do not click on links, open attachments, or respond unless you recognize the sender and can validate the content is safe. 

______________________________________________________________________
On 2024-08-08 12:24, mark.yagnatinsky <at> barclays.com wrote:
> Re: how am I doing that ... via bash, just like the way you suggested I run "locale" the second time:
> LC_CTYPE=C.UTF-8 grep -P needle haystack.txt  # just CTYPE seems to be 
> enough, no need for ALL

As an aside, I wouldn't mess with LC_CTYPE independently. One can get into trouble if the LC_CTYPE locale disagrees with the others. However, I don't think that's your problem.


> Re: is_using_utf8 ... It relies on mbrtowc, which in turn relies on the current locale.
> It seems that this function should NEVER return false in a UTF-8 locale.

Correct.

> But how does grep decide what the locale even is?
> Presumably it must call setlocale at some point, or else it would be using the C locale, which is surely a unibyte locale.

Correct, it calls 'setlocale (LC_ALL, "")' first thing.


> is there any good way to find out what locale it actually got "resolved" to?

You could modify the source code to add a call like this:

    fprintf (stderr, "grep: locale is %s\n", setlocale (LC_ALL, 0));

after the earlier call to setlocale. Or you could run 'setlocale (LC_ALL, 0)' in a debugger.

This message is for information purposes only. It is not a recommendation, advice, offer or solicitation to buy or sell a product or service, nor an official confirmation of any transaction. It is directed at persons who are professionals and is intended for the recipient(s) only. It is not directed at retail customers. This message is subject to the terms at: https://www.ib.barclays/disclosures/web-and-email-disclaimer.html. 

For important disclosures, please see: https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding marketing commentary from Barclays Sales and/or Trading desks, who are active market participants; https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html regarding our standard terms for Barclays Investment Bank where we trade with you in principal-to-principal wholesale markets transactions; and in respect to Barclays Research, including disclosures relating to specific issuers, see: https://publicresearch.barclays.com.
__________________________________________________________________________________ 
If you are incorporated or operating in Australia, read these important disclosures: https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__________________________________________________________________________________
For more details about how we use personal information, see our privacy notice: https://www.ib.barclays/disclosures/personal-information-use.html. 
__________________________________________________________________________________

Information forwarded to bug-grep <at> gnu.org:
bug#72524; Package grep. (Fri, 09 Aug 2024 22:03:02 GMT) Full text and rfc822 format available.

Message #20 received at 72524 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: mark.yagnatinsky <at> barclays.com
Cc: 72524 <at> debbugs.gnu.org
Subject: Re: bug#72524: how does grep determine locale if no LC environment
 variables are set
Date: Fri, 9 Aug 2024 15:02:11 -0700
On 2024-08-09 14:59, mark.yagnatinsky <at> barclays.com wrote:
> Re: debugger: will that work if grep wasn't built with debug symbols?

Only if you're reasonably experienced with machine code. It's not fun, 
but it's doable.

You could write your own little test program that simply does 'setlocale 
(LC_ALL, "")' followed by the fprintf code I sent you earlier, and debug 
that or see what it does.




Information forwarded to bug-grep <at> gnu.org:
bug#72524; Package grep. (Fri, 09 Aug 2024 22:10:01 GMT) Full text and rfc822 format available.

Message #23 received at 72524 <at> debbugs.gnu.org (full text, mbox):

From: <mark.yagnatinsky <at> barclays.com>
To: <eggert <at> cs.ucla.edu>
Cc: 72524 <at> debbugs.gnu.org
Subject: RE: bug#72524: how does grep determine locale if no LC environment
 variables are set
Date: Fri, 9 Aug 2024 22:08:39 +0000
Yeah but that would involve compiling the SAME WAY that this one was compiled, and I have no idea how the Git Bash for Windows project does this kind of thing.
I tried it on a grep in a more full-featured distribution (namely MSYS2) and it works fine.
I'm tempted to try it on a different computer and see what it does there, but not sure when I'll get around to it.
I don't suppose you have a Windows machine with git installed conveniently lying around by any chance?
(I have one lying around, but not conveniently.)

-----Original Message-----
From: Paul Eggert <eggert <at> cs.ucla.edu> 
Sent: Friday, August 9, 2024 6:02 PM
To: Yagnatinsky, Mark : IT (NYK) <mark.yagnatinsky <at> barclays.com>
Cc: 72524 <at> debbugs.gnu.org
Subject: Re: bug#72524: how does grep determine locale if no LC environment variables are set

 CAUTION:  This email originated from outside our organization - eggert <at> cs.ucla.edu  Do not click on links, open attachments, or respond unless you recognize the sender and can validate the content is safe. 

______________________________________________________________________
On 2024-08-09 14:59, mark.yagnatinsky <at> barclays.com wrote:
> Re: debugger: will that work if grep wasn't built with debug symbols?

Only if you're reasonably experienced with machine code. It's not fun, but it's doable.

You could write your own little test program that simply does 'setlocale (LC_ALL, "")' followed by the fprintf code I sent you earlier, and debug that or see what it does.

This message is for information purposes only. It is not a recommendation, advice, offer or solicitation to buy or sell a product or service, nor an official confirmation of any transaction. It is directed at persons who are professionals and is intended for the recipient(s) only. It is not directed at retail customers. This message is subject to the terms at: https://www.ib.barclays/disclosures/web-and-email-disclaimer.html. 

For important disclosures, please see: https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding marketing commentary from Barclays Sales and/or Trading desks, who are active market participants; https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html regarding our standard terms for Barclays Investment Bank where we trade with you in principal-to-principal wholesale markets transactions; and in respect to Barclays Research, including disclosures relating to specific issuers, see: https://publicresearch.barclays.com.
__________________________________________________________________________________ 
If you are incorporated or operating in Australia, read these important disclosures: https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__________________________________________________________________________________
For more details about how we use personal information, see our privacy notice: https://www.ib.barclays/disclosures/personal-information-use.html. 
__________________________________________________________________________________

Information forwarded to bug-grep <at> gnu.org:
bug#72524; Package grep. (Fri, 09 Aug 2024 22:11:01 GMT) Full text and rfc822 format available.

Message #26 received at 72524 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: mark.yagnatinsky <at> barclays.com
Cc: 72524 <at> debbugs.gnu.org
Subject: Re: bug#72524: how does grep determine locale if no LC environment
 variables are set
Date: Fri, 9 Aug 2024 15:10:04 -0700
On 2024-08-09 15:08, mark.yagnatinsky <at> barclays.com wrote:
> I don't suppose you have a Windows machine with git installed conveniently lying around by any chance?
> (I have one lying around, but not conveniently.)

Nope, don't do MS-Windows.




Information forwarded to bug-grep <at> gnu.org:
bug#72524; Package grep. (Fri, 09 Aug 2024 22:13:02 GMT) Full text and rfc822 format available.

Message #29 received at 72524 <at> debbugs.gnu.org (full text, mbox):

From: <mark.yagnatinsky <at> barclays.com>
To: <eggert <at> cs.ucla.edu>
Cc: 72524 <at> debbugs.gnu.org
Subject: RE: bug#72524: how does grep determine locale if no LC environment
 variables are set
Date: Fri, 9 Aug 2024 22:11:46 +0000
I suspected as much... I'll try to get my other one working then.

-----Original Message-----
From: Paul Eggert <eggert <at> cs.ucla.edu> 
Sent: Friday, August 9, 2024 6:10 PM
To: Yagnatinsky, Mark : IT (NYK) <mark.yagnatinsky <at> barclays.com>
Cc: 72524 <at> debbugs.gnu.org
Subject: Re: bug#72524: how does grep determine locale if no LC environment variables are set

 CAUTION:  This email originated from outside our organization - eggert <at> cs.ucla.edu  Do not click on links, open attachments, or respond unless you recognize the sender and can validate the content is safe. 

______________________________________________________________________
On 2024-08-09 15:08, mark.yagnatinsky <at> barclays.com wrote:
> I don't suppose you have a Windows machine with git installed conveniently lying around by any chance?
> (I have one lying around, but not conveniently.)

Nope, don't do MS-Windows.

This message is for information purposes only. It is not a recommendation, advice, offer or solicitation to buy or sell a product or service, nor an official confirmation of any transaction. It is directed at persons who are professionals and is intended for the recipient(s) only. It is not directed at retail customers. This message is subject to the terms at: https://www.ib.barclays/disclosures/web-and-email-disclaimer.html. 

For important disclosures, please see: https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding marketing commentary from Barclays Sales and/or Trading desks, who are active market participants; https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html regarding our standard terms for Barclays Investment Bank where we trade with you in principal-to-principal wholesale markets transactions; and in respect to Barclays Research, including disclosures relating to specific issuers, see: https://publicresearch.barclays.com.
__________________________________________________________________________________ 
If you are incorporated or operating in Australia, read these important disclosures: https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__________________________________________________________________________________
For more details about how we use personal information, see our privacy notice: https://www.ib.barclays/disclosures/personal-information-use.html. 
__________________________________________________________________________________

Information forwarded to bug-grep <at> gnu.org:
bug#72524; Package grep. (Mon, 12 Aug 2024 11:35:01 GMT) Full text and rfc822 format available.

Message #32 received at 72524 <at> debbugs.gnu.org (full text, mbox):

From: <mark.yagnatinsky <at> barclays.com>
To: <eggert <at> cs.ucla.edu>
Cc: 72524 <at> debbugs.gnu.org
Subject: RE: bug#72524: how does grep determine locale if no LC environment
 variables are set
Date: Mon, 12 Aug 2024 11:33:25 +0000
I checked; it's happens on Git for Windows on the other computer too.
Must be something with how they implemented locals.
Thanks!

-----Original Message-----
From: Yagnatinsky, Mark : IT (NYK) 
Sent: Friday, August 9, 2024 6:12 PM
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 72524 <at> debbugs.gnu.org
Subject: RE: bug#72524: how does grep determine locale if no LC environment variables are set

I suspected as much... I'll try to get my other one working then.

-----Original Message-----
From: Paul Eggert <eggert <at> cs.ucla.edu> 
Sent: Friday, August 9, 2024 6:10 PM
To: Yagnatinsky, Mark : IT (NYK) <mark.yagnatinsky <at> barclays.com>
Cc: 72524 <at> debbugs.gnu.org
Subject: Re: bug#72524: how does grep determine locale if no LC environment variables are set

 CAUTION:  This email originated from outside our organization - eggert <at> cs.ucla.edu  Do not click on links, open attachments, or respond unless you recognize the sender and can validate the content is safe. 

______________________________________________________________________
On 2024-08-09 15:08, mark.yagnatinsky <at> barclays.com wrote:
> I don't suppose you have a Windows machine with git installed conveniently lying around by any chance?
> (I have one lying around, but not conveniently.)

Nope, don't do MS-Windows.

This message is for information purposes only. It is not a recommendation, advice, offer or solicitation to buy or sell a product or service, nor an official confirmation of any transaction. It is directed at persons who are professionals and is intended for the recipient(s) only. It is not directed at retail customers. This message is subject to the terms at: https://www.ib.barclays/disclosures/web-and-email-disclaimer.html. 

For important disclosures, please see: https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding marketing commentary from Barclays Sales and/or Trading desks, who are active market participants; https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html regarding our standard terms for Barclays Investment Bank where we trade with you in principal-to-principal wholesale markets transactions; and in respect to Barclays Research, including disclosures relating to specific issuers, see: https://publicresearch.barclays.com.
__________________________________________________________________________________ 
If you are incorporated or operating in Australia, read these important disclosures: https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__________________________________________________________________________________
For more details about how we use personal information, see our privacy notice: https://www.ib.barclays/disclosures/personal-information-use.html. 
__________________________________________________________________________________

This bug report was last modified 309 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.