GNU bug report logs - #16812
Eszett handling

Previous Next

Package: grep;

Reported by: mathstuf <at> gmail.com

Date: Wed, 19 Feb 2014 19:04:01 UTC

Severity: wishlist

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ben Boeckel <mathstuf <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: Eszett handling
Date: Wed, 19 Feb 2014 13:59:18 -0500
[Message part 1 (text/plain, inline)]
[ I am not subscribed; please keep me on the CC. ]

Hi,

From the new grep announcement on LWN[1], I had a thought about how the
German eszett was handled. It seems that it hasn't been handled at all.
This may fall to the same resolution as the recent LJ/Lj thread[2]
though.

Basically, it seems that grep doesn't support alternates when changing
case. The uppercase of 'ß' is either 'SS' or 'ẞ' depending on the
context[3]. From some poking, only the latter is supported. My
thought[4] was that the code would generate '[ßSS]' which would be wrong
when matching and would instead need to do '(ß|SS)'. It now seems that
'(ß|SS|ẞ)' or even '(ß|[sS][sS]|ẞ)' would need to be generated instead
using the new code.

I've attached a test case I wrote based on 'turkish-eyes'. I release it
to the public domain.

Thanks,

--Ben

[1]https://lwn.net/Articles/586899/
[2]https://lists.gnu.org/archive/html/bug-grep/2014-02/msg00004.html
[3]https://en.wikipedia.org/wiki/Capital_%C3%9F
[4]https://lwn.net/Articles/587010/
[german-eszett (text/plain, attachment)]

This bug report was last modified 11 years and 78 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.