GNU bug report logs - #16812
Eszett handling

Previous Next

Package: grep;

Reported by: mathstuf <at> gmail.com

Date: Wed, 19 Feb 2014 19:04:01 UTC

Severity: wishlist

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 16812 <at> debbugs.gnu.org
Subject: bug#16812: Eszett handling
Date: Sat, 08 Mar 2014 10:52:49 -0800
'grep' is conforming to its specification, even though it's not as 
useful as it might be when searching German text.  The situation with 
'ß'/'SS' is different than the situation with 'lj'/'Lj'/'LJ' because in the 
latter case 'grep' is dealing only with individual characters.

There's a related issue with 'ß' versus the recently-introduced capital 
sharp-S 'ẞ'.  These do not match each other with 'grep --ignore-case' in 
the current savannah git master.  This is an unfortunate property of how 
the glibc regex code behaves: the regex code uppercases both pattern and 
data before comparing, but in the standard German locale 'ß' is 
unchanged by uppercasing.

I'll leave this bug open as it is an awkward situation.  Fixing it would 
require changing the glibc regex code, which is a big deal -- it would 
have some performance implications in a lot of programs.  So I'm not 
optimistic about fixing it any time soon.




This bug report was last modified 11 years and 54 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.