From debbugs-submit-bounces@debbugs.gnu.org Mon Jul 15 14:52:13 2019 Received: (at submit) by debbugs.gnu.org; 15 Jul 2019 18:52:13 +0000 Received: from localhost ([127.0.0.1]:49092 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hn655-0008SO-3z for submit@debbugs.gnu.org; Mon, 15 Jul 2019 14:52:13 -0400 Received: from lists.gnu.org ([209.51.188.17]:45165) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hn5vM-00060p-BW for submit@debbugs.gnu.org; Mon, 15 Jul 2019 14:42:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:32919) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from ) id 1hn5vL-0002Ti-8x for bug-coreutils@gnu.org; Mon, 15 Jul 2019 14:42:08 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hn5vK-0004gd-Bl for bug-coreutils@gnu.org; Mon, 15 Jul 2019 14:42:07 -0400 Received: from omta014.uswest2.a.cloudfilter.net ([35.164.127.237]:36530) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hn5vK-0004g3-3U for bug-coreutils@gnu.org; Mon, 15 Jul 2019 14:42:06 -0400 Received: from cxr.smtp.a.cloudfilter.net ([10.0.17.147]) by cmsmtp with ESMTP id mx92hQ0HXwCEUn5vIhkYWM; Mon, 15 Jul 2019 18:42:04 +0000 Received: from hp ([68.109.143.57]) by cmsmtp with ESMTPSA id n5vGhFTX4bdp3n5vHh9Fno; Mon, 15 Jul 2019 18:42:04 +0000 Authentication-Results: cox.net; auth=pass (LOGIN) smtp.auth=marshalllake@cox.net X-Authority-Analysis: v=2.3 cv=H+mlPNQi c=1 sm=1 tr=0 a=XxlAE5WhdvB/66EeSbxTQA==:117 a=XxlAE5WhdvB/66EeSbxTQA==:17 a=kj9zAlcOel0A:10 a=0o9FgrsRnhwA:10 a=8KZFqUnsAAAA:8 a=0rhCbE7lAwRGi4r155MA:9 a=CjuIK1q_8ugA:10 a=zddZHEzMotzOnKvlV1oV:22 a=pHzHmUro8NiASowvMSCR:22 a=nt3jZW36AmriUCFCBwmW:22 Received: by hp (Postfix, from userid 1000) id E2C24130007C; Mon, 15 Jul 2019 11:42:01 -0700 (MST) Received: from localhost (localhost [127.0.0.1]) by hp (Postfix) with ESMTP id D52931300037 for ; Mon, 15 Jul 2019 11:42:01 -0700 (MST) Date: Mon, 15 Jul 2019 11:42:01 -0700 (MST) From: Marshall Lake To: bug-coreutils@gnu.org Subject: Sort Suggestion Message-ID: User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=US-ASCII X-CMAE-Envelope: MS4wfBoLg/YYa7LmVLW+c8dT3Rf4akZ1dgRsSHyESm6VAWQOg73+D/Dtrdv0+pSst4DxrMttDhapBPCA3qNZV5UQqOQll7CzJORfB2YBSLRFJ35CUDmaqtqf W2X+8OIFNbfvcAApUaxoI0QIGStbfeG5R8g= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 35.164.127.237 X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 15 Jul 2019 14:52:09 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Even though this isn't a bug, I was asked to send the following to this email address. Re: SORT Command from GNU coreutils 8.25 A suggestion for an additional option to the SORT command is to ignore non-alphanumeric characters. As an example, in attempting to sort an index ... Abbott, William 259 sorts before: Abbot, William 099 If non-alphanumeric characters were ignored then the same two records would sort as: Abbot, William 099 Abbott, William 259 Thanks for reading. -- Marshall Lake -- mlake@mlake.net -- http://www.mlake.net From debbugs-submit-bounces@debbugs.gnu.org Mon Jul 15 15:24:03 2019 Received: (at 36674) by debbugs.gnu.org; 15 Jul 2019 19:24:03 +0000 Received: from localhost ([127.0.0.1]:49150 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hn6Zu-0005Kr-OE for submit@debbugs.gnu.org; Mon, 15 Jul 2019 15:24:03 -0400 Received: from mail-pl1-f193.google.com ([209.85.214.193]:38854) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hn6Zt-0005KI-4O; Mon, 15 Jul 2019 15:24:01 -0400 Received: by mail-pl1-f193.google.com with SMTP id az7so8787430plb.5; Mon, 15 Jul 2019 12:24:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=KBOq6J32wgtSCQmuAt/fOpvDwPaW7cSKLc67ZI4Nal0=; b=FrRH1wnJjm7dZUefQz3u6luyjYyK8gKrGvzV21bRhWCmlyBSzyubK6wfEchL+CmOtP /WnuehekcJoQBm//doyngVYiyXte+OxulsypvPkIEbtg3APkJrjz8r5hHXqI3T5GzJyk 0vBTPZCwITTFEQvlfKjoCG7u9yfUP/BywrbTZmfD2Knrsb8fDRwZddxLz4pWkE23+rd0 SwABm/3etJS/sMqeMAJEIc/nAkUSc/p3xr2LIFkmg96wBjupPOkELNo64wqqGORLXiRM fcV9Yu7N8FoOIBiHYTmDmTY0QWMw0QL1N3yiuyT9CAxiog55DrvSau02me2G5MWAKCcA rPrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=KBOq6J32wgtSCQmuAt/fOpvDwPaW7cSKLc67ZI4Nal0=; b=qwkN2x2SrXKn3jbHe21+B/8ik8a6wt+pqLuIR9QOp+LtbeWzS5Ibm4IX0qoQK2cbEE /31mBmYzIS4lBodE0CBBxaC/NamgLx27iy4AzJQh9gjVXw5lMROwbBJApJUF+4ARyvXN gNUUUhrMcfiC9PM6E8XYX2M/YkpWrFDV1Xeum8/p8M4UdGIhhAf7cTquAut3kZigZF/i iaoljpxx8RTTig6xni8yjR7H04UDzXubyx8llfmEG/K04EC8OQ4X28Du/6XzT2Ij4xJi 8UyrJvvLlDkne+fELnKwzb14G25nYRLVPlXCyI586VF9ONHpdgOu9Sx0gRu0COFUbpZo vfvw== X-Gm-Message-State: APjAAAXC6IZ59N6P+TsiT8jsYPSUu1bKVsHFdQquSWmmGX1o/ANXHYQJ B79HdWVKcKzIdUOKYieXTO32Fvjz X-Google-Smtp-Source: APXvYqyDILMg74nraGCmJgxakmhzAT+lFME6pIom5bcBIGyTXmf09T4R21yh33WTCHc9A2VfWg+oOQ== X-Received: by 2002:a17:902:7281:: with SMTP id d1mr29706704pll.329.1563218634641; Mon, 15 Jul 2019 12:23:54 -0700 (PDT) Received: from tomato (moose.housegordon.com. [184.68.105.38]) by smtp.gmail.com with ESMTPSA id b26sm22779552pfo.129.2019.07.15.12.23.53 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 15 Jul 2019 12:23:53 -0700 (PDT) Received: by tomato (Postfix, from userid 1000) id 73AC3680C30; Mon, 15 Jul 2019 13:23:52 -0600 (MDT) Date: Mon, 15 Jul 2019 13:23:52 -0600 From: Assaf Gordon To: Marshall Lake Subject: Re: bug#36674: Sort Suggestion Message-ID: <20190715192352.GB2676@tomato.moose.housegordon.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.4 (2019-03-13) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 36674 Cc: 36674@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tag 36674 notabug close 36674 stop Hello, On Mon, Jul 15, 2019 at 11:42:01AM -0700, Marshall Lake wrote: > Even though this isn't a bug, I was asked to send the following to this > email address. (General suggestions and discussions are better suited for coreutils@gnu.org mailing list, that way the system won't open a new bug item.) > > Re: SORT Command from GNU coreutils 8.25 > > A suggestion for an additional option to the SORT command is to ignore > non-alphanumeric characters. > > As an example, in attempting to sort an index ... > > Abbott, William 259 > > sorts before: > > Abbot, William 099 > > If non-alphanumeric characters were ignored then the same two records > would sort as: > > Abbot, William 099 > Abbott, William 259 > > There's actually something else at play here: In your case, sort does ignore non-alphanumeric characters, but it ALSO ignores white space. That happens because your locale is set to some language (for example, en_US.UTF8). Using such locale makes sort ignore all non-alphanumeric chareacters, whitespace, and upper/lower cases. In essense, you are compaing "AbbottWilliam" (two 't's) to 'AbbotWilliam' (one 't') - and then the second 't' is compared to a 'w', and is determined to come first. If you force a POSIX/C locate, then all characters are considered, and the result will be as you requested. Observe the following: $ printf "%s\n" AbbottWilliam AbbotWilliam | LC_ALL=en_CA.utf8 sort AbbottWilliam AbbotWilliam $ printf "%s\n" "Abbott William" "Abbot William" | LC_ALL=en_CA.utf8 sort Abbott William Abbot William $ printf "%s\n" "Abbott William" "Abbot William" | LC_ALL=C sort Abbot William Abbott William $ printf "%s\n" "Abbott, William" "Abbot, William" | LC_ALL=C sort Abbot, William Abbott, William Note that 'sort' already has an option for dictionary style sorting: -d, --dictionary-order: consider only blanks and alphanumeric characters. However, locale rules take precedence over it, so effectively it only works in "C" locale: $ printf "%s\n" "Ab,,b,,ott William" "Abbot William" | LC_ALL=C sort Ab,,b,,ott William Abbot William $ printf "%s\n" "Ab,,b,,ott William" "Abbot William" | LC_ALL=C sort -d Abbot William Ab,,b,,ott William You can read past discussion about the confusion resulting from locale sorting rules here: https://debbugs.gnu.org/11621 https://debbugs.gnu.org/12783 As such, I'm closing this as "not a bug", but discussion can continue by replying to this thread. -assaf From unknown Fri Aug 15 20:55:18 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 13 Aug 2019 11:24:08 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator