From debbugs-submit-bounces@debbugs.gnu.org Tue Oct 16 11:37:34 2012 Received: (at submit) by debbugs.gnu.org; 16 Oct 2012 15:37:34 +0000 Received: from localhost ([127.0.0.1]:46996 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TO9D2-000497-4u for submit@debbugs.gnu.org; Tue, 16 Oct 2012 11:37:34 -0400 Received: from eggs.gnu.org ([208.118.235.92]:36494) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TO2Y4-0001hd-Ls for submit@debbugs.gnu.org; Tue, 16 Oct 2012 04:30:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TO2Wm-0000Se-6r for submit@debbugs.gnu.org; Tue, 16 Oct 2012 04:29:32 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_HI, RECEIVED_FROM_WINDOWS_HOST autolearn=ham version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:37935) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TO2Wl-0000SS-Rh for submit@debbugs.gnu.org; Tue, 16 Oct 2012 04:29:28 -0400 Received: from eggs.gnu.org ([208.118.235.92]:39216) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TO2Wi-0002cG-13 for bug-coreutils@gnu.org; Tue, 16 Oct 2012 04:29:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TO2WZ-0000OV-N2 for bug-coreutils@gnu.org; Tue, 16 Oct 2012 04:29:23 -0400 Received: from bay0-omc3-s16.bay0.hotmail.com ([65.54.190.154]:9370) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TO2WZ-0000OF-H5 for bug-coreutils@gnu.org; Tue, 16 Oct 2012 04:29:15 -0400 Received: from BAY154-DS12 ([65.54.190.188]) by bay0-omc3-s16.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 16 Oct 2012 01:29:14 -0700 X-Originating-IP: [218.240.45.238] X-EIP: [e0+XwM+1n7oECdKm1aH1Iw2xRDI9X8ut] X-Originating-Email: [chinalinux@hotmail.com] Message-ID: From: "Michael" To: Subject: the join command bug report! Date: Tue, 16 Oct 2012 16:29:14 +0800 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0008_01CDABBB.613DB340" X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 14.0.8117.416 X-MimeOLE: Produced By Microsoft MimeOLE V14.0.8117.416 X-OriginalArrivalTime: 16 Oct 2012 08:29:14.0608 (UTC) FILETIME=[53546F00:01CDAB78] X-detected-operating-system: by eggs.gnu.org: Windows 2000 SP4, XP SP1+ X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -3.5 (---) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 16 Oct 2012 11:37:31 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.5 (---) This is a multi-part message in MIME format. ------=_NextPart_000_0008_01CDABBB.613DB340 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: quoted-printable Hi, I have two sorted files with million of lines to join together, I am = sure of that there are at least 1/3 keys in two files are the same, but = no even one key joined together. there's no such situation within small = files.=20 Michael Wu ------=_NextPart_000_0008_01CDABBB.613DB340 Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: quoted-printable
Hi,
 
I have two sorted files with million of lines = to join=20 together, I am sure of that there are at least 1/3 keys in two files are = the=20 same, but no even one key joined together.  there's no such = situation=20 within small files. 
 
Michael Wu
 
------=_NextPart_000_0008_01CDABBB.613DB340-- From debbugs-submit-bounces@debbugs.gnu.org Tue Oct 16 12:12:13 2012 Received: (at 12659) by debbugs.gnu.org; 16 Oct 2012 16:12:13 +0000 Received: from localhost ([127.0.0.1]:47029 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TO9ka-0004wa-Jf for submit@debbugs.gnu.org; Tue, 16 Oct 2012 12:12:12 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:41273) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TO9kY-0004wN-7K for 12659@debbugs.gnu.org; Tue, 16 Oct 2012 12:12:11 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id F2D60A60042; Tue, 16 Oct 2012 09:10:50 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4G1U7LKGc70B; Tue, 16 Oct 2012 09:10:50 -0700 (PDT) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id A085BA6001E; Tue, 16 Oct 2012 09:10:50 -0700 (PDT) Message-ID: <507D8700.3090004@cs.ucla.edu> Date: Tue, 16 Oct 2012 09:10:40 -0700 From: Paul Eggert User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121009 Thunderbird/16.0 MIME-Version: 1.0 To: Michael Subject: Re: bug#12659: the join command bug report! References: In-Reply-To: Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: 7bit X-Spam-Score: 0.4 (/) X-Debbugs-Envelope-To: 12659 Cc: 12659@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.4 (/) Sounds like a locale problem. What does the "locale" command say? How exactly are you invoking 'sort' and 'join'? What do the input and output lines look like? From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 17 03:20:53 2012 Received: (at 12659) by debbugs.gnu.org; 17 Oct 2012 07:20:53 +0000 Received: from localhost ([127.0.0.1]:47603 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TONvv-0002mO-OK for submit@debbugs.gnu.org; Wed, 17 Oct 2012 03:20:53 -0400 Received: from bay0-omc3-s4.bay0.hotmail.com ([65.54.190.142]:42511) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TONvs-0002m9-MI for 12659@debbugs.gnu.org; Wed, 17 Oct 2012 03:20:50 -0400 Received: from BAY154-DS16 ([65.54.190.189]) by bay0-omc3-s4.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 17 Oct 2012 00:19:25 -0700 X-Originating-IP: [218.240.45.238] X-EIP: [+pRdFuKID1DMK2NFB9aorT36X5LOKEGz] X-Originating-Email: [chinalinux@hotmail.com] Message-ID: From: "Michael" To: "Paul Eggert" References: <507D8700.3090004@cs.ucla.edu> In-Reply-To: <507D8700.3090004@cs.ucla.edu> Subject: Re: bug#12659: the join command bug report! Date: Wed, 17 Oct 2012 15:19:26 +0800 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="gb2312"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 14.0.8117.416 X-MimeOLE: Produced By Microsoft MimeOLE V14.0.8117.416 X-OriginalArrivalTime: 17 Oct 2012 07:19:25.0075 (UTC) FILETIME=[BC962E30:01CDAC37] X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 12659 Cc: 12659@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.9 (/) en_US.UTF-8 # sort -n file1 > file3 # sort -n file2 > file4 # join file3 file4 | wc -l 19 # sort file3 file4 | uniq -d | wc -l 4698 # There are only numbers in my both joined files, I have realized that join does not support numeric sort method for the time being. if sort without option '-n', the result after joining was correct. Michael -------------------------------------------------- From: "Paul Eggert" Sent: Wednesday, October 17, 2012 12:10 AM To: "Michael" Cc: <12659@debbugs.gnu.org> Subject: Re: bug#12659: the join command bug report! > Sounds like a locale problem. What does the "locale" > command say? How exactly are you invoking 'sort' and > 'join'? What do the input and output lines look like? > From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 17 11:16:23 2012 Received: (at 12659) by debbugs.gnu.org; 17 Oct 2012 15:16:23 +0000 Received: from localhost ([127.0.0.1]:48636 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TOVM6-000691-Gz for submit@debbugs.gnu.org; Wed, 17 Oct 2012 11:16:22 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:52603) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TOVM3-00068p-EY for 12659@debbugs.gnu.org; Wed, 17 Oct 2012 11:16:20 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id E8D93A60012; Wed, 17 Oct 2012 08:14:54 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NzMRQny9oioC; Wed, 17 Oct 2012 08:14:54 -0700 (PDT) Received: from [192.168.1.3] (pool-108-23-119-2.lsanca.fios.verizon.net [108.23.119.2]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 95043A6000F; Wed, 17 Oct 2012 08:14:54 -0700 (PDT) Message-ID: <507ECB67.3070504@cs.ucla.edu> Date: Wed, 17 Oct 2012 08:14:47 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux i686; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: Michael Subject: Re: bug#12659: the join command bug report! References: <507D8700.3090004@cs.ucla.edu> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: 0.4 (/) X-Debbugs-Envelope-To: 12659 Cc: 12659@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.4 (/) On 10/17/2012 12:19 AM, Michael wrote: > # sort -n file1 > file3 > # sort -n file2 > file4 > > # join file3 file4 That won't work. You have to join with the same sorting order that you sorted with. This is discussed in the manual. From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 17 14:15:26 2012 Received: (at control) by debbugs.gnu.org; 17 Oct 2012 18:15:27 +0000 Received: from localhost ([127.0.0.1]:48785 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TOY9O-0001qg-H4 for submit@debbugs.gnu.org; Wed, 17 Oct 2012 14:15:26 -0400 Received: from joseki.proulx.com ([216.17.153.58]:39616) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TOY9M-0001qZ-LU for control@debbugs.gnu.org; Wed, 17 Oct 2012 14:15:25 -0400 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 2B2FB211DF for ; Wed, 17 Oct 2012 12:14:04 -0600 (MDT) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 05E3D2DCD1; Wed, 17 Oct 2012 12:14:03 -0600 (MDT) Date: Wed, 17 Oct 2012 12:14:03 -0600 From: Bob Proulx To: control@debbugs.gnu.org Subject: closing 12659 Message-ID: <20121017181403.GA20235@hysteria.proulx.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: 0.4 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.4 (/) tag 12659 + notabug close 12659 thanks From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 17 14:16:29 2012 Received: (at 12659) by debbugs.gnu.org; 17 Oct 2012 18:16:29 +0000 Received: from localhost ([127.0.0.1]:48791 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TOYAO-0001sY-Uu for submit@debbugs.gnu.org; Wed, 17 Oct 2012 14:16:29 -0400 Received: from joseki.proulx.com ([216.17.153.58]:39622) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TOYAM-0001sO-KK; Wed, 17 Oct 2012 14:16:27 -0400 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id AE5BC211E7; Wed, 17 Oct 2012 12:15:06 -0600 (MDT) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 9F6212DCD1; Wed, 17 Oct 2012 12:15:06 -0600 (MDT) Date: Wed, 17 Oct 2012 12:15:06 -0600 From: Bob Proulx To: Michael , 12659@debbugs.gnu.org Subject: Re: bug#12659: the join command bug report! Message-ID: <20121017181506.GA19968@hysteria.proulx.com> References: <507D8700.3090004@cs.ucla.edu> <507ECB67.3070504@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <507ECB67.3070504@cs.ucla.edu> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: 0.4 (/) X-Debbugs-Envelope-To: 12659 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.4 (/) Paul Eggert wrote: > On 10/17/2012 12:19 AM, Michael wrote: > > # sort -n file1 > file3 > > # sort -n file2 > file4 > > > > # join file3 file4 > > That won't work. You have to join with the same > sorting order that you sorted with. This is discussed > in the manual. Since this seems to have been resolved satisfactorily I have closed the bug report. If you have any further information please feel free to respond as I have done here and it will be delivered to all of the interested parties. Bob From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 17 21:44:55 2012 Received: (at 12659) by debbugs.gnu.org; 18 Oct 2012 01:44:55 +0000 Received: from localhost ([127.0.0.1]:49075 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TOfAM-0005tf-OK for submit@debbugs.gnu.org; Wed, 17 Oct 2012 21:44:55 -0400 Received: from bay0-omc3-s6.bay0.hotmail.com ([65.54.190.144]:19263) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TOfAK-0005tR-3r for 12659@debbugs.gnu.org; Wed, 17 Oct 2012 21:44:53 -0400 Received: from BAY154-DS10 ([65.54.190.187]) by bay0-omc3-s6.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 17 Oct 2012 18:43:25 -0700 X-Originating-IP: [218.240.45.238] X-EIP: [KW2N+UeBrrv9d16B1W0vhttNyikP0ejg] X-Originating-Email: [chinalinux@hotmail.com] Message-ID: From: "Michael" To: "Paul Eggert" References: <507D8700.3090004@cs.ucla.edu> <507ECB67.3070504@cs.ucla.edu> In-Reply-To: <507ECB67.3070504@cs.ucla.edu> Subject: Re: bug#12659: the join command bug report! Date: Thu, 18 Oct 2012 09:43:27 +0800 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="utf-8"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 14.0.8117.416 X-MimeOLE: Produced By Microsoft MimeOLE V14.0.8117.416 X-OriginalArrivalTime: 18 Oct 2012 01:43:25.0045 (UTC) FILETIME=[F6AF7A50:01CDACD1] X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 12659 Cc: 12659@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.9 (/) " Important: FILE1 and FILE2 must be sorted on the join fields." There is only words above in the manual. it should mention the sort method at least. I strongly suggest improve the join maunal in the latter distribution. Thanks Michael -------------------------------------------------- From: "Paul Eggert" Sent: Wednesday, October 17, 2012 11:14 PM To: "Michael" Cc: <12659@debbugs.gnu.org> Subject: Re: bug#12659: the join command bug report! > On 10/17/2012 12:19 AM, Michael wrote: >> # sort -n file1 > file3 >> # sort -n file2 > file4 >> >> # join file3 file4 > > That won't work. You have to join with the same > sorting order that you sorted with. This is discussed > in the manual. > From unknown Mon Aug 18 11:27:31 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 15 Nov 2012 12:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator