From unknown Fri Jun 20 07:21:24 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#78213 <78213@debbugs.gnu.org> To: bug#78213 <78213@debbugs.gnu.org> Subject: Status: should produce a diff with similar lines grouped together if possible Reply-To: bug#78213 <78213@debbugs.gnu.org> Date: Fri, 20 Jun 2025 14:21:24 +0000 retitle 78213 should produce a diff with similar lines grouped together if = possible reassign 78213 diffutils submitter 78213 Vincent Lefevre severity 78213 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Fri May 02 10:18:59 2025 Received: (at submit) by debbugs.gnu.org; 2 May 2025 14:18:59 +0000 Received: from localhost ([127.0.0.1]:60067 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uArEA-0006B4-Ht for submit@debbugs.gnu.org; Fri, 02 May 2025 10:18:59 -0400 Received: from lists.gnu.org ([2001:470:142::17]:42342) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1uArE5-0006AZ-Gp for submit@debbugs.gnu.org; Fri, 02 May 2025 10:18:56 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uArDy-0002XA-4T for bug-diffutils@gnu.org; Fri, 02 May 2025 10:18:46 -0400 Received: from joooj.vinc17.net ([155.133.131.76]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uArDv-0006mv-5p for bug-diffutils@gnu.org; Fri, 02 May 2025 10:18:45 -0400 Received: from smtp-qaa.vinc17.net (unknown [IPv6:2a01:cb19:952d:d700:2843:628d:3ffe:26bb]) by joooj.vinc17.net (Postfix) with ESMTPSA id A3FAC483; Fri, 2 May 2025 16:18:28 +0200 (CEST) Received: by qaa.vinc17.org (Postfix, from userid 1000) id 374B0CA0097; Fri, 02 May 2025 16:18:26 +0200 (CEST) Date: Fri, 2 May 2025 16:18:26 +0200 From: Vincent Lefevre To: bug-diffutils@gnu.org Subject: should produce a diff with similar lines grouped together if possible Message-ID: <20250502141826.GA966462@qaa.vinc17.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="lpbfwsSgzewr+rag" Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Mailer-Info: https://www.vinc17.net/mutt/ User-Agent: Mutt/2.2.13+86 (bb2064ae) vl-169878 (2025-02-08) Received-SPF: pass client-ip=155.133.131.76; envelope-from=vincent@vinc17.net; helo=joooj.vinc17.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --lpbfwsSgzewr+rag Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit "diff" should produce a diff with similar lines grouped together if possible, so that it can be more easily readable and be word-diff friendly. Issues can occur on text files with paragraphs separated by a blank line (like in wiki source files and LaTeX files), where a modification consists in * some paragraph being split, and * some of the following paragraphs being slightly modified. In such a case, a shift of the slightly modified lines can occur, which makes the diff hardly readable and breaks word-diff (e.g. when one opens the diff file with GNU Emacs). I've attached an example: * file1 and file2: the files to be diff'ed (file2 is similar to file1, with the first paragraph split). * file-bad.diff: the diff produced by diff 3.10. * file-ok.diff: the diff I would expect. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon) --lpbfwsSgzewr+rag Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=file1 A alias esse quia sunt velit commodi. Voluptas neque dignissimos excepturi quaerat id tempore quia alias. Quia hic dignissimos deleniti at magni id alias id. Qui dolore voluptatem quae. A doloremque illo molestias aperiam expedita voluptatem. Dolorum et dolor quia ullam. Rerum culpa hic sint unde. Reprehenderit est quisquam sequi modi possimus. Dolor sit similique vel. Dolorem laborum in non quibusdam et nihil. Sed fugit delectus hic voluptatem et doloremque. Qui nesciunt doloremque sed veritatis dignissimos illo. In qui voluptatum quia excepturi unde officiis. Voluptas quia distinctio voluptates tempora qui labore beatae sint. Consequuntur est sed quaerat necessitatibus labore. Voluptatem aliquam voluptates natus quibusdam sequi vel iusto. Minus fugit deserunt tenetur. Maiores tempore in ea architecto quaerat aut. Culpa et est quos. Eos in et quia facere porro sit. Iste dolores omnis incidunt aliquam quia labore soluta. Officia doloremque rerum cum quas aut. Modi eos vel cupiditate dolore nisi non. Non voluptate provident id officia dolor. --lpbfwsSgzewr+rag Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=file2 A alias esse quia sunt velit commodi. Voluptas neque dignissimos excepturi quaerat id tempore quia alias. Quia hic dignissimos deleniti at magni id alias id. * Qui dolore voluptatem quae. A doloremque illo molestias aperiam expedita voluptatem. Dolorum et dolor quia ullam. * Rerum culpa hic sint unde. Reprehenderit est quisquam sequi modi possimus. Dolor sit similique vel. Dolorem laborum in non quibusdam et nihil. Sed fugit delectus hic voluptatem et doloremque. * Qui nesciunt doloremque sed veritatis dignissimos illo. In qui voluptatum quia excepturi unde officiis. Voluptas quia distinctio voluptates tempora qui labore beatae sint. Consequuntur est sed quaerat necessitatibus labore. * Voluptatem aliquam voluptates natus quibusdam sequi vel iusto. Minus fugit deserunt tenetur. Maiores tempore in ea architecto quaerat aut. Culpa et est quos. Eos in et quia facere porro sit. Iste dolores omnis incidunt aliquam quia labore soluta. * Officia doloremque rerum cum quas aut. Modi eos vel cupiditate dolore nisi non. Non voluptate provident id officia dolor. * --lpbfwsSgzewr+rag Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=file-bad.diff --- file1 2025-04-28 04:55:40.705835474 +0200 +++ file2 2025-04-28 04:56:23.245451675 +0200 @@ -1,9 +1,11 @@ -A alias esse quia sunt velit commodi. Voluptas neque dignissimos excepturi quaerat id tempore quia alias. Quia hic dignissimos deleniti at magni id alias id. Qui dolore voluptatem quae. A doloremque illo molestias aperiam expedita voluptatem. Dolorum et dolor quia ullam. +A alias esse quia sunt velit commodi. Voluptas neque dignissimos excepturi quaerat id tempore quia alias. Quia hic dignissimos deleniti at magni id alias id. * -Rerum culpa hic sint unde. Reprehenderit est quisquam sequi modi possimus. Dolor sit similique vel. Dolorem laborum in non quibusdam et nihil. Sed fugit delectus hic voluptatem et doloremque. +Qui dolore voluptatem quae. A doloremque illo molestias aperiam expedita voluptatem. Dolorum et dolor quia ullam. * -Qui nesciunt doloremque sed veritatis dignissimos illo. In qui voluptatum quia excepturi unde officiis. Voluptas quia distinctio voluptates tempora qui labore beatae sint. Consequuntur est sed quaerat necessitatibus labore. +Rerum culpa hic sint unde. Reprehenderit est quisquam sequi modi possimus. Dolor sit similique vel. Dolorem laborum in non quibusdam et nihil. Sed fugit delectus hic voluptatem et doloremque. * -Voluptatem aliquam voluptates natus quibusdam sequi vel iusto. Minus fugit deserunt tenetur. Maiores tempore in ea architecto quaerat aut. Culpa et est quos. Eos in et quia facere porro sit. Iste dolores omnis incidunt aliquam quia labore soluta. +Qui nesciunt doloremque sed veritatis dignissimos illo. In qui voluptatum quia excepturi unde officiis. Voluptas quia distinctio voluptates tempora qui labore beatae sint. Consequuntur est sed quaerat necessitatibus labore. * -Officia doloremque rerum cum quas aut. Modi eos vel cupiditate dolore nisi non. Non voluptate provident id officia dolor. +Voluptatem aliquam voluptates natus quibusdam sequi vel iusto. Minus fugit deserunt tenetur. Maiores tempore in ea architecto quaerat aut. Culpa et est quos. Eos in et quia facere porro sit. Iste dolores omnis incidunt aliquam quia labore soluta. * + +Officia doloremque rerum cum quas aut. Modi eos vel cupiditate dolore nisi non. Non voluptate provident id officia dolor. * --lpbfwsSgzewr+rag Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=file-ok.diff --- file1 2025-04-28 04:55:40.705835474 +0200 +++ file3 2025-04-28 04:57:38.148841327 +0200 @@ -1,9 +1,11 @@ -A alias esse quia sunt velit commodi. Voluptas neque dignissimos excepturi quaerat id tempore quia alias. Quia hic dignissimos deleniti at magni id alias id. Qui dolore voluptatem quae. A doloremque illo molestias aperiam expedita voluptatem. Dolorum et dolor quia ullam. +A alias esse quia sunt velit commodi. Voluptas neque dignissimos excepturi quaerat id tempore quia alias. Quia hic dignissimos deleniti at magni id alias id. * + +Qui dolore voluptatem quae. A doloremque illo molestias aperiam expedita voluptatem. Dolorum et dolor quia ullam. * -Rerum culpa hic sint unde. Reprehenderit est quisquam sequi modi possimus. Dolor sit similique vel. Dolorem laborum in non quibusdam et nihil. Sed fugit delectus hic voluptatem et doloremque. +Rerum culpa hic sint unde. Reprehenderit est quisquam sequi modi possimus. Dolor sit similique vel. Dolorem laborum in non quibusdam et nihil. Sed fugit delectus hic voluptatem et doloremque. * -Qui nesciunt doloremque sed veritatis dignissimos illo. In qui voluptatum quia excepturi unde officiis. Voluptas quia distinctio voluptates tempora qui labore beatae sint. Consequuntur est sed quaerat necessitatibus labore. +Qui nesciunt doloremque sed veritatis dignissimos illo. In qui voluptatum quia excepturi unde officiis. Voluptas quia distinctio voluptates tempora qui labore beatae sint. Consequuntur est sed quaerat necessitatibus labore. * -Voluptatem aliquam voluptates natus quibusdam sequi vel iusto. Minus fugit deserunt tenetur. Maiores tempore in ea architecto quaerat aut. Culpa et est quos. Eos in et quia facere porro sit. Iste dolores omnis incidunt aliquam quia labore soluta. +Voluptatem aliquam voluptates natus quibusdam sequi vel iusto. Minus fugit deserunt tenetur. Maiores tempore in ea architecto quaerat aut. Culpa et est quos. Eos in et quia facere porro sit. Iste dolores omnis incidunt aliquam quia labore soluta. * -Officia doloremque rerum cum quas aut. Modi eos vel cupiditate dolore nisi non. Non voluptate provident id officia dolor. +Officia doloremque rerum cum quas aut. Modi eos vel cupiditate dolore nisi non. Non voluptate provident id officia dolor. * --lpbfwsSgzewr+rag-- From debbugs-submit-bounces@debbugs.gnu.org Sat May 03 04:13:11 2025 Received: (at 78213) by debbugs.gnu.org; 3 May 2025 08:13:11 +0000 Received: from localhost ([127.0.0.1]:37912 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uB7zj-0004Uq-4T for submit@debbugs.gnu.org; Sat, 03 May 2025 04:13:11 -0400 Received: from mail-ej1-x633.google.com ([2a00:1450:4864:20::633]:47578) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1uB7ze-0004U9-K9 for 78213@debbugs.gnu.org; Sat, 03 May 2025 04:13:08 -0400 Received: by mail-ej1-x633.google.com with SMTP id a640c23a62f3a-ac3fcf5ab0dso441896966b.3 for <78213@debbugs.gnu.org>; Sat, 03 May 2025 01:13:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746259980; x=1746864780; darn=debbugs.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wF0TEPLgiIcl0ukWxMZQxf+cNTYOvp40EIO0tMnrWrs=; b=DqA25t8J5Oq/OhKGKoDXo1RWxfK+EyaBfN4x3AsjPt9EVnH+W0BbdE+byXv3AtSazJ 7wB4fzAOtu+5EAOxvCHPTfMHr/BLSyquZfKAUhAEh79imbuQQaPbXn13CxBJaKJrKwu4 7QYH2GKF11Bb7KV53o/HEI2zpLsi4RjLQt0Mzph27lcd4pJNJvxM2bGf6OVc+eTqB880 IL1Cb8RiKh+iFqdGLwc55TFgTJUNpFLvnJtZaW+hJ0zCm1lUUvYQqi9qWo2cWamKy0AI g5EsNRtPNKrrJ4OqVVKNYW8ElHb+H+cXQ5+GdA5gB1gx7D5kyGAr85cGJxD3qbmZVuFu 9npw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746259980; x=1746864780; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wF0TEPLgiIcl0ukWxMZQxf+cNTYOvp40EIO0tMnrWrs=; b=DLeEZ0NQmMdyJQJkKuHUrqoLmDQz5++I9+CaaqbqdIzWQdfFJEhLNHD7W9SjXNyRts L0b9eK/L2fSW33I60GlfrhA2E8mSAIO+WkD8tuT7ei19XPieLL3PNbEZfhUdjpingKIs sg0edPHBtg6vx/5C0jxn5+Q6q9g39BzHkzVJiEAEt4nZHipL3yXQsNNs83KmTj9Ei2nI vhoE5GTQczIPM0CWpRGrENlNBuRcM4bwblRQ5PXIDoEY/HsKYm3xziauMFl0mbFH0q/3 BPdI+Yj0LeE6VCBMmwjduHbdJUTyzoiz5mIcLYNJdobCZgXo+85PCRRY63oS17JY+oJL sUyw== X-Gm-Message-State: AOJu0YxN69QFnW463em1PSKkw0A7MD+7XzlVYSdAyOTxgBByooG3Vxko pcVYf0fGH3iwRbXIZSghSuhdm7cLb4s/Wl/wSgBOVL+6EeV4K8JgtLL/C2CSCh6Tz/fhvMHtQ2b 2xoq9dtzzquWR3wDnSiG0TzOiiMW7BziX X-Gm-Gg: ASbGnctpgyuN0HKH/MpVE5tl+KNuBIzaa+KIrNmxBzonBUXJVFttgodPFCrswKN2HlS NYE4uDgBpbhn/dnPoTyiUUj0PmLS4ZC+xYC3QlKDr51niyc/4rLPsEgmgYGDgq3SMNllbDB2SsG yIPITHiVWOGmAYpSk2Ccs= X-Google-Smtp-Source: AGHT+IGTSkZL2gCfTx0dYqe1bKfeGbLt2lo+ejhnj1XDExGfWHCjFxkJO6tBXj49eSTwB/I5jZUVE9+9Sk9kCSCUv9g= X-Received: by 2002:a17:907:7f90:b0:ab7:cfe7:116f with SMTP id a640c23a62f3a-ad190837083mr191767466b.46.1746259980077; Sat, 03 May 2025 01:13:00 -0700 (PDT) MIME-Version: 1.0 References: <20250502141826.GA966462@qaa.vinc17.org> In-Reply-To: <20250502141826.GA966462@qaa.vinc17.org> From: Robert Webb Date: Sat, 3 May 2025 01:12:44 -0700 X-Gm-Features: ATxdqUE9MnsirsogOqYAj123yc8vTmlu4B6KcFXBxPzpeg4q9XW2A5bronDhDh4 Message-ID: Subject: Re: [bug-diffutils] bug#78213: should produce a diff with similar lines grouped together if possible To: Vincent Lefevre Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 78213 Cc: 78213@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) The diff to create file-bad.diff was with the '-u' option. I used diff v3.6 with identical results. The only lines that match between the file1 and file2 are empty ones. This bash command shows only an empty line in common: comm -12 -- <(sort -u file1) <(sort -u file2) Note: The line terminator on the provided files is CRLF. Long lines - Use 'less -S' :-) On Fri, May 2, 2025 at 7:19=E2=80=AFAM Vincent Lefevre = wrote: > > "diff" should produce a diff with similar lines grouped together if > possible, so that it can be more easily readable and be word-diff > friendly. > > Issues can occur on text files with paragraphs separated by a blank > line (like in wiki source files and LaTeX files), where a modification > consists in > * some paragraph being split, and > * some of the following paragraphs being slightly modified. > > In such a case, a shift of the slightly modified lines can occur, > which makes the diff hardly readable and breaks word-diff (e.g. > when one opens the diff file with GNU Emacs). > > I've attached an example: > * file1 and file2: the files to be diff'ed (file2 is similar to > file1, with the first paragraph split). > * file-bad.diff: the diff produced by diff 3.10. > * file-ok.diff: the diff I would expect. > > -- > Vincent Lef=C3=A8vre - Web: > 100% accessible validated (X)HTML - Blog: > Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Sat May 03 05:24:15 2025 Received: (at 78213) by debbugs.gnu.org; 3 May 2025 09:24:15 +0000 Received: from localhost ([127.0.0.1]:38254 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uB96U-00081w-Nl for submit@debbugs.gnu.org; Sat, 03 May 2025 05:24:15 -0400 Received: from joooj.vinc17.net ([155.133.131.76]:38324) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1uB96S-00081o-TL for 78213@debbugs.gnu.org; Sat, 03 May 2025 05:24:13 -0400 Received: from smtp-qaa.vinc17.net (135.197.67.86.rev.sfr.net [86.67.197.135]) by joooj.vinc17.net (Postfix) with ESMTPSA id 8CC05496; Sat, 3 May 2025 11:24:10 +0200 (CEST) Received: by qaa.vinc17.org (Postfix, from userid 1000) id 5BFC8CA0137; Sat, 03 May 2025 11:24:10 +0200 (CEST) Date: Sat, 3 May 2025 11:24:10 +0200 From: Vincent Lefevre To: Robert Webb Subject: Re: [bug-diffutils] bug#78213: should produce a diff with similar lines grouped together if possible Message-ID: <20250503092410.GS6691@qaa.vinc17.org> References: <20250502141826.GA966462@qaa.vinc17.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Mailer-Info: https://www.vinc17.net/mutt/ User-Agent: Mutt/2.2.13+86 (bb2064ae) vl-169878 (2025-02-08) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 78213 Cc: 78213@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 2025-05-03 01:12:44 -0700, Robert Webb wrote: > The diff to create file-bad.diff was with the '-u' option. Yes, I forgot to mention that as this is what I *always* use (and I always see diffs generated with this option). > The only lines that match between the file1 and file2 are empty ones. > This bash command shows only an empty line in common: > comm -12 -- <(sort -u file1) <(sort -u file2) > > Note: The line terminator on the provided files is CRLF. No, a single LF as usual (I suspect that some mail software converts them to CRLF when saving). > Long lines - Use 'less -S' :-) Well, be careful that the difference between the lines are at the end. I generated the files with "lorem -p 5" as the goal was to generate 5 paragraphs (and there is no way to get shorter paragraphs), and slightly editing them. I now think that "lorem -s 5" (to generate 5 sentences in a single paragraph) would have been better here since I had to add blank lines for the testcase anyway. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Sat May 03 05:45:50 2025 Received: (at 78213) by debbugs.gnu.org; 3 May 2025 09:45:50 +0000 Received: from localhost ([127.0.0.1]:38346 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uB9RN-0000iu-Pq for submit@debbugs.gnu.org; Sat, 03 May 2025 05:45:50 -0400 Received: from joooj.vinc17.net ([155.133.131.76]:49278) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1uB9RL-0000il-Lu for 78213@debbugs.gnu.org; Sat, 03 May 2025 05:45:48 -0400 Received: from smtp-qaa.vinc17.net (135.197.67.86.rev.sfr.net [86.67.197.135]) by joooj.vinc17.net (Postfix) with ESMTPSA id B86E3496; Sat, 3 May 2025 11:45:45 +0200 (CEST) Received: by qaa.vinc17.org (Postfix, from userid 1000) id 733B2CA0137; Sat, 03 May 2025 11:45:45 +0200 (CEST) Date: Sat, 3 May 2025 11:45:45 +0200 From: Vincent Lefevre To: Robert Webb Subject: Re: [bug-diffutils] bug#78213: should produce a diff with similar lines grouped together if possible Message-ID: <20250503094545.GA1049892@qaa.vinc17.org> References: <20250502141826.GA966462@qaa.vinc17.org> <20250503092410.GS6691@qaa.vinc17.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="PC1dEEC4bxS3SNrX" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20250503092410.GS6691@qaa.vinc17.org> X-Mailer-Info: https://www.vinc17.net/mutt/ User-Agent: Mutt/2.2.13+86 (bb2064ae) vl-169878 (2025-02-08) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 78213 Cc: 78213@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --PC1dEEC4bxS3SNrX Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit On 2025-05-03 11:24:10 +0200, Vincent Lefevre wrote: > On 2025-05-03 01:12:44 -0700, Robert Webb wrote: > > Long lines - Use 'less -S' :-) > > Well, be careful that the difference between the lines are at the end. > I generated the files with "lorem -p 5" as the goal was to generate > 5 paragraphs (and there is no way to get shorter paragraphs), and > slightly editing them. I now think that "lorem -s 5" (to generate > 5 sentences in a single paragraph) would have been better here since > I had to add blank lines for the testcase anyway. Here's a new version of the testcase, with files having short lines. Note: To obtain file-ok.diff, I first added a character in the first empty line of "file2", generated the diff with "diff -u" (with the added character, the issue with the first common empty lines disappears, so that the diff is good), then removed the character from the generated diff. -- Vincent Lefèvre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon) --PC1dEEC4bxS3SNrX Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=file1 Quo esse rem temporibus. Et omnis quos suscipit. Reprehenderit maxime nesciunt voluptatum placeat veniam et. Dolorem qui minima eaque fuga ut est in. In officiis amet dignissimos neque earum autem soluta. Beatae quo ut odio. --PC1dEEC4bxS3SNrX Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=file2 Quo esse rem temporibus. FOO Et omnis quos suscipit. FOO Reprehenderit maxime nesciunt voluptatum placeat veniam et. FOO Dolorem qui minima eaque fuga ut est in. FOO In officiis amet dignissimos neque earum autem soluta. FOO Beatae quo ut odio. FOO --PC1dEEC4bxS3SNrX Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=file-bad.diff --- file1 +++ file2 @@ -1,9 +1,11 @@ -Quo esse rem temporibus. Et omnis quos suscipit. +Quo esse rem temporibus. FOO -Reprehenderit maxime nesciunt voluptatum placeat veniam et. +Et omnis quos suscipit. FOO -Dolorem qui minima eaque fuga ut est in. +Reprehenderit maxime nesciunt voluptatum placeat veniam et. FOO -In officiis amet dignissimos neque earum autem soluta. +Dolorem qui minima eaque fuga ut est in. FOO -Beatae quo ut odio. +In officiis amet dignissimos neque earum autem soluta. FOO + +Beatae quo ut odio. FOO --PC1dEEC4bxS3SNrX Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=file-ok.diff --- file1 +++ file2 @@ -1,9 +1,11 @@ -Quo esse rem temporibus. Et omnis quos suscipit. +Quo esse rem temporibus. FOO + +Et omnis quos suscipit. FOO -Reprehenderit maxime nesciunt voluptatum placeat veniam et. +Reprehenderit maxime nesciunt voluptatum placeat veniam et. FOO -Dolorem qui minima eaque fuga ut est in. +Dolorem qui minima eaque fuga ut est in. FOO -In officiis amet dignissimos neque earum autem soluta. +In officiis amet dignissimos neque earum autem soluta. FOO -Beatae quo ut odio. +Beatae quo ut odio. FOO --PC1dEEC4bxS3SNrX-- From debbugs-submit-bounces@debbugs.gnu.org Sat May 03 06:13:49 2025 Received: (at 78213) by debbugs.gnu.org; 3 May 2025 10:13:49 +0000 Received: from localhost ([127.0.0.1]:38450 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uB9sT-00021w-AL for submit@debbugs.gnu.org; Sat, 03 May 2025 06:13:49 -0400 Received: from mail-ed1-x52b.google.com ([2a00:1450:4864:20::52b]:46334) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1uB9sR-00021f-40 for 78213@debbugs.gnu.org; Sat, 03 May 2025 06:13:47 -0400 Received: by mail-ed1-x52b.google.com with SMTP id 4fb4d7f45d1cf-5e8be1bdb7bso4043233a12.0 for <78213@debbugs.gnu.org>; Sat, 03 May 2025 03:13:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746267220; x=1746872020; darn=debbugs.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nOIGppKHMmj4i/CywRJG75p7V1P528TJtoEy/pGZYGE=; b=JVYZe8E/pb/JyeDsw06Vkkdcr6s1VaqAVDF0zCyxygcNkdSJGb0v1cHcHkDthjA+0p /e3iRvsGkRVk/SDS6IAW095DXOcbOggA87HhPN0ujb9zDKrRhSqKzlbkKcWtpQ9/Z6Xj aTliHZww27w21vst0u8SWK1st/fbPd/PtPkwhFz0vvz9orsYSP+lG1wyt3aRxq/rpfyq QqkuZUNfNfyvDXKltNbj2jdzuKV6jDytlxMGSXdzusZdYtJ7y5+vVsi5Yv01+TGMs3/N sWrWK3jQbiJq9mqp1kYXzROpN+k48zh1STH60pbgRLUYu1mzSe/B/ZlMhwStnkO4s0KM w5iA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746267220; x=1746872020; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nOIGppKHMmj4i/CywRJG75p7V1P528TJtoEy/pGZYGE=; b=xIPAg0k5mJ0ME14EFQ+zd4iUEsD16N0Rk9gM669tW6xQV0MNr48CeiRiDvurb3skFs DqsLzoIEPn0txTr213hvK0id489f4dg7xWXozoNs6zRcVqG/Aa1ua6/CRzugZSHdJH5r 7kJ91zzBYIfjFriAdpf5RDd1nWf9qWoG7GM8RAqw/vXqelkaL831E4eA7s+gvkrTfMas 6iNtoS8PW4uKe0e5Gey3ywyr0NYa/YypNuHUPou6YoQoROqeT3NMD2HTdB6G5E2PvAAU PO2dd+IYPWA47k4ZYurI5Ci6EKfylXjh9upa+WvldW91xltfiSCVHR5/abIIF/T6gwDL Uuyw== X-Gm-Message-State: AOJu0Yx/byZFlrg0jGv1baOZDez330Gv3T0B1U8S5gVj1QGb/8TAgojM l2CCy8FEBcA16bKBzJsE6hGKVSSOLkDyt4Wr1FxBDVp95ln9QvMyJ3cp5+6i1I2CRfg01hqnbQx EP9PBygX1+3GoZXgpQBD/5yKNVQfAzmvk X-Gm-Gg: ASbGncuk39oFopLi7eVtcT9+A2lFiu4BPkiLBO86Vm4/5aGGhmgG7tOPkLm9escLs5B 8scF56PLEoIQ0nJ9/MuZ5XO9eG+nDVwv0eR4ZTuZ9U+hoQMZ59V7ruf7s2T4lWz/5OB0ECTTxj8 VqfUtRXjrpouOFuGo74rg= X-Google-Smtp-Source: AGHT+IE2RMUI9PGcmunvFoEs+NP3EuDZq1ZG6JJztXYp/tH/tnb5rqCzb5wwJ2m46MPq/LodXVxuEOjGqeO61ya0zuU= X-Received: by 2002:a05:6402:5cb:b0:5fa:7d48:ae19 with SMTP id 4fb4d7f45d1cf-5faa8001ffamr1704750a12.25.1746267220197; Sat, 03 May 2025 03:13:40 -0700 (PDT) MIME-Version: 1.0 References: <20250502141826.GA966462@qaa.vinc17.org> <20250503092410.GS6691@qaa.vinc17.org> In-Reply-To: <20250503092410.GS6691@qaa.vinc17.org> From: Robert Webb Date: Sat, 3 May 2025 03:13:24 -0700 X-Gm-Features: ATxdqUE80cIaCj465ZUWZeWJSMcVSYiLAQDVZUFuntQTH68wSzmh54SeKhhCjV0 Message-ID: Subject: Re: [bug-diffutils] bug#78213: should produce a diff with similar lines grouped together if possible To: Vincent Lefevre Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 78213 Cc: 78213@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Yes, I was mistaken about the line terminators. Your files were attached as "Content-Type: text/plain; charset=3Dus-ascii" with normal line endings. Using Gmail (in Firefox), I tried saving all the attachments in a zip file, and individually too. Either way they had the CRLF terminators, but Gmail should have converted them to LF at least when saving the individual files. On Sat, May 3, 2025 at 2:24=E2=80=AFAM Vincent Lefevre = wrote: > > On 2025-05-03 01:12:44 -0700, Robert Webb wrote: > > The diff to create file-bad.diff was with the '-u' option. > > Yes, I forgot to mention that as this is what I *always* use > (and I always see diffs generated with this option). > > > The only lines that match between the file1 and file2 are empty ones. > > This bash command shows only an empty line in common: > > comm -12 -- <(sort -u file1) <(sort -u file2) > > > > Note: The line terminator on the provided files is CRLF. > > No, a single LF as usual (I suspect that some mail software converts > them to CRLF when saving). > > > Long lines - Use 'less -S' :-) > > Well, be careful that the difference between the lines are at the end. > I generated the files with "lorem -p 5" as the goal was to generate > 5 paragraphs (and there is no way to get shorter paragraphs), and > slightly editing them. I now think that "lorem -s 5" (to generate > 5 sentences in a single paragraph) would have been better here since > I had to add blank lines for the testcase anyway. > > -- > Vincent Lef=C3=A8vre - Web: > 100% accessible validated (X)HTML - Blog: > Work: CR INRIA - computer arithmetic / Pascaline project (LIP, ENS-Lyon)