GNU bug report logs - #56844
[PATCH] Refactor repunctuate-sentences to accommodate corner case.

Previous Next

Package: emacs;

Reported by: André A. Gomes <andremegafone <at> gmail.com>

Date: Sat, 30 Jul 2022 18:07:02 UTC

Severity: wishlist

Tags: moreinfo, patch

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 56844 in the body.
You can then email your comments to 56844 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#56844; Package emacs. (Sat, 30 Jul 2022 18:07:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to André A. Gomes <andremegafone <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 30 Jul 2022 18:07:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: André A. Gomes <andremegafone <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH] Refactor repunctuate-sentences to accommodate corner case.
Date: Sat, 30 Jul 2022 21:06:16 +0300
[Message part 1 (text/plain, inline)]
Tags: patch

Hi Emacs,

Please find the patch below.



In GNU Emacs 29.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.30, cairo version 1.16.0)
Windowing system distributor 'The X.Org Foundation', version 11.0.12101004
System Description: Guix System

Configured using:
 'configure
 CONFIG_SHELL=/gnu/store/4y5m9lb8k3qkb1y9m02sw9w9a6hacd16-bash-minimal-5.1.8/bin/bash
 SHELL=/gnu/store/4y5m9lb8k3qkb1y9m02sw9w9a6hacd16-bash-minimal-5.1.8/bin/bash
 --prefix=/gnu/store/7a6fnkqrxb0chmvj63f7ddr6wg3pq9g5-emacs-next-29.0.50-1.0a5477b
 --enable-fast-install --with-modules --with-cairo
 --disable-build-details'

[0001-Refactor-repunctuate-sentences-to-accommodate-corner.patch (text/patch, attachment)]
[Message part 3 (text/plain, inline)]
-- 
André A. Gomes
"You cannot even find the ruins..."

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56844; Package emacs. (Sun, 31 Jul 2022 08:35:01 GMT) Full text and rfc822 format available.

Message #8 received at 56844 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: André A. Gomes <andremegafone <at> gmail.com>
Cc: 56844 <at> debbugs.gnu.org
Subject: Re: bug#56844: [PATCH] Refactor repunctuate-sentences to
 accommodate corner case.
Date: Sun, 31 Jul 2022 10:34:09 +0200
André A. Gomes <andremegafone <at> gmail.com> writes:

> It now gracefully handles the case when abbreviations such as e.g. or
> i.e. are used in sentences.

[...]

> +        (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")

I'm not quite sure I understand this patch.  Are you changing this to
only consider punctuation that's followed by an upper-case character to
be sentence-end punctuation?





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56844; Package emacs. (Sun, 31 Jul 2022 20:12:03 GMT) Full text and rfc822 format available.

Message #11 received at 56844 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: "André A. Gomes" <andremegafone <at> gmail.com>,
 56844 <at> debbugs.gnu.org
Subject: Re: bug#56844: [PATCH] Refactor repunctuate-sentences to
 accommodate corner case.
Date: Sun, 31 Jul 2022 22:49:33 +0300
>> It now gracefully handles the case when abbreviations such as e.g. or
>> i.e. are used in sentences.
>
> [...]
>
>> +        (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>
> I'm not quite sure I understand this patch.  Are you changing this to
> only consider punctuation that's followed by an upper-case character to
> be sentence-end punctuation?

It would be better to add such heuristics to repunctuate-sentences-filter,
so anyone could customize it.




Added tag(s) moreinfo. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Tue, 02 Aug 2022 10:55:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56844; Package emacs. (Tue, 02 Aug 2022 11:42:02 GMT) Full text and rfc822 format available.

Message #16 received at 56844 <at> debbugs.gnu.org (full text, mbox):

From: André A. Gomes <andremegafone <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 56844 <at> debbugs.gnu.org
Subject: Re: bug#56844: [PATCH] Refactor repunctuate-sentences to
 accommodate corner case.
Date: Tue, 02 Aug 2022 14:41:32 +0300
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

>> +        (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>
> I'm not quite sure I understand this patch.  Are you changing this to
> only consider punctuation that's followed by an upper-case character to
> be sentence-end punctuation?

Yes.  The patch section relative to testing is illustrative:

--8<---------------cut here---------------start------------->8---
 (ert-deftest paragraphs-tests-repunctuate-sentences ()
   (with-temp-buffer
-    (insert "Just. Some. Sentences.")
+    (insert "Just. Some. Sentences. Yet another, e.g. this one.")
     (goto-char (point-min))
     (repunctuate-sentences t)
-    (should (equal (buffer-string) "Just.  Some.  Sentences."))))
+    (should (equal (buffer-string)
+                   "Just.  Some.  Sentences.  Yet another, e.g. this one."))))
--8<---------------cut here---------------end--------------->8---

Thanks.
 

-- 
André A. Gomes
"You cannot even find the ruins..."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56844; Package emacs. (Tue, 02 Aug 2022 11:44:02 GMT) Full text and rfc822 format available.

Message #19 received at 56844 <at> debbugs.gnu.org (full text, mbox):

From: André A. Gomes <andremegafone <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 56844 <at> debbugs.gnu.org
Subject: Re: bug#56844: [PATCH] Refactor repunctuate-sentences to
 accommodate corner case.
Date: Tue, 02 Aug 2022 14:43:42 +0300
Juri Linkov <juri <at> linkov.net> writes:

>>> It now gracefully handles the case when abbreviations such as e.g. or
>>> i.e. are used in sentences.
>>
>> [...]
>>
>>> +        (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>>
>> I'm not quite sure I understand this patch.  Are you changing this to
>> only consider punctuation that's followed by an upper-case character to
>> be sentence-end punctuation?
>
> It would be better to add such heuristics to repunctuate-sentences-filter,
> so anyone could customize it.

In general I'd agree with you, but this patch is actually fixing a bug,
not introducing a personal preference.  That's how I see it at least.


-- 
André A. Gomes
"You cannot even find the ruins..."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56844; Package emacs. (Tue, 02 Aug 2022 11:46:02 GMT) Full text and rfc822 format available.

Message #22 received at 56844 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: André A. Gomes <andremegafone <at> gmail.com>
Cc: 56844 <at> debbugs.gnu.org
Subject: Re: bug#56844: [PATCH] Refactor repunctuate-sentences to
 accommodate corner case.
Date: Tue, 02 Aug 2022 13:45:05 +0200
André A. Gomes <andremegafone <at> gmail.com> writes:

>> I'm not quite sure I understand this patch.  Are you changing this to
>> only consider punctuation that's followed by an upper-case character to
>> be sentence-end punctuation?
>
> Yes.

I don't think that'll be a generally welcome change -- some people write
using non-standard orthography.  If this change is to be made, it has to
be optional.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56844; Package emacs. (Tue, 02 Aug 2022 12:11:02 GMT) Full text and rfc822 format available.

Message #25 received at 56844 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: "André A. Gomes" <andremegafone <at> gmail.com>,
 56844 <at> debbugs.gnu.org
Subject: Re: bug#56844: [PATCH] Refactor repunctuate-sentences to
 accommodate corner case.
Date: Tue, 02 Aug 2022 14:10:35 +0200
>>>>> On Tue, 02 Aug 2022 13:45:05 +0200, Lars Ingebrigtsen <larsi <at> gnus.org> said:

    Lars> André A. Gomes <andremegafone <at> gmail.com> writes:
    >>> I'm not quite sure I understand this patch.  Are you changing this to
    >>> only consider punctuation that's followed by an upper-case character to
    >>> be sentence-end punctuation?
    >> 
    >> Yes.

    Lars> I don't think that'll be a generally welcome change -- some people write
    Lars> using non-standard orthography.  If this change is to be made, it has to
    Lars> be optional.

It doesnʼt even have to be that non-standard. Consider.

   De deur sloeg open.  de Valk stond in het licht.

Thatʼs pedantically incorrect, but there are many people (myself
included) who think that certain grammarians should keep quiet 😀

Robert
-- 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56844; Package emacs. (Tue, 02 Aug 2022 12:36:02 GMT) Full text and rfc822 format available.

Message #28 received at 56844 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: André A. Gomes <andremegafone <at> gmail.com>, 
 Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 56844 <at> debbugs.gnu.org
Subject: Re: bug#56844: [PATCH] Refactor repunctuate-sentences to accommodate
 corner case.
Date: Tue, 2 Aug 2022 12:35:35 +0000
André A. Gomes <andremegafone <at> gmail.com> writes:

>> I'm not quite sure I understand this patch.  Are you changing this to
>> only consider punctuation that's followed by an upper-case character to
>> be sentence-end punctuation?
>
> Yes.

FWIW, I would rather want to specify a list of ignored abbreviations
that I'd like to not consider ending a sentence.  This could include
standard US ones like "e.g.", "i.e.", etc. by default, but should be
customizable so I can add any localized equivalents.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56844; Package emacs. (Tue, 02 Aug 2022 12:49:01 GMT) Full text and rfc822 format available.

Message #31 received at 56844 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: André A. Gomes <andremegafone <at> gmail.com>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 56844 <at> debbugs.gnu.org,
 Juri Linkov <juri <at> linkov.net>
Subject: Re: bug#56844: [PATCH] Refactor repunctuate-sentences to
 accommodate corner case.
Date: Tue, 02 Aug 2022 18:18:18 +0530
[செவ்வாய் ஆகஸ்ட் 02, 2022] André A. Gomes wrote:

> Juri Linkov <juri <at> linkov.net> writes:
>
>>>> It now gracefully handles the case when abbreviations such as e.g. or
>>>> i.e. are used in sentences.
>>>
>>> [...]
>>>
>>>> +        (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>>>
>>> I'm not quite sure I understand this patch.  Are you changing this to
>>> only consider punctuation that's followed by an upper-case character to
>>> be sentence-end punctuation?
>>
>> It would be better to add such heuristics to repunctuate-sentences-filter,
>> so anyone could customize it.
>
> In general I'd agree with you, but this patch is actually fixing a bug,
> not introducing a personal preference.  That's how I see it at least.

This breaks repunctuate-sentences for languages that don't have the
concept of upper and lower case characters.  Try repunctuate-sentences
with and without your patch for the following text,

தொழிற்சாலை யந்திரங்கள் தேவையான மட்டும் அந்தத் தொழிலாளர்களது சக்தியை உறிஞ்சித்
தீர்த்துவிடுவதோடு அந்த நாள் விழுங்கப்பட்டுவிடும். எந்தவிதமான எச்சமிச்சங்களும் இல்லாமல்
அன்றையப் பொழுது அழிந்து கழியும்; மனிதனும் தனது சவக்குழியை நோக்கி ஓரடி
முன்னேறிவிடுவான். ஆனால் இப்போதோ ஒய்வின்

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56844; Package emacs. (Tue, 02 Aug 2022 20:01:02 GMT) Full text and rfc822 format available.

Message #34 received at 56844 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: "André A. Gomes" <andremegafone <at> gmail.com>,
 Lars Ingebrigtsen <larsi <at> gnus.org>, 56844 <at> debbugs.gnu.org
Subject: Re: bug#56844: [PATCH] Refactor repunctuate-sentences to
 accommodate corner case.
Date: Tue, 02 Aug 2022 22:59:21 +0300
> FWIW, I would rather want to specify a list of ignored abbreviations
> that I'd like to not consider ending a sentence.  This could include
> standard US ones like "e.g.", "i.e.", etc. by default, but should be
> customizable so I can add any localized equivalents.

Please see an example in the docstring of the variable
'repunctuate-sentences-filter'.




Severity set to 'wishlist' from 'normal' Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Thu, 04 Aug 2022 13:58:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56844; Package emacs. (Fri, 02 Sep 2022 10:48:02 GMT) Full text and rfc822 format available.

Message #39 received at 56844 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: André A. Gomes <andremegafone <at> gmail.com>,
 56844 <at> debbugs.gnu.org, Stefan Kangas <stefankangas <at> gmail.com>
Subject: Re: bug#56844: [PATCH] Refactor repunctuate-sentences to
 accommodate corner case.
Date: Fri, 02 Sep 2022 12:47:29 +0200
Juri Linkov <juri <at> linkov.net> writes:

>> FWIW, I would rather want to specify a list of ignored abbreviations
>> that I'd like to not consider ending a sentence.  This could include
>> standard US ones like "e.g.", "i.e.", etc. by default, but should be
>> customizable so I can add any localized equivalents.
>
> Please see an example in the docstring of the variable
> 'repunctuate-sentences-filter'.

I think the conclusion here is that we don't want to change how
repunctuate-sentences work here, so I'm closing this bug report.




bug closed, send any further explanations to 56844 <at> debbugs.gnu.org and André A. Gomes <andremegafone <at> gmail.com> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Fri, 02 Sep 2022 10:48:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 30 Sep 2022 11:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 264 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.