GNU bug report logs -
#56844
[PATCH] Refactor repunctuate-sentences to accommodate corner case.
Previous Next
Reported by: André A. Gomes <andremegafone <at> gmail.com>
Date: Sat, 30 Jul 2022 18:07:02 UTC
Severity: wishlist
Tags: moreinfo, patch
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 56844 in the body.
You can then email your comments to 56844 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56844
; Package
emacs
.
(Sat, 30 Jul 2022 18:07:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
André A. Gomes <andremegafone <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Sat, 30 Jul 2022 18:07:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Tags: patch
Hi Emacs,
Please find the patch below.
In GNU Emacs 29.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.30, cairo version 1.16.0)
Windowing system distributor 'The X.Org Foundation', version 11.0.12101004
System Description: Guix System
Configured using:
'configure
CONFIG_SHELL=/gnu/store/4y5m9lb8k3qkb1y9m02sw9w9a6hacd16-bash-minimal-5.1.8/bin/bash
SHELL=/gnu/store/4y5m9lb8k3qkb1y9m02sw9w9a6hacd16-bash-minimal-5.1.8/bin/bash
--prefix=/gnu/store/7a6fnkqrxb0chmvj63f7ddr6wg3pq9g5-emacs-next-29.0.50-1.0a5477b
--enable-fast-install --with-modules --with-cairo
--disable-build-details'
[0001-Refactor-repunctuate-sentences-to-accommodate-corner.patch (text/patch, attachment)]
[Message part 3 (text/plain, inline)]
--
André A. Gomes
"You cannot even find the ruins..."
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56844
; Package
emacs
.
(Sun, 31 Jul 2022 08:35:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 56844 <at> debbugs.gnu.org (full text, mbox):
André A. Gomes <andremegafone <at> gmail.com> writes:
> It now gracefully handles the case when abbreviations such as e.g. or
> i.e. are used in sentences.
[...]
> + (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
I'm not quite sure I understand this patch. Are you changing this to
only consider punctuation that's followed by an upper-case character to
be sentence-end punctuation?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56844
; Package
emacs
.
(Sun, 31 Jul 2022 20:12:03 GMT)
Full text and
rfc822 format available.
Message #11 received at 56844 <at> debbugs.gnu.org (full text, mbox):
>> It now gracefully handles the case when abbreviations such as e.g. or
>> i.e. are used in sentences.
>
> [...]
>
>> + (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>
> I'm not quite sure I understand this patch. Are you changing this to
> only consider punctuation that's followed by an upper-case character to
> be sentence-end punctuation?
It would be better to add such heuristics to repunctuate-sentences-filter,
so anyone could customize it.
Added tag(s) moreinfo.
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Tue, 02 Aug 2022 10:55:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56844
; Package
emacs
.
(Tue, 02 Aug 2022 11:42:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 56844 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
>> + (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>
> I'm not quite sure I understand this patch. Are you changing this to
> only consider punctuation that's followed by an upper-case character to
> be sentence-end punctuation?
Yes. The patch section relative to testing is illustrative:
--8<---------------cut here---------------start------------->8---
(ert-deftest paragraphs-tests-repunctuate-sentences ()
(with-temp-buffer
- (insert "Just. Some. Sentences.")
+ (insert "Just. Some. Sentences. Yet another, e.g. this one.")
(goto-char (point-min))
(repunctuate-sentences t)
- (should (equal (buffer-string) "Just. Some. Sentences."))))
+ (should (equal (buffer-string)
+ "Just. Some. Sentences. Yet another, e.g. this one."))))
--8<---------------cut here---------------end--------------->8---
Thanks.
--
André A. Gomes
"You cannot even find the ruins..."
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56844
; Package
emacs
.
(Tue, 02 Aug 2022 11:44:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 56844 <at> debbugs.gnu.org (full text, mbox):
Juri Linkov <juri <at> linkov.net> writes:
>>> It now gracefully handles the case when abbreviations such as e.g. or
>>> i.e. are used in sentences.
>>
>> [...]
>>
>>> + (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>>
>> I'm not quite sure I understand this patch. Are you changing this to
>> only consider punctuation that's followed by an upper-case character to
>> be sentence-end punctuation?
>
> It would be better to add such heuristics to repunctuate-sentences-filter,
> so anyone could customize it.
In general I'd agree with you, but this patch is actually fixing a bug,
not introducing a personal preference. That's how I see it at least.
--
André A. Gomes
"You cannot even find the ruins..."
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56844
; Package
emacs
.
(Tue, 02 Aug 2022 11:46:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 56844 <at> debbugs.gnu.org (full text, mbox):
André A. Gomes <andremegafone <at> gmail.com> writes:
>> I'm not quite sure I understand this patch. Are you changing this to
>> only consider punctuation that's followed by an upper-case character to
>> be sentence-end punctuation?
>
> Yes.
I don't think that'll be a generally welcome change -- some people write
using non-standard orthography. If this change is to be made, it has to
be optional.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56844
; Package
emacs
.
(Tue, 02 Aug 2022 12:11:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 56844 <at> debbugs.gnu.org (full text, mbox):
>>>>> On Tue, 02 Aug 2022 13:45:05 +0200, Lars Ingebrigtsen <larsi <at> gnus.org> said:
Lars> André A. Gomes <andremegafone <at> gmail.com> writes:
>>> I'm not quite sure I understand this patch. Are you changing this to
>>> only consider punctuation that's followed by an upper-case character to
>>> be sentence-end punctuation?
>>
>> Yes.
Lars> I don't think that'll be a generally welcome change -- some people write
Lars> using non-standard orthography. If this change is to be made, it has to
Lars> be optional.
It doesnʼt even have to be that non-standard. Consider.
De deur sloeg open. de Valk stond in het licht.
Thatʼs pedantically incorrect, but there are many people (myself
included) who think that certain grammarians should keep quiet 😀
Robert
--
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56844
; Package
emacs
.
(Tue, 02 Aug 2022 12:36:02 GMT)
Full text and
rfc822 format available.
Message #28 received at 56844 <at> debbugs.gnu.org (full text, mbox):
André A. Gomes <andremegafone <at> gmail.com> writes:
>> I'm not quite sure I understand this patch. Are you changing this to
>> only consider punctuation that's followed by an upper-case character to
>> be sentence-end punctuation?
>
> Yes.
FWIW, I would rather want to specify a list of ignored abbreviations
that I'd like to not consider ending a sentence. This could include
standard US ones like "e.g.", "i.e.", etc. by default, but should be
customizable so I can add any localized equivalents.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56844
; Package
emacs
.
(Tue, 02 Aug 2022 12:49:01 GMT)
Full text and
rfc822 format available.
Message #31 received at 56844 <at> debbugs.gnu.org (full text, mbox):
[செவ்வாய் ஆகஸ்ட் 02, 2022] André A. Gomes wrote:
> Juri Linkov <juri <at> linkov.net> writes:
>
>>>> It now gracefully handles the case when abbreviations such as e.g. or
>>>> i.e. are used in sentences.
>>>
>>> [...]
>>>
>>>> + (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>>>
>>> I'm not quite sure I understand this patch. Are you changing this to
>>> only consider punctuation that's followed by an upper-case character to
>>> be sentence-end punctuation?
>>
>> It would be better to add such heuristics to repunctuate-sentences-filter,
>> so anyone could customize it.
>
> In general I'd agree with you, but this patch is actually fixing a bug,
> not introducing a personal preference. That's how I see it at least.
This breaks repunctuate-sentences for languages that don't have the
concept of upper and lower case characters. Try repunctuate-sentences
with and without your patch for the following text,
தொழிற்சாலை யந்திரங்கள் தேவையான மட்டும் அந்தத் தொழிலாளர்களது சக்தியை உறிஞ்சித்
தீர்த்துவிடுவதோடு அந்த நாள் விழுங்கப்பட்டுவிடும். எந்தவிதமான எச்சமிச்சங்களும் இல்லாமல்
அன்றையப் பொழுது அழிந்து கழியும்; மனிதனும் தனது சவக்குழியை நோக்கி ஓரடி
முன்னேறிவிடுவான். ஆனால் இப்போதோ ஒய்வின்
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56844
; Package
emacs
.
(Tue, 02 Aug 2022 20:01:02 GMT)
Full text and
rfc822 format available.
Message #34 received at 56844 <at> debbugs.gnu.org (full text, mbox):
> FWIW, I would rather want to specify a list of ignored abbreviations
> that I'd like to not consider ending a sentence. This could include
> standard US ones like "e.g.", "i.e.", etc. by default, but should be
> customizable so I can add any localized equivalents.
Please see an example in the docstring of the variable
'repunctuate-sentences-filter'.
Severity set to 'wishlist' from 'normal'
Request was from
Stefan Kangas <stefan <at> marxist.se>
to
control <at> debbugs.gnu.org
.
(Thu, 04 Aug 2022 13:58:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56844
; Package
emacs
.
(Fri, 02 Sep 2022 10:48:02 GMT)
Full text and
rfc822 format available.
Message #39 received at 56844 <at> debbugs.gnu.org (full text, mbox):
Juri Linkov <juri <at> linkov.net> writes:
>> FWIW, I would rather want to specify a list of ignored abbreviations
>> that I'd like to not consider ending a sentence. This could include
>> standard US ones like "e.g.", "i.e.", etc. by default, but should be
>> customizable so I can add any localized equivalents.
>
> Please see an example in the docstring of the variable
> 'repunctuate-sentences-filter'.
I think the conclusion here is that we don't want to change how
repunctuate-sentences work here, so I'm closing this bug report.
bug closed, send any further explanations to
56844 <at> debbugs.gnu.org and André A. Gomes <andremegafone <at> gmail.com>
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Fri, 02 Sep 2022 10:48:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 30 Sep 2022 11:24:08 GMT)
Full text and
rfc822 format available.
This bug report was last modified 2 years and 264 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.