GNU bug report logs -
#56844
[PATCH] Refactor repunctuate-sentences to accommodate corner case.
Previous Next
Reported by: André A. Gomes <andremegafone <at> gmail.com>
Date: Sat, 30 Jul 2022 18:07:02 UTC
Severity: wishlist
Tags: moreinfo, patch
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
Message #31 received at 56844 <at> debbugs.gnu.org (full text, mbox):
[செவ்வாய் ஆகஸ்ட் 02, 2022] André A. Gomes wrote:
> Juri Linkov <juri <at> linkov.net> writes:
>
>>>> It now gracefully handles the case when abbreviations such as e.g. or
>>>> i.e. are used in sentences.
>>>
>>> [...]
>>>
>>>> + (regexp "\\([]\"')]?\\)\\([.?!]\\)\\([]\"')]?\\) +\\([\"')[:upper:]]\\)")
>>>
>>> I'm not quite sure I understand this patch. Are you changing this to
>>> only consider punctuation that's followed by an upper-case character to
>>> be sentence-end punctuation?
>>
>> It would be better to add such heuristics to repunctuate-sentences-filter,
>> so anyone could customize it.
>
> In general I'd agree with you, but this patch is actually fixing a bug,
> not introducing a personal preference. That's how I see it at least.
This breaks repunctuate-sentences for languages that don't have the
concept of upper and lower case characters. Try repunctuate-sentences
with and without your patch for the following text,
தொழிற்சாலை யந்திரங்கள் தேவையான மட்டும் அந்தத் தொழிலாளர்களது சக்தியை உறிஞ்சித்
தீர்த்துவிடுவதோடு அந்த நாள் விழுங்கப்பட்டுவிடும். எந்தவிதமான எச்சமிச்சங்களும் இல்லாமல்
அன்றையப் பொழுது அழிந்து கழியும்; மனிதனும் தனது சவக்குழியை நோக்கி ஓரடி
முன்னேறிவிடுவான். ஆனால் இப்போதோ ஒய்வின்
This bug report was last modified 2 years and 264 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.