GNU bug report logs -
#78561
[PATCH] Add semantic linefeed support for paragraph filling
Previous Next
Reported by: Roi Martin <jroi.martin <at> gmail.com>
Date: Fri, 23 May 2025 09:59:02 UTC
Severity: normal
Tags: patch
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
My 2c here is that this seems rather basic and while the tests pass with
toy lorem-ipsum language it fails completely if abbreviations are used,
for example, running M-x fill-paragraph-semlf over:
#+begin_example
Hi there Chris! I see you've got your M.D. now so I suppose I should
call you Dr. Yoo. I hear you're also a newly appointed U.S. Rep., what
is that like? Good I hope.
#+end_example
Gets split to:
#+begin_example
Hi there Chris!
I see you've got your M.D.
now so I suppose I should call you Dr.
Yoo.
I hear you're also a newly appointed U.S.
Rep., what is that like?
Good I hope.
#+end_example
This is not correct, one would expect:
#+begin_example
Hi there Chris!
I see you've got your M.D. now so I suppose I should call you Dr. Yoo.
I hear you're also a newly appointed U.S. Rep., what is that like?
Good I hope.
#+end_example
I admit the example given here is intentionally (somewhat) contrived but
abbreviations like "Dr." are not uncommon and use of double-spaced
full-stops is absent from a majority of English which would probably
alleviate that problem.
I understand the work of finding a sentence here is done via
forward-sentence, perhaps inspiration from this prior art could help
find the end of sentences better:
(1) https://github.com/neurosnap/sentences
(2) https://github.com/diasks2/pragmatic_segmenter
I haven't looked into the exact techniques being used for those two
projects, and I am also unsure on how "serious" an "issue" this is
(hence just referring to it as 2c) but it would be an improvement to
correctly split (move forward by sentence) over natural language most of
the time I'd wager.
In any case still a good feature. One thing I had been planning to use
these kinds of things for is to semantically fill a large paragraph and
then more easily be able to rewrite or re-arrange thoughts (now they are
just one line per sentence). Once that's done, join lines back to a
paragraph and viola.
/Jordan
This bug report was last modified 18 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.