GNU bug report logs -
#11187
multibyte: fmt: Fix incorrect width handling of multibyte
Previous Next
To reply to this bug, email your comments to 11187 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#11187
; Package
coreutils
.
(Thu, 05 Apr 2012 18:23:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Vladimir 'φ-coder/phcoder' Serbinenko <phcoder <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Thu, 05 Apr 2012 18:23:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Currently fmt assumes that 1 byte= 1 column which creates wrongly
formatted strings. Attached patch fixes it
--
Regards
Vladimir 'φ-coder/phcoder' Serbinenko
[fmt_width.diff (text/x-diff, attachment)]
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#11187
; Package
coreutils
.
(Thu, 05 Apr 2012 19:10:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 11187 <at> debbugs.gnu.org (full text, mbox):
Vladimir "'φ-coder/phcoder'" Serbinenko wrote:
> Currently fmt assumes that 1 byte= 1 column which creates wrongly
> formatted strings. Attached patch fixes it
Hi Vlad,
Thank you for contributing.
This is a large enough change that we'll need an FSF copyright
assignment from you. If you haven't already sent in the one for
gnulib, please just add coreutils to the list of affected projects.
(you can do up to 4 projects at a time)
Here are a few suggested adjustments:
It'd be great to have a test-suite addition that fails without
your patch, yet succeeds with it. If you simply provide small
sample input/output pairs along with a selected locale name, we
can convert that to an actual test suite script for you.
Also, since this is a NEWS-worthy change, it is customary to
add an entry in the NEWS file, too. I'd put this in a section
entitled "Improvements".
> diff --git a/src/fmt.c b/src/fmt.c
> index 89d13a6..56f7c0b 100644
> --- a/src/fmt.c
> +++ b/src/fmt.c
> @@ -20,6 +20,7 @@
> #include <stdio.h>
> #include <sys/types.h>
> #include <getopt.h>
> +#include <wchar.h>
>
> /* Redefine. Otherwise, systems (Unicos for one) with headers that define
> it to be a type get syntax errors for the variable declaration below. */
> @@ -135,6 +136,7 @@ struct Word
>
> const char *text; /* the text of the word */
> int length; /* length of this word */
> + int width;
Please don't follow the bad example there. This value is always
unsigned, so it is better to use size_t, to match the type of your
new get_display_width function.
> int space; /* the size of the following space */
> unsigned int paren:1; /* starts with open paren */
> unsigned int period:1; /* ends in [.?!])* */
> @@ -259,6 +261,42 @@ static int next_prefix_indent;
> paragraphs chosen by fmt_paragraph(). */
> static int last_line_length;
>
Please add a comment saying what the function does/returns,
and naming/describing the arguments.
> +static size_t
> +get_display_width (const char *beg, const char *end)
> +{
> + const char *ptr;
> + size_t r = 0;
> + mbstate_t ps;
> +
> + memset (&ps, 0, sizeof (ps));
We prefer to initialize mbstate_t variables all on one line, like this:
mbstate_t ps = { 0, };
> + for (ptr = beg; *ptr && ptr < end; )
> + {
> + wchar_t wc;
> + size_t s;
Oops. You've used TABs for indentation.
Note how mixing TABs and spaces makes the indentation look invalid.
Please use only spaces instead.
> + s = mbrtowc (&wc, ptr, end - ptr, &ps);
> + if (s == (size_t) -1)
> + break;
> + if (s == (size_t) -2)
> + {
> + ptr++;
> + r++;
> + continue;
> + }
> + if (wc == '\e' && ptr + 3 < end
> + && ptr[1] == '[' && (ptr[2] == '0' || ptr[2] == '1')
> + && ptr[3] == 'm')
> + {
> + ptr += 4;
> + continue;
> + }
> + r += wcwidth (wc);
> + ptr += s;
> + }
> + return r;
> +}
> +
> void
> usage (int status)
> {
> @@ -669,7 +707,9 @@ get_line (FILE *f, int c)
> c = getc (f);
> }
> while (c != EOF && !isspace (c));
> - in_column += word_limit->length = wptr - word_limit->text;
> + word_limit->length = wptr - word_limit->text;
> + in_column += word_limit->width = get_display_width (word_limit->text,
> + wptr);
> check_punctuation (word_limit);
>
> /* Scan inter-word space. */
> @@ -871,13 +911,13 @@ fmt_paragraph (void)
> if (w == word_limit)
> break;
>
> - len += (w - 1)->space + w->length; /* w > start >= word */
> + len += (w - 1)->space + w->width; /* w > start >= word */
> }
> while (len < max_width);
> start->best_cost = best + base_cost (start);
> }
>
> - word_limit->length = saved_length;
> + word_limit->width = saved_length;
> }
>
> /* Return the constant component of the cost of breaking before the
> @@ -902,13 +942,13 @@ base_cost (WORD *this)
> else if ((this - 1)->punct)
> cost -= PUNCT_BONUS;
> else if (this > word + 1 && (this - 2)->final)
> - cost += WIDOW_COST ((this - 1)->length);
> + cost += WIDOW_COST ((this - 1)->width);
> }
>
> if (this->paren)
> cost -= PAREN_BONUS;
> else if (this->final)
> - cost += ORPHAN_COST (this->length);
> + cost += ORPHAN_COST (this->width);
>
> return cost;
> }
> @@ -983,7 +1023,7 @@ put_word (WORD *w)
> s = w->text;
> for (n = w->length; n != 0; n--)
> putchar (*s++);
> - out_column += w->length;
> + out_column += w->width;
> }
>
> /* Output to stdout SPACE spaces, or equivalent tabs. */
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#11187
; Package
coreutils
.
(Thu, 05 Apr 2012 23:02:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 11187 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 05.04.2012 21:09, Jim Meyering wrote:
> Vladimir "'φ-coder/phcoder'" Serbinenko wrote:
>> Currently fmt assumes that 1 byte= 1 column which creates wrongly
>> formatted strings. Attached patch fixes it
> Hi Vlad,
>
> Thank you for contributing.
> This is a large enough change that we'll need an FSF copyright
> assignment from you. If you haven't already sent in the one for
> gnulib, please just add coreutils to the list of affected projects.
> (you can do up to 4 projects at a time)
Ok, will do so. I'll also wait till more or less definitive version is
ready for gnulib before updating the one for coreutils.
Can I add TP in the same time?
--
Regards
Vladimir 'φ-coder/phcoder' Serbinenko
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#11187
; Package
coreutils
.
(Fri, 06 Apr 2012 06:36:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 11187 <at> debbugs.gnu.org (full text, mbox):
Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> On 05.04.2012 21:09, Jim Meyering wrote:
>> Vladimir "'φ-coder/phcoder'" Serbinenko wrote:
>>> Currently fmt assumes that 1 byte= 1 column which creates wrongly
>>> formatted strings. Attached patch fixes it
>> Hi Vlad,
>>
>> Thank you for contributing.
>> This is a large enough change that we'll need an FSF copyright
>> assignment from you. If you haven't already sent in the one for
>> gnulib, please just add coreutils to the list of affected projects.
>> (you can do up to 4 projects at a time)
> Ok, will do so. I'll also wait till more or less definitive version is
> ready for gnulib before updating the one for coreutils.
> Can I add TP in the same time?
Translation Project? I don't know if they use the same forms/addresses.
Please ask coordinator <at> translationproject.org to be sure.
Severity set to 'wishlist' from 'normal'
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Fri, 19 Oct 2018 01:19:01 GMT)
Full text and
rfc822 format available.
Changed bug title to 'multibyte: fmt: Fix incorrect width handling of multibyte' from '[PATCH] Fix incorrect width handling of multibyte characters in fmt'
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Fri, 19 Oct 2018 01:19:01 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 325 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.