GNU bug report logs - #41518
Bug in od?

Reported by: Yuan Cao <yuancao85 <at> gmail.com>

Date: Mon, 25 May 2020 05:56:02 UTC

Severity: normal

Tags: notabug

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 41518 in the body.
You can then email your comments to 41518 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-coreutils <at> gnu.org:
bug#41518; Package coreutils. (Mon, 25 May 2020 05:56:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Yuan Cao <yuancao85 <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 25 May 2020 05:56:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Yuan Cao <yuancao85 <at> gmail.com>
To: coreutils <at> gnu.org, bug-coreutils <at> gnu.org
Subject: Bug in od?
Date: Sun, 24 May 2020 23:05:16 -0400

[Message part 1 (text/plain, inline)]

Hello,

I recently came across the following behavior.

When using "--traditional x2" or "-x" option, it seems the order of hex
code output for the characters is pairwise reversed (if that's the correct
way of describing it).

For example, using "od -cx" on a test file that contains "123456789\n", you
get the following output:

0000000   1   2   3   4   5   6   7   8   9   0  \n
                 3231  3433  3635  3837  3039  000a
0000013

It seems like it should be the following instead:

0000000   1   2   3   4   5   6   7   8   9   0  \n
                 3132  3334  3536  3738  3930  0a00
0000013

The version involved is od in GNU coreutils 8.28.

Best Regards,

Yuan

[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#41518; Package coreutils. (Mon, 25 May 2020 10:49:02 GMT) Full text and rfc822 format available.

Message #8 received at 41518 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Yuan Cao <yuancao85 <at> gmail.com>, coreutils <at> gnu.org, 41518 <at> debbugs.gnu.org
Subject: Re: bug#41518: Bug in od?
Date: Mon, 25 May 2020 11:48:20 +0100

tag 41518 notabug
close 41518
stop

response below...

On 25/05/2020 04:05, Yuan Cao wrote:
> Hello,
> 
> I recently came across the following behavior.
> 
> When using "--traditional x2" or "-x" option, it seems the order of hex
> code output for the characters is pairwise reversed (if that's the correct
> way of describing it).
> 
> For example, using "od -cx" on a test file that contains "123456789\n", you
> get the following output:
> 
> 0000000   1   2   3   4   5   6   7   8   9   0  \n
>                   3231  3433  3635  3837  3039  000a
> 0000013
> 
> It seems like it should be the following instead:
> 
> 0000000   1   2   3   4   5   6   7   8   9   0  \n
>                   3132  3334  3536  3738  3930  0a00
> 0000013
> 
> The version involved is od in GNU coreutils 8.28.

That's because you're on a little endian machine.
If you want to reorder as per a big endian machine you can:

  od --endian=big -cx your_file

If you want to hexdump independently of endianess you can:

  od -Ax -tx1z -v

cheers,
Pádraig

Added tag(s) notabug. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Mon, 25 May 2020 10:49:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 41518 <at> debbugs.gnu.org and Yuan Cao <yuancao85 <at> gmail.com> Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Mon, 25 May 2020 10:49:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#41518; Package coreutils. (Fri, 29 May 2020 05:21:01 GMT) Full text and rfc822 format available.

Message #15 received at 41518 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Yuan Cao <yuancao85 <at> gmail.com>
Cc: 41518 <at> debbugs.gnu.org
Subject: Re: bug#41518: Bug in od?
Date: Thu, 28 May 2020 23:20:41 -0600

A little more information.

Pádraig Brady wrote:
> Yuan Cao wrote:
> > I recently came across the following behavior.
> > 
> > When using "--traditional x2" or "-x" option, it seems the order of hex
> > code output for the characters is pairwise reversed (if that's the correct
> > way of describing it).

‘-x’
     Output as hexadecimal two-byte units.  Equivalent to ‘-t x2’.

Outputs 16-bit integers in the *native byte order* of the machine.
Which may be either big-endian or little-endian depending on the
machine.  Not portable.  Depends upon the machine it is run upon.

> If you want to hexdump independently of endianess you can:
> 
>   od -Ax -tx1z -v

The -tx1 option above is portable because it outputs 1-byte units
instead of 2-byte units which is independent of endianess.

This is the FAQ entry for this topic.

  https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e

Bob

Information forwarded to bug-coreutils <at> gnu.org:
bug#41518; Package coreutils. (Fri, 29 May 2020 20:48:02 GMT) Full text and rfc822 format available.

Message #18 received at 41518 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Cao <yuancao85 <at> gmail.com>
To: Bob Proulx <bob <at> proulx.com>
Cc: 41518 <at> debbugs.gnu.org
Subject: Re: bug#41518: Bug in od?
Date: Fri, 29 May 2020 16:47:17 -0400

[Message part 1 (text/plain, inline)]

On Fri, May 29, 2020 at 1:20 AM Bob Proulx <bob <at> proulx.com> wrote:

> A little more information.
>
> Pádraig Brady wrote:
> > Yuan Cao wrote:
> > > I recently came across the following behavior.
> > >
> > > When using "--traditional x2" or "-x" option, it seems the order of hex
> > > code output for the characters is pairwise reversed (if that's the
> correct
> > > way of describing it).
>
> ‘-x’
>      Output as hexadecimal two-byte units.  Equivalent to ‘-t x2’.
>
> Outputs 16-bit integers in the *native byte order* of the machine.
> Which may be either big-endian or little-endian depending on the
> machine.  Not portable.  Depends upon the machine it is run upon.
>
> > If you want to hexdump independently of endianess you can:
> >
> >   od -Ax -tx1z -v
>
> The -tx1 option above is portable because it outputs 1-byte units
> instead of 2-byte units which is independent of endianess.
>
> This is the FAQ entry for this topic.
>
>
> https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e
>
> Bob
>

Thanks for pointing me to this documentation.

It just feels strange because the order does not reflect the order of the
characters in the file.

I think it might have been useful to get the "by word" value of the file if
you are working with a binary file historically. One might have stored some
data as a list of shorts. Then, we can easily view the data using "od -x
data_file_name".

Since memory is so cheap now, people are probably using just using chars
for text, and 4 byte ints or 8 byte ints where they used to use 2 byte ints
(shorts) before. In this case, the "by word" order does not seem to me to
be as useful and violates the principle of least astonishment needlessly.

It might be interesting to change the option to print values by double word
or quadword instead or add another option to let the users choose to print
by double word or quadword if they want.

Best Regards,

Yuan

[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#41518; Package coreutils. (Fri, 29 May 2020 22:34:01 GMT) Full text and rfc822 format available.

Message #21 received at 41518 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Yuan Cao <yuancao85 <at> gmail.com>
Cc: 41518 <at> debbugs.gnu.org
Subject: Re: bug#41518: Bug in od?
Date: Fri, 29 May 2020 16:32:50 -0600

Yuan Cao wrote:
> > https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e
> 
> Thanks for pointing me to this documentation.
> 
> It just feels strange because the order does not reflect the order of the
> characters in the file.

It feels strange in the environment *today*.  But in the 1970's when
the 'od' was written it was perfectly natural on the PDP-11 to print
out the native machine word in the *native word order* of the PDP-11.
During that time most software operated on the native architecture and
the idea of being portable to other systems was not yet common.

The PDP-11 is a 16-bit word machine.  Therefore what you are seeing
with the 2-byte integer and the order it is printed is the order that
it was printed on the PDP-11 system.  And has remained unchanged to
the present day.  Because it can't change without breaking all
historical use.

For anyone using od today the best way to use -x is -tx1 which prints
bytes in a portable order.  Whenever you think to type in -x use -tx1
instead.  This avoids breaking historical use and produces the output
that you are wanting.

> I think it might have been useful to get the "by word" value of the file if
> you are working with a binary file historically. One might have stored some
> data as a list of shorts. Then, we can easily view the data using "od -x
> data_file_name".
> 
> Since memory is so cheap now, people are probably using just using chars
> for text, and 4 byte ints or 8 byte ints where they used to use 2 byte ints
> (shorts) before. In this case, the "by word" order does not seem to me to
> be as useful and violates the principle of least astonishment needlessly.

But changing the use of options to a command is a hard problem and
cannot be done without breaking a lot of use of it.  The better way is
not to try.  The options to head and tail changed an eon ago and yet
just in the last week I ran across a posting where the option change
bit someone in the usage change.

And since there is no need for any breaking change it is better not to
do it.  Simply use the correct options for what you want.  -tx1 in
this case.

> It might be interesting to change the option to print values by double word
> or quadword instead or add another option to let the users choose to print
> by double word or quadword if they want.

And the size of 16-bits was a good value for a yester-year.  32-bits
has been a good size for some years.  Now 64-bits is the most common
size.  The only way to win is not to play.  Better to say the size
explicitly.  And IMNHO the best size is 1 regardless of architecture.

  od -Ax -tx1z -v

Each of those options have been added over the years and each changes
the behavior of the program.  Each of those would be a breaking change
if they were made the default.  Best to ask for what you want explicitly.

I strongly recommend https://www.ietf.org/rfc/ien/ien137.txt as
required reading.

Bob

Information forwarded to bug-coreutils <at> gnu.org:
bug#41518; Package coreutils. (Sat, 30 May 2020 07:37:02 GMT) Full text and rfc822 format available.

Message #24 received at 41518 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Yuan Cao <yuancao85 <at> gmail.com>
Cc: 41518 <at> debbugs.gnu.org, Bob Proulx <bob <at> proulx.com>
Subject: Re: bug#41518: Bug in od?
Date: Sat, 30 May 2020 09:36:10 +0200

On Mai 29 2020, Yuan Cao wrote:

> It just feels strange because the order does not reflect the order of the
> characters in the file.

But that's not true.  It reflects exactly how 2-byte numbers are stored
in memory on your system.  If you want to make a connection with
characters, you need to think about UCS-2 characters.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 27 Jun 2020 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 49 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #41518 Bug in od?

GNU bug report logs - #41518
Bug in od?