GNU bug report logs -
#41518
Bug in od?
Previous Next
Reported by: Yuan Cao <yuancao85 <at> gmail.com>
Date: Mon, 25 May 2020 05:56:02 UTC
Severity: normal
Tags: notabug
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 41518 in the body.
You can then email your comments to 41518 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#41518
; Package
coreutils
.
(Mon, 25 May 2020 05:56:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Yuan Cao <yuancao85 <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 25 May 2020 05:56:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
I recently came across the following behavior.
When using "--traditional x2" or "-x" option, it seems the order of hex
code output for the characters is pairwise reversed (if that's the correct
way of describing it).
For example, using "od -cx" on a test file that contains "123456789\n", you
get the following output:
0000000 1 2 3 4 5 6 7 8 9 0 \n
3231 3433 3635 3837 3039 000a
0000013
It seems like it should be the following instead:
0000000 1 2 3 4 5 6 7 8 9 0 \n
3132 3334 3536 3738 3930 0a00
0000013
The version involved is od in GNU coreutils 8.28.
Best Regards,
Yuan
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#41518
; Package
coreutils
.
(Mon, 25 May 2020 10:49:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 41518 <at> debbugs.gnu.org (full text, mbox):
tag 41518 notabug
close 41518
stop
response below...
On 25/05/2020 04:05, Yuan Cao wrote:
> Hello,
>
> I recently came across the following behavior.
>
> When using "--traditional x2" or "-x" option, it seems the order of hex
> code output for the characters is pairwise reversed (if that's the correct
> way of describing it).
>
> For example, using "od -cx" on a test file that contains "123456789\n", you
> get the following output:
>
> 0000000 1 2 3 4 5 6 7 8 9 0 \n
> 3231 3433 3635 3837 3039 000a
> 0000013
>
> It seems like it should be the following instead:
>
> 0000000 1 2 3 4 5 6 7 8 9 0 \n
> 3132 3334 3536 3738 3930 0a00
> 0000013
>
> The version involved is od in GNU coreutils 8.28.
That's because you're on a little endian machine.
If you want to reorder as per a big endian machine you can:
od --endian=big -cx your_file
If you want to hexdump independently of endianess you can:
od -Ax -tx1z -v
cheers,
Pádraig
Added tag(s) notabug.
Request was from
Pádraig Brady <P <at> draigBrady.com>
to
control <at> debbugs.gnu.org
.
(Mon, 25 May 2020 10:49:02 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
41518 <at> debbugs.gnu.org and Yuan Cao <yuancao85 <at> gmail.com>
Request was from
Pádraig Brady <P <at> draigBrady.com>
to
control <at> debbugs.gnu.org
.
(Mon, 25 May 2020 10:49:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#41518
; Package
coreutils
.
(Fri, 29 May 2020 05:21:01 GMT)
Full text and
rfc822 format available.
Message #15 received at 41518 <at> debbugs.gnu.org (full text, mbox):
A little more information.
Pádraig Brady wrote:
> Yuan Cao wrote:
> > I recently came across the following behavior.
> >
> > When using "--traditional x2" or "-x" option, it seems the order of hex
> > code output for the characters is pairwise reversed (if that's the correct
> > way of describing it).
‘-x’
Output as hexadecimal two-byte units. Equivalent to ‘-t x2’.
Outputs 16-bit integers in the *native byte order* of the machine.
Which may be either big-endian or little-endian depending on the
machine. Not portable. Depends upon the machine it is run upon.
> If you want to hexdump independently of endianess you can:
>
> od -Ax -tx1z -v
The -tx1 option above is portable because it outputs 1-byte units
instead of 2-byte units which is independent of endianess.
This is the FAQ entry for this topic.
https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e
Bob
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#41518
; Package
coreutils
.
(Fri, 29 May 2020 20:48:02 GMT)
Full text and
rfc822 format available.
Message #18 received at 41518 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Fri, May 29, 2020 at 1:20 AM Bob Proulx <bob <at> proulx.com> wrote:
> A little more information.
>
> Pádraig Brady wrote:
> > Yuan Cao wrote:
> > > I recently came across the following behavior.
> > >
> > > When using "--traditional x2" or "-x" option, it seems the order of hex
> > > code output for the characters is pairwise reversed (if that's the
> correct
> > > way of describing it).
>
> ‘-x’
> Output as hexadecimal two-byte units. Equivalent to ‘-t x2’.
>
> Outputs 16-bit integers in the *native byte order* of the machine.
> Which may be either big-endian or little-endian depending on the
> machine. Not portable. Depends upon the machine it is run upon.
>
> > If you want to hexdump independently of endianess you can:
> >
> > od -Ax -tx1z -v
>
> The -tx1 option above is portable because it outputs 1-byte units
> instead of 2-byte units which is independent of endianess.
>
> This is the FAQ entry for this topic.
>
>
> https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e
>
> Bob
>
Thanks for pointing me to this documentation.
It just feels strange because the order does not reflect the order of the
characters in the file.
I think it might have been useful to get the "by word" value of the file if
you are working with a binary file historically. One might have stored some
data as a list of shorts. Then, we can easily view the data using "od -x
data_file_name".
Since memory is so cheap now, people are probably using just using chars
for text, and 4 byte ints or 8 byte ints where they used to use 2 byte ints
(shorts) before. In this case, the "by word" order does not seem to me to
be as useful and violates the principle of least astonishment needlessly.
It might be interesting to change the option to print values by double word
or quadword instead or add another option to let the users choose to print
by double word or quadword if they want.
Best Regards,
Yuan
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#41518
; Package
coreutils
.
(Fri, 29 May 2020 22:34:01 GMT)
Full text and
rfc822 format available.
Message #21 received at 41518 <at> debbugs.gnu.org (full text, mbox):
Yuan Cao wrote:
> > https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e
>
> Thanks for pointing me to this documentation.
>
> It just feels strange because the order does not reflect the order of the
> characters in the file.
It feels strange in the environment *today*. But in the 1970's when
the 'od' was written it was perfectly natural on the PDP-11 to print
out the native machine word in the *native word order* of the PDP-11.
During that time most software operated on the native architecture and
the idea of being portable to other systems was not yet common.
The PDP-11 is a 16-bit word machine. Therefore what you are seeing
with the 2-byte integer and the order it is printed is the order that
it was printed on the PDP-11 system. And has remained unchanged to
the present day. Because it can't change without breaking all
historical use.
For anyone using od today the best way to use -x is -tx1 which prints
bytes in a portable order. Whenever you think to type in -x use -tx1
instead. This avoids breaking historical use and produces the output
that you are wanting.
> I think it might have been useful to get the "by word" value of the file if
> you are working with a binary file historically. One might have stored some
> data as a list of shorts. Then, we can easily view the data using "od -x
> data_file_name".
>
> Since memory is so cheap now, people are probably using just using chars
> for text, and 4 byte ints or 8 byte ints where they used to use 2 byte ints
> (shorts) before. In this case, the "by word" order does not seem to me to
> be as useful and violates the principle of least astonishment needlessly.
But changing the use of options to a command is a hard problem and
cannot be done without breaking a lot of use of it. The better way is
not to try. The options to head and tail changed an eon ago and yet
just in the last week I ran across a posting where the option change
bit someone in the usage change.
And since there is no need for any breaking change it is better not to
do it. Simply use the correct options for what you want. -tx1 in
this case.
> It might be interesting to change the option to print values by double word
> or quadword instead or add another option to let the users choose to print
> by double word or quadword if they want.
And the size of 16-bits was a good value for a yester-year. 32-bits
has been a good size for some years. Now 64-bits is the most common
size. The only way to win is not to play. Better to say the size
explicitly. And IMNHO the best size is 1 regardless of architecture.
od -Ax -tx1z -v
Each of those options have been added over the years and each changes
the behavior of the program. Each of those would be a breaking change
if they were made the default. Best to ask for what you want explicitly.
I strongly recommend https://www.ietf.org/rfc/ien/ien137.txt as
required reading.
Bob
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#41518
; Package
coreutils
.
(Sat, 30 May 2020 07:37:02 GMT)
Full text and
rfc822 format available.
Message #24 received at 41518 <at> debbugs.gnu.org (full text, mbox):
On Mai 29 2020, Yuan Cao wrote:
> It just feels strange because the order does not reflect the order of the
> characters in the file.
But that's not true. It reflects exactly how 2-byte numbers are stored
in memory on your system. If you want to make a connection with
characters, you need to think about UCS-2 characters.
Andreas.
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 27 Jun 2020 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 359 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.