GNU bug report logs - #50701
cannot append or insert to empty file or stream

Previous Next

Package: sed;

Reported by: lexi hale <lexi <at> hale.su>

Date: Mon, 20 Sep 2021 14:58:02 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 50701 in the body.
You can then email your comments to 50701 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#50701; Package sed. (Mon, 20 Sep 2021 14:58:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to lexi hale <lexi <at> hale.su>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Mon, 20 Sep 2021 14:58:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: lexi hale <lexi <at> hale.su>
To: bug-sed <at> gnu.org
Subject: cannot append or insert to empty file or stream
Date: Mon, 20 Sep 2021 10:53:48 +0200
there is an inconsistency in the handling of the a and i commands in
sed. if the input stream or file immediately yields EOL, sed ignores
the commands and produces an empty stream.

for instance:

	$ echo -n >file line-1
	$ sed -i aline-2 file
	$ cat file

correctly yields

	line-1
	line-2

however,

	$ touch empty
	$ sed -i aline-1 empty
	$ cat empty

prints nothing,  instead of the expected result "line-1".

this is an extremely surprising behavior that limits the utility of sed
when one cannot predict the contents of the file in question.  if it is
strictly necessary for standards conformance or to meet the
expectations of ancient shell scripts that depend on nonstandard
behavior, it would be helpful to at least have a flag that would turn
on consistent behavior for these commands.

however, making the consistent behavior the new default might also fix
a few extremely rare bugs in existing shell scripts :)

thanks for your time,
lexi




Information forwarded to bug-sed <at> gnu.org:
bug#50701; Package sed. (Mon, 20 Sep 2021 19:29:02 GMT) Full text and rfc822 format available.

Message #8 received at 50701 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: lexi hale <lexi <at> hale.su>, 50701 <at> debbugs.gnu.org
Subject: Re: bug#50701: cannot append or insert to empty file or stream
Date: Mon, 20 Sep 2021 13:28:42 -0600
tag 50701 notabug
close 50701
stop

Hello,

Thank you for providing clear reproducible example of the situation -
it makes troubleshooting much easier.

This is, however, not a bug - but the intended behavior:

On 2021-09-20 2:53 a.m., lexi hale wrote:
> there is an inconsistency in the handling of the a and i commands in
> sed. if the input stream or file immediately yields EOL, sed ignores
> the commands and produces an empty stream.
> 

First, let's clarify what the files are:

> 	$ echo -n >file line-1
> 	$ touch empty

The file 'file' is not empty, it has one line.
This line just happens to be empty (i.e. no characters in the line 
before the newline character).

The file 'empty' is empty, it contains NO lines.

The above might seem obvious,
but the distinction is important for the next step:

> 	$ sed -i aline-2 file

The sed command "aline-2", means append ("a") the text "line-2",
to *every* line.
The "every line" parts is due to the command "a" having no address
part (contrast with sed command "3aline-2" which would add "line-2" only 
to the third line).

If the input file has (any) lines, the command will be executed.
If the input does not have any lines, the command will not be executed.
And that is the behavior you are seeing.

> however, >
>   $ sed -i aline-1 empty >   $ cat empty>
> prints nothing,  instead of the expected result "line-1". >

Compare the situation to this (slightly more obvious) case:

    $ printf "%s\n" hello world > in1
    $ cat in1
    hello
    world

Add 'FOO' after every line:

    $ sed 'aFOO' in1
    hello
    FOO
    world
    FOO

Add 'FOO' after the second line:

    $ sed '2aFOO' in1
    hello
    world
    FOO

Add 'FOO' after the third line:

    $ sed '3aFOO' in1
    hello
    world

Since there is NO third line in the input, the command wasn't executed.

And similarly, since the 'empty' file does not have ANY lines,
the command (which is programmed to run on every line) is not executed.

> this is an extremely surprising behavior that limits the utility of sed
> when one cannot predict the contents of the file in question.  

I hope the explanation above makes this behavior less surprising.

> [...] it would be helpful to at least have a flag that would turn
> on consistent behavior for these commands.
> 
> however, making the consistent behavior the new default might also fix
> a few extremely rare bugs in existing shell scripts :)

I humbly think that this behavior is consistent and does not require any
modification or flags - but other opinions are welcomed.

When integrating with shell scripts, edge-cases (e.g. empty input)
should probably be checked explicitly.

Few suggestion to check for empty files (contrived and not tested):

    if test -s INPUTFILE ; then
       # process the file
    else
       echo "FILE IS EMPTY"
    fi

    ---

    lines=$(wc -l < INPUTFILE)
    if test "$lines" -gt 0 ; then
       # process the file
    else
       echo "FILE IS EMPTY"
    fi

And lastly, since you are already using GNU sed extensions
(and not worrying about portability), you can use
the 'qNUM' command extension to exit with a specific code
if there was any input:

    sed -e 'aline-1'  -e '$q42' INPUTFILE > OUTPUTFILE
    exit_code=$?
    if test $exit_code -eq 42 ; then
       # file processed OK, wasn't empty
    elif test $exit_code -eq 0 ; then
       # file was empty
    else
       # another sed error (e.g. bad program, I/O error, etc)
    fi


As such, I'm marking this as "not a bug", but discussion can continue
by replying to this thread.

regards,
 - assaf







Added tag(s) notabug. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 20 Sep 2021 19:29:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 50701 <at> debbugs.gnu.org and lexi hale <lexi <at> hale.su> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 20 Sep 2021 19:29:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-sed <at> gnu.org:
bug#50701; Package sed. (Mon, 20 Sep 2021 19:41:02 GMT) Full text and rfc822 format available.

Message #15 received at 50701 <at> debbugs.gnu.org (full text, mbox):

From: Davide Brini <dave_br <at> gmx.com>
To: 50701 <at> debbugs.gnu.org
Subject: Re: bug#50701: cannot append or insert to empty file or stream
Date: Mon, 20 Sep 2021 21:40:19 +0200
On Mon, 20 Sep 2021 10:53:48 +0200, lexi hale <lexi <at> hale.su> wrote:

> there is an inconsistency in the handling of the a and i commands in
> sed. if the input stream or file immediately yields EOL, sed ignores
> the commands and produces an empty stream.
>
> for instance:
>
> 	$ echo -n >file line-1
> 	$ sed -i aline-2 file
> 	$ cat file
>
> correctly yields
>
> 	line-1
> 	line-2
>
> however,
>
> 	$ touch empty
> 	$ sed -i aline-1 empty
> 	$ cat empty
>
> prints nothing,  instead of the expected result "line-1".

It's the same thing as the following:

$ echo x > file
$ sed -i 's|^|FOO|' file
$ cat file
FOOx

$ touch empty
$ sed -i 's|^|FOO|' empty
$ cat empty
$

Would you expect a line magically materializing in the seocnd case? I
suppose you wouldn't...

--
D.




Information forwarded to bug-sed <at> gnu.org:
bug#50701; Package sed. (Thu, 23 Sep 2021 19:00:02 GMT) Full text and rfc822 format available.

Message #18 received at 50701 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: lexi hale <lexi <at> hale.su>, 50701 <at> debbugs.gnu.org
Subject: Re: bug#50701: cannot append or insert to empty file or stream
Date: Thu, 23 Sep 2021 13:59:36 -0500
On Mon, Sep 20, 2021 at 01:28:42PM -0600, Assaf Gordon wrote:

> First, let's clarify what the files are:
> 
> > 	$ echo -n >file line-1
> > 	$ touch empty
> 
> The file 'file' is not empty, it has one line.

That's one way of viewing it.  But according to POSIX, it doesn't even
have that (the POSIX definition of a "line" is characters followed by
a newline, and since you omitted the newline, it is not a line).

> This line just happens to be empty (i.e. no characters in the line before
> the newline character).
> 
> The file 'empty' is empty, it contains NO lines.
>

...

> 
> > this is an extremely surprising behavior that limits the utility of sed
> > when one cannot predict the contents of the file in question.
> 
> I hope the explanation above makes this behavior less surprising.

However, note that POSIX also says that 'sed' only has specified
behavior for "text files".  And the POSIX definition of a text file
specifically excludes files that do not end in a newline, other than
the empty file [1].  Thus, attempting to use sed on the file 'file'
which does not end in a newline, and is therefore not a text file, is
undefined behavior, and ANYTHING can happen.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403
(Other ways to have a file that is not a text file: have a NUL
character, have more characters than LINE_MAX bytes between newlines,
or have an encoding error in the current multibyte locale - but those
are not relevant to this conversation.)

But just because POSIX doesn't say what to do does not mean that we
can't pick something useful.  GNU sed tries hard to have sane behavior
for "mostly-text" files, such as the case where you forgot a trailing
newline.  It does so by pretending that there was a newline after
those last characters, after all.  But not all sed implementations
behave the same on your example.

> 
> As such, I'm marking this as "not a bug", but discussion can continue
> by replying to this thread.

At any rate, I agree that it's not a bug.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 22 Oct 2021 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 297 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.