GNU bug report logs - #60989
[PATCH] rm: fail on duplicate input if force not enabled

Previous Next

Package: coreutils;

Reported by: Łukasz Sroka <sroka.dev <at> gmail.com>

Date: Sat, 21 Jan 2023 14:29:02 UTC

Severity: normal

Tags: patch

To reply to this bug, email your comments to 60989 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#60989; Package coreutils. (Sat, 21 Jan 2023 14:29:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Łukasz Sroka <sroka.dev <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 21 Jan 2023 14:29:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Łukasz Sroka <sroka.dev <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: [PATCH] rm: fail on duplicate input if force not enabled
Date: Sat, 21 Jan 2023 14:05:25 +0100
    When the input files contain duplicates, then the rm fails. Because
    duplicates occur most often when the * is used and the shell unwraps it.
    There is a very common scenario when a user accidentally enters space
    after a filename, or enters space instead of forward slash.
    Example:

      rm prefix_ *

    The user intended to remove all files with a `prefix_` but removed all
    of the files in cwd.
    The program quits immediately when a duplicate is detected, to prevent
    pressing `y` because user expected a prompt regarding removing multiple
    files.
    The force option disables this function to enable scripts to work
    without modifying them.

```
diff --git a/src/rm.c b/src/rm.c
index 354e2b0df..e4f9949f0 100644
--- a/src/rm.c
+++ b/src/rm.c
@@ -123,6 +123,16 @@ diagnose_leading_hyphen (int argc, char **argv)
     }
 }

+static bool
+find_duplicates (int n_files, char **files)
+{
+  for (int l = 0; l < n_files-1; l++)
+    for (int r = l+1; r < n_files; r++)
+      if (strcmp(files[l], files[r]) == 0)
+          return true;
+  return false;
+}
+
 void
 usage (int status)
 {
@@ -211,6 +221,7 @@ main (int argc, char **argv)
   bool preserve_root = true;
   struct rm_options x;
   bool prompt_once = false;
+  bool force_rm = false;
   int c;

   initialize_main (&argc, &argv);
@@ -238,6 +249,7 @@ main (int argc, char **argv)
           x.interactive = RMI_NEVER;
           x.ignore_missing_files = true;
           prompt_once = false;
+          force_rm = true;
           break;

         case 'i':
@@ -352,6 +364,17 @@ main (int argc, char **argv)
   uintmax_t n_files = argc - optind;
   char **file =  argv + optind;

+  if (!force_rm && find_duplicates(n_files, file))
+    {
+      /* Because usually when the input files are duplicated it means
that the user
+         sumbitted both a directory and an * as separate arguments,
probably by accident */
+      fprintf (stderr,
+               "%s: input contains duplicates, most likely you've put "
+               "both * and a file from the same directory.\n",
+               program_name);
+      return EXIT_FAILURE;
+    }
+
   if (prompt_once && (x.recursive || 3 < n_files))
     {
       fprintf (stderr,
```




Information forwarded to bug-coreutils <at> gnu.org:
bug#60989; Package coreutils. (Sat, 21 Jan 2023 14:54:02 GMT) Full text and rfc822 format available.

Message #8 received at 60989 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Łukasz Sroka <sroka.dev <at> gmail.com>, 60989 <at> debbugs.gnu.org
Subject: Re: bug#60989: [PATCH] rm: fail on duplicate input if force not
 enabled
Date: Sat, 21 Jan 2023 14:53:13 +0000
On 21/01/2023 13:05, Łukasz Sroka wrote:
>      When the input files contain duplicates, then the rm fails. Because
>      duplicates occur most often when the * is used and the shell unwraps it.
>      There is a very common scenario when a user accidentally enters space
>      after a filename, or enters space instead of forward slash.
>      Example:
> 
>        rm prefix_ *
> 
>      The user intended to remove all files with a `prefix_` but removed all
>      of the files in cwd.
>      The program quits immediately when a duplicate is detected, to prevent
>      pressing `y` because user expected a prompt regarding removing multiple
>      files.
>      The force option disables this function to enable scripts to work
>      without modifying them.
> 
> ```
> diff --git a/src/rm.c b/src/rm.c
> index 354e2b0df..e4f9949f0 100644
> --- a/src/rm.c
> +++ b/src/rm.c
> @@ -123,6 +123,16 @@ diagnose_leading_hyphen (int argc, char **argv)
>       }
>   }
> 
> +static bool
> +find_duplicates (int n_files, char **files)
> +{
> +  for (int l = 0; l < n_files-1; l++)
> +    for (int r = l+1; r < n_files; r++)
> +      if (strcmp(files[l], files[r]) == 0)
> +          return true;
> +  return false;
> +}
> +
>   void
>   usage (int status)
>   {
> @@ -211,6 +221,7 @@ main (int argc, char **argv)
>     bool preserve_root = true;
>     struct rm_options x;
>     bool prompt_once = false;
> +  bool force_rm = false;
>     int c;
> 
>     initialize_main (&argc, &argv);
> @@ -238,6 +249,7 @@ main (int argc, char **argv)
>             x.interactive = RMI_NEVER;
>             x.ignore_missing_files = true;
>             prompt_once = false;
> +          force_rm = true;
>             break;
> 
>           case 'i':
> @@ -352,6 +364,17 @@ main (int argc, char **argv)
>     uintmax_t n_files = argc - optind;
>     char **file =  argv + optind;
> 
> +  if (!force_rm && find_duplicates(n_files, file))
> +    {
> +      /* Because usually when the input files are duplicated it means
> that the user
> +         sumbitted both a directory and an * as separate arguments,
> probably by accident */
> +      fprintf (stderr,
> +               "%s: input contains duplicates, most likely you've put "
> +               "both * and a file from the same directory.\n",
> +               program_name);
> +      return EXIT_FAILURE;
> +    }
> +
>     if (prompt_once && (x.recursive || 3 < n_files))
>       {
>         fprintf (stderr,
> ```

An interesting proposal.
The main protection would be for `dir/ *` rather than `file_prefix_ *`.
The former would be unusual for a user to type, while the latter more usual, but wouldn't trigger the protection AFAICS.
This ads O(N^2) on each interaction, so if it was to be included probably only enabled with --interactive.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#60989; Package coreutils. (Sat, 21 Jan 2023 18:52:02 GMT) Full text and rfc822 format available.

Message #11 received at 60989 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Łukasz Sroka <sroka.dev <at> gmail.com>
Cc: 60989 <at> debbugs.gnu.org
Subject: Re: bug#60989: [PATCH] rm: fail on duplicate input if force not
 enabled
Date: Sat, 21 Jan 2023 18:51:51 +0000
On 21/01/2023 15:51, Łukasz Sroka wrote:
> On 21/01/2023 15:53, Pádraig Brady <P <at> draigbrady.com> wrote:
>>
>> On 21/01/2023 13:05, Łukasz Sroka wrote:
>>>       When the input files contain duplicates, then the rm fails. Because
>>>       duplicates occur most often when the * is used and the shell unwraps it.
>>>       There is a very common scenario when a user accidentally enters space
>>>       after a filename, or enters space instead of forward slash.
>>>       Example:
>>>
>>>         rm prefix_ *
>>>
>>>       The user intended to remove all files with a `prefix_` but removed all
>>>       of the files in cwd.
>>>       The program quits immediately when a duplicate is detected, to prevent
>>>       pressing `y` because user expected a prompt regarding removing multiple
>>>       files.
>>>       The force option disables this function to enable scripts to work
>>>       without modifying them.
>>>
>>> ```
>>> diff --git a/src/rm.c b/src/rm.c
>>> index 354e2b0df..e4f9949f0 100644
>>> --- a/src/rm.c
>>> +++ b/src/rm.c
>>> @@ -123,6 +123,16 @@ diagnose_leading_hyphen (int argc, char **argv)
>>>        }
>>>    }
>>>
>>> +static bool
>>> +find_duplicates (int n_files, char **files)
>>> +{
>>> +  for (int l = 0; l < n_files-1; l++)
>>> +    for (int r = l+1; r < n_files; r++)
>>> +      if (strcmp(files[l], files[r]) == 0)
>>> +          return true;
>>> +  return false;
>>> +}
>>> +
>>>    void
>>>    usage (int status)
>>>    {
>>> @@ -211,6 +221,7 @@ main (int argc, char **argv)
>>>      bool preserve_root = true;
>>>      struct rm_options x;
>>>      bool prompt_once = false;
>>> +  bool force_rm = false;
>>>      int c;
>>>
>>>      initialize_main (&argc, &argv);
>>> @@ -238,6 +249,7 @@ main (int argc, char **argv)
>>>              x.interactive = RMI_NEVER;
>>>              x.ignore_missing_files = true;
>>>              prompt_once = false;
>>> +          force_rm = true;
>>>              break;
>>>
>>>            case 'i':
>>> @@ -352,6 +364,17 @@ main (int argc, char **argv)
>>>      uintmax_t n_files = argc - optind;
>>>      char **file =  argv + optind;
>>>
>>> +  if (!force_rm && find_duplicates(n_files, file))
>>> +    {
>>> +      /* Because usually when the input files are duplicated it means
>>> that the user
>>> +         sumbitted both a directory and an * as separate arguments,
>>> probably by accident */
>>> +      fprintf (stderr,
>>> +               "%s: input contains duplicates, most likely you've put "
>>> +               "both * and a file from the same directory.\n",
>>> +               program_name);
>>> +      return EXIT_FAILURE;
>>> +    }
>>> +
>>>      if (prompt_once && (x.recursive || 3 < n_files))
>>>        {
>>>          fprintf (stderr,
>>> ```
>>
>> An interesting proposal.
>> The main protection would be for `dir/ *` rather than `file_prefix_ *`.
>> The former would be unusual for a user to type, while the latter more usual, but wouldn't trigger the protection AFAICS.
>> This ads O(N^2) on each interaction, so if it was to be included probably only enabled with --interactive.
>>
>> cheers,
>> Pádraig
> 
> Yeah, true. Implemented it quickly and focused on the `dir/ *`
> scanerio at first, because that has happened to me -.-
> It was because I tried to do something quickly while editing command
> from backsearch and ended up with `rm tmp/ *` instead of `rm tmp/*`.
> To make the prefix_ protection it won't be possible without checking
> if a "prefix" exits on disk (eg. file and file1), which would be more
> time complex.
> I don't like the --interactive approach, as when you're trying to do
> something quickly, you probably won't opt for doing it interactively
> and you're probably either disabled or ignoring shell warnings.

True, but at a high level -I can be seen as
an unintrusive "be more careful" option,
and so is often enabled by default with an alias.
Red Hat flavored systems even enable the more intrusive
-i option by default for the root user for example.

> The more I think about this problem, the more I tilt towards
> implementing it in shell. If you'd've written "rm prefix_ *" and had
> 10k files there, it would be so much easier to detect it on the shell
> side than checking every filename with the other and triggering
> syscalls for every one...
> Either way if you have other solutions in mind, please update me, as
> it wouldn't require to configure shell in dockers or remote machines,
> thus making the solution more available.

A shell option like `shopt -s uniqueglob` would indeed be more general.
There are various other globbing options, like the similar "failglob"
which fails to execute the command if a glob doesn't match anything.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#60989; Package coreutils. (Sat, 21 Jan 2023 19:20:01 GMT) Full text and rfc822 format available.

Message #14 received at 60989 <at> debbugs.gnu.org (full text, mbox):

From: Łukasz Sroka <sroka.dev <at> gmail.com>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 60989 <at> debbugs.gnu.org
Subject: Re: bug#60989: [PATCH] rm: fail on duplicate input if force not
 enabled
Date: Sat, 21 Jan 2023 16:51:23 +0100
On 21/01/2023 15:53, Pádraig Brady <P <at> draigbrady.com> wrote:
>
> On 21/01/2023 13:05, Łukasz Sroka wrote:
> >      When the input files contain duplicates, then the rm fails. Because
> >      duplicates occur most often when the * is used and the shell unwraps it.
> >      There is a very common scenario when a user accidentally enters space
> >      after a filename, or enters space instead of forward slash.
> >      Example:
> >
> >        rm prefix_ *
> >
> >      The user intended to remove all files with a `prefix_` but removed all
> >      of the files in cwd.
> >      The program quits immediately when a duplicate is detected, to prevent
> >      pressing `y` because user expected a prompt regarding removing multiple
> >      files.
> >      The force option disables this function to enable scripts to work
> >      without modifying them.
> >
> > ```
> > diff --git a/src/rm.c b/src/rm.c
> > index 354e2b0df..e4f9949f0 100644
> > --- a/src/rm.c
> > +++ b/src/rm.c
> > @@ -123,6 +123,16 @@ diagnose_leading_hyphen (int argc, char **argv)
> >       }
> >   }
> >
> > +static bool
> > +find_duplicates (int n_files, char **files)
> > +{
> > +  for (int l = 0; l < n_files-1; l++)
> > +    for (int r = l+1; r < n_files; r++)
> > +      if (strcmp(files[l], files[r]) == 0)
> > +          return true;
> > +  return false;
> > +}
> > +
> >   void
> >   usage (int status)
> >   {
> > @@ -211,6 +221,7 @@ main (int argc, char **argv)
> >     bool preserve_root = true;
> >     struct rm_options x;
> >     bool prompt_once = false;
> > +  bool force_rm = false;
> >     int c;
> >
> >     initialize_main (&argc, &argv);
> > @@ -238,6 +249,7 @@ main (int argc, char **argv)
> >             x.interactive = RMI_NEVER;
> >             x.ignore_missing_files = true;
> >             prompt_once = false;
> > +          force_rm = true;
> >             break;
> >
> >           case 'i':
> > @@ -352,6 +364,17 @@ main (int argc, char **argv)
> >     uintmax_t n_files = argc - optind;
> >     char **file =  argv + optind;
> >
> > +  if (!force_rm && find_duplicates(n_files, file))
> > +    {
> > +      /* Because usually when the input files are duplicated it means
> > that the user
> > +         sumbitted both a directory and an * as separate arguments,
> > probably by accident */
> > +      fprintf (stderr,
> > +               "%s: input contains duplicates, most likely you've put "
> > +               "both * and a file from the same directory.\n",
> > +               program_name);
> > +      return EXIT_FAILURE;
> > +    }
> > +
> >     if (prompt_once && (x.recursive || 3 < n_files))
> >       {
> >         fprintf (stderr,
> > ```
>
> An interesting proposal.
> The main protection would be for `dir/ *` rather than `file_prefix_ *`.
> The former would be unusual for a user to type, while the latter more usual, but wouldn't trigger the protection AFAICS.
> This ads O(N^2) on each interaction, so if it was to be included probably only enabled with --interactive.
>
> cheers,
> Pádraig

Yeah, true. Implemented it quickly and focused on the `dir/ *`
scanerio at first, because that has happened to me -.-
It was because I tried to do something quickly while editing command
from backsearch and ended up with `rm tmp/ *` instead of `rm tmp/*`.
To make the prefix_ protection it won't be possible without checking
if a "prefix" exits on disk (eg. file and file1), which would be more
time complex.
I don't like the --interactive approach, as when you're trying to do
something quickly, you probably won't opt for doing it interactively
and you're probably either disabled or ignoring shell warnings.

The more I think about this problem, the more I tilt towards
implementing it in shell. If you'd've written "rm prefix_ *" and had
10k files there, it would be so much easier to detect it on the shell
side than checking every filename with the other and triggering
syscalls for every one...
Either way if you have other solutions in mind, please update me, as
it wouldn't require to configure shell in dockers or remote machines,
thus making the solution more available.




Information forwarded to bug-coreutils <at> gnu.org:
bug#60989; Package coreutils. (Sun, 22 Jan 2023 18:20:01 GMT) Full text and rfc822 format available.

Message #17 received at 60989 <at> debbugs.gnu.org (full text, mbox):

From: "Philip Rowlands" <phr+coreutils <at> dimebar.com>
To: Łukasz Sroka <sroka.dev <at> gmail.com>, 60989 <at> debbugs.gnu.org
Subject: Re: bug#60989: [PATCH] rm: fail on duplicate input if force not
 enabled
Date: Sun, 22 Jan 2023 18:18:04 +0000
On Sat, 21 Jan 2023, at 13:05, Łukasz Sroka wrote:
> When the input files contain duplicates, then the rm fails. Because
>     duplicates occur most often when the * is used and the shell unwraps it.
>     There is a very common scenario when a user accidentally enters space
>     after a filename, or enters space instead of forward slash.

To fail on duplicate FILE args, this bash function would do (lightly tested, doesn't attempt getopt processing):

function safe_rm {
  local -A seen
  local file
  for file in "$@"; do
    if [[ -v ${seen[$file]} ]]; then
      echo "error: duplicate name '$file'" 1>&2
      return 1
    fi
    seen[$file]=1
  done

  # no dupes seen
  command rm "$@"
}

and could be used today, without waiting for the next coreutils release.

As an aside, I could be reading it wrong but the coreutils manual suggests the file arguments are optional
   rm [option]… [file]…


Cheers,
Phil




Information forwarded to bug-coreutils <at> gnu.org:
bug#60989; Package coreutils. (Sun, 22 Jan 2023 21:37:02 GMT) Full text and rfc822 format available.

Message #20 received at 60989 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Philip Rowlands <phr+coreutils <at> dimebar.com>,
 Łukasz Sroka <sroka.dev <at> gmail.com>, 60989 <at> debbugs.gnu.org
Subject: Re: bug#60989: [PATCH] rm: fail on duplicate input if force not
 enabled
Date: Sun, 22 Jan 2023 21:36:49 +0000
On 22/01/2023 18:18, Philip Rowlands wrote:
> On Sat, 21 Jan 2023, at 13:05, Łukasz Sroka wrote:
>> When the input files contain duplicates, then the rm fails. Because
>>      duplicates occur most often when the * is used and the shell unwraps it.
>>      There is a very common scenario when a user accidentally enters space
>>      after a filename, or enters space instead of forward slash.
> 
> To fail on duplicate FILE args, this bash function would do (lightly tested, doesn't attempt getopt processing):
> 
> function safe_rm {
>    local -A seen
>    local file
>    for file in "$@"; do
>      if [[ -v ${seen[$file]} ]]; then
>        echo "error: duplicate name '$file'" 1>&2
>        return 1
>      fi
>      seen[$file]=1
>    done
> 
>    # no dupes seen
>    command rm "$@"
> }
> 
> and could be used today, without waiting for the next coreutils release.

That's informative, thanks.


> As an aside, I could be reading it wrong but the coreutils manual suggests the file arguments are optional
>     rm [option]… [file]…

Right with the -f option rm will not fail if no arguments are specified
(in the presence of nullglob etc.), which is POSIX compliant.

cheers,
Pádraig




This bug report was last modified 2 years and 150 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.