GNU bug report logs - #60989
[PATCH] rm: fail on duplicate input if force not enabled

Previous Next

Package: coreutils;

Reported by: Łukasz Sroka <sroka.dev <at> gmail.com>

Date: Sat, 21 Jan 2023 14:29:02 UTC

Severity: normal

Tags: patch

Full log


View this message in rfc822 format

From: Pádraig Brady <P <at> draigBrady.com>
To: Łukasz Sroka <sroka.dev <at> gmail.com>
Cc: 60989 <at> debbugs.gnu.org
Subject: bug#60989: [PATCH] rm: fail on duplicate input if force not enabled
Date: Sat, 21 Jan 2023 18:51:51 +0000
On 21/01/2023 15:51, Łukasz Sroka wrote:
> On 21/01/2023 15:53, Pádraig Brady <P <at> draigbrady.com> wrote:
>>
>> On 21/01/2023 13:05, Łukasz Sroka wrote:
>>>       When the input files contain duplicates, then the rm fails. Because
>>>       duplicates occur most often when the * is used and the shell unwraps it.
>>>       There is a very common scenario when a user accidentally enters space
>>>       after a filename, or enters space instead of forward slash.
>>>       Example:
>>>
>>>         rm prefix_ *
>>>
>>>       The user intended to remove all files with a `prefix_` but removed all
>>>       of the files in cwd.
>>>       The program quits immediately when a duplicate is detected, to prevent
>>>       pressing `y` because user expected a prompt regarding removing multiple
>>>       files.
>>>       The force option disables this function to enable scripts to work
>>>       without modifying them.
>>>
>>> ```
>>> diff --git a/src/rm.c b/src/rm.c
>>> index 354e2b0df..e4f9949f0 100644
>>> --- a/src/rm.c
>>> +++ b/src/rm.c
>>> @@ -123,6 +123,16 @@ diagnose_leading_hyphen (int argc, char **argv)
>>>        }
>>>    }
>>>
>>> +static bool
>>> +find_duplicates (int n_files, char **files)
>>> +{
>>> +  for (int l = 0; l < n_files-1; l++)
>>> +    for (int r = l+1; r < n_files; r++)
>>> +      if (strcmp(files[l], files[r]) == 0)
>>> +          return true;
>>> +  return false;
>>> +}
>>> +
>>>    void
>>>    usage (int status)
>>>    {
>>> @@ -211,6 +221,7 @@ main (int argc, char **argv)
>>>      bool preserve_root = true;
>>>      struct rm_options x;
>>>      bool prompt_once = false;
>>> +  bool force_rm = false;
>>>      int c;
>>>
>>>      initialize_main (&argc, &argv);
>>> @@ -238,6 +249,7 @@ main (int argc, char **argv)
>>>              x.interactive = RMI_NEVER;
>>>              x.ignore_missing_files = true;
>>>              prompt_once = false;
>>> +          force_rm = true;
>>>              break;
>>>
>>>            case 'i':
>>> @@ -352,6 +364,17 @@ main (int argc, char **argv)
>>>      uintmax_t n_files = argc - optind;
>>>      char **file =  argv + optind;
>>>
>>> +  if (!force_rm && find_duplicates(n_files, file))
>>> +    {
>>> +      /* Because usually when the input files are duplicated it means
>>> that the user
>>> +         sumbitted both a directory and an * as separate arguments,
>>> probably by accident */
>>> +      fprintf (stderr,
>>> +               "%s: input contains duplicates, most likely you've put "
>>> +               "both * and a file from the same directory.\n",
>>> +               program_name);
>>> +      return EXIT_FAILURE;
>>> +    }
>>> +
>>>      if (prompt_once && (x.recursive || 3 < n_files))
>>>        {
>>>          fprintf (stderr,
>>> ```
>>
>> An interesting proposal.
>> The main protection would be for `dir/ *` rather than `file_prefix_ *`.
>> The former would be unusual for a user to type, while the latter more usual, but wouldn't trigger the protection AFAICS.
>> This ads O(N^2) on each interaction, so if it was to be included probably only enabled with --interactive.
>>
>> cheers,
>> Pádraig
> 
> Yeah, true. Implemented it quickly and focused on the `dir/ *`
> scanerio at first, because that has happened to me -.-
> It was because I tried to do something quickly while editing command
> from backsearch and ended up with `rm tmp/ *` instead of `rm tmp/*`.
> To make the prefix_ protection it won't be possible without checking
> if a "prefix" exits on disk (eg. file and file1), which would be more
> time complex.
> I don't like the --interactive approach, as when you're trying to do
> something quickly, you probably won't opt for doing it interactively
> and you're probably either disabled or ignoring shell warnings.

True, but at a high level -I can be seen as
an unintrusive "be more careful" option,
and so is often enabled by default with an alias.
Red Hat flavored systems even enable the more intrusive
-i option by default for the root user for example.

> The more I think about this problem, the more I tilt towards
> implementing it in shell. If you'd've written "rm prefix_ *" and had
> 10k files there, it would be so much easier to detect it on the shell
> side than checking every filename with the other and triggering
> syscalls for every one...
> Either way if you have other solutions in mind, please update me, as
> it wouldn't require to configure shell in dockers or remote machines,
> thus making the solution more available.

A shell option like `shopt -s uniqueglob` would indeed be more general.
There are various other globbing options, like the similar "failglob"
which fails to execute the command if a glob doesn't match anything.

cheers,
Pádraig




This bug report was last modified 2 years and 151 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.