GNU bug report logs - #72641
31.0.50; "Unlocking file: Invalid argument" when deleting lock file on network file system

Previous Next

Package: emacs;

Reported by: Michal Nazarewicz <mina86 <at> mina86.com>

Date: Thu, 15 Aug 2024 16:03:02 UTC

Severity: normal

Merged with 70973

Found in versions 29.1, 31.0.50

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Michal Nazarewicz <mina86 <at> mina86.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 72641 <at> debbugs.gnu.org
Subject: bug#72641: 31.0.50; "Unlocking file: Invalid argument" when deleting lock file on network file system
Date: Fri, 16 Aug 2024 02:59:32 +0200
On Thu, Aug 15 2024, Paul Eggert wrote:
> That being said, I think Emacs can ignore and remove bad lock files 
> without introducing more races. I installed the attached into the master 
> branch and it works for me on your test case (which I introduced 
> artificially on GNU/Linux). Please give it a try.

With Emacs from current master, if I open a file, edit it and then kill
the buffer without saving the changes, the lock file is deleted.
However, if I save the file (be it by save-buffer or by killing the
buffer and picking save option), the lock file remains.

I didn’t fully track what is actually happening.  It looks like saving
the buffer results in the following:
- lock_file is called
  - it calls current_lock_owner
    - which deletes the lock file
  - and now lock_file creates new lock file
- unlock_file is called
  - it calls current_lock_owner
    - which return ENOENT for some reason
  - the lock file is left alone
 
In desperation I’ve tried attached patch (it adds `if (!lfinfolen)
return I_OWN_IT`; big diff is because than `if (lfinfolen)` check can be
removed and code dedented) and it worked.

Maybe this is a better approach? Because currently lock_file will delete
the lock file and then create an empty file.  With the below file,
lock_file will just notice file is there and do nothing.



From be05054ae47e74192bb3551e83d4afb2ff41f888 Mon Sep 17 00:00:00 2001
From: Michal Nazarewicz <mina86 <at> mina86.com>
Date: Fri, 16 Aug 2024 02:49:55 +0200
Subject: [PATCH] Treat empty lock files as owned by us (bug#72641)

* src/filelock.c (current_lock_owner): Rather then deleting empty lock
files, treat them as if they were owned by us.  Previous commit which
deleted the lock file instead resulted in stale lock file remaining when
saving a file.
---
 src/filelock.c | 185 ++++++++++++++++++++++++-------------------------
 1 file changed, 92 insertions(+), 93 deletions(-)

diff --git a/src/filelock.c b/src/filelock.c
index 1ae57dc7344..f9aac0dc5c5 100644
--- a/src/filelock.c
+++ b/src/filelock.c
@@ -397,110 +397,109 @@ current_lock_owner (lock_info_type *owner, Lisp_Object lfname)
   if (lfinfolen < 0)
     return errno == ENOENT || errno == ENOTDIR ? 0 : errno;
 
+  /* The lock file is empty which may be due to buggy file system, e.g.,
+     <https://bugs.gnu.org/72641>.  Treat it as us holding the lock.  */
+  if (!lfinfolen)
+	return I_OWN_IT;
+
   /* If the lock file seems valid, return a value based on its contents.  */
-  if (lfinfolen)
+  if (MAX_LFINFO < lfinfolen)
+    return ENAMETOOLONG;
+  owner->user[lfinfolen] = 0;
+
+  /* Parse USER <at> HOST.PID:BOOT_TIME.  If can't parse, return EINVAL.  */
+  /* The USER is everything before the last @.  */
+  char *at = memrchr (owner->user, '@', lfinfolen);
+  if (!at)
+    return EINVAL;
+  owner->at = at;
+  char *dot = strrchr (at, '.');
+  if (!dot)
+    return EINVAL;
+  owner->dot = dot;
+
+  /* The PID is everything from the last '.' to the ':' or equivalent.  */
+  if (! integer_prefixed (dot + 1))
+    return EINVAL;
+  errno = 0;
+  intmax_t pid = strtoimax (dot + 1, &owner->colon, 10);
+  if (errno == ERANGE)
+    pid = -1;
+
+  /* After the ':' or equivalent, if there is one, comes the boot time.  */
+  intmax_t boot_time;
+  char *boot = owner->colon + 1, *lfinfo_end;
+  switch (owner->colon[0])
     {
-      if (MAX_LFINFO < lfinfolen)
-	return ENAMETOOLONG;
-      owner->user[lfinfolen] = 0;
-
-      /* Parse USER <at> HOST.PID:BOOT_TIME.  If can't parse, return EINVAL.  */
-      /* The USER is everything before the last @.  */
-      char *at = memrchr (owner->user, '@', lfinfolen);
-      if (!at)
-	return EINVAL;
-      owner->at = at;
-      char *dot = strrchr (at, '.');
-      if (!dot)
-	return EINVAL;
-      owner->dot = dot;
+    case 0:
+      boot_time = 0;
+      lfinfo_end = owner->colon;
+      break;
 
-      /* The PID is everything from the last '.' to the ':' or equivalent.  */
-      if (! integer_prefixed (dot + 1))
+    case '\357':
+      /* Treat "\357\200\242" (U+F022 in UTF-8) like ":" (Bug#24656).
+	 This works around a bug in the Linux CIFS kernel client, which can
+	 mistakenly transliterate ':' to U+F022 in symlink contents.
+	 See <https://bugzilla.redhat.com/show_bug.cgi?id=1384153>.  */
+      if (! (boot[0] == '\200' && boot[1] == '\242'))
 	return EINVAL;
-      errno = 0;
-      intmax_t pid = strtoimax (dot + 1, &owner->colon, 10);
-      if (errno == ERANGE)
-	pid = -1;
-
-      /* After the ':' or equivalent, if there is one, comes the boot time.  */
-      intmax_t boot_time;
-      char *boot = owner->colon + 1, *lfinfo_end;
-      switch (owner->colon[0])
-	{
-	case 0:
-	  boot_time = 0;
-	  lfinfo_end = owner->colon;
-	  break;
-
-	case '\357':
-	  /* Treat "\357\200\242" (U+F022 in UTF-8) like ":" (Bug#24656).
-	     This works around a bug in the Linux CIFS kernel client, which can
-	     mistakenly transliterate ':' to U+F022 in symlink contents.
-	     See <https://bugzilla.redhat.com/show_bug.cgi?id=1384153>.  */
-	  if (! (boot[0] == '\200' && boot[1] == '\242'))
-	    return EINVAL;
-	  boot += 2;
-	  FALLTHROUGH;
-	case ':':
-	  if (! integer_prefixed (boot))
-	    return EINVAL;
-	  boot_time = strtoimax (boot, &lfinfo_end, 10);
-	  break;
-
-	default:
-	  return EINVAL;
-	}
-      if (lfinfo_end != owner->user + lfinfolen)
+      boot += 2;
+      FALLTHROUGH;
+    case ':':
+      if (! integer_prefixed (boot))
 	return EINVAL;
+      boot_time = strtoimax (boot, &lfinfo_end, 10);
+      break;
 
-      char *linkhost = at + 1;
-      ptrdiff_t linkhostlen = dot - linkhost;
-      Lisp_Object system_name = Fsystem_name ();
-      /* If `system-name' returns nil, that means we're in a
-	 --no-build-details Emacs, and the name part of the link (e.g.,
-	 .#test.txt -> larsi@.118961:1646577954) is an empty string.  */
-      bool on_current_host;
-      if (NILP (system_name))
-	on_current_host = linkhostlen == 0;
-      else
-	{
-	  on_current_host = linkhostlen == SBYTES (system_name);
-	  if (on_current_host)
-	    {
-	      /* Protect against the extremely unlikely case of the host
-		 name containing '@'.  */
-	      char *sysname = SSDATA (system_name);
-	      for (ptrdiff_t i = 0; i < linkhostlen; i++)
-		if (linkhost[i] != (sysname[i] == '@' ? '-' : sysname[i]))
-		  {
-		    on_current_host = false;
-		    break;
-		  }
-	    }
-	}
-      if (!on_current_host)
+    default:
+      return EINVAL;
+    }
+  if (lfinfo_end != owner->user + lfinfolen)
+    return EINVAL;
+
+  char *linkhost = at + 1;
+  ptrdiff_t linkhostlen = dot - linkhost;
+  Lisp_Object system_name = Fsystem_name ();
+  /* If `system-name' returns nil, that means we're in a
+     --no-build-details Emacs, and the name part of the link (e.g.,
+     .#test.txt -> larsi@.118961:1646577954) is an empty string.  */
+  bool on_current_host;
+  if (NILP (system_name))
+    on_current_host = linkhostlen == 0;
+  else
+    {
+      on_current_host = linkhostlen == SBYTES (system_name);
+      if (on_current_host)
 	{
-	  /* Not on current host.  If we wanted to support the check for
-	     stale locks on remote machines, here's where we'd do it.  */
-	  return ANOTHER_OWNS_IT;
+	  /* Protect against the extremely unlikely case of the host
+	     name containing '@'.  */
+	  char *sysname = SSDATA (system_name);
+	  for (ptrdiff_t i = 0; i < linkhostlen; i++)
+	    if (linkhost[i] != (sysname[i] == '@' ? '-' : sysname[i]))
+	      {
+		on_current_host = false;
+		break;
+	      }
 	}
+    }
+  if (!on_current_host)
+    {
+      /* Not on current host.  If we wanted to support the check for
+	 stale locks on remote machines, here's where we'd do it.  */
+      return ANOTHER_OWNS_IT;
+    }
 
-      if (pid == getpid ())
-	return I_OWN_IT;
+  if (pid == getpid ())
+    return I_OWN_IT;
 
-      if (VALID_PROCESS_ID (pid)
-	  && ! (kill (pid, 0) < 0 && errno != EPERM)
-	  && (boot_time == 0
-	      || within_one_second (boot_time, get_boot_sec ())))
-	return ANOTHER_OWNS_IT;
-    }
+  if (VALID_PROCESS_ID (pid)
+      && ! (kill (pid, 0) < 0 && errno != EPERM)
+      && (boot_time == 0
+	  || within_one_second (boot_time, get_boot_sec ())))
+    return ANOTHER_OWNS_IT;
 
-  /* The owner process is dead or has a strange pid, or the lock file is empty.
-     Try to zap the lockfile.  If the lock file is empty, this assumes
-     the file system is buggy, e.g., <https://bugs.gnu.org/72641>.
-     Emacs never creates empty lock files even temporarily, so removing
-     an empty lock file should be harmless.  */
+  /* The owner process is dead or has a strange pid.
+     Try to zap the lockfile.  */
   return emacs_unlink (SSDATA (lfname)) < 0 ? errno : 0;
 }
 
-- 
2.43.0





This bug report was last modified 277 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.