GNU bug report logs - #67926
29.1; fail to extract ZIP subfile named with [...]

Previous Next

Package: emacs;

Reported by: awrhygty <at> outlook.com

Date: Wed, 20 Dec 2023 11:24:02 UTC

Severity: normal

Found in version 29.1

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 67926 in the body.
You can then email your comments to 67926 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Wed, 20 Dec 2023 11:24:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to awrhygty <at> outlook.com:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 20 Dec 2023 11:24:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: awrhygty <at> outlook.com
To: bug-gnu-emacs <at> gnu.org
Subject: 29.1; fail to extract ZIP subfile named with [...]
Date: Wed, 20 Dec 2023 20:23:38 +0900
If a ZIP archive has a subfile named 'file[abc].txt',
RET(archive-extract) in an archive-mode buffer fails to extract with
message: 
  caution: filename not matched:  file[abc].txt

Unfortunately if there are filea.txt, fileb.txt and filec.txt,
extraction does not report errors and the buffer of 'file[abc].txt'
contains all contents of filea.txt, fileb.txt and filec.txt, but does
not contains the contents of 'file[abc].txt'. 

This is because 'unzip.exe' treats subfilename arguments containing
'[...]' as subfilename patterns. This does not occur with '7z.exe'.


In GNU Emacs 29.1 (build 2, x86_64-w64-mingw32) of 2023-08-02 built on
 AVALON
Windowing system distributor 'Microsoft Corp.', version 10.0.19045
System Description: Microsoft Windows 10 Pro (v10.0.2009.19045.3803)

Configured using:
 'configure --with-modules --without-dbus --with-native-compilation=aot
 --without-compress-install --with-tree-sitter CFLAGS=-O2'

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG JSON LCMS2 LIBXML2 MODULES NATIVE_COMP
NOTIFY W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB

(NATIVE_COMP present but libgccjit not available)

Important settings:
  value of $LANG: JPN
  locale-coding-system: cp932

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils term/bobcat japan-util rmc iso-transl tooltip
cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
elisp-mode mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win
w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode lisp-mode prog-mode register
page tab-bar menu-bar rfn-eshadow isearch easymenu timer select
scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors
frame minibuffer nadvice seq simple cl-generic indonesian philippine
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure
cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp
files window text-properties overlay sha1 md5 base64 format env
code-pages mule custom widget keymap hashtable-print-readable backquote
threads w32notify w32 lcms2 multi-tty make-network-process
native-compile emacs)

Memory information:
((conses 16 51611 10241)
 (symbols 48 5198 0)
 (strings 32 15199 1603)
 (string-bytes 1 409290)
 (vectors 16 10773)
 (vector-slots 8 335141 17930)
 (floats 8 35 38)
 (intervals 56 228 9)
 (buffers 984 10))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Sat, 23 Dec 2023 10:17:02 GMT) Full text and rfc822 format available.

Message #8 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: awrhygty <at> outlook.com
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Sat, 23 Dec 2023 12:16:10 +0200
> From: awrhygty <at> outlook.com
> Date: Wed, 20 Dec 2023 20:23:38 +0900
> 
> 
> If a ZIP archive has a subfile named 'file[abc].txt',
> RET(archive-extract) in an archive-mode buffer fails to extract with
> message: 
>   caution: filename not matched:  file[abc].txt
> 
> Unfortunately if there are filea.txt, fileb.txt and filec.txt,
> extraction does not report errors and the buffer of 'file[abc].txt'
> contains all contents of filea.txt, fileb.txt and filec.txt, but does
> not contains the contents of 'file[abc].txt'. 
> 
> This is because 'unzip.exe' treats subfilename arguments containing
> '[...]' as subfilename patterns. This does not occur with '7z.exe'.

Is there any way of making 'unzip' extract file[abc].txt by name,  by
some kind of escaping or protecting the [...] wildcard from expansion?
If there is such a way, we could try using it (maybe); if there's no
such way, I will tag this bug "wontfix", since it isn't a problem with
Emacs, but with the Windows build of 'unzip'.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Sat, 23 Dec 2023 11:48:02 GMT) Full text and rfc822 format available.

Message #11 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 67926 <at> debbugs.gnu.org, awrhygty <at> outlook.com
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Sat, 23 Dec 2023 12:47:31 +0100
On Dez 23 2023, Eli Zaretskii wrote:

> Is there any way of making 'unzip' extract file[abc].txt by name,  by
> some kind of escaping or protecting the [...] wildcard from expansion?

$ unzip foo a\[bc\].txt
Archive:  foo.zip
caution: filename not matched:  a[bc].txt
$ unzip foo 'a\[bc\].txt'
Archive:  foo.zip
 extracting: a[bc].txt               
$ unzip foo 'a\*.txt'
Archive:  foo.zip
caution: filename not matched:  a\*.txt

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Sat, 23 Dec 2023 12:00:02 GMT) Full text and rfc822 format available.

Message #14 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: 67926 <at> debbugs.gnu.org, awrhygty <at> outlook.com
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Sat, 23 Dec 2023 13:58:54 +0200
> From: Andreas Schwab <schwab <at> linux-m68k.org>
> Cc: awrhygty <at> outlook.com,  67926 <at> debbugs.gnu.org
> Date: Sat, 23 Dec 2023 12:47:31 +0100
> 
> On Dez 23 2023, Eli Zaretskii wrote:
> 
> > Is there any way of making 'unzip' extract file[abc].txt by name,  by
> > some kind of escaping or protecting the [...] wildcard from expansion?
> 
> $ unzip foo a\[bc\].txt
> Archive:  foo.zip
> caution: filename not matched:  a[bc].txt
> $ unzip foo 'a\[bc\].txt'
> Archive:  foo.zip
>  extracting: a[bc].txt               

Thanks, but this doesn't seem to work on Windows, likely because unzip
converts backslashes into forward slashes (or something), and because
quoting 'like this' is not supported on Windows:

  D:\usr\eli>unzip wild.zip 'file\[abc\].txt'
  Archive:  wild.zip
  caution: filename not matched:  'file/[abc/].txt'

  D:\usr\eli>unzip wild.zip "file\[abc\].txt"
  Archive:  wild.zip
  caution: filename not matched:  file/[abc/].txt

  D:\usr\eli>unzip wild.zip "file\\[abc\\].txt"
  Archive:  wild.zip
  caution: filename not matched:  file//[abc//].txt

The OP's report was specifically about Windows.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Tue, 26 Dec 2023 16:02:02 GMT) Full text and rfc822 format available.

Message #17 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: awrhygty <at> outlook.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Tue, 26 Dec 2023 23:51:01 +0900
Eli Zaretskii <eliz <at> gnu.org> writes:
>> This is because 'unzip.exe' treats subfilename arguments containing
>> '[...]' as subfilename patterns. This does not occur with '7z.exe'.
>
> Is there any way of making 'unzip' extract file[abc].txt by name,  by
> some kind of escaping or protecting the [...] wildcard from expansion?
> If there is such a way, we could try using it (maybe); if there's no
> such way, I will tag this bug "wontfix", since it isn't a problem with
> Emacs, but with the Windows build of 'unzip'.

There is a tricky way to specify "file[[]abc].txt".

I think that avoiding the use of unzip.exe/zip.exe solves problems about
directory names, archive names, subfile names.
Replacing #'archive-zip-extract with the form below,
ZIP subfiles can be extracted without unzip.exe.

(defun archive-zip-extract (archive name)
  (let* ((desc archive-subfile-mode)
         (buf (current-buffer))
         (bufname (buffer-file-name)))
    (set-buffer archive-superior-buffer)
    (save-restriction
      (widen)
      (let* ((file-beg archive-proper-file-start)
             (p0 (+ file-beg (archive--file-desc-pos desc)))
             (p  (+ file-beg (archive-l-e (+ p0 42) 4)))
             (bitflags (archive-l-e (+ p  6) 2))
             (method   (archive-l-e (+ p  8) 2))
             (compsize (archive-l-e (+ p0 20) 4))
             (fn-len   (archive-l-e (+ p 26) 2))
             (ex-len   (archive-l-e (+ p 28) 2))
             (data-beg (+ p 30 fn-len ex-len))
             (data-end (+ data-beg compsize))
             (coding-system-for-read  'no-conversion)
             (coding-system-for-write 'no-conversion)
             (default-directory temporary-file-directory))
        (cond ((/= 0 (logand bitflags 1))
               (message "Subfile is encrypted"))
              ((= method 0)
               (with-current-buffer buf
                 (insert-buffer-substring archive-superior-buffer
                                          data-beg data-end)))
              ((eq method 8)
               (let ((crc-32    (buffer-substring (+ p0 16) (+ p0 20)))
                     (orig-size (buffer-substring (+ p0 24) (+ p0 28)))
                     (proc (start-process "gzip" buf "gzip" "-cd"))
                     (header "\x1f\x8b\x08\0\0\0\0\0\0\0"))
                 (set-process-sentinel proc #'ignore)
                 (process-send-string proc header)
                 (process-send-region proc data-beg data-end)
                 (process-send-string proc crc-32)
                 (process-send-string proc orig-size)
                 (process-send-eof proc)
                 (accept-process-output proc nil nil t)
                 (delete-process proc)))
              ((eq method 12)
               (call-process-region data-beg data-end
                                    "bzip2" nil buf nil "-cd"))
              (t (message "Unknown compression method")))))
    (set-buffer buf)))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Tue, 26 Dec 2023 17:27:01 GMT) Full text and rfc822 format available.

Message #20 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: awrhygty <at> outlook.com
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Tue, 26 Dec 2023 19:25:39 +0200
> From: awrhygty <at> outlook.com
> Cc: 67926 <at> debbugs.gnu.org
> Date: Tue, 26 Dec 2023 23:51:01 +0900
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> >> This is because 'unzip.exe' treats subfilename arguments containing
> >> '[...]' as subfilename patterns. This does not occur with '7z.exe'.
> >
> > Is there any way of making 'unzip' extract file[abc].txt by name,  by
> > some kind of escaping or protecting the [...] wildcard from expansion?
> > If there is such a way, we could try using it (maybe); if there's no
> > such way, I will tag this bug "wontfix", since it isn't a problem with
> > Emacs, but with the Windows build of 'unzip'.
> 
> There is a tricky way to specify "file[[]abc].txt".

That could be a good solution if it works reliably.

> I think that avoiding the use of unzip.exe/zip.exe solves problems about
> directory names, archive names, subfile names.
> Replacing #'archive-zip-extract with the form below,
> ZIP subfiles can be extracted without unzip.exe.
> 
> (defun archive-zip-extract (archive name)
>   (let* ((desc archive-subfile-mode)
>          (buf (current-buffer))
>          (bufname (buffer-file-name)))
>     (set-buffer archive-superior-buffer)
>     (save-restriction
>       (widen)
>       (let* ((file-beg archive-proper-file-start)
>              (p0 (+ file-beg (archive--file-desc-pos desc)))
>              (p  (+ file-beg (archive-l-e (+ p0 42) 4)))
>              (bitflags (archive-l-e (+ p  6) 2))
>              (method   (archive-l-e (+ p  8) 2))
>              (compsize (archive-l-e (+ p0 20) 4))
>              (fn-len   (archive-l-e (+ p 26) 2))
>              (ex-len   (archive-l-e (+ p 28) 2))
>              (data-beg (+ p 30 fn-len ex-len))
>              (data-end (+ data-beg compsize))
>              (coding-system-for-read  'no-conversion)
>              (coding-system-for-write 'no-conversion)
>              (default-directory temporary-file-directory))
>         (cond ((/= 0 (logand bitflags 1))
>                (message "Subfile is encrypted"))
>               ((= method 0)
>                (with-current-buffer buf
>                  (insert-buffer-substring archive-superior-buffer
>                                           data-beg data-end)))
>               ((eq method 8)
>                (let ((crc-32    (buffer-substring (+ p0 16) (+ p0 20)))
>                      (orig-size (buffer-substring (+ p0 24) (+ p0 28)))
>                      (proc (start-process "gzip" buf "gzip" "-cd"))
>                      (header "\x1f\x8b\x08\0\0\0\0\0\0\0"))
>                  (set-process-sentinel proc #'ignore)
>                  (process-send-string proc header)
>                  (process-send-region proc data-beg data-end)
>                  (process-send-string proc crc-32)
>                  (process-send-string proc orig-size)
>                  (process-send-eof proc)
>                  (accept-process-output proc nil nil t)
>                  (delete-process proc)))
>               ((eq method 12)
>                (call-process-region data-beg data-end
>                                     "bzip2" nil buf nil "-cd"))
>               (t (message "Unknown compression method")))))
>     (set-buffer buf)))

Thanks, but I don't think it's a good idea.  There are more
compression methods than just those 3, and some of them aren't
documented.  unzip.exe itself supports 17 methods.  So I'd rather stay
with unzip.exe than invent our own wheel.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Wed, 27 Dec 2023 14:37:01 GMT) Full text and rfc822 format available.

Message #23 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: awrhygty <at> outlook.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Wed, 27 Dec 2023 23:36:32 +0900
Eli Zaretskii <eliz <at> gnu.org> writes:

>> I think that avoiding the use of unzip.exe/zip.exe solves problems about
>> directory names, archive names, subfile names.
>> Replacing #'archive-zip-extract with the form below,
>> ZIP subfiles can be extracted without unzip.exe.
>> 
>> (defun archive-zip-extract (archive name)
>>   (let* ((desc archive-subfile-mode)
>>          (buf (current-buffer))
>>          (bufname (buffer-file-name)))
>>     (set-buffer archive-superior-buffer)
>>     (save-restriction
>>       (widen)
>>       (let* ((file-beg archive-proper-file-start)
>>              (p0 (+ file-beg (archive--file-desc-pos desc)))
>>              (p  (+ file-beg (archive-l-e (+ p0 42) 4)))
>>              (bitflags (archive-l-e (+ p  6) 2))
>>              (method   (archive-l-e (+ p  8) 2))
>>              (compsize (archive-l-e (+ p0 20) 4))
>>              (fn-len   (archive-l-e (+ p 26) 2))
>>              (ex-len   (archive-l-e (+ p 28) 2))
>>              (data-beg (+ p 30 fn-len ex-len))
>>              (data-end (+ data-beg compsize))
>>              (coding-system-for-read  'no-conversion)
>>              (coding-system-for-write 'no-conversion)
>>              (default-directory temporary-file-directory))
>>         (cond ((/= 0 (logand bitflags 1))
>>                (message "Subfile is encrypted"))
>>               ((= method 0)
>>                (with-current-buffer buf
>>                  (insert-buffer-substring archive-superior-buffer
>>                                           data-beg data-end)))
>>               ((eq method 8)
>>                (let ((crc-32    (buffer-substring (+ p0 16) (+ p0 20)))
>>                      (orig-size (buffer-substring (+ p0 24) (+ p0 28)))
>>                      (proc (start-process "gzip" buf "gzip" "-cd"))
>>                      (header "\x1f\x8b\x08\0\0\0\0\0\0\0"))
>>                  (set-process-sentinel proc #'ignore)
>>                  (process-send-string proc header)
>>                  (process-send-region proc data-beg data-end)
>>                  (process-send-string proc crc-32)
>>                  (process-send-string proc orig-size)
>>                  (process-send-eof proc)
>>                  (accept-process-output proc nil nil t)
>>                  (delete-process proc)))
>>               ((eq method 12)
>>                (call-process-region data-beg data-end
>>                                     "bzip2" nil buf nil "-cd"))
>>               (t (message "Unknown compression method")))))
>>     (set-buffer buf)))
>
> Thanks, but I don't think it's a good idea.  There are more
> compression methods than just those 3, and some of them aren't
> documented.  unzip.exe itself supports 17 methods.  So I'd rather stay
> with unzip.exe than invent our own wheel.

If unzip.exe(or an alternative external program) is necessary,
I want emacs not to load contents of archive files into archive-mode
buffer. It is waste of time and memory.
I never opened large ZIP archives of Giga Byte size.
But I would be glad to open such files with archive-mode
in a short time and with a small memory.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Wed, 27 Dec 2023 16:50:02 GMT) Full text and rfc822 format available.

Message #26 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: awrhygty <at> outlook.com
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Wed, 27 Dec 2023 18:48:57 +0200
> From: awrhygty <at> outlook.com
> Cc: 67926 <at> debbugs.gnu.org
> Date: Wed, 27 Dec 2023 23:36:32 +0900
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Thanks, but I don't think it's a good idea.  There are more
> > compression methods than just those 3, and some of them aren't
> > documented.  unzip.exe itself supports 17 methods.  So I'd rather stay
> > with unzip.exe than invent our own wheel.
> 
> If unzip.exe(or an alternative external program) is necessary,
> I want emacs not to load contents of archive files into archive-mode
> buffer. It is waste of time and memory.

unzip is necessary to extract files, but not to display the archive's
contents.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Thu, 28 Dec 2023 00:40:02 GMT) Full text and rfc822 format available.

Message #29 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: awrhygty <at> outlook.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Thu, 28 Dec 2023 09:38:57 +0900
Eli Zaretskii <eliz <at> gnu.org> writes:

>> If unzip.exe(or an alternative external program) is necessary,
>> I want emacs not to load contents of archive files into archive-mode
>> buffer. It is waste of time and memory.
>
> unzip is necessary to extract files, but not to display the archive's
> contents.

If users are expected to have unzip.exe, emacs can list subfiles without
examining archive contents as a binary file.
Users with unzip.exe don't care about whether subfiles are listed with
unzip.exe or not.

If users are not expected to have unzip.exe, they feel convenient if
subfiles are extracted without unzip.exe.
In this case, it is better archive-zip-extract's value as variable can
be a lisp function to be called in the archive-zip-extract function.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Thu, 28 Dec 2023 06:32:02 GMT) Full text and rfc822 format available.

Message #32 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: awrhygty <at> outlook.com
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Thu, 28 Dec 2023 08:31:13 +0200
> From: awrhygty <at> outlook.com
> Cc: 67926 <at> debbugs.gnu.org
> Date: Thu, 28 Dec 2023 09:38:57 +0900
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> >> If unzip.exe(or an alternative external program) is necessary,
> >> I want emacs not to load contents of archive files into archive-mode
> >> buffer. It is waste of time and memory.
> >
> > unzip is necessary to extract files, but not to display the archive's
> > contents.
> 
> If users are expected to have unzip.exe, emacs can list subfiles without
> examining archive contents as a binary file.
> Users with unzip.exe don't care about whether subfiles are listed with
> unzip.exe or not.

I see your point.  However, those decisions were made many years ago,
and have withstood the test of time since then.  So I see no reason to
make drastic changes in how we support zip archives, just because we
can, or just because other arrangements are possible.

> If users are not expected to have unzip.exe, they feel convenient if
> subfiles are extracted without unzip.exe.
> In this case, it is better archive-zip-extract's value as variable can
> be a lisp function to be called in the archive-zip-extract function.

We could consider extracting using our own code if someone writes the
code to support all the 17 methods that unzip.exe supports.
Otherwise, we would introduce a regression, and someone somewhere will
rightfully complain.

Btw, your suggested changes required gzip and bunzip2 as external
programs to support the 2 most popular compression methods.  Why
should we assume these are available more widely than unzip,
especially on Windows?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Thu, 28 Dec 2023 13:10:01 GMT) Full text and rfc822 format available.

Message #35 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: awrhygty <at> outlook.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Thu, 28 Dec 2023 22:09:23 +0900
Eli Zaretskii <eliz <at> gnu.org> writes:

>> If users are not expected to have unzip.exe, they feel convenient if
>> subfiles are extracted without unzip.exe.
>> In this case, it is better archive-zip-extract's value as variable can
>> be a lisp function to be called in the archive-zip-extract function.
>
> We could consider extracting using our own code if someone writes the
> code to support all the 17 methods that unzip.exe supports.
> Otherwise, we would introduce a regression, and someone somewhere will
> rightfully complain.
>
> Btw, your suggested changes required gzip and bunzip2 as external
> programs to support the 2 most popular compression methods.  Why
> should we assume these are available more widely than unzip,
> especially on Windows?

When I installed UnxUtils years ago, it had bzip2 and gzip, but not
unzip nor zip. Now I download it again, it has unzip and zip.

My interest is how to avoid naming problems.
There are more difficulties in Japanese.
Japanese characters in file names are normally encoded in cp932.
Encoded characters may have '[', '\' or ']' as a second byte.
  (encode-coding-string "ゼソゾ" 'cp932)
  => "\203[\203\\\203]"
Subfiles of such names can not be extracted normally.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Thu, 28 Dec 2023 14:07:02 GMT) Full text and rfc822 format available.

Message #38 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: awrhygty <at> outlook.com
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Thu, 28 Dec 2023 16:06:14 +0200
> From: awrhygty <at> outlook.com
> Cc: 67926 <at> debbugs.gnu.org
> Date: Thu, 28 Dec 2023 22:09:23 +0900
> 
> > Btw, your suggested changes required gzip and bunzip2 as external
> > programs to support the 2 most popular compression methods.  Why
> > should we assume these are available more widely than unzip,
> > especially on Windows?
> 
> When I installed UnxUtils years ago, it had bzip2 and gzip, but not
> unzip nor zip. Now I download it again, it has unzip and zip.

Windows systems don't come with UnxUtils installed anyway.

> My interest is how to avoid naming problems.
> There are more difficulties in Japanese.
> Japanese characters in file names are normally encoded in cp932.
> Encoded characters may have '[', '\' or ']' as a second byte.
>   (encode-coding-string "ゼソゾ" 'cp932)
>   => "\203[\203\\\203]"
> Subfiles of such names can not be extracted normally.

I don't think we can solve this in Emacs: non-ASCII file names in zip
archives are a mess, even before you consider the fact that zip
archives are frequently moved between systems.  For starters, how can
one know in advance what is the encoding of file names in an arbitrary
zip archive?  This will bite you even if we do everything in Emacs,
and even if someone does submit patches to implement all the
compression methods.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Thu, 28 Dec 2023 14:58:01 GMT) Full text and rfc822 format available.

Message #41 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: awrhygty <at> outlook.com
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Thu, 28 Dec 2023 16:56:51 +0200
> Cc: 67926 <at> debbugs.gnu.org
> Date: Tue, 26 Dec 2023 19:25:39 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> > From: awrhygty <at> outlook.com
> > Cc: 67926 <at> debbugs.gnu.org
> > Date: Tue, 26 Dec 2023 23:51:01 +0900
> > 
> > Eli Zaretskii <eliz <at> gnu.org> writes:
> > >> This is because 'unzip.exe' treats subfilename arguments containing
> > >> '[...]' as subfilename patterns. This does not occur with '7z.exe'.
> > >
> > > Is there any way of making 'unzip' extract file[abc].txt by name,  by
> > > some kind of escaping or protecting the [...] wildcard from expansion?
> > > If there is such a way, we could try using it (maybe); if there's no
> > > such way, I will tag this bug "wontfix", since it isn't a problem with
> > > Emacs, but with the Windows build of 'unzip'.
> > 
> > There is a tricky way to specify "file[[]abc].txt".
> 
> That could be a good solution if it works reliably.

I've now verified that it works reliably, and replaced
shell-quote-argument with this special quoting in archive-zip-extract
(but only when the program used to extract files is "unzip").

So the original problem of this bug report is now fixed, and I think
we can close this bug.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Wed, 03 Jan 2024 19:54:01 GMT) Full text and rfc822 format available.

Message #44 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: awrhygty <at> outlook.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Thu, 04 Jan 2024 04:53:26 +0900
Eli Zaretskii <eliz <at> gnu.org> writes:

>> My interest is how to avoid naming problems.
>> There are more difficulties in Japanese.
>> Japanese characters in file names are normally encoded in cp932.
>> Encoded characters may have '[', '\' or ']' as a second byte.
>>   (encode-coding-string "ゼソゾ" 'cp932)
>>   => "\203[\203\\\203]"
>> Subfiles of such names can not be extracted normally.
>
> I don't think we can solve this in Emacs: non-ASCII file names in zip
> archives are a mess, even before you consider the fact that zip
> archives are frequently moved between systems.  For starters, how can
> one know in advance what is the encoding of file names in an arbitrary
> zip archive?  This will bite you even if we do everything in Emacs,
> and even if someone does submit patches to implement all the
> compression methods.

So I need a extractor without subfile names.
It is more usefull to extract contents with broken names than unable to
extract contents at all.

And I found my unzip.exe cannot extract BZIP2 or LZMA compressed
subfiles created by python zipfile module. I doubt unzip.exe does not
work for all compression methods.

By the way, I didn't know zlib-decompress-region function.
Now subfiles compressed with deflate method can be extracted
only with elisp program.

(advice-add #'archive-zip-extract :override
            #'archive-zip-decompress-content)

(defun archive-zip-decompress-content (archive name)
  (let* ((desc archive-subfile-mode)
         (buf (current-buffer))
         (bufname (buffer-file-name)))
    (set-buffer archive-superior-buffer)
    (save-restriction
      (widen)
      (let* ((file-beg archive-proper-file-start)
             (p0 (+ file-beg (archive--file-desc-pos desc)))
             (p  (+ file-beg (archive-l-e (+ p0 42) 4)))
             (bitflags (archive-l-e (+ p  6) 2))
             (method   (archive-l-e (+ p  8) 2))
             (compsize (archive-l-e (+ p0 20) 4))
             (fn-len   (archive-l-e (+ p 26) 2))
             (ex-len   (archive-l-e (+ p 28) 2))
             (data-beg (+ p 30 fn-len ex-len))
             (data-end (+ data-beg compsize))
             (coding-system-for-read  'no-conversion)
             (coding-system-for-write 'no-conversion)
             (default-directory temporary-file-directory))
        (cond ((/= 0 (logand bitflags 1))
               (message "Subfile is encrypted"))
              ((= method 0)
               (with-current-buffer buf
                 (insert-buffer-substring archive-superior-buffer
                                          data-beg data-end)))
              ((eq method 8)
               (let ((crc-32    (buffer-substring (+ p0 16) (+ p0 20)))
                     (orig-size (buffer-substring (+ p0 24) (+ p0 28)))
                     (header "\x1f\x8b\x08\0\0\0\0\0\0\0"))
                 (with-current-buffer buf
                   (set-buffer-multibyte nil)
                   (insert header)
                   (insert-buffer-substring archive-superior-buffer
                                            data-beg data-end)
                   (insert crc-32 orig-size)
                   (zlib-decompress-region (point-min) (point-max))
                   (set-buffer-multibyte 'to))))
              ((eq method 12)
               (call-process-region data-beg data-end
                                    "bzip2" nil buf nil "-cd"))
              (t (message "Unknown compression method")))))
    (set-buffer buf)))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Wed, 03 Jan 2024 20:01:02 GMT) Full text and rfc822 format available.

Message #47 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: awrhygty <at> outlook.com
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Wed, 03 Jan 2024 22:00:03 +0200
> From: awrhygty <at> outlook.com
> Cc: 67926 <at> debbugs.gnu.org
> Date: Thu, 04 Jan 2024 04:53:26 +0900
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > I don't think we can solve this in Emacs: non-ASCII file names in zip
> > archives are a mess, even before you consider the fact that zip
> > archives are frequently moved between systems.  For starters, how can
> > one know in advance what is the encoding of file names in an arbitrary
> > zip archive?  This will bite you even if we do everything in Emacs,
> > and even if someone does submit patches to implement all the
> > compression methods.
> 
> So I need a extractor without subfile names.
> It is more usefull to extract contents with broken names than unable to
> extract contents at all.

Feel free to do it, for you personally.  But most people have other
needs: they need to extract files from zip archives like unzip program
does, and that's what Emacs gives them.  Your personal needs can be
solved with Lisp programs you write for your own use.  Here we are
talking about what arc-mode.el should do for everyone, not just for
you.  And your special needs don't necessarily mean others have the
same needs.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#67926; Package emacs. (Wed, 03 Jan 2024 20:04:02 GMT) Full text and rfc822 format available.

Message #50 received at 67926 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: awrhygty <at> outlook.com
Cc: 67926 <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Wed, 03 Jan 2024 22:02:47 +0200
> From: awrhygty <at> outlook.com
> Cc: 67926 <at> debbugs.gnu.org
> Date: Thu, 04 Jan 2024 04:53:26 +0900
> 
> By the way, I didn't know zlib-decompress-region function.
> Now subfiles compressed with deflate method can be extracted
> only with elisp program.

zlib-decompress-region is only available if Emacs was built with the
zlib library, which is an optional dependency.  We prefer not to rely
on optional libraries for features that could be useful even when the
optional dependency is not available.




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Thu, 04 Jan 2024 10:47:01 GMT) Full text and rfc822 format available.

Notification sent to awrhygty <at> outlook.com:
bug acknowledged by developer. (Thu, 04 Jan 2024 10:47:02 GMT) Full text and rfc822 format available.

Message #55 received at 67926-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: awrhygty <at> outlook.com
Cc: 67926-done <at> debbugs.gnu.org
Subject: Re: bug#67926: 29.1; fail to extract ZIP subfile named with [...]
Date: Thu, 04 Jan 2024 12:46:13 +0200
> Cc: 67926 <at> debbugs.gnu.org
> Date: Thu, 28 Dec 2023 16:56:51 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> So the original problem of this bug report is now fixed, and I think
> we can close this bug.

Now done.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 01 Feb 2024 12:24:11 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 232 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.