GNU bug report logs - #74994
Improve Emacs iCalendar support

Previous Next

Package: emacs;

Reported by: Richard Lawrence <rwl <at> recursewithless.net>

Date: Fri, 20 Dec 2024 13:08:02 UTC

Severity: wishlist

Full log


Message #65 received at 74994 <at> debbugs.gnu.org (full text, mbox):

From: Richard Lawrence <rwl <at> recursewithless.net>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 74994 <at> debbugs.gnu.org
Subject: Re: bug#74994: Improve Emacs iCalendar support
Date: Thu, 23 Jan 2025 19:12:20 +0100
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>> The standard says that long lines need to be "folded" (wrapped) by
>> inserting a CR-LF-space sequence. It defines long lines as those longer
>> than 75 *bytes*, and explicitly says that implementations need to handle
>> the case where the line-wrapping sequence occurs in the middle of a
>> multi-byte character. So the only safe way to unwrap lines is before a
>> buffer gets decoded.
>
> Eww!

Yes indeed. 

>> So far the best user interface I could come up with was to check for
>> long lines when icalendar-mode starts and ask the user whether they want
>> to unwrap them. If they do, it re-loads the raw data into a new buffer,
>> unwraps the lines, decodes the buffer, and then re-starts icalendar-mode
>> in the new buffer. But I find this pretty awkward in practice, because
>> you end up with two buffers containing the same data (modulo whitespace)
>> and visiting the same file, and I'm not sure how to improve this.
>
> Maybe strongly encourage the user to save the result back into the
> original file?

Yes, that's already what I do, setting buffer-file-name to point to the
original file in the new buffer as well; and there's a prompt to re-wrap
lines on save. I suppose what I could do is unconditionally kill the old
buffer and then steal its name for the new one (or just erase it and
reload the data into the same buffer), so that from the user's
perspective, it's "the same" buffer. Does that seem better?

> How common is it for multibyte sequences to split in this way?

No idea. Probably not common. 

> Is it always UTF8?

Alas, no. According to the standard, UTF-8 is the "default" encoding,
and implementations must support it, but as far as I can tell, the
standard allows using another encoding via the MIME charset parameter
(I infer this from section 8.1, which mentions the possibility).  

> If it's always UTF8, then multibyte sequences split
> in two *will* result in "eight-bit" byte chars, so you should be able to
> recognize them reliably even in the already-decoded buffer with a regexp
> along the lines of "[\200-\377]+\n [\200-\377]+" and you should then be
> able handle them "directly/locally" without reloading the undecoded file.
>
> Something like:
>
>     (while (re-search-forward "[\200-\377]+\n [\200-\377]+" nil t)
>       (delete-region (1- (line-beginning-position))
>                      (1+ (line-beginning-position)))
>       (decode-coding-region (match-beg 0) (- (match-end 0) 2) 'utf-8))

Hmm, that's an interesting idea, thanks. I will look into plugging this
into the unwrapping code, at least when the coding system is known to be
UTF-8.




This bug report was last modified 99 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.