GNU bug report logs -
#74994
Improve Emacs iCalendar support
Previous Next
Full log
View this message in rfc822 format
>> In `bindat.el` I used a similar approach except that each construct (I
>> guess in your case, that means each "type") is stored as a method (in
>> a generic function) instead of a property of a symbol. I'm not sure
>> it's the perfect solution, but it's nice that `C-h o` on the generic
>> function can then provide a documentation of each of the constructs.
>
> This would mean relying more heavily on cl-lib, correct? Generic
> functions and methods are part of cl-lib's CLOS implementation?
No. They (ab)use the "cl-" prefix for historical reasons (EIEIO had
already used the non-prefixed `defgeneric/defmethod` names), but
`cl-generic.el` is not part of cl-lib (and it is preloaded into Emacs).
> C-h o already works with my code (see the describe-symbol backend at the
> end of icalendar-parser.el), but maybe the generic functions approach is
> cleaner. I'll think about it.
>
>> Other options we use elsewhere is to use function names constructed from
>> a constant prefix plus the name of the construct, so instead of
>>
>> (funcall (get 'foo 'bar) ...)
>>
>> you might be able to macroexpand to something like
>>
>> (,(intern (format "bar %s" 'foo)) ...)
>>
>> so you get (for free) compile-time warnings when using a construct that
>> doesn't exist, and you avoid a `get` at runtime (IIRC, we use that
>> approach in `peg.el`).
> I hadn't thought of that. Would this prevent users of the library from
> defining new types after the library is compiled, though?
No, tho when the above `'foo` part is not a constant but is computed
dynamically, the code is less efficient than a funcall+get.
> It's certainly useful when debugging. Calling a pure function with M-:
> or e in the debugger to make sure it's doing what I expect is generally
> a lot easier than getting a whole buffer into the right parsing state.
🙂
> If I can declare them pure, it might also have some performance
> benefits.
I'd be surprised if it makes a measurable difference, tho.
The debugging argument is much more compelling.
> The standard says that long lines need to be "folded" (wrapped) by
> inserting a CR-LF-space sequence. It defines long lines as those longer
> than 75 *bytes*, and explicitly says that implementations need to handle
> the case where the line-wrapping sequence occurs in the middle of a
> multi-byte character. So the only safe way to unwrap lines is before a
> buffer gets decoded.
Eww!
> So far the best user interface I could come up with was to check for
> long lines when icalendar-mode starts and ask the user whether they want
> to unwrap them. If they do, it re-loads the raw data into a new buffer,
> unwraps the lines, decodes the buffer, and then re-starts icalendar-mode
> in the new buffer. But I find this pretty awkward in practice, because
> you end up with two buffers containing the same data (modulo whitespace)
> and visiting the same file, and I'm not sure how to improve this.
Maybe strongly encourage the user to save the result back into the
original file?
How common is it for multibyte sequences to split in this way?
Is it always UTF8? If it's always UTF8, then multibyte sequences split
in two *will* result in "eight-bit" byte chars, so you should be able to
recognize them reliably even in the already-decoded buffer with a regexp
along the lines of "[\200-\377]+\n [\200-\377]+" and you should then be
able handle them "directly/locally" without reloading the undecoded file.
Something like:
(while (re-search-forward "[\200-\377]+\n [\200-\377]+" nil t)
(delete-region (1- (line-beginning-position))
(1+ (line-beginning-position)))
(decode-coding-region (match-beg 0) (- (match-end 0) 2) 'utf-8))
- Stefan
This bug report was last modified 99 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.