Package: emacs;
Reported by: Richard Lawrence <rwl <at> recursewithless.net>
Date: Fri, 20 Dec 2024 13:08:02 UTC
Severity: wishlist
View this message in rfc822 format
From: Richard Lawrence <rwl <at> recursewithless.net> To: Stefan Monnier <monnier <at> iro.umontreal.ca>, Eli Zaretskii <eliz <at> gnu.org> Cc: 74994 <at> debbugs.gnu.org Subject: bug#74994: Improve Emacs iCalendar support Date: Wed, 22 Jan 2025 08:43:38 +0100
Hi Stefan, thanks for your feedback! Stefan Monnier <monnier <at> iro.umontreal.ca> writes: >>> - I've used macros (see icalendar-macs.el) to create a small "DSL" for >>> defining iCalendar types. These macros store parsing-related information for >>> each type as properties of the symbols which name them. There's a lot of >>> dynamic dispatch in the parser based on these type symbols' properties. >>> This adds some complexity but (I hope) makes the parser more "atomic"/ >>> extensible. Does this seem like a reasonable approach in general? > > It sounds like a reasonable design, yes. > > In `bindat.el` I used a similar approach except that each construct (I > guess in your case, that means each "type") is stored as a method (in > a generic function) instead of a property of a symbol. I'm not sure > it's the perfect solution, but it's nice that `C-h o` on the generic > function can then provide a documentation of each of the constructs. This would mean relying more heavily on cl-lib, correct? Generic functions and methods are part of cl-lib's CLOS implementation? C-h o already works with my code (see the describe-symbol backend at the end of icalendar-parser.el), but maybe the generic functions approach is cleaner. I'll think about it. > Other options we use elsewhere is to use function names constructed from > a constant prefix plus the name of the construct, so instead of > > (funcall (get 'foo 'bar) ...) > > you might be able to macroexpand to something like > > (,(intern (format "bar %s" 'foo)) ...) > > so you get (for free) compile-time warnings when using a construct that > doesn't exist, and you avoid a `get` at runtime (IIRC, we use that > approach in `peg.el`). I hadn't thought of that. Would this prevent users of the library from defining new types after the library is compiled, though? The iCalendar standard allows extensions in "X-" properties and components; I don't want to do anything that would make it difficult e.g. for Org to use these to encode its own data structures. >>> - I ran into one issue that feels like a design flaw: the parser separates >>> "reading" (converting a string to an Elisp value) into a function >>> distinct from the parsing function which matches that string (see e.g. >>> ical:parse-property-value in icalendar-parser.el, which calls >>> ical:read-property-value). In simple cases this nicely factors out a pure >>> function from one which depends on a lot of global buffer state; >>> but in more complicated cases the "pure" reader function depends on >>> the match data and so isn't pure at all (see e.g. ical:read-dur-value). >>> Is there a better way to do this? (Not make the distinction? Pass >>> the match data explicitly? ...?) > > Is the separation useful to users (including internal users) of the > parser? This kind of problem doesn't directly ring a bell, so I don't > have a good suggestion to make. It's certainly useful when debugging. Calling a pure function with M-: or e in the debugger to make sure it's doing what I expect is generally a lot easier than getting a whole buffer into the right parsing state. If I can declare them pure, it might also have some performance benefits. >>> - whether there's a better solution to the problem of needing to unfold >>> lines *before* a buffer containing iCalendar data is decoded >>> (is there anything like a hook that runs before decoding?) > > [ Sorry, I don't understand this question. ] The standard says that long lines need to be "folded" (wrapped) by inserting a CR-LF-space sequence. It defines long lines as those longer than 75 *bytes*, and explicitly says that implementations need to handle the case where the line-wrapping sequence occurs in the middle of a multi-byte character. So the only safe way to unwrap lines is before a buffer gets decoded. So far the best user interface I could come up with was to check for long lines when icalendar-mode starts and ask the user whether they want to unwrap them. If they do, it re-loads the raw data into a new buffer, unwraps the lines, decodes the buffer, and then re-starts icalendar-mode in the new buffer. But I find this pretty awkward in practice, because you end up with two buffers containing the same data (modulo whitespace) and visiting the same file, and I'm not sure how to improve this. Thanks again for your thoughts! Best, Richard
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.