Package: guile;
Reported by: Ekaitz Zarraga <ekaitz <at> elenq.tech>
Date: Wed, 11 Sep 2024 22:05:01 UTC
Severity: normal
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Ekaitz Zarraga <ekaitz <at> elenq.tech> To: Ludovic Courtès <ludo <at> gnu.org> Cc: 73188 <at> debbugs.gnu.org Subject: bug#73188: PEG parser does not support full PEG grammar Date: Sun, 20 Oct 2024 22:18:52 +0200
[Message part 1 (text/plain, inline)]
Hi Ludovic, Thanks for the review On 2024-10-20 12:10, Ludovic Courtès wrote: >> +(define (Primary->defn lst for-syntax) >> + (let ((value (second lst))) >> + (case (car value) >> + ('DOT #'peg-any) >> + ('Identifier (Identifier->defn value for-syntax)) >> + ('Expression (Expression->defn value for-syntax)) >> + ('Literal (Literal->defn value for-syntax)) >> + ('Class (Class->defn value for-syntax))))) > > I get these compiler warnings: > > --8<---------------cut here---------------start------------->8--- > ice-9/peg/string-peg.scm:258:7: warning: duplicate datum quote in clause ((quote NOT) (quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) of case expression (case (car suffix) ((quote AND) (quasisyntax (followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) ((quote NOT) (quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) (else (Suffix->defn suffix for-syntax))) > ice-9/peg/string-peg.scm:277:9: warning: duplicate datum quote in clause ((quote STAR) (quasisyntax (* (unsyntax out)))) of case expression (case (caar extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR) (quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax out))))) > ice-9/peg/string-peg.scm:278:9: warning: duplicate datum quote in clause ((quote PLUS) (quasisyntax (+ (unsyntax out)))) of case expression (case (caar extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR) (quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax out))))) > ice-9/peg/string-peg.scm:284:7: warning: duplicate datum quote in clause ((quote Identifier) (Identifier->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax))) > ice-9/peg/string-peg.scm:285:7: warning: duplicate datum quote in clause ((quote Expression) (Expression->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax))) > ice-9/peg/string-peg.scm:286:7: warning: duplicate datum quote in clause ((quote Literal) (Literal->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax))) > ice-9/peg/string-peg.scm:287:7: warning: duplicate datum quote in clause ((quote NotInClass) (NotInClass->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax))) > ice-9/peg/string-peg.scm:288:7: warning: duplicate datum quote in clause ((quote Class) (Class->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax))) > --8<---------------cut here---------------end--------------->8--- > > And indeed, the correct syntax is: > > (case value (DOT …) (Identifier …) …) > > or: > > (match value ('DOT …) ('Identifier …) …) > > The former returns *unspecified* when passed a value not matched by any > clause, whereas the latter throws an error. > > So I would recommend ‘match’. I'm always in doubt with this but the thing worked. I always check in the internet how to do the case, and I always get the warning... my bad! > At any rate, that makes me wonder whether this code path is tested at > all. As written, ‘Primary->defn’ would always return *unspecified*. > Is there a test we could add? I made some tests and it works, maybe unexpectedly: scheme@(guile-user)> (define a 'b) scheme@(guile-user)> (case a ('c 3)('b 1)('d 2)) ;;; <stdin>:12:15: warning: duplicate datum quote in clause ((quote b) 1) of case expression (case a ((quote c) 3) ((quote b) 1) ((quote d) 2)) ;;; <stdin>:12:21: warning: duplicate datum quote in clause ((quote d) 2) of case expression (case a ((quote c) 3) ((quote b) 1) ((quote d) 2)) $8 = 1 While your proposal doesn't: scheme@(guile-user)> (case a (c 3)(b 1)(d 2)) While compiling expression: Syntax error: unknown file:14:8: case: invalid clause in subform (c 3) of (case a (c 3) (b 1) (d 2)) I tested further and this is what it should be: scheme@(guile-user)> (case a ((b) 1)((d) 2)) $1 = 1 I used match instead as you proposed and made it all work with no warnings and better error reporting. >> +(define (Range->defn lst for-syntax) >> (cond >> - ((list? el) >> - (cond >> - ((eq? (car el) 'peg-literal) >> - (peg-literal->defn el for-syntax)) >> - ((eq? (car el) 'peg-charclass) >> - (peg-charclass->defn el for-syntax)) >> - ((eq? (car el) 'peg-nonterminal) >> - (datum->syntax for-syntax (string->symbol (cadr el)))))) >> - ((string? el) >> + ((= 2 (length lst)) >> + (second (second lst))) >> + ((= 3 (length lst)) >> + #`(range >> + #,(Char->defn (second lst) for-syntax) >> + #,(Char->defn (third lst) for-syntax))))) > > Keep in mind that ‘length’ is O(N), and that car/first, second/cadr are > frowned upon. I would write it as: > > (match lst > ((x y) y) > ((x y z) #`(range …))) Yeah, I was very bad at `match` when I wrote this thing :( But! Now I spent some hours with it and rewrote the thing for `match`! > (Ideally with identifiers more meaningful than x, y, and z. :-)) > >> ;; Transforms the nonterminals defined in the PEG parser written as a PEG to the nonterminals defined in the PEG parser written with S-expressions. >> (define (grammar-transform x) >> @@ -69,7 +77,7 @@ >> (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg))) >> (tree-map >> grammar-transform >> - (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg))))))) >> + (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg))))))) > > What happened to the ‘grammar’ binding? I can’t see where it was coming > from. This is a very good catch! I "fixed" a missing identifier by this, but it happened to be skipping a really important check: Is our PEG written as PEG understood like our PEG definition in sexps? The `Grammar` is coming from the processing of `peg-as-peg` as a peg-string. This is what I was missing. This was hiding some errors that I fixed, and now I'm pretty confident of the thing. Wow! This was a very good one! Thanks for pointing it out. >> From 64a17be08581465d11185b4a0ca636354d2f944c Mon Sep 17 00:00:00 2001 >> From: Ekaitz Zarraga <ekaitz <at> elenq.tech> >> Date: Fri, 11 Oct 2024 14:24:30 +0200 >> Subject: [PATCH v3 2/2] PEG: Add support for `not-in-range` and [^...] >> >> Modern PEG supports inversed class like `[^a-z]` that would get any >> character not in the `a-z` range. This commit adds support for that and >> also for a new `not-in-range` PEG pattern for scheme. >> >> * module/ice-9/peg/codegen.scm (cg-not-in-range): New function. >> * module/ice-9/peg/string-peg.scm: Add support for `[^...]` >> * test-suite/tests/peg.test: Test it. >> * doc/ref/api-peg.texi: Document accordingly. > > This one LGTM. > > In addition to the issues mentioned above, could you add an entry in the > ‘NEWS’ file, probably under a new “New interfaces and functionality” > heading? Yes! Added it in both commits: In the first commit I added that PEG was improved, and in the second that `not-in-range` was added. > Thanks, > Ludo’. Attached new version of both patches. Thanks for the help, the kind of things you point out are the ones that I wanted help with! Thanks! Now I'm a better matcher. This helps me improve, Ekaitz
[v4-0002-PEG-Add-support-for-not-in-range-and.patch (text/x-patch, attachment)]
[v4-0001-PEG-Add-full-support-for-PEG-some-extensions.patch (text/x-patch, attachment)]
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.