GNU bug report logs - #73188
PEG parser does not support full PEG grammar

Previous Next

Package: guile;

Reported by: Ekaitz Zarraga <ekaitz <at> elenq.tech>

Date: Wed, 11 Sep 2024 22:05:01 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #26 received at 73188 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Ekaitz Zarraga <ekaitz <at> elenq.tech>
Cc: 73188 <at> debbugs.gnu.org
Subject: Re: bug#73188: PEG parser does not support full PEG grammar
Date: Sun, 20 Oct 2024 12:10:47 +0200
Egun on Ekaitz,

Ekaitz Zarraga <ekaitz <at> elenq.tech> skribis:

> - What it is true is if people were using `peg-as-peg` for their
>   things, which isn't even exported by the module, their code would
>   have problems. But we can't predict that.

Since ‘peg-as-peg’ has always been private, that’s fine.

> From 81944c2037e0d6d8c972def2ceb1aa4c51a61431 Mon Sep 17 00:00:00 2001
> From: Ekaitz Zarraga <ekaitz <at> elenq.tech>
> Date: Wed, 11 Sep 2024 21:19:26 +0200
> Subject: [PATCH v3 1/2] PEG: Add full support for PEG + some extensions
>
> This commit adds support for PEG as described in:
>
>     <https://bford.info/pub/lang/peg.pdf>
>
> It adds support for the missing features (comments, underscores in
> identifiers and escaping) while keeping the extensions (dashes in
> identifiers, < and <--).
>
> The naming system tries to be as close as possible to the one proposed
> in the paper.
>
> * module/ice-9/peg/string-peg.scm: Rewrite PEG parser.
> * test-suite/tests/peg.test: Fix import

[...]

> +(define (Primary->defn lst for-syntax)
> +  (let ((value (second lst)))
> +    (case (car value)
> +      ('DOT        #'peg-any)
> +      ('Identifier (Identifier->defn value for-syntax))
> +      ('Expression (Expression->defn value for-syntax))
> +      ('Literal    (Literal->defn value for-syntax))
> +      ('Class      (Class->defn value for-syntax)))))

I get these compiler warnings:

--8<---------------cut here---------------start------------->8---
ice-9/peg/string-peg.scm:258:7: warning: duplicate datum quote in clause ((quote NOT) (quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) of case expression (case (car suffix) ((quote AND) (quasisyntax (followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) ((quote NOT) (quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) (else (Suffix->defn suffix for-syntax)))
ice-9/peg/string-peg.scm:277:9: warning: duplicate datum quote in clause ((quote STAR) (quasisyntax (* (unsyntax out)))) of case expression (case (caar extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR) (quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax out)))))
ice-9/peg/string-peg.scm:278:9: warning: duplicate datum quote in clause ((quote PLUS) (quasisyntax (+ (unsyntax out)))) of case expression (case (caar extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR) (quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax out)))))
ice-9/peg/string-peg.scm:284:7: warning: duplicate datum quote in clause ((quote Identifier) (Identifier->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax)))
ice-9/peg/string-peg.scm:285:7: warning: duplicate datum quote in clause ((quote Expression) (Expression->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax)))
ice-9/peg/string-peg.scm:286:7: warning: duplicate datum quote in clause ((quote Literal) (Literal->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax)))
ice-9/peg/string-peg.scm:287:7: warning: duplicate datum quote in clause ((quote NotInClass) (NotInClass->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax)))
ice-9/peg/string-peg.scm:288:7: warning: duplicate datum quote in clause ((quote Class) (Class->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax)))
--8<---------------cut here---------------end--------------->8---

And indeed, the correct syntax is:

  (case value (DOT …) (Identifier …) …)

or:

  (match value ('DOT …) ('Identifier …) …)

The former returns *unspecified* when passed a value not matched by any
clause, whereas the latter throws an error.

So I would recommend ‘match’.

At any rate, that makes me wonder whether this code path is tested at
all.  As written, ‘Primary->defn’ would always return *unspecified*.
Is there a test we could add?

> +(define (Range->defn lst for-syntax)
>    (cond
> -   ((list? el)
> -    (cond
> -     ((eq? (car el) 'peg-literal)
> -      (peg-literal->defn el for-syntax))
> -     ((eq? (car el) 'peg-charclass)
> -      (peg-charclass->defn el for-syntax))
> -     ((eq? (car el) 'peg-nonterminal)
> -      (datum->syntax for-syntax (string->symbol (cadr el))))))
> -   ((string? el)
> +    ((= 2 (length lst))
> +     (second (second lst)))
> +    ((= 3 (length lst))
> +     #`(range
> +         #,(Char->defn (second lst) for-syntax)
> +         #,(Char->defn (third lst) for-syntax)))))

Keep in mind that ‘length’ is O(N), and that car/first, second/cadr are
frowned upon.  I would write it as:

  (match lst
    ((x y) y)
    ((x y z) #`(range …)))

(Ideally with identifiers more meaningful than x, y, and z. :-))

>  ;; Transforms the nonterminals defined in the PEG parser written as a PEG to the nonterminals defined in the PEG parser written with S-expressions.
>  (define (grammar-transform x)
> @@ -69,7 +77,7 @@
>      (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg)))
>      (tree-map
>       grammar-transform
> -     (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg)))))))
> +     (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg)))))))

What happened to the ‘grammar’ binding?  I can’t see where it was coming
from.

> From 64a17be08581465d11185b4a0ca636354d2f944c Mon Sep 17 00:00:00 2001
> From: Ekaitz Zarraga <ekaitz <at> elenq.tech>
> Date: Fri, 11 Oct 2024 14:24:30 +0200
> Subject: [PATCH v3 2/2] PEG: Add support for `not-in-range` and [^...]
>
> Modern PEG supports inversed class like `[^a-z]` that would get any
> character not in the `a-z` range. This commit adds support for that and
> also for a new `not-in-range` PEG pattern for scheme.
>
> * module/ice-9/peg/codegen.scm (cg-not-in-range): New function.
> * module/ice-9/peg/string-peg.scm: Add support for `[^...]`
> * test-suite/tests/peg.test: Test it.
> * doc/ref/api-peg.texi: Document accordingly.

This one LGTM.

In addition to the issues mentioned above, could you add an entry in the
‘NEWS’ file, probably under a new “New interfaces and functionality”
heading?

Thanks,
Ludo’.




This bug report was last modified 143 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.