GNU bug report logs - #73188
PEG parser does not support full PEG grammar

Previous Next

Package: guile;

Reported by: Ekaitz Zarraga <ekaitz <at> elenq.tech>

Date: Wed, 11 Sep 2024 22:05:01 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Ekaitz Zarraga <ekaitz <at> elenq.tech>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 73188 <at> debbugs.gnu.org
Subject: bug#73188: PEG parser does not support full PEG grammar
Date: Sun, 20 Oct 2024 22:18:52 +0200
[Message part 1 (text/plain, inline)]
Hi Ludovic,

Thanks for the review

On 2024-10-20 12:10, Ludovic Courtès wrote:

>> +(define (Primary->defn lst for-syntax)
>> +  (let ((value (second lst)))
>> +    (case (car value)
>> +      ('DOT        #'peg-any)
>> +      ('Identifier (Identifier->defn value for-syntax))
>> +      ('Expression (Expression->defn value for-syntax))
>> +      ('Literal    (Literal->defn value for-syntax))
>> +      ('Class      (Class->defn value for-syntax)))))
> 
> I get these compiler warnings:
> 
> --8<---------------cut here---------------start------------->8---
> ice-9/peg/string-peg.scm:258:7: warning: duplicate datum quote in clause ((quote NOT) (quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) of case expression (case (car suffix) ((quote AND) (quasisyntax (followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) ((quote NOT) (quasisyntax (not-followed-by (unsyntax (Suffix->defn (third lst) for-syntax))))) (else (Suffix->defn suffix for-syntax)))
> ice-9/peg/string-peg.scm:277:9: warning: duplicate datum quote in clause ((quote STAR) (quasisyntax (* (unsyntax out)))) of case expression (case (caar extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR) (quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax out)))))
> ice-9/peg/string-peg.scm:278:9: warning: duplicate datum quote in clause ((quote PLUS) (quasisyntax (+ (unsyntax out)))) of case expression (case (caar extra) ((quote QUESTION) (quasisyntax (? (unsyntax out)))) ((quote STAR) (quasisyntax (* (unsyntax out)))) ((quote PLUS) (quasisyntax (+ (unsyntax out)))))
> ice-9/peg/string-peg.scm:284:7: warning: duplicate datum quote in clause ((quote Identifier) (Identifier->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax)))
> ice-9/peg/string-peg.scm:285:7: warning: duplicate datum quote in clause ((quote Expression) (Expression->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax)))
> ice-9/peg/string-peg.scm:286:7: warning: duplicate datum quote in clause ((quote Literal) (Literal->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax)))
> ice-9/peg/string-peg.scm:287:7: warning: duplicate datum quote in clause ((quote NotInClass) (NotInClass->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax)))
> ice-9/peg/string-peg.scm:288:7: warning: duplicate datum quote in clause ((quote Class) (Class->defn value for-syntax)) of case expression (case (car value) ((quote DOT) (syntax peg-any)) ((quote Identifier) (Identifier->defn value for-syntax)) ((quote Expression) (Expression->defn value for-syntax)) ((quote Literal) (Literal->defn value for-syntax)) ((quote NotInClass) (NotInClass->defn value for-syntax)) ((quote Class) (Class->defn value for-syntax)))
> --8<---------------cut here---------------end--------------->8---
> 
> And indeed, the correct syntax is:
> 
>    (case value (DOT …) (Identifier …) …)
> 
> or:
> 
>    (match value ('DOT …) ('Identifier …) …)
> 
> The former returns *unspecified* when passed a value not matched by any
> clause, whereas the latter throws an error.
> 
> So I would recommend ‘match’.

I'm always in doubt with this but the thing worked.
I always check in the internet how to do the case, and I always get the 
warning... my bad!

> At any rate, that makes me wonder whether this code path is tested at
> all.  As written, ‘Primary->defn’ would always return *unspecified*.
> Is there a test we could add?

I made some tests and it works, maybe unexpectedly:

	scheme@(guile-user)> (define a 'b)
	scheme@(guile-user)> (case a ('c 3)('b 1)('d 2))
	;;; <stdin>:12:15: warning: duplicate datum quote in clause ((quote b) 
1) of case expression (case a ((quote c) 3) ((quote b) 1) ((quote d) 2))
	;;; <stdin>:12:21: warning: duplicate datum quote in clause ((quote d) 
2) of case expression (case a ((quote c) 3) ((quote b) 1) ((quote d) 2))
	$8 = 1

While your proposal doesn't:
	
	scheme@(guile-user)> (case a (c 3)(b 1)(d 2))
	While compiling expression:
	Syntax error:
	unknown file:14:8: case: invalid clause in subform (c 3) of (case a (c 
3) (b 1) (d 2))

I tested further and this is what it should be:

	scheme@(guile-user)> (case a ((b) 1)((d) 2))
	$1 = 1

I used match instead as you proposed and made it all work with no 
warnings and better error reporting.

>> +(define (Range->defn lst for-syntax)
>>     (cond
>> -   ((list? el)
>> -    (cond
>> -     ((eq? (car el) 'peg-literal)
>> -      (peg-literal->defn el for-syntax))
>> -     ((eq? (car el) 'peg-charclass)
>> -      (peg-charclass->defn el for-syntax))
>> -     ((eq? (car el) 'peg-nonterminal)
>> -      (datum->syntax for-syntax (string->symbol (cadr el))))))
>> -   ((string? el)
>> +    ((= 2 (length lst))
>> +     (second (second lst)))
>> +    ((= 3 (length lst))
>> +     #`(range
>> +         #,(Char->defn (second lst) for-syntax)
>> +         #,(Char->defn (third lst) for-syntax)))))
> 
> Keep in mind that ‘length’ is O(N), and that car/first, second/cadr are
> frowned upon.  I would write it as:
> 
>    (match lst
>      ((x y) y)
>      ((x y z) #`(range …)))

Yeah, I was very bad at `match` when I wrote this thing :(
But! Now I spent some hours with it and rewrote the thing for `match`!

> (Ideally with identifiers more meaningful than x, y, and z. :-))
> 
>>   ;; Transforms the nonterminals defined in the PEG parser written as a PEG to the nonterminals defined in the PEG parser written with S-expressions.
>>   (define (grammar-transform x)
>> @@ -69,7 +77,7 @@
>>       (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg)))
>>       (tree-map
>>        grammar-transform
>> -     (peg:tree (match-pattern grammar (@@ (ice-9 peg) peg-as-peg)))))))
>> +     (peg:tree (match-pattern (@@ (ice-9 peg) peg-grammar) (@@ (ice-9 peg) peg-as-peg)))))))
> 
> What happened to the ‘grammar’ binding?  I can’t see where it was coming
> from.

This is a very good catch!
I "fixed" a missing identifier by this, but it happened to be skipping a 
really important check: Is our PEG written as PEG understood like our 
PEG definition in sexps?

The `Grammar` is coming from the processing of `peg-as-peg` as a 
peg-string. This is what I was missing.

This was hiding some errors that I fixed, and now I'm pretty confident 
of the thing. Wow! This was a very good one! Thanks for pointing it out.

>>  From 64a17be08581465d11185b4a0ca636354d2f944c Mon Sep 17 00:00:00 2001
>> From: Ekaitz Zarraga <ekaitz <at> elenq.tech>
>> Date: Fri, 11 Oct 2024 14:24:30 +0200
>> Subject: [PATCH v3 2/2] PEG: Add support for `not-in-range` and [^...]
>>
>> Modern PEG supports inversed class like `[^a-z]` that would get any
>> character not in the `a-z` range. This commit adds support for that and
>> also for a new `not-in-range` PEG pattern for scheme.
>>
>> * module/ice-9/peg/codegen.scm (cg-not-in-range): New function.
>> * module/ice-9/peg/string-peg.scm: Add support for `[^...]`
>> * test-suite/tests/peg.test: Test it.
>> * doc/ref/api-peg.texi: Document accordingly.
> 
> This one LGTM.
> 
> In addition to the issues mentioned above, could you add an entry in the
> ‘NEWS’ file, probably under a new “New interfaces and functionality”
> heading?

Yes!
Added it in both commits: In the first commit I added that PEG was 
improved, and in the second that `not-in-range` was added.

> Thanks,
> Ludo’.

Attached new version of both patches.

Thanks for the help, the kind of things you point out are the ones that 
I wanted help with! Thanks! Now I'm a better matcher.

This helps me improve,
Ekaitz
[v4-0002-PEG-Add-support-for-not-in-range-and.patch (text/x-patch, attachment)]
[v4-0001-PEG-Add-full-support-for-PEG-some-extensions.patch (text/x-patch, attachment)]

This bug report was last modified 143 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.