On Fri, Nov 13, 2015 at 10:19 PM, Ludovic Courtès wrote: > Federico Beffa skribis: > >> --------------------------------------------------------------- >> (define (canonical-newline-port port) >> "Return an input port that wraps PORT such that all newlines consist >> of a single carriage return." >> (define (get-position) >> (if (port-has-port-position? port) (port-position port) #f)) >> (define (set-position! position) >> (if (port-has-set-port-position!? port) >> (set-port-position! position port) >> #f)) >> (define (close) (close-port port)) >> (define (read! bv start n) >> (let loop ((count 0) >> (byte (get-u8 port))) >> (cond ((or (eof-object? byte) (= count n)) count) > > BYTE is lost here in the case it is not EOF. Ooops. Thanks for catching it! > It may be best to move the (= count n) case right before the recursive > call below. > >> ((eqv? byte (char->integer #\return)) (loop count (get-u8 port))) > > In practice this discards LF even if it’s not following CR; that’s > probably a good enough approximation, but an XXX comment would be > welcome. This is intentional because, in my ignorance, I only know of uses of '\r' before or after '\n'. Do you know of any other use in text files? I've added an "XXX" comment, but I'm not sure what's its use. > >> (else >> (bytevector-u8-set! bv (+ start count) byte) >> (loop (+ count 1) (get-u8 port)))))) >> (make-custom-binary-input-port "canonical-newline-port" >> read! >> get-position >> set-position! >> close)) >> --------------------------------------------------------------- >> >> IMO this is general enough that it could go into "guix/utils.scm". Are >> you OK with this? > > Looks good! Could you make a patch that does that, along with adding a > test or two in tests/utils.scm? The attached patches fix the parsing of all but two of the failures reported by Paul. Two cabal files are still not imported correctly because they are buggy: * streaming-commons: indentation changes from 4 to 2. But this is explicitly forbidden. From [1]: "Field names may be indented, but all field values in the same section must use the same indentation." * fgl: uses braces to delimit the value of a field. As far as I understand this is not allowed by [1]: "To continue a field value, indent the next line relative to the field name." and "Flags, conditionals, library and executable sections use layout to indicate structure. ... As an alternative to using layout you can also use explicit braces {}. ". Thus I understand that braces may be used to delimit sections, not field values. Obviously the official 'cabal' program is more permissive than the description in the documentation. Regards, Fede [1] https://www.haskell.org/cabal/users-guide/developing-packages.html