#22241 - 25.0.50; etags Ruby parser problems

GNU bug report logs - #22241
25.0.50; etags Ruby parser problems

Package: emacs;

Reported by: Dmitry Gutov <dgutov <at> yandex.ru>

Date: Sat, 26 Dec 2015 04:00:02 UTC

Severity: normal

Found in version 25.0.50

Done: Dmitry Gutov <dgutov <at> yandex.ru>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Dmitry Gutov <dgutov <at> yandex.ru> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 22241 <at> debbugs.gnu.org Subject: bug#22241: 25.0.50; etags Ruby parser problems Date: Sat, 23 Jan 2016 21:23:57 +0300

On 01/23/2016 07:38 PM, Eli Zaretskii wrote: > I don't speak Ruby. So please give a more detailed spec for the > features you want added. I wrote some questions below, but I'm quite > sure there are more questions I should ask, but don't know about. So > please provide as complete specification for each feature as you > possibly can, TIA. There's no actual up-to-date language spec, and when in doubt, I fire up the REPL and try things out (and forget many of the results afterwards). So there's no "detailed spec" in my head. Let me just try my best answering your questions, for now. >> - Constants are not indexed. > > What is the full syntax of a "constant"? Is it just > > IDENTIFIER "=" INTEGER-NUMBER Pretty much. IDENTIFIER should be ALL_CAPS, or CamelCase, with underscores allowed. INTEGER-NUMBER should be just EXPRESSION, because it can be any expression, possibly a multiline one. CamelCase constants usually are assigned some "anonymous class" value, like in the following example: SpecialError = Class.new(StandardError) (Which is a metaprogramming-y way to define the class SpecialError). But you probably shouldn't worry about ALL_CAPS vs CamelCase distinction here, and just treat them the same. > ? Is whitespace significant? What about newlines? No spaces around "=" is fine. Spaces can also be replaced by tabs. A newline before "=" is not allowed. >> - Class methods (def self.foo) are given the wrong name ("self." >> shouldn't be included). > > Is it enough to remove a single "self.", case-sensitive, at the > beginning of an identifier? Can there be more than one, like > "self.self.SOMETHING"? One one "self." is allowed. When you remove it, you should record that SOMETHING is a method defined on the current class (or module). In Java terms, say, it would be like "static" method. The upshot is, it can be called on the class itself, but not on its instance: irb(main):001:0> class C irb(main):002:1> def self.foo irb(main):003:2> 3 irb(main):004:2> end irb(main):005:1> end => nil irb(main):006:0> C.foo => 3 irb(main):007:0> C.new.foo NoMethodError: undefined method `foo' for #<C:0x000000020141e8> So the qualified name of that method should be "C.foo", as opposed to "C#foo" for an instance method. > Your other example, i.e. > > def ModuleExample.singleton_module_method > > indicates that anything up to and including the period should be > removed, is that correct? More or less. This is an "explicit syntax", which is equivalent to using "self.". These two declarations are equivalent: module ModuleExample def ModuleExample.foo end end module ModuleExample def self.foo end end > Is there only one, or can there be many? There can be only one dot there. There could be a method resolution operator (::) in there, I suppose, but I'm not sure if you want to add support for that right now, or ever. > Should they all be removed for an unqualified name? Yes. >> - "class << self" blocks are given a separate entry. > > What should be done instead? Can't a class be named "<<"? A class cannot be named "<<". You should not add that line to the index, but record that the method definitions inside the following scope are defined on the current class or module. These are equivalent: class C def self.foo end end class C class << self def foo end end end >> - Qualified tag names are never generated. > > (Etags never promised qualified names except for C and derived > languages, and also in Java.) OK, that would be a nice bonus, but we can live without it. ctags doesn't define qualified names either. Without qualified names, I suppose you should treat def self.foo end and def foo end and def Class.foo end the same. Only record those as "foo". > How to know when a module's or a class's scope ends? Is it enough to > count "end" lines? Hmm, maybe? I'm guessing etags doesn't really handle heredoc syntax, or multiline strings defined with percent literals (examples here: https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals#.22Here_document.22_notation) The result shouldn't be too bad if you do that, anyway. Except: > Can I assume that "end" will always appear by > itself on a line? Unfortunately, no. It can also be on the same line, after a semicolon (or on any other line, I suppose, but nobody writes Ruby like that). Examples: class SpecialError < StandardError; end or class MyStruct < Struct.new(:a, :b, :c); end (One could also stick a method definition inside that, but I haven't seen that in practice yet). So, either: - 'end' is on a separate line (after ^[ \t]*). - class/module Name[< ]...; end$ 'end' can also be followed by "# some comment" in both cases. > Can I disregard indentation of "end" (and of > everything else) when I determine where a scope begins and ends? Probably, yes. Indentation is not significant in Ruby, but heredocs can mess up the detection of 'end' keywords, so we could use indentation as a way to detect where each scope ends. But if etags doesn't normally do that, let's not go there now. >> A >> A::B >> A::B::ABC >> A::B#foo! >> A::B.bar? >> A::B.qux= > > Why did 'foo!' get a '#' instead of a '.', as for '_bar'? It's common to use '#' in the qualified names of instance methods, in Java, Ruby and JS docstrings. '.' is used for class methods (static methods, in Java), or methods defined on other singleton objects. Examples: http://usejsdoc.org/tags-inline-link.html (search for '#' there) http://stackoverflow.com/questions/5915992/javadoc-writing-links-to-methods http://docs.ruby-lang.org/en/2.1.0/RDoc/Markup.html#class-RDoc::Markup-label-Links (the documentation also says to use ":: for class methods", but let's not do that) > Why doesn't > "class << self" count as a class scope, and add something to qualified > names? It just served to turn 'qux=' into a class (static) method. >> should become (the unqualified version): >> >> A >> foo >> bar= >> tee >> tee= >> qux >> >> All attr_* methods can take a variable number of arguments. The parser >> should take each argument, check that it's a symbol and not a variable >> (starts with :), and if so, record the corresponding method name. > > Why did 'bar' and 'tee' git a '=' appended? Because 'attr_writer :bar' effectively expands to def bar=(val) @bar = val end and 'attr_accessor :tee' expands into def tee @tee end def tee=(val) @tee = val end > Are there any other such "append rules"? There are other macros (any code can define a macro), but let's not worry about them now.

This bug report was last modified 9 years and 162 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #22241 25.0.50; etags Ruby parser problems

GNU bug report logs - #22241
25.0.50; etags Ruby parser problems