Arc Forumnew | comments | leaders | submitlogin
Vim syntax highlighter and filetype-plugin
9 points by fallintothis 5631 days ago | 3 comments
Figured others might like this. I got tired of staring at thousands of lines of indistinguishable Arc code, so I tried the syntax highlighter on Anarki (http://github.com/nex3/arc/tree/dca901c0cb59ebb533a511df3a844e578e4fa5d6/extras/vim). But that one's based on Vim's Scheme highlighting, which I can't stand. I like the stock Common Lisp highlighting, but the Scheme hardly does anything -- and doesn't do it very well, at that. I started by tweaking $VIMRUNTIME/syntax/lisp.vim, but it snowballed into a full-blown Arc highlighter & filetype-plugin. The two files are generated by an Arc program which, after some doing, is much nicer to use than direct Vimscript (plus, it gave me something to test).

I'm fond of the results. I was going to post some :TOhtml output of the Anarki highlighter vs mine, but I'm short on hosting at the moment, so you'll just have to see for yourself:

http://www.vim.org/scripts/script.php?script_id=2720



3 points by fallintothis 5631 days ago | link

Not to clutter my own thread, but these are some issues that I wasn't smart enough to solve before publication. I felt only one of them qualified as a bug, so I posted it to a bitbucket issue tracker: http://bitbucket.org/fallintothis/arc-vim/issues/

Known Issues:

1. The way numbers are highlighted is mostly a hack. Arc, built atop mzscheme, currently inherits the Scheme reader. As such, numbers comply with R5RS syntax, plus some mz extensions. Highlighting is somewhat complicated: reals, rationals, complex numbers, binary, octal, hexadecimal, exponentiation, etc. The way I do it is to programmatically piece together giant, naively-built regular expressions from the R5RS EBNF grammar (modified slightly for mz). Then, each RE is noisier because I wrap them in delimiter REs so that numbers won't highlight when they're in names (like foldl1, 1up, or p2p).

This might make rendering slower versus simpler, naive alternatives (see Scheme & CL highlighting); I've not noticed anything terrible, myself. Having complex numbers & such highlight correctly is awesome, so I'm inclined to stick with the grammar route, if there aren't any major issues. There could be bugs in the way I generate the RE, which would be noticeable if valid numbers didn't highlight. Again, I've not noticed any (save for the ssyntax issue discussed later), but please tell me if you do.

2. I make overzealous use of display in the Vimscript, guessing at when it should be okay to use (the documentation isn't that clear). Not sure if there are bugs here, though they might also be mitigated with different synchronization methods. You'd notice this issue if, while scrolling through a file, Vim suddenly "runs out of colors" -- highlighting everything like a string or a comment, even though the region ended somewhere above your cursor. It's an easily overlooked problem (once you jar Vim into highlighting correctly), but one I don't like much: it's hard to replicate or know what's wrong. I've not noticed any problems like this for awhile, though.

3. There are some difficult edge-cases for highlighting ssyntax vs valid numbers. See http://bitbucket.org/fallintothis/arc-vim/issue/1/highlighti... for the full explanation.

4. The g:arc_bodops option requires that Vim is compiled with Python support because I couldn't figure out a way that wasn't convoluted to iterate through regular expression matches in a buffer. Seems gross to rely on Python, but I don't know how you can beat

  for line in vim.current.buffer:
      m = re.match(regex, line)
with Vimscript. Suggestions are welcome.

After this, it's worth noting that number (and ssyntax) highlighting is still pretty accurate. It correctly highlights the following, even though you might not be able to tell at first glance what the right behavior is supposed to be:

  #e#x+e#s+e@-e#l-e                                                         ; a
  16140901064495857664-50176i.#e#x+e#s+e@-e#l-e                             ; b
  #b#e101010101111101010101201010101010000010101                            ; c
  127.0.0.1                                                                 ; d
  +inf.0@3+4i                                                               ; e
  +inf.0                                                                    ; f
  1.+i                                                                      ; g
  _.1.3                                                                     ; h
  16140901064495857664-50176.0i!#e#x+e#s+e@-e#l-e:+inf.0@1/1###.##0e4/2i+hi ; i

Whereas with the Scheme highlighter: (a) incorrectly rejects, (b) incorrectly accepts, (c) rejects for the wrong reason (more than one # mark), (d) incorrectly accepts, (e) rejects for the wrong reason (doesn't recognize mz inf constants), (f) rejects (doesn't recognize mz inf constants), (g) correctly accepts, (h & i) doesn't have ssyntax, unfair comparison.

These are just some of the tests I've run (many ripped off of mzscheme tests).

-----

3 points by fallintothis 5631 days ago | link

Miscellaneous thoughts:

1. Vim's auto-indentation for Lisp is controlled by the &lispwords variable. e.g.,

  ; :set lispwords+=def
  (def f (x)
    body)

  ; :set lispwords-=list
  (list x
        y
        z)
def forms have their arguments indented by the standard amount of space, while the arguments to list are lined up with each other.

&lispwords can be approximated by the names of macros that take a rest parameter named "body". It's not as strict as in Common Lisp, because it's just a naming convention instead of a keyword, but it's followed pretty closely in Arc's source. I just maintain a few exceptions to the rule.

But using &lispwords gives a pretty binary choice that fails on "wavy" indentation like

  (if a
        b
      c
        d
        e)
and the indentation of paired forms like the variables of

  (with (x something
         y something-else)
    body)
In all, a better equalprg would be really nice. Not something I plan on doing, but maybe some crazies out there feel like it.

2. Symbols like avg get highlighted in the deftem at the top of news.arc, even though it doesn't refer to the function avg. This can actually be really helpful, so that you know when you accidentally shadow a standard function/macro/variable. Besides, detecting this distinction in Vim seems too difficult.

3. Open question: should forms like a!b!c only highlight the exclamation points, or (current behavior) highlight past each ! as though they are quoted symbols? I think single exclamation marks like a!b should highlight past ! like a quoted symbol, since that's what the ssyntax means and it's quite readable, but nested ones will "run together" under such a scheme.

4. If, like me, you use Vim views & sessions, your old settings for existing Arc files may prevent

  au BufRead,BufNewFile *.arc setf arc
from kicking in. You can setf manually, so that the view is saved with this info, or delete the existing views so that it stops reverting the filetype from (in my case) "arc" back to just "". Took me for-fucking-ever to figure that one out. Hope this helps someone as much as it would've helped me.

-----

2 points by tc-rucho 5631 days ago | link

If only vim weren't so ugly in a dvorak layout...

-----