I’ve been trying to debug some graph algorithms I’m developing, and got to the point where I needed to look at the graphs to work out what is going on. In the past I’ve used graphviz/dot to do this. So, now that I mainly work in scala, I had a hunt about for a scala graphviz library and came up empty.
Currently, there are two packages, dsl and exec. The dsl package contains the DOT data model, parser and pretty-printer together with extra glue needed to make writing the scala source describing graphs less painful. The exec package manages execution of the Graphviz tools. For 99% of uses, it is enough to import the two package objects.
Here’s an example that generates a graph, and then pipes it through DOT to get layout information added:
The dot2dotG call takes a Graph, sends it to dot and then parses the result back into a Graph. This has the effect of adding all the style and layout annotations from dot.
Some notes on the design
The DOT file format is relatively simple. I’ve taken the publicly available grammar and created scala data structures for it, retaining the structure where it makes sense and substituting idiomatic scala where it makes sense. The whole data model is in the Graph.scala file. It’s entirely made up of case classes and case objects. To make it less verbose to code against, I’ve used a mixture of alternative constructors (in companion objects) and implicit conversions (in the dsl package object). With use, I expect that more of these will be added.
I could have coded the parser and pretty-printer directly against this data model. However, sometimes we work with very large graphs, and it would be a shame to have the data twice, once in the native representation and once as a dsl.Graph structure. So, I’ve abstracted out the concrete representation from the logical one.
Firstly, Dot.scala defines types for the different parts of the data structure. The names are all prefixed with T_, to avoid any name clashes later. So, the graph type is T_Graph. This trait is directly extended by both DotConstructors and DotDestructors.
The constructors trait specifies a whole load of handle_ methods for building an instance from the data fields. The signatures for these mirror those in the data model, but with Dot types substituted for the concrete types. For example, there’s a handler for graph named handle_graph that returns a T_Graph, and uses T_GraphType where the Graph constructor uses GraphType.
DotParser extends DotConstructors, and provides a parser for DOT text files. It uses RegexParsers, and whenever productions match, the corresponding handle_ method is used to process the input. This means that any data-structure for representing the graph that has a DotConstructors binding can be built using the parser.
The destructors trait decomposes objects into their fields. It is like the unapply method in scala. So, to find out what parts are in a graph, you’d call decompose_graph(g: T_Graph) and get back a tuple of its fields. This means that any data structure could be viewed as a DOT graph simply by providing the destructor instance.
Extending this is the DotRenderer trait. This provides render_ methods that decompose structures into their component parts and then renders them to an Appendable as DOT text. Again, because this goes through the DotDestructors interface, it will work on any data structures that have the appropriate bindings defined.
Finally, bindings to the dot Graph data model are defined in the DotAst.scala file. This provides DotAst, which binds T-Graph to Graph and so on. DotAstConstructors mixes in DotConstructors and DotAstBuilder routs all the handle_ methods to the relevant constructor. DotAstParser extends DotAstBuilder, mixing in DotParser, and as if by magic we have a parser that builds the Graph data model from DOT text. Next, DotAstDestructors provides all of the decompose_ handlers and DotAstRenderer extends this, mixing in DotRenderer to provide the pretty-print handler.
Facades in package objects
The dsl package is over-engineered for the common case of working directly with the dot data model. It’s done for a good reason – I expect to be working with very large graphs, and want to decouple the representation from the rendering and parsing. However, for the simple case, it’s far to much work to use. So, in the package object for dsl there are some convenience methods for parsing and rendering Graph.
Similarly, the exec package lets you use much of the power of the graphviz applications, but this is over-kill for the most common uses. To hide this, the exec object provides access to piping strings or graphs through the dot command-line tool.
I expect that as I use this, and possibly as other people give feedback, I’ll beef up these facades to cover a wider range of common cases.
What have we all learned?
I thought I’d get this done in 2 days. It has taken me a week. So much for accurately estimating development time. The split between the abstract and concrete data model is really useful, but made everything slower to code up, and the IntellijIdea scala plugin isn’t quite clever enough with types to be able to get its highlighting right. It took me several goes to get the specs tests to run correctly from maven test, but it’s all working now and if you care, you can check in my pom and the specs themsevles to see how I did it.
It was certainly worth writing this binding. I’ve already used it to identify a bug in my graph algorithms that I would probably never have spotted looking at text. The next thing to do is to see about integrating it more fully with my graph API, also hosted on github.
If you like this or get it to run on your box, please do let me know. I’m a sucker for feedback.