How to write a Vet analyzer pass

January 21st, 2019

The Go toolchain has the vet command which can be used to to perform static checks on a codebase. But a significant problem of vet was that it was not extensible. vet was structured as a monolithic executable with a fixed suite of checkers. To overcome this, the ecosystem started developing its own tools like staticcheck and go-critic. The problem with this is that every tool has its own way to load and parse the source code. Hence, a checker written for one tool would require extensive effort to be able to run on a different driver.

During the 1.12 release cycle, a new API for static code analysis was developed: the golang.org/x/tools/go/analysis package. This creates a standard API for writing Go static analyzers, which allows them to be easily shared with the rest of the ecosystem in a plug-and-play model.

In this post, we will see how to go about writing an analyzer using this new API.

Background

SQL queries are always evaluated at runtime. As a result, if you make a syntax error in a SQL query, there is no way to catch that until you run the code or write a test for it. There was this peculiar pattern in particular, that was always tripping me up.

Let’s say I have a SQL query like:

db.Exec("insert into table (c1, c2, c3, c4) values ($1, $2, $3, $4)", p1, p2, p3, p4)

It’s the middle of the night and I need to add a new column. I quickly change the query to:

db.Exec("insert into table (c1, c2, c3, c4, c5) values ($1, $2, $3, $4)", p1, p2, p3, p4, p5).

It seems like things are fine, but I have just missed a $5. This bugged me so much that I wanted to write a vet analyzer for this to detect patterns like these and flag them.

There are other semantic checks we can apply like matching the number of positional args with the number of params passed and so on. But in this post, we will just focus on the most basic check of verifying whether a sql query is syntactically correct or not.

Layout of an analyzer

All analyzers usually expose a global variable Analyzer of type analysis.Analyzer. It is this variable which is imported by driver packages.

Let us see what it looks like:

var Analyzer = &analysis.Analyzer{
	Name:             "sqlargs",                                 // name of the analyzer
	Doc:              "check sql query strings for correctness", // documentation
	Run:              run,                                       // perform your analysis here
	Requires:         []*analysis.Analyzer{inspect.Analyzer},    // a set of analyzers which must run before the current one.
	RunDespiteErrors: true,
}

Most of the fields are self-explanatory. The actual analysis is performed by run: a function which takes an analysis.Pass as an argument. The pass variable provides information to the run function to perform its tasks and optionally pass on information to other analyzers.

It looks like:

func run(pass *analysis.Pass) (interface{}, error) {
}

Now, to run this analyzer, we will use the singlechecker package which can be used to run a single analyzer.

package main

import (
	"github.com/agnivade/sqlargs"
	"golang.org/x/tools/go/analysis/singlechecker"
)

func main() { singlechecker.Main(sqlargs.Analyzer) }

Upon successfully compiling this, you can execute the binary as a standalone tool on your codebase: sqlargs ./....

This is the standard layout of all analyzers. Let us have a look into the internals of the run function, which is where the main code analysis is performed.

Look for SQL queries

Our primary aim is to look for expressions like db.Exec("<query>") in the code base and analyze them. This requires knowledge of Go ASTs (Abstract Syntax Tree) to slice and dice the source code and extract the stuff that we need.

To help us with scavenging the codebase and filtering the AST expressions that we need, we have some tools at our disposal, viz. the go/ast/inspector package. Using this, we just specify the node type in the source code that we are interested in and it does the rest. Since this is a very common task for all analyzers, we have an inspect pass which returns an inspector for a given pass.

Let us see how that looks like:

import (
	"golang.org/x/tools/go/analysis"
	"golang.org/x/tools/go/analysis/passes/inspect"
	"golang.org/x/tools/go/ast/inspector"
)

func run(pass *analysis.Pass) (interface{}, error) {
	inspect := pass.ResultOf[inspect.Analyzer].(*inspector.Inspector)
	// We filter only function calls.
	nodeFilter := []ast.Node{
		(*ast.CallExpr)(nil),
	}

	inspect.Preorder(nodeFilter, func(n ast.Node) {
		call := n.(*ast.CallExpr)
		_ = call // work with the call expression that we have
	})
}

All expressions of the form of db.Exec("<query>") are called CallExprs. So we specify that in our nodeFilter. After that, the Preorder function will give us only CallExprs found in the codebase.

A CallExpr has two parts- Fun and Args. A Fun can either be an Ident (for example Fun()) or a SelectorExpr (for example foo.Fun()). Since we are looking for patterns like db.Exec, we need to filter only SelectorExprs.

inspect.Preorder(nodeFilter, func(n ast.Node) {
	call := n.(*ast.CallExpr)
	sel, ok := call.Fun.(*ast.SelectorExpr)
	if !ok {
		return
	}

})

Alright, so far so good. This means we have filtered all expressions of the form of type.Method() from the source code. Now we need to verify 2 things:

The function name is Exec; because that is what we are interested in.
The type of the selector is sql.DB. (To keep things simple, we will ignore the case when sql.DB is embedded in another struct).

Let us peek into the SelectorExpr to get these. A SelectorExpr again has two parts- X and Sel. If we take an example of db.Exec()- then db is X, and Exec is Sel. Matching the function name is easy. But to get the type info, we need to take help of analysis.Pass passed in the run function.

Pass contains a TypesInfo field which contain type information about the package. We need to use that to get the type of X and verify that the object comes from the database/sql package and is of type *sql.DB.

inspect.Preorder(nodeFilter, func(n ast.Node) {
	call := n.(*ast.CallExpr)
	sel, ok := call.Fun.(*ast.SelectorExpr)
	if !ok {
		return
	}

	// Get the type of X
	typ, ok := pass.TypesInfo.Types[sel.X]
	if !ok {
		return
	}

	t := typ.Type
	// If it is a pointer, get the element.
	if ptr, ok := t.(*types.Pointer); ok {
		t = ptr.Elem()
	}
	nTyp, ok := t.(*types.Named)
	if !ok {
		return
	}
})

Now, from nTyp we can get the type info of X and directly match the function name from Sel.

// Get the function name
sel.Sel.Name // == "Exec"

// Get the object name
nTyp.Obj().Name() // == "DB"

// Check the import of the object
nTyp.Obj().Pkg().Path() // == "database/sql"

Extract the query string

Alright ! We have successfully filtered out only expressions of type (*sql.DB).Exec. The only thing remaining is to extract the query string from the CallExpr and check it for syntax errors.

So far, we have been dealing with the Fun field of a CallExpr. To get the query string, we need to access Args. A db.Exec call will have the query string as its first param and the arguments follow after. We will get the first element of the Args slice and then use TypesInfo.Types again to get the value of the argument.

// Code continues from before.

arg0 := call.Args[0]
typ, ok := pass.TypesInfo.Types[arg0]
if !ok || typ.Value == nil {
	return
}

_ = constant.StringVal(typ.Value) // Gives us the query string ! (constant is from "go/constant")

Note that this doesn’t work if the query string is a variable. A lot of codebases have a query template string and generate the final query string dynamically. So, for example, the following will not be caught by our analyzer:

q := `SELECT %s FROM certificates WHERE date=$1;`
query := fmt.Sprintf(q, table)
db.Exec(query, date)

All that is left is for us to check the query string for syntax errors. We will use the github.com/lfittl/pg_query_go package for that. And if we get an error, pass has a Reportf helper method to print out diagnostics found during a vet pass. So:

query := constant.StringVal(typ.Value)
_, err := pg_query.Parse(query)
if err != nil {
	pass.Reportf(call.Lparen, "Invalid query: %v", err)
	return
}

The final result looks like this:

func run(pass *analysis.Pass) (interface{}, error) {
	inspect := pass.ResultOf[inspect.Analyzer].(*inspector.Inspector)
	// We filter only function calls.
	nodeFilter := []ast.Node{
		(*ast.CallExpr)(nil),
	}

	inspect.Preorder(nodeFilter, func(n ast.Node) {
		call := n.(*ast.CallExpr)
		sel, ok := call.Fun.(*ast.SelectorExpr)
		if !ok {
			return
		}

		// Get the type of X
		typ, ok := pass.TypesInfo.Types[sel.X]
		if !ok {
			return
		}

		t := typ.Type
		// If it is a pointer, get the element.
		if ptr, ok := t.(*types.Pointer); ok {
			t = ptr.Elem()
		}
		nTyp, ok := t.(*types.Named)
		if !ok {
			return
		}

		if sel.Sel.Name != "Exec" &&
			nTyp.Obj().Name() != "DB" &&
			nTyp.Obj().Pkg().Path() != "database/sql" {
			return
		}

		arg0 := call.Args[0]
		typ, ok = pass.TypesInfo.Types[arg0]
		if !ok || typ.Value == nil {
			return
		}

		query := constant.StringVal(typ.Value)
		_, err := pg_query.Parse(query)
		if err != nil {
			pass.Reportf(call.Lparen, "Invalid query: %v", err)
			return
		}
	})
}

Tests

The golang.org/x/tools/go/analysis/analysistest package provides several helpers to make testing of vet passes a breeze. We just need to have our sample code that we want to test in a package. That package should reside inside the testdata folder which acts as the GOPATH for the test.

Let’s say we have a file basic.go which contains db.Exec function calls that we want to test. So the folder structure needed is:

testdata
    └── src
        └── basic
            └── basic.go

To verify expected diagnostics, we just need to add comments of the form // want ".." beside the line which is expected to throw the error. So for example, this is what the file basic.go might look like-

func runDB() {
	var db *sql.DB
	defer db.Close()

	db.Exec(`INSERT INTO t (c1, c2) VALUES ($1, $2)`, p1, "const") // no error
	db.Exec(`INSERT INTO t(c1 c2) VALUES ($1, $2)`, p1, p2) // want `Invalid query: syntax error at or near "c2"`
}

And finally to run the test, we import the analysistest package and pass our analyzer, pointing to the package that we want to test.

import (
	"testing"

	"github.com/agnivade/sqlargs"
	"golang.org/x/tools/go/analysis/analysistest"
)

func TestBasic(t *testing.T) {
	testdata := analysistest.TestData()
	analysistest.Run(t, testdata, sqlargs.Analyzer, "basic") // loads testdata/src/basic
}

That’s it !

To quickly recap-

We saw the basic layout of all analyzers.
We used the inspect pass to filter the AST nodes that we want.
Once we got our node, we used the pass.TypesInfo.Type map to give us type information about an object.
We used that to verify that the received object comes from the database/sql package and is of type *sql.DB.
Then we extracted the first argument from the CallExpr and checked whether the string is a valid SQL query or not.

This was a short demo of how to go about writing a vet analyzer. Note that sql strings can also appear in other libraries like sqlx or gorm. Matching objects only with type of *sql.DB is not enough. One needs to maintain a list of type and method names to be matched. But I have kept things simple for the sake of the article. The full source code is available here. Please feel free to download and run sqlargs on your codebase. If you find a mistake in the article, please do point it out in the comments !

vet (1)

Agniva De Sarker

My journal on Software, Bugs, Tips & Tricks.