March 20th, 2018

Quick guide to JSON operators and functions in Postgres

Postgres introduced JSON support in 9.2. And with 9.4, it released JSONB which even improved querying and indexing json fields a notch. In this post, I would like to give a quick tour of some of the most common json operators and functions which I have encountered. And some gotchas which tripped me up. I have tested them on 9.6. If you have an earlier version, please refer the documentation for any changes.

Querying json fields

Let’s start off with getting data from json keys. There are 2 operators for doing this: -> and ->>. The difference is very subtle and something which had tripped me up when I started writing postgres queries with json.

-> returns the value of a field as another json object. Whereas ->> returns the value of a field as text.

Let’s understand that with an example. Suppose you have a json object like {"a": "hobbit", "b": "elf"}.

To get the value of “a”, you can do:

test=> select '{"a": "hobbit", "b": "elf"}'::jsonb->'a';
 ?column?
----------
 "hobbit"
(1 row)

But, if you use the ->> operator, then:

test=> select '{"a": "hobbit", "b": "elf"}'::jsonb->>'a';
 ?column?
----------
 hobbit
(1 row)

Notice, the "" in the previous case. -> thinks that the return value is an object, hence quotes the result. Its usefulness becomes apparent when you have a nested json object.

test=> select '{"a": {"internal": 45}, "b": "elf"}'::jsonb->>'a'->>'internal';
ERROR:  operator does not exist: text ->> unknown
LINE 1: ...'{"a": {"internal": 45}, "b": "elf"}'::jsonb->>'a'->>'intern...
                                                             ^
HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.


test=> select '{"a": {"internal": 45}, "b": "elf"}'::jsonb->'a'->>'internal';
 ?column?
----------
 45
(1 row)

Here, the difference is clear. If you use the ->> operator and try to access the fields from it’s result, it doesn’t work. You need to use the -> operator for that. Bottom line is - If you want to get the value from a json field, use ->>, but if you need to access nested fields, use ->.

Key exist operator

You can also check whether a json field exists or not. Use the ? operator for that.

test=> select '{"a": "hobbit"}'::jsonb?'hello';
 ?column?
----------
 f
(1 row)

test=> select '{"a": "hobbit"}'::jsonb?'a';
 ?column?
----------
 t
(1 row)

Delete a key

To delete a json field, use the - operator.

test=> select '{"a": "hobbit", "b": "elf"}'::jsonb-'a';
   ?column?
--------------
 {"b": "elf"}
(1 row)

Update a key

To update a json field, you need to use the jsonb_set function.

Let’s say you have a table like this:

CREATE TABLE IF NOT EXISTS users (
	id serial PRIMARY KEY,
	full_name text NOT NULL,
	metadata jsonb
);

To update a field in the metadata column, you can do:

UPDATE USERS SET metadata=jsonb_set(metadata, '{category}', '"hobbit"') where id=1;

If the field does not exist, it will be created by default. You can also choose to disable that behavior by passing an additional flag.

UPDATE USERS SET metadata=jsonb_set(metadata, '{category}', '"hobbit"', false) where id=1;

There is a catch here. Note that we have set the metadata field to be nullable. What if you try to set a field when the value is NULL ? It fails silently !

test=> select metadata from users where id=1;
 metadata
----------

(1 row)

test=> update users set metadata=jsonb_set(metadata, '{category}', '""') where id=1;
UPDATE 1

test=> select metadata from users where id=1;
 metadata
----------

(1 row)

Either set the field to NOT NULL. Or if that is not possible, use the coalesce function.

test=> update users set metadata=jsonb_set(coalesce(metadata, '{}'), '{category}', '""') where id=1;
UPDATE 1
test=> select metadata from users where id=1;
     metadata
------------------
 {"category": ""}
(1 row)

This covers the most common use-cases of json queries that I have encountered. If you spot a mistake or if there is something else you feel need to be added, please feel free to point it out !

March 11th, 2018

Hidden goodies inside lib/pq

It has happened to all of us. You get into a habit and accept a few inconveniences and move on. It bothers you, but you procrastinate, putting it in the backburner by slapping that mental TODO note. Yet surprisingly, sometimes the solution is right in front of you.

Take my case. I have always done _ "github.com/lib/pq" in my code to use the postgres driver. The _ is to register the driver with the standard library interface. Since we usually do not actually use the pq library, one needs to use a _ to import the library without exposing the package in the code. Life went on and I didn’t even bother to look for better ways to do things. Until the time came and I screamed “There has to be a better way !”.

Indeed there was. It was the actual pq package, which I was already using but never actually imported ! Yes, I am shaking my head too . Stupidly, I had always looked at database/sql and never bothered to look at the underlying lib/pq package. Oh well, dumb mistakes are bound to happen. I learn from them and move on.

Let’s take a look at some of the goodies that I found inside the package, and how it made my postgres queries look much leaner and elegant.

Arrays

Let’s say that you have a table like this -

CREATE TABLE IF NOT EXISTS users (
	id serial PRIMARY KEY,
	comments text[]
);

Believe it or not, for the longest time, I did this to scan a postgres array -

id := 1
var rawComments string
err := db.QueryRow(`SELECT comments from users WHERE id=$1`, id).Scan(&rawComments)
if err != nil {
	return err
}
comments := strings.Split(rawComments[1:len(rawComments)-1], ",")
log.Println(id, comments)

It was ugly. But life has deadlines and I moved on. Here is the better way -

var comments []string
err := db.QueryRow(`SELECT comments from users WHERE id=$1`, id).Scan(pq.Array(&comments))
if err != nil {
	return err
}
log.Println(id, comments)

Similarly, to insert a row with an array -

id := 3
comments := []string{"marvel", "dc"}
_, err := db.Exec(`INSERT INTO users VALUES ($1, $2)`, id, pq.Array(comments))
if err != nil {
	return err
}

Null Time

Consider a table like this -

CREATE TABLE IF NOT EXISTS last_updated (
	id serial PRIMARY KEY,
	ts timestamp
);

Now if you have an entry where ts is NULL, it is extremely painful to scan it in one shot. You can use coalesce or a CTE or something of that sort. This is how I would have done it earlier -

id := 1
var ts time.Time
err := db.QueryRow(`SELECT coalesce(ts, to_timestamp(0)) from last_updated WHERE id=$1`, id).Scan(&ts)
if err != nil {
	return err
}
log.Println(id, ts, ts.IsZero()) // ts.IsZero will still be false btw !

This is far better -

id := 1
var ts pq.NullTime
err := db.QueryRow(`SELECT ts from last_updated WHERE id=$1`, id).Scan(&ts)
if err != nil {
	return err
}
if ts.Valid {
	// do something
}
log.Println(id, ts.Time, ts.Time.IsZero()) // This is true !

Errors

Structured errors are great. But the only error type check that I used to have in my tests were for ErrNoRows since that is the only useful error type exported by the database/sql package. It frustrated me to no end. Because there are so many types of DB errors like syntax errors, constraint errors, not_null errors etc. Am I forced to do the dreadful string matching ?

I made the discovery when I learnt about the # format specifier. Doing a t.Logf("%+v", err) versus t.Logf("%#v", err) makes a world of a difference.

If you have a key constraint error, the first would print

pq: duplicate key value violates unique constraint "last_updated_pkey"

whereas in case of latter

&pq.Error{Severity:"ERROR", Code:"23505", Message:"duplicate key value violates unique constraint \"last_updated_pkey\"", Detail:"Key (id)=(1) already exists.", Hint:"", Position:"", InternalPosition:"", InternalQuery:"", Where:"", Schema:"public", Table:"last_updated", Column:"", DataTypeName:"", Constraint:"last_updated_pkey", File:"nbtinsert.c", Line:"433", Routine:"_bt_check_unique"}

Aha. So there is an underlying pq.Error type. And it has error codes ! Wohoo ! Better tests !

So in this case, the way to go would be -

pqe, ok := err.(*pq.Error)
if ok != true {
	t.Fatal("unexpected type")
}
if string(pqe.Code) != "23505" {
	t.Error("unexpected error code.")
}

And that’s it ! For a more detailed look, head over to the package documentation.

Feel free to post a comment if you spot a mistake. Or if you know of some other hidden gems, let me know !

December 14th, 2017

How to shrink an AWS EBS volume

Recently, I had a requirement to shrink the disk space of a machine I had setup. We had overestimated and decided to use lesser space until the need arises. I had setup a 1TB disk initially and we wanted it to be 100GB.

I thought it would be as simple as detaching the volume, setting the new values and be done with it. Turns out you can increase the disk space, but not decrease it. Bummer, now I need to do the shrinking manually.

Disclaimer:

This is nearly taken verbatim from Matt Berther’s post https://matt.berther.io/2015/02/03/how-to-resize-aws-ec2-ebs-volumes/ combined with @sinnardem’s suggestion. But I have showed the actual command outputs and updated some steps from my experience following the process.

Note: This worked for me on an Ubuntu 16.04 OS. YMMV. Proceed with caution. Take a snapshot of your volume before you do anything.

Basic idea:

We have a 1TB filesystem. Our target is to make it 100GB.

AWS stores all your data in EBS (Elastic Block Storage) which allows detaching volumes from one machine and attaching to another. We will use this to our advantage. We will create a 100GB volume, attach this newly created volume and the original volume to a temporary machine. From inside the machine, we will copy over the data from the original to the new volume. Detach both volumes and attach this new volume to our original machine. Easy peasy.

Here we go !

Note the hostname of the current machine. It should be something like ip-a-b-c-d.
Shutdown the current machine. (Don’t forget to take the snapshot !).
Detach the volume, name it as original-volume to avoid confusion.
Create a new ec2 instance with the same OS as the current machine with 100GB of storage. Note, that it has to be in the same availability zone.
Shutdown that machine
Detach the volume from the machine, name it as new-volume to avoid confusion.
Now create another new ec2 machine, t2.micro is fine. Again, this has to be in the same availability zone.
Boot up the machine. Log in.
Attach original-volume to this machine at /dev/sdf which will become /dev/xvdf1.

Attach new-volume to this machine at /dev/sdg which will become /dev/xvdg1.

It will take some time to attach because the machines are running. Do NOT attach while the machine is shut down because it will take the original-volume to be the root partition and boot into it. We do not want that. (This happened to me).

We want the root partition to be the separate 8G disk of the t2.micro machine, and have 2 separate partitions to work with.

After the attachment is complete (you will see so in the aws ec2 console), do a lsblk. Check that you can see the partitions.
```
 $lsblk
 NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
 xvda    202:0    0    8G  0 disk
 └─xvda1 202:1    0    8G  0 part /
 xvdf    202:80   0 1000G  0 disk  --> original-volume
 └─xvdf1 202:81   0 1000G  0 part
 xvdg    202:96   0  100G  0 disk  --> new-volume
 └─xvdg1 202:97   0  100G  0 part
```
We are now all set to do the data transfer.

First, check filesystem integrity of the original volume.

ubuntu@ip-172-31-12-57:~$ sudo e2fsck -f /dev/xvdf1
e2fsck 1.42.13 (17-May-2015)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
cloudimg-rootfs: 175463/128000000 files (0.1% non-contiguous), 9080032/262143739 blocks

Resize the filesytem to the partition’s size.

ubuntu@ip-172-31-12-57:~$ sudo resize2fs -M -p /dev/xvdf1
resize2fs 1.42.13 (17-May-2015)
Resizing the filesystem on /dev/xvdf1 to 1445002 (4k) blocks.
Begin pass 2 (max = 492123)
Relocating blocks             XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 3 (max = 8000)
Scanning inode table          XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 4 (max = 31610)
Updating inode references     XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
The filesystem on /dev/xvdf1 is now 1445002 (4k) blocks long.

Take the number from the previous step and calculate how many 16MB blocks would be required.
```
ubuntu@ip-172-31-12-57:~$ echo $((1445002*4/(16*1024)))
352
```
Let’s round it off to 355.

Start the copy.

ubuntu@ip-172-31-12-57:~$ sudo dd bs=16M if=/dev/xvdf1 of=/dev/xvdg1 count=355
355+0 records in
355+0 records out
5955911680 bytes (6.0 GB, 5.5 GiB) copied, 892.549 s, 6.7 MB/s

Double check that all changes are synced to disk.
```
ubuntu@ip-172-31-12-57:~$ sync
```

Resize the new volume.

ubuntu@ip-172-31-12-57:~$ sudo resize2fs -p /dev/xvdg1
resize2fs 1.42.13 (17-May-2015)
Resizing the filesystem on /dev/xvdg1 to 26214139 (4k) blocks.
The filesystem on /dev/xvdg1 is now 26214139 (4k) blocks long.

Check for filesystem integrity.

ubuntu@ip-172-31-12-57:~$ sudo e2fsck -f /dev/xvdg1
e2fsck 1.42.13 (17-May-2015)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
cloudimg-rootfs: 175463/12800000 files (0.1% non-contiguous), 1865145/26214139 blocks

Shutdown the machine.
Detach both volumes.
Attach the new-volume to your original machine, and mount it as your boot device (/dev/sda1).
Login to the machine. You will see that the hostname is set to the machine from where you created the volume. We need to set to the original hostname.
```
sudo hostnamectl set-hostname ip-a-b-c-d
```
Reboot.

That should be it. If you find anything that has not worked for you or you have a better method, please feel free to let me know in the comments !

November 18th, 2017

Go faster with gogoproto

If you are using protocol buffers with Go and have reached a point where the serialization / deserialization has become a bottleneck in your application, fret not, you can still go faster with gogoprotobuf.

I wasn’t aware (until now !) of any such libraries which had perfect interoperability with the protocol buffer format, and still gave much better speed than using the usual protobuf Marshaler. I was so impressed after using the library that I had to blog about it.

The context of this began when some code of mine that used normal protobuf serialization started to show bottlenecks. The primary reason being the code was running on a raspberry pi with a single CPU. And the overall throughput that was desired was much lower than expected.

Now I knew about capn’proto and flatbuffers. I was considering what would be the best approach to take when I came across this benchmark and heard about gogoprotobuf.

The concept of gogoproto was simple and appealing. They use custom extensions in the proto declaration which lead to better code generation tailor made for Go. Especially if you let go of some protocol buffer contracts like nullable, you can generate even faster code. In my case, all the fields of my message were required fields. So it seemed like something I could take advantage of.

My .proto declaration changed from

syntax = "proto3";
package mypackage;

// This is the message sent to the cloud server
message ClientMessage {
	string field1 = 1;
	string field2 = 2;
	int64 timestamp = 3;
}

to this

syntax = "proto2";
package mypackage;


import "github.com/gogo/protobuf/gogoproto/gogo.proto";

option (gogoproto.gostring_all) = true;
option (gogoproto.goproto_stringer_all) = false;
option (gogoproto.stringer_all) =  true;
option (gogoproto.marshaler_all) = true;
option (gogoproto.sizer_all) = true;
option (gogoproto.unmarshaler_all) = true;

// For tests
option (gogoproto.testgen_all) = true;
option (gogoproto.equal_all) = true;
option (gogoproto.populate_all) = true;

// This is the message sent to the cloud server
message ClientMessage {
	required string field1 = 1 [(gogoproto.nullable) = false];
	required string field2 = 2 [(gogoproto.nullable) = false];
	required int64 timestamp = 3 [(gogoproto.nullable) = false];
}

Yes, using gogoproto, you cannot use proto3 if you intend to share your protobuf definitions with languages which do not support proto2, like php. That’s because proto3 does not support extensions. There is an active issue open which discusses this in further detail.

To generate the .pb.go file is not immediately straightforward. You have to set the proper proto_path, which took me some time to figure out.

protoc -I=. -I=$GOPATH/src -I=$GOPATH/src/github.com/gogo/protobuf/protobuf --gogofaster_out=. message.proto

Opened a PR here to clarify it.

Alright, time for some actual benchmarks and see if I get my money’s worth.

func BenchmarkProto(b *testing.B) {
	msg := "randomstring"
	now := time.Now().UTC().UnixNano()
	msg2 := "anotherstring"
	// wrap the msg in protobuf
	protoMsg := &ClientMessage{
		field1:    msg,
		field2:    msg2,
		timestamp: now,
	}

	for n := 0; n < b.N; n++ {
		_, err := proto.Marshal(protoMsg)
		if err != nil {
			b.Error(err)
		}
	}
}

Improvements seen across the board

name     old time/op    new time/op    delta
Proto-4     463ns ± 2%     101ns ± 1%  -78.09%  (p=0.008 n=5+5)

name     old alloc/op   new alloc/op   delta
Proto-4      264B ± 0%       32B ± 0%  -87.88%  (p=0.008 n=5+5)

name     old allocs/op  new allocs/op  delta
Proto-4      4.00 ± 0%      1.00 ± 0%  -75.00%  (p=0.008 n=5+5)

Lesser memory, more speed. Greater happiness.

October 3rd, 2017

A small memory optimization for log-heavy applications

Sometimes, I randomly browse through Go source code just to look for any patterns or best practices. I was doing that recently with the log package when I came across an interesting observation that I wanted to share.

Any call to log.Print or log.Println or any of its sister functions is actually a wrapper around the equivalent S call from the fmt package. The final output of that is then passed to an Output function, which is actually responsible for writing out the string to the underlying writer.

Here is some code to better explain what I’m talking about -

// Print calls l.Output to print to the logger.
// Arguments are handled in the manner of fmt.Print.
func (l *Logger) Print(v ...interface{}) { l.Output(2, fmt.Sprint(v...)) }
// Println calls l.Output to print to the logger.
// Arguments are handled in the manner of fmt.Println.
func (l *Logger) Println(v ...interface{}) { l.Output(2, fmt.Sprintln(v...)) }

This means that if I just have one string to print, I can directly call the Output function and bypass this entire Sprinting process.

Lets whip up some benchmarks and analyse exactly how much of an overhead is taken by the fmt call -

func BenchmarkLogger(b *testing.B) {
	logger := log.New(ioutil.Discard, "[INFO] ", log.LstdFlags)
	errmsg := "hi this is an error msg"
	for n := 0; n < b.N; n++ {
		logger.Println(errmsg)
	}
}

If we look into the cpu profile from this benchmark -

profile-println

Its hard to figure out what’s going on. But the key takeaway here is that huge portion of the function calls circled in red is what’s happening from the Sprintln call. If you zoom in to the attached svg here, you can see lot of time being spent on getting and putting back the buffer to the pool and some more time being spent on formatting the string.

Now, if we compare this to a benchmark by directly calling the Output function -

func BenchmarkLogger(b *testing.B) {
	logger := log.New(ioutil.Discard, "[INFO] ", log.LstdFlags)
	errmsg := "hi this is an error msg"
	for n := 0; n < b.N; n++ {
		logger.Output(1, errmsg) // 1 is the call depth used to print the source file and line number
	}
}

profile-output

Bam. The entire portion due to the SPrintln call is gone.

Time to actually compare the 2 benchmarks and see how they perform.

func BenchmarkLogger(b *testing.B) {
	logger := log.New(ioutil.Discard, "[INFO] ", log.LstdFlags)
	testData := []struct {
		test string
		data string
	}{
		{"short-str", "short string"},
		{"medium-str", "this can be a medium sized string"},
		{"long-str", "just to see how much difference a very long string makes"},
	}

	for _, item := range testData {
		b.Run(item.test, func(b *testing.B) {
			b.SetBytes(int64(len(item.data)))
			for n := 0; n < b.N; n++ {
				// logger.Println(str) // Switched between these lines to compare
				logger.Output(1, item.data)
			}
		})
	}
}

name                 old time/op    new time/op     delta
Logger/short-str-4   457ns ± 2%      289ns ± 0%   -36.76%  (p=0.016 n=5+4)
Logger/medium-str-4  465ns ± 0%      291ns ± 0%   -37.30%  (p=0.000 n=4+5)
Logger/long-str-4    471ns ± 1%      291ns ± 2%   -38.35%  (p=0.008 n=5+5)

name                 old speed      new speed       delta
Logger/short-str-4   26.3MB/s ± 2%   41.5MB/s ± 0%   +58.07%  (p=0.016 n=5+4)
Logger/medium-str-4  70.9MB/s ± 0%  113.1MB/s ± 1%   +59.40%  (p=0.016 n=4+5)
Logger/long-str-4    119MB/s ± 0%    192MB/s ± 2%   +62.14%  (p=0.008 n=5+5)

name                 old alloc/op   new alloc/op    delta
Logger/short-str-4   32.0B ± 0%       0.0B       -100.00%  (p=0.008 n=5+5)
Logger/medium-str-4  64.0B ± 0%       0.0B       -100.00%  (p=0.008 n=5+5)
Logger/long-str-4    80.0B ± 0%       0.0B       -100.00%  (p=0.008 n=5+5)

name                 old allocs/op  new allocs/op   delta
Logger/short-str-4   2.00 ± 0%       0.00       -100.00%  (p=0.008 n=5+5)
Logger/medium-str-4  2.00 ± 0%       0.00       -100.00%  (p=0.008 n=5+5)
Logger/long-str-4    2.00 ± 0%       0.00       -100.00%  (p=0.008 n=5+5)

More or less what was expected. It removes the allocations entirely by bypassing the fmt calls. So, the larger of a string you have, the more you save. And also, the time difference increases as the string size increases.

But as you might have already figured out, this is just optimizing a corner case. Some of the limitations of this approach are:

It is only applicable when you just have a single string and directly printing that. The moment you move to creating a formatted string, you need to call fmt.Sprintf and you deal with the pp buffer pool again.
It is only applicable when you are using the log package to write to an underlying writer. If you are calling the methods of the writer struct directly, then all of this is already taken care of.
It hurts readability too. logger.Println(msg) is certainly much more readable and clear than logger.Output(1, msg).

I only had a couple of cases like this in my code’s hot path. And in top-level benchmarks, they don’t have much of an impact. But in situations, where you have a write-heavy application and a whole lot of plain strings are being written, you might look into using this and see if it gives you any benefit.