*

Checking JSON files for correctness

Posted on 27/01/2016 by Philip Withnall

tl;dr: Write a Schema for your JSON format, and use Walbottle to validate your JSON files against it.

As JSON becomes used more and more in place of XML, we need a replacement for tools like xmllint to check that JSON documents follow whatever format they are supposed to be following.

Walbottle is a tool to do this, which I’ve been working on as part of client work at Collabora. Firstly, a brief introduction to JSON Schema, then I will give an example of how to integrate Walbottle into an application. In a future post I hope to explain some of the theory behind its test vector generation.

JSON Schema is a standard for describing how a particular type of JSON document should be structured. (There’s a good introduction on the Space Telescope Science Institute.) For example, what properties should be in the top-level object in the document, and what their types should be. It is entirely analogous to XML Schema (or Relax NG). It becomes a little confusing in the fact that JSON Schema files are themselves JSON, which means that there is a JSON Schema file for validating that JSON Schema files are well-formed; this is the JSON meta-schema.

Here is an example JSON Schema file (taken from the JSON Schema website):

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
{
    "title": "Example Schema",
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string"
        },
        "lastName": {
            "type": "string"
        },
        "age": {
            "description": "Age in years",
            "type": "integer",
            "minimum": 0
        }
    },
    "required": ["firstName", "lastName"]
}


Valid instances of this JSON schema are, for example:

1
2
3
4
{
    "firstName": "John",
    "lastName": "Smith"
}


or:

1
2
3
4
5
{
    "firstName": "Jessica",
    "lastName": "Smith",
    "age": 31
}


or even:

1
2
3
4
5
{
    "firstName": "Sandy",
    "lastName": "Sanderson",
    "country": "England"
}


The final example is important: by default, JSON object instances are allowed to contain properties which are not defined in the schema (because the default value for the JSON Schema additionalProperties keyword is an empty schema, rather than false).

What does Walbottle do? It takes a JSON Schema as input, and can either:

  • check the schema is a valid JSON Schema (the json-schema-validate tool);
  • check that a JSON instance follows the schema (the json-validate tool); or
  • generate JSON instances from the schema (the json-schema-generate tool).

Why is the last option useful? Imagine you have written a library which interacts with a web API which returns JSON. You use json-glib to turn the HTTP responses into a JSON syntax tree (tree of JsonNodes), but you have your own code to navigate through that tree and extract the interesting bits of the response, such as success codes or new objects from the server. How do you know your code is correct?

Ideally, the web API author has provided a JSON Schema file which describes exactly what you should expect from one of their HTTP responses. You can use json-schema-generate to generate a set of example JSON instances which follow or subtly do not follow the schema. You can then run your code against these instances, and check whether it:

  • does not crash;
  • correctly accepts the valid JSON instances; and
  • correctly rejects the invalid JSON instances.

This should be a lot better than writing such unit tests by hand, because nobody wants to spend time doing that — and even if you do, you are almost guaranteed to miss a corner case, which leaves your code prone to crashing when given unexpected input. (Alarmists would say that it is vulnerable to attack, and that any such vulnerability of network-facing code is probably prone to escalation into arbitrary code execution.)

For the example schema above, json-schema-generate returns (amongst others) the following JSON instances:

1
2
3
4
5
{"0":null,"firstName":null}
{"lastName":[null,null],"0":null,"age":0}
{"firstName":[]}
{"lastName":"","0":null,"age":1,"firstName":""}
{"lastName":[],"0":null,"age":-1}


They include valid and invalid instances, which are designed to try and hit boundary conditions in typical json-glib-using code.

How do you integrate Walbottle into your project? Probably the easiest way is to use it to generate a C or H file of JSON test vectors, and link or #include that into a simple test program which runs your code against each of them in turn.

Here is an example, straight from the documentation. Add the following to configure.ac:

1
2
3
4
5
6
7
AC_PATH_PROG([JSON_SCHEMA_VALIDATE],[json-schema-validate])
AC_PATH_PROG([JSON_SCHEMA_GENERATE],[json-schema-generate])
 
AS_IF([test "$JSON_SCHEMA_VALIDATE" = ""],
      [AC_MSG_ERROR([json-schema-validate not found])])
AS_IF([test "$JSON_SCHEMA_GENERATE" = ""],
      [AC_MSG_ERROR([json-schema-generate not found])])


Add this to the Makefile.am for your tests:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
json_schemas = \
    my-format.schema.json \
    my-other-format.schema.json \
    $(NULL)
 
EXTRA_DIST += $(json_schemas)
 
check-json-schema: $(json_schemas)
    $(AM_V_GEN)$(JSON_SCHEMA_VALIDATE) $^
check-local: check-json-schema
.PHONY: check-json-schema
 
json_schemas_h = $(json_schemas:.schema.json=.schema.h)
BUILT_SOURCES += $(json_schemas_h)
CLEANFILES += $(json_schemas_h)
 
%.schema.h: %.schema.json
    $(AM_V_GEN)$(JSON_SCHEMA_GENERATE) \
        --c-variable-name=$(subst -,_,$(notdir $*))_json_instances \
        --format c $^ > $@
 
my_test_suite_SOURCES = my-test-suite.c
nodist_my_test_suite_SOURCES = $(json_schemas_h)


And add this to your test suite C file itself:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include "my-format.schema.h"
 
 
// Test the parser with each generated test vector from the JSON schema.
static void
test_parser_generated (gconstpointer user_data)
{
  guint i;
  GObject *parsed = NULL;
  GError *error = NULL;
 
  i = GPOINTER_TO_UINT (user_data);
 
  parsed = try_parsing_string (my_format_json_instances[i].json,
                               my_format_json_instances[i].size, &error);
 
  if (my_format_json_instances[i].is_valid)
    {
      // Assert @parsed is valid.
      g_assert_no_error (error);
      g_assert (G_IS_OBJECT (parser));
    }
  else
    {
      // Assert parsing failed.
      g_assert_error (error, SOME_ERROR_DOMAIN, SOME_ERROR_CODE);
      g_assert (parsed == NULL);
    }
 
  g_clear_error (&error);
  g_clear_object (&parsed);
}
 
 
int
main (int argc, char *argv[])
{
  guint i;
 
  
 
  for (i = 0; i < G_N_ELEMENTS (my_format_json_instances); i++)
    {
      gchar *test_name = NULL;
 
      test_name = g_strdup_printf ("/parser/generated/%u", i);
      g_test_add_data_func (test_name, GUINT_TO_POINTER (i),
                            test_parser_generated);
      g_free (test_name);
    }
 
  
}


Walbottle is heading towards being mature. There are some features of the JSON Schema standard it doesn’t yet support: $ref/definitions and format. Its main downside at the moment is speed: test vector generation is complex, and the algorithms slow down due to computational complexity with lots of nested sub-schemas (so try to design your schemas to avoid this if possible). json-schema-generate recently acquired a --show-timings option which gives debug information about each of the sub-schemas in your schema, how many JSON instances it generates, and how long that took, which gives some insight into how to optimise the schema.

Original post

Comments (0)


Add a Comment





Allowed tags: <b><i><br>Add a new comment:


Latest Blog Posts

ipcpipeline: Splitting a GStreamer pipeline into multiple processes

17/11/2017

Earlier this year I worked on a certain GStreamer plugin that is called “ipcpipeline”. This plugin provides elements that make it possible…

Quick hack: Experiments with crosvm

09/11/2017

Running crosvm outside Chromium OS is quite easy, with the only complication being that minijail isn't widely packaged in distros. In these…

Tracing memory leaks in the NFC Digital Protocol stack

06/11/2017

Kmemleak allows you to track possible memory leaks inside the Linux kernel. Basically, it tracks dynamically allocated memory blocks in…

Who knew we still had low-hanging fruit?

17/10/2017

Earlier this month I had the pleasure of attending the Web Engines Hackfest, hosted by Igalia at their offices in A Coruña, and also sponsored…

Performance analysis in Linux (continued)

06/10/2017

In this post, I will show one more example of how easy it is to disrupt performance of a modern CPU, and also run a quick discussion on…

XDC 2017 - Links to recorded presentations (videos)

23/09/2017

Many thanks to Google for recording all the XDC2017 talks. To make them easier to watch, here are direct links to each talk recorded at…

Open Since 2005

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2017. All rights reserved. Website sitemap.