*

Checking JSON files for correctness

Philip Withnall avatar

Posted on 27/01/2016 by Philip Withnall

Share this post:

tl;dr: Write a Schema for your JSON format, and use Walbottle to validate your JSON files against it.

As JSON becomes used more and more in place of XML, we need a replacement for tools like xmllint to check that JSON documents follow whatever format they are supposed to be following.

Walbottle is a tool to do this, which I’ve been working on as part of client work at Collabora. Firstly, a brief introduction to JSON Schema, then I will give an example of how to integrate Walbottle into an application. In a future post I hope to explain some of the theory behind its test vector generation.

JSON Schema is a standard for describing how a particular type of JSON document should be structured. (There’s a good introduction on the Space Telescope Science Institute.) For example, what properties should be in the top-level object in the document, and what their types should be. It is entirely analogous to XML Schema (or Relax NG). It becomes a little confusing in the fact that JSON Schema files are themselves JSON, which means that there is a JSON Schema file for validating that JSON Schema files are well-formed; this is the JSON meta-schema.

Here is an example JSON Schema file (taken from the JSON Schema website):

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
{
    "title": "Example Schema",
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string"
        },
        "lastName": {
            "type": "string"
        },
        "age": {
            "description": "Age in years",
            "type": "integer",
            "minimum": 0
        }
    },
    "required": ["firstName", "lastName"]
}


Valid instances of this JSON schema are, for example:

1
2
3
4
{
    "firstName": "John",
    "lastName": "Smith"
}


or:

1
2
3
4
5
{
    "firstName": "Jessica",
    "lastName": "Smith",
    "age": 31
}


or even:

1
2
3
4
5
{
    "firstName": "Sandy",
    "lastName": "Sanderson",
    "country": "England"
}


The final example is important: by default, JSON object instances are allowed to contain properties which are not defined in the schema (because the default value for the JSON Schema additionalProperties keyword is an empty schema, rather than false).

What does Walbottle do? It takes a JSON Schema as input, and can either:

  • check the schema is a valid JSON Schema (the json-schema-validate tool);
  • check that a JSON instance follows the schema (the json-validate tool); or
  • generate JSON instances from the schema (the json-schema-generate tool).

Why is the last option useful? Imagine you have written a library which interacts with a web API which returns JSON. You use json-glib to turn the HTTP responses into a JSON syntax tree (tree of JsonNodes), but you have your own code to navigate through that tree and extract the interesting bits of the response, such as success codes or new objects from the server. How do you know your code is correct?

Ideally, the web API author has provided a JSON Schema file which describes exactly what you should expect from one of their HTTP responses. You can use json-schema-generate to generate a set of example JSON instances which follow or subtly do not follow the schema. You can then run your code against these instances, and check whether it:

  • does not crash;
  • correctly accepts the valid JSON instances; and
  • correctly rejects the invalid JSON instances.

This should be a lot better than writing such unit tests by hand, because nobody wants to spend time doing that — and even if you do, you are almost guaranteed to miss a corner case, which leaves your code prone to crashing when given unexpected input. (Alarmists would say that it is vulnerable to attack, and that any such vulnerability of network-facing code is probably prone to escalation into arbitrary code execution.)

For the example schema above, json-schema-generate returns (amongst others) the following JSON instances:

1
2
3
4
5
{"0":null,"firstName":null}
{"lastName":[null,null],"0":null,"age":0}
{"firstName":[]}
{"lastName":"","0":null,"age":1,"firstName":""}
{"lastName":[],"0":null,"age":-1}


They include valid and invalid instances, which are designed to try and hit boundary conditions in typical json-glib-using code.

How do you integrate Walbottle into your project? Probably the easiest way is to use it to generate a C or H file of JSON test vectors, and link or #include that into a simple test program which runs your code against each of them in turn.

Here is an example, straight from the documentation. Add the following to configure.ac:

1
2
3
4
5
6
7
AC_PATH_PROG([JSON_SCHEMA_VALIDATE],[json-schema-validate])
AC_PATH_PROG([JSON_SCHEMA_GENERATE],[json-schema-generate])
 
AS_IF([test "$JSON_SCHEMA_VALIDATE" = ""],
      [AC_MSG_ERROR([json-schema-validate not found])])
AS_IF([test "$JSON_SCHEMA_GENERATE" = ""],
      [AC_MSG_ERROR([json-schema-generate not found])])


Add this to the Makefile.am for your tests:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
json_schemas = \
    my-format.schema.json \
    my-other-format.schema.json \
    $(NULL)
 
EXTRA_DIST += $(json_schemas)
 
check-json-schema: $(json_schemas)
    $(AM_V_GEN)$(JSON_SCHEMA_VALIDATE) $^
check-local: check-json-schema
.PHONY: check-json-schema
 
json_schemas_h = $(json_schemas:.schema.json=.schema.h)
BUILT_SOURCES += $(json_schemas_h)
CLEANFILES += $(json_schemas_h)
 
%.schema.h: %.schema.json
    $(AM_V_GEN)$(JSON_SCHEMA_GENERATE) \
        --c-variable-name=$(subst -,_,$(notdir $*))_json_instances \
        --format c $^ > $@
 
my_test_suite_SOURCES = my-test-suite.c
nodist_my_test_suite_SOURCES = $(json_schemas_h)


And add this to your test suite C file itself:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include "my-format.schema.h"
 
 
// Test the parser with each generated test vector from the JSON schema.
static void
test_parser_generated (gconstpointer user_data)
{
  guint i;
  GObject *parsed = NULL;
  GError *error = NULL;
 
  i = GPOINTER_TO_UINT (user_data);
 
  parsed = try_parsing_string (my_format_json_instances[i].json,
                               my_format_json_instances[i].size, &error);
 
  if (my_format_json_instances[i].is_valid)
    {
      // Assert @parsed is valid.
      g_assert_no_error (error);
      g_assert (G_IS_OBJECT (parser));
    }
  else
    {
      // Assert parsing failed.
      g_assert_error (error, SOME_ERROR_DOMAIN, SOME_ERROR_CODE);
      g_assert (parsed == NULL);
    }
 
  g_clear_error (&error);
  g_clear_object (&parsed);
}
 
 
int
main (int argc, char *argv[])
{
  guint i;
 
  
 
  for (i = 0; i < G_N_ELEMENTS (my_format_json_instances); i++)
    {
      gchar *test_name = NULL;
 
      test_name = g_strdup_printf ("/parser/generated/%u", i);
      g_test_add_data_func (test_name, GUINT_TO_POINTER (i),
                            test_parser_generated);
      g_free (test_name);
    }
 
  
}


Walbottle is heading towards being mature. There are some features of the JSON Schema standard it doesn’t yet support: $ref/definitions and format. Its main downside at the moment is speed: test vector generation is complex, and the algorithms slow down due to computational complexity with lots of nested sub-schemas (so try to design your schemas to avoid this if possible). json-schema-generate recently acquired a --show-timings option which gives debug information about each of the sub-schemas in your schema, how many JSON instances it generates, and how long that took, which gives some insight into how to optimise the schema.

Original post

Related Posts

Related Posts

Comments (0)


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Latest Blog Posts

Quick hack: Speed up your GitLab CI

06/11/2018

Did you know you could register your own PC, or a spare laptop collecting dust in a drawer, to get instant CI going on GitLab? Not only…

Introducing Zink, an OpenGL implementation on top of Vulkan

31/10/2018

For the last month or so, I've been playing with a new project during my work at Collabora, and as I've already briefly talked about at…

On the low adoption of automated testing in FOSS

18/10/2018

For projects of any value and significance, having a comprehensive automated test suite is nowadays considered a standard software engineering…

Recently in Geoclue

12/10/2018

After I started working for Collabora in April, I've finally been able to put some time on maintenance and development of Geoclue again.…

The beauty of Open Source

10/10/2018

Like all software, Open Source software isn't without it's bugs and issues. However, thanks to the nature of Open Source, resolving or mitigating…

MicroDebConf Brasilia

02/10/2018

Last month, the first "MicroDebConf" took place at the Gama campus of the University of Brasilia. Here's a look at how this one day event…

Open Since 2005 logo

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2018. All rights reserved. Website sitemap.