*

Checking JSON files for correctness

Philip Withnall avatar

Posted on 27/01/2016 by Philip Withnall

tl;dr: Write a Schema for your JSON format, and use Walbottle to validate your JSON files against it.

As JSON becomes used more and more in place of XML, we need a replacement for tools like xmllint to check that JSON documents follow whatever format they are supposed to be following.

Walbottle is a tool to do this, which I’ve been working on as part of client work at Collabora. Firstly, a brief introduction to JSON Schema, then I will give an example of how to integrate Walbottle into an application. In a future post I hope to explain some of the theory behind its test vector generation.

JSON Schema is a standard for describing how a particular type of JSON document should be structured. (There’s a good introduction on the Space Telescope Science Institute.) For example, what properties should be in the top-level object in the document, and what their types should be. It is entirely analogous to XML Schema (or Relax NG). It becomes a little confusing in the fact that JSON Schema files are themselves JSON, which means that there is a JSON Schema file for validating that JSON Schema files are well-formed; this is the JSON meta-schema.

Here is an example JSON Schema file (taken from the JSON Schema website):

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
{
    "title": "Example Schema",
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string"
        },
        "lastName": {
            "type": "string"
        },
        "age": {
            "description": "Age in years",
            "type": "integer",
            "minimum": 0
        }
    },
    "required": ["firstName", "lastName"]
}


Valid instances of this JSON schema are, for example:

1
2
3
4
{
    "firstName": "John",
    "lastName": "Smith"
}


or:

1
2
3
4
5
{
    "firstName": "Jessica",
    "lastName": "Smith",
    "age": 31
}


or even:

1
2
3
4
5
{
    "firstName": "Sandy",
    "lastName": "Sanderson",
    "country": "England"
}


The final example is important: by default, JSON object instances are allowed to contain properties which are not defined in the schema (because the default value for the JSON Schema additionalProperties keyword is an empty schema, rather than false).

What does Walbottle do? It takes a JSON Schema as input, and can either:

  • check the schema is a valid JSON Schema (the json-schema-validate tool);
  • check that a JSON instance follows the schema (the json-validate tool); or
  • generate JSON instances from the schema (the json-schema-generate tool).

Why is the last option useful? Imagine you have written a library which interacts with a web API which returns JSON. You use json-glib to turn the HTTP responses into a JSON syntax tree (tree of JsonNodes), but you have your own code to navigate through that tree and extract the interesting bits of the response, such as success codes or new objects from the server. How do you know your code is correct?

Ideally, the web API author has provided a JSON Schema file which describes exactly what you should expect from one of their HTTP responses. You can use json-schema-generate to generate a set of example JSON instances which follow or subtly do not follow the schema. You can then run your code against these instances, and check whether it:

  • does not crash;
  • correctly accepts the valid JSON instances; and
  • correctly rejects the invalid JSON instances.

This should be a lot better than writing such unit tests by hand, because nobody wants to spend time doing that — and even if you do, you are almost guaranteed to miss a corner case, which leaves your code prone to crashing when given unexpected input. (Alarmists would say that it is vulnerable to attack, and that any such vulnerability of network-facing code is probably prone to escalation into arbitrary code execution.)

For the example schema above, json-schema-generate returns (amongst others) the following JSON instances:

1
2
3
4
5
{"0":null,"firstName":null}
{"lastName":[null,null],"0":null,"age":0}
{"firstName":[]}
{"lastName":"","0":null,"age":1,"firstName":""}
{"lastName":[],"0":null,"age":-1}


They include valid and invalid instances, which are designed to try and hit boundary conditions in typical json-glib-using code.

How do you integrate Walbottle into your project? Probably the easiest way is to use it to generate a C or H file of JSON test vectors, and link or #include that into a simple test program which runs your code against each of them in turn.

Here is an example, straight from the documentation. Add the following to configure.ac:

1
2
3
4
5
6
7
AC_PATH_PROG([JSON_SCHEMA_VALIDATE],[json-schema-validate])
AC_PATH_PROG([JSON_SCHEMA_GENERATE],[json-schema-generate])
 
AS_IF([test "$JSON_SCHEMA_VALIDATE" = ""],
      [AC_MSG_ERROR([json-schema-validate not found])])
AS_IF([test "$JSON_SCHEMA_GENERATE" = ""],
      [AC_MSG_ERROR([json-schema-generate not found])])


Add this to the Makefile.am for your tests:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
json_schemas = \
    my-format.schema.json \
    my-other-format.schema.json \
    $(NULL)
 
EXTRA_DIST += $(json_schemas)
 
check-json-schema: $(json_schemas)
    $(AM_V_GEN)$(JSON_SCHEMA_VALIDATE) $^
check-local: check-json-schema
.PHONY: check-json-schema
 
json_schemas_h = $(json_schemas:.schema.json=.schema.h)
BUILT_SOURCES += $(json_schemas_h)
CLEANFILES += $(json_schemas_h)
 
%.schema.h: %.schema.json
    $(AM_V_GEN)$(JSON_SCHEMA_GENERATE) \
        --c-variable-name=$(subst -,_,$(notdir $*))_json_instances \
        --format c $^ > $@
 
my_test_suite_SOURCES = my-test-suite.c
nodist_my_test_suite_SOURCES = $(json_schemas_h)


And add this to your test suite C file itself:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include "my-format.schema.h"
 
 
// Test the parser with each generated test vector from the JSON schema.
static void
test_parser_generated (gconstpointer user_data)
{
  guint i;
  GObject *parsed = NULL;
  GError *error = NULL;
 
  i = GPOINTER_TO_UINT (user_data);
 
  parsed = try_parsing_string (my_format_json_instances[i].json,
                               my_format_json_instances[i].size, &error);
 
  if (my_format_json_instances[i].is_valid)
    {
      // Assert @parsed is valid.
      g_assert_no_error (error);
      g_assert (G_IS_OBJECT (parser));
    }
  else
    {
      // Assert parsing failed.
      g_assert_error (error, SOME_ERROR_DOMAIN, SOME_ERROR_CODE);
      g_assert (parsed == NULL);
    }
 
  g_clear_error (&error);
  g_clear_object (&parsed);
}
 
 
int
main (int argc, char *argv[])
{
  guint i;
 
  
 
  for (i = 0; i < G_N_ELEMENTS (my_format_json_instances); i++)
    {
      gchar *test_name = NULL;
 
      test_name = g_strdup_printf ("/parser/generated/%u", i);
      g_test_add_data_func (test_name, GUINT_TO_POINTER (i),
                            test_parser_generated);
      g_free (test_name);
    }
 
  
}


Walbottle is heading towards being mature. There are some features of the JSON Schema standard it doesn’t yet support: $ref/definitions and format. Its main downside at the moment is speed: test vector generation is complex, and the algorithms slow down due to computational complexity with lots of nested sub-schemas (so try to design your schemas to avoid this if possible). json-schema-generate recently acquired a --show-timings option which gives debug information about each of the sub-schemas in your schema, how many JSON instances it generates, and how long that took, which gives some insight into how to optimise the schema.

Original post

Related Posts

Related Posts

Comments (0)


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Latest Blog Posts

Virtme: The kernel developers' best friend

18/09/2018

When working on the Linux Kernel, testing via QEMU is pretty common. Here's a look at virtme, a QEMU wrapper that uses the host instead…

Cambridge XMPP Sprint

30/08/2018

Earlier this month, Collabora sponsored & hosted the XMMP Sprint, the first developer event in the XMPP community in a long time. Here's…

Testing Chromebooks with LAVA on kernelci.org

29/08/2018

In addition to Collabora's work to add support in mainline Linux kernel for several Chromebooks, these platforms are now being continuously…

Quick hack: git reset upstream

27/08/2018

Working with a git based project that has a defacto upstream repository means that you perioducally want to fetch the canonical master branch.…

En route to a robust GPU device selection in GL

21/08/2018

A look at the work and motivation behind implementing the Khronos EGLDevice extensions in Mesa. These extensions allow users of open source…

Cross-compilation made easy for GNOME Builder

03/08/2018

GNOME Builder is an Integrated Development Environment designed for the GNOME ecosystem. It most notably features a deep integration to…

Open Since 2005 logo

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2018. All rights reserved. Website sitemap.