Notes and cheats about jq, in a kind-of cookbook form.

jq is like sed for JSON data - you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.

The notes below reflect my personal level of understanding; many actions have surely better ways to be obtained, only I don’t know them and the internet did not tell me before. You’re encouraged to share more with flavio [that-email-char] polettix.it.

SYNOPSIS

Pretty-print

$ jq <input.json   # or, better to learn
$ jq <input.json .

Overview

jq is powerful, but anybody reading this page already knows about that. I can only tell that I often use it together with Romeo, which has a json2csv sub-command to generate CSV, as well as a csv2json to feed data in jq starting from a CSV file.

Command-line Arguments

Most of the times I use it in a pipe in the command-line to just have a pretty-print:

$ printf '{"top":"foo","in":[1,2,3]}' | jq
{
  "top": "foo",
  "in": [
    1,
    2,
    3
  ]
}

There will be color in your terminal but I’m too lazy to fiddle with it here in this page.

To do anything interesting, like getting only the value of top, we have to add a filter:

$ printf '{"top":"foo","in":[1,2,3]}' | jq .top
"foo"

If we have some files to work on, they can be provided on the command line after the filter. It will happily work on each of them:

$ printf '{"top":"foo","in":[1,2,3]}' > input.json

$ jq .top input.json input.json
"top"
"top"

What’s with those double quotes? jq tries hard to return valid JSON, so they are JSON strings. If we only care about the value, we can ask for raw data with command-line option -r:

$ printf '{"top":"foo","in":[1,2,3]}' | jq -r .top
foo

Having many input files will not yield valid JSON output, because each of them is a separate processing:

$ printf '{"top":"foo","bar":"baz"}' > input.json

$ jq . input.json input.json
{
  "top": "foo",
  "bar": "baz"
}
{
  "top": "foo",
  "bar": "baz"
}

It’s possible to put all inputs in a wrapping array with command-line option -s:

$ printf '{"top":"foo","bar":"baz"}' > input.json

$ jq -s . input.json input.json
[
  {
    "top": "foo",
    "bar": "baz"
  },
  {
    "top": "foo",
    "bar": "baz"
  }
]

Slicing

Adding objects

When we add two objects with +, we get back an object with all key/value pairs from both addends:

$ printf '{"foo":1}' | jq '. + {bar:2}'
{
  "foo": 1,
  "bar": 2
}

The last object wins if keys overlap:

$ printf '{"foo":1}' | jq '. + {bar:2, foo:3}'
{
  "foo": 3,
  "bar": 2
}

This can be applied also to implicit iterations over arrays:

$ printf '[{"foo":1},{"foo":2}]' | jq '.[] + {bar:7}'
{
  "foo": 1,
  "bar": 7
}
{
  "foo": 2,
  "bar": 7
}

The iteration “unpacks” the array, so we need to wrap this with square brackets to get the array back:

$ printf '[{"foo":1},{"foo":2}]' | jq '[.[] + {bar:7}]'
[
  {
    "foo": 1,
    "bar": 7
  },
  {
    "foo": 2,
    "bar": 7
  }
]

This is not limited to extraction of data, as we can use the += operator to enrich a sub-array of objects:

$ jq <input.json .
{
  "foo": [
    {
      "baz": 1
    },
    {
      "baz": 13
    }
  ]
}

$ jq <input.json '.foo[] += {bar:2}'
{
  "foo": [
    {
      "baz": 1,
      "bar": 2
    },
    {
      "baz": 13,
      "bar": 2
    }
  ]
}

This is not limited to adding fixed stuff, e.g. we can get the key/value to add from another part of the upper object:

$ jq <input.json .
{
  "bar": 2,
  "foo": [
    {
      "baz": 1
    },
    {
      "baz": 13
    }
  ]
}

$ jq <input.json '.foo[] += {bar}'
{
  "bar": 2,
  "foo": [
    {
      "baz": 1,
      "bar": 2
    },
    {
      "baz": 13,
      "bar": 2
    }
  ]
}

This is by no means limited to adding a single key/value pair:

$ jq <input.json .
{
  "bar": {
    "galook": 9,
    "aargh": 0
  },
  "foo": [
    {
      "baz": 1
    },
    {
      "baz": 13
    }
  ]
}

$ jq <input.json '.foo[] += .bar'
{
  "bar": {
    "galook": 9,
    "aargh": 0
  },
  "foo": [
    {
      "baz": 1,
      "galook": 9,
      "aargh": 0
    },
    {
      "baz": 13,
      "galook": 9,
      "aargh": 0
    }
  ]
}

This can be applied looping over a higher-level array of objects:

$ jq <input.json .
[
  {
    "bar": {
      "galook": 9,
      "aargh": 0
    },
    "foo": [
      {
        "baz": 1
      },
      {
        "baz": 13
      }
    ]
  },
  {
    "bar": {
      "galook": 99,
      "aargh": 90
    },
    "foo": [
      {
        "baz": 91
      },
      {
        "baz": 913
      }
    ]
  }
]

$ jq <input.json '.[] |= (.foo[] += .bar)'
[
  {
    "bar": {
      "galook": 9,
      "aargh": 0
    },
    "foo": [
      {
        "baz": 1,
        "galook": 9,
        "aargh": 0
      },
      {
        "baz": 13,
        "galook": 9,
        "aargh": 0
      }
    ]
  },
  {
    "bar": {
      "galook": 99,
      "aargh": 90
    },
    "foo": [
      {
        "baz": 91,
        "galook": 99,
        "aargh": 90
      },
      {
        "baz": 913,
        "galook": 99,
        "aargh": 90
      }
    ]
  }
]

How can we use this? As an example, we might have a JSON representing groups in a LDAP directory, like the following, and we want to generate a CSV with columns cn and member and the “right” values:

$ jq <from-ldap.json .
[
  {
    "cn": "odd",
    "member": [
      "CN=three,OU=numbers,DC=whatever",
      "CN=seven,OU=numbers,DC=whatever",
      "CN=eleven,OU=numbers,DC=whatever",
      "CN=nineteen,OU=numbers,DC=whatever"
    ]
  },
  {
    "cn": "even",
    "member": [
      "CN=two,OU=numbers,DC=whatever",
      "CN=four,OU=numbers,DC=whatever",
      "CN=twelve,OU=numbers,DC=whatever"
    ]
  }
]

First of all, we turn each string in the member array to an object, so that we are prepared to put more data:

$ jq <from-ldap.json '.
       | (.[] |= (.member[] |= {member:.}))'
[
  {
    "cn": "odd",
    "member": [
      {
        "member": "CN=three,OU=numbers,DC=whatever"
      },
      {
        "member": "CN=seven,OU=numbers,DC=whatever"
      },
      {
        "member": "CN=eleven,OU=numbers,DC=whatever"
      },
      {
        "member": "CN=nineteen,OU=numbers,DC=whatever"
      }
    ]
  },
  {
    "cn": "even",
    "member": [
      {
        "member": "CN=two,OU=numbers,DC=whatever"
      },
      {
        "member": "CN=four,OU=numbers,DC=whatever"
      },
      {
        "member": "CN=twelve,OU=numbers,DC=whatever"
      }
    ]
  }
]

Now we can apply what we learned in this section and add the cn:

$ jq <from-ldap.json '.
       | (.[] |= (.member[] |= {member:.}))
       | (.[] |= (.member[] += {cn}))'
[
  {
    "cn": "odd",
    "member": [
      {
        "member": "CN=three,OU=numbers,DC=whatever",
        "cn": "odd"
      },
      {
        "member": "CN=seven,OU=numbers,DC=whatever",
        "cn": "odd"
      },
      {
        "member": "CN=eleven,OU=numbers,DC=whatever",
        "cn": "odd"
      },
      {
        "member": "CN=nineteen,OU=numbers,DC=whatever",
        "cn": "odd"
      }
    ]
  },
  {
    "cn": "even",
    "member": [
      {
        "member": "CN=two,OU=numbers,DC=whatever",
        "cn": "even"
      },
      {
        "member": "CN=four,OU=numbers,DC=whatever",
        "cn": "even"
      },
      {
        "member": "CN=twelve,OU=numbers,DC=whatever",
        "cn": "even"
      }
    ]
  }
]

Time to get only the member sub-arrays:

$ jq <from-ldap.json '.
       | (.[] |= (.member[] |= {member:.}))
       | (.[] |= (.member[] += {cn}))
       | (.[] |= .member)'
[
  [
    {
      "member": "CN=three,OU=numbers,DC=whatever",
      "cn": "odd"
    },
    {
      "member": "CN=seven,OU=numbers,DC=whatever",
      "cn": "odd"
    },
    {
      "member": "CN=eleven,OU=numbers,DC=whatever",
      "cn": "odd"
    },
    {
      "member": "CN=nineteen,OU=numbers,DC=whatever",
      "cn": "odd"
    }
  ],
  [
    {
      "member": "CN=two,OU=numbers,DC=whatever",
      "cn": "even"
    },
    {
      "member": "CN=four,OU=numbers,DC=whatever",
      "cn": "even"
    },
    {
      "member": "CN=twelve,OU=numbers,DC=whatever",
      "cn": "even"
    }
  ]
]

Now we can add this array and get a single, flattened array:

$ jq <from-ldap.json '.
       | (.[] |= (.member[] |= {member:.}))
       | (.[] |= (.member[] += {cn}))
       | (.[] |= .member)
       | add'
[
  {
    "member": "CN=three,OU=numbers,DC=whatever",
    "cn": "odd"
  },
  {
    "member": "CN=seven,OU=numbers,DC=whatever",
    "cn": "odd"
  },
  {
    "member": "CN=eleven,OU=numbers,DC=whatever",
    "cn": "odd"
  },
  {
    "member": "CN=nineteen,OU=numbers,DC=whatever",
    "cn": "odd"
  },
  {
    "member": "CN=two,OU=numbers,DC=whatever",
    "cn": "even"
  },
  {
    "member": "CN=four,OU=numbers,DC=whatever",
    "cn": "even"
  },
  {
    "member": "CN=twelve,OU=numbers,DC=whatever",
    "cn": "even"
  }
]

This can be fed into romeo and finally get our CSV:

$ jq <from-ldap.json '.
       | (.[] |= (.member[] |= {member:.}))
       | (.[] |= (.member[] += {cn}))
       | (.[] |= .member)
       | add' \
    | romeo json2csv
cn;member
odd;CN=three,OU=numbers,DC=whatever
odd;CN=seven,OU=numbers,DC=whatever
odd;CN=eleven,OU=numbers,DC=whatever
odd;CN=nineteen,OU=numbers,DC=whatever
even;CN=two,OU=numbers,DC=whatever
even;CN=four,OU=numbers,DC=whatever
even;CN=twelve,OU=numbers,DC=whatever

This was a demonstration for a very generic situation; in this case, we don’t need to keep the whole structure along the way, and can benefit from some implicit flattening happening automatically as we unfold arrays:

$ jq <from-ldap.json '.[] | (.member[] | {member:.}) + {cn}'
{
  "member": "CN=three,OU=numbers,DC=whatever",
  "cn": "odd"
}
{
  "member": "CN=seven,OU=numbers,DC=whatever",
  "cn": "odd"
}
{
  "member": "CN=eleven,OU=numbers,DC=whatever",
  "cn": "odd"
}
{
  "member": "CN=nineteen,OU=numbers,DC=whatever",
  "cn": "odd"
}
{
  "member": "CN=two,OU=numbers,DC=whatever",
  "cn": "even"
}
{
  "member": "CN=four,OU=numbers,DC=whatever",
  "cn": "even"
}
{
  "member": "CN=twelve,OU=numbers,DC=whatever",
  "cn": "even"
}

At this point, we just need to wrap the whole thing inside an array, so that the end result is valid JSON that can be fed into romeo:

$ jq <from-ldap.json '[.[] | (.member[] | {member:.}) + {cn}]' \
    | romeo json2csv
cn;member
odd;CN=three,OU=numbers,DC=whatever
odd;CN=seven,OU=numbers,DC=whatever
odd;CN=eleven,OU=numbers,DC=whatever
odd;CN=nineteen,OU=numbers,DC=whatever
even;CN=two,OU=numbers,DC=whatever
even;CN=four,OU=numbers,DC=whatever
even;CN=twelve,OU=numbers,DC=whatever

Normalizing

Sometimes we have a key-value pair in an object which might be missing, or having the value as a single string, or where the value is an array.

# no "bar" key/value pair
$ printf '{"top":"foo"}'                        > input-none.json

# value for "bar" is a string (/scalar)
$ printf '{"top":"foo","bar":"baz"}'            > input-string.json

# value for "bar" is an array
$ printf '{"top":"foo","bar":["baz","galook"]}' > input-array.json

When we want to operate on bar as an array, we’re likely to hit a wall in the other two cases:

# All OK
$ jq '.bar[]' input-array.json 
"baz"
"galook"

# Errors
$ jq '.bar[]' input-none.json 
jq: error (at input-none.json:0): Cannot iterate over null (null)

$ jq '.bar[]' input-string.json 
jq: error (at input-string.json:0): Cannot iterate over string ("baz")

jq has a if/then/else/elifend construct, together with a type operator, which help normalizing the data.

$ filter='.bar |
    if type == "array" then .
    elif type != "null" then [.]
    else []
    end'

$ jq "$filter" input-array.json
[
  "baz",
  "galook"
]

$ jq "$filter" input-string.json
[
  "baz"
]

$ jq "$filter" input-none.json
[]

Caveat: the else is always needed. Well, unless you have a recent version, of course.

Useful links

The following pages can help a lot: