23 June 2021

Here at Gozynta we use Kubernetes to manage our product platform. Moving to container based development and hosting has helped us provide better scalability, consistency, and a better testing process. One result of this is I often find myself writing Python code to manipulate deeply nested yaml files.

Deeply nested yaml or json files are messy to deal with in Python, and the jsonpath libraries that are available are worse. JMESPath is a good alternative. This is the article I wish I found when I was searching for “python jsonpath”.

See this Github repo for all a full example.

For my examples I’m going to use this example deployment yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
 name: nginx
spec:
 selector:
   matchLabels:
     app: nginx
 template:
   metadata:
     labels:
       app: nginx
   spec:
     containers:
     - image: nginx
       name: nginx
       env:
       - name: NGINX_HOST
         value: example.com
       - name: NGINX_PORT
         value: 80

Let’s say we want to change the port from 80 to 8080. Parsing this in Python is pretty easy until we hit the lists.  Once we load the yaml, we can get to the containers section by using

> deployment["spec"]["template"]["spec"]["containers"]
[{'image': 'nginx', 'name': 'nginx', 'env': [{'name': 'NGINX_HOST', 'value': 'example.com'}, {'name': 'NGINX_PORT', 'value': 80}]}]

Here’s where we start to run into problems. We’re now left with 2 nested lists of dicts. The first one we could just index the list since it only has one member, but that would break if we ever add a sidecar container to this yaml. The second one we definitely need to search the list. Let’s search them both to be safe.

containers = deployment["spec"]["template"]["spec"]["containers"]
 
nginx_container = next(c for c in containers if c["name"] == "nginx")
nginx_port_env = next(e for e in nginx_container["env"] if e["name"] == "NGINX_PORT")
nginx_port_env["value"] = 8080

Well, that’s pretty ugly and difficult to read.

kubectl has a nice option of -o jsonpath which you can use to slice out a section of a definition you retrieve out of the cluster. The jsonpath syntax for the above is much easier to read.

{.spec.template.spec.containers[?(@.name=="nginx")].env[?(@.name=="NGINX_PORT")]}

Perhaps we can use jsonpath with Python?

I did some research and the best/most frequently recommended jsonpath libraries available seem to be jsonpath-rw and it’s fork jsonpath-ng (Note: Those links are to snykAdvisor, which I also found on this journey. It's a nice tool to get an at-a-glance look at the health of a package.). Both projects are mostly inactive, and neither supports the [? query syntax that we need. A good summary of the state of these projects is this Github comment which despite being written in 2019 is still accurate in 2021. In the end I wasn’t able to find a jsonpath library for Python that I found satisfactory.

Instead, what I found was jmespath. Jmespath is similar to jsonpath, and has well supported libraries for quite a few programming languages. This is great, even though we mostly work in Python we do sometimes use other languages. If we standardize on this query language we can use the same queries everywhere!

Let’s try doing the above exercise using jmespath.

nginx_port_env = jmespath.search(   "spec.template.spec.containers[?name=='nginx']|[0].env[?name=='NGINX_PORT']|[0]",
   deployment,
)
nginx_port_env["value"] = 8080

Well, that's a lot easier to understand and work with. JMESPath provides a close-enough alternative to jsonpath for many use-cases, and in Python it definitely makes working with yaml (or json) a lot easier.