YAML Schema for Organizing Data

Published | Go Back

I have an idea about us­ing YAML Schema to or­ga­nize data.

I came to this idea the other day when I was asked whether I had a cer­tain vac­cine in the past. All of a sud­den, I wish I had or­ga­nized my per­sonal med­ical data in a struc­tured way so that I could eas­ily look it up. For eas­ier dis­cus­sion, let’s use per­sonal med­ical data as an ex­am­ple here.

First of all, I choose texts rather than data­base or any bi­nary for­mat be­cause I want it to be read­able and easy to main­tain. I also want it to be or­ga­nized, so I choose the YAML for­mat where it’s kind of both hu­man- and ma­chine-read­able, yet does­n’t re­quire a lot of skele­ton char­ac­ters like JSON. On top of that, I add YAML Schema so that the data YAML file can be val­i­dated. It’s also use­ful for auto-com­ple­tion in ed­i­tors that sup­ports it. YAML Schema is based on JSON Schema, which is still work in progress. YAML Schema is also writ­ten in JSON at this mo­ment.

Here is a snip­pet of YAML Schema that I use to or­ga­nize my im­mu­niza­tion his­tory:

"immunization": {
  "description": "Immunization history.",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "date": {
        "description": "Date of the immunization.",
        "type": "string"
      },
      "name": {
        "description": "Immunization name.",
        "type": "string"
      }
    },
    "required": ["date", "name"]
  },
  "uniqueItems": true
}

Basically, I’m defin­ing an ar­ray named immunization. Each el­e­ment of the ar­ray is an ob­ject that has two prop­er­ties, date and name, which are both strings. I don’t think we have date-time type at the mo­ment, so I have to use strings for dates. I’m also de­clar­ing that the el­e­ments should be unique in the ar­ray.

With the above schema, I can add and val­i­date the fol­low­ing sec­tion in my data YAML file:

immunization:
  - date: 2019/12/13
    name: some vaccine 1
  - date: 2019/12/14
    name: some vaccine 2

Similarly, I can add de­f­i­n­i­tions for surgery his­tory and other med­ical in­for­ma­tion. For com­plete syn­tax about YAML Schema and JSON Schema, see here.

I’m us­ing VS Code with the YAML plu­gin. It takes a bit of time to con­fig­ure, but af­ter that, I get the auto-com­ple­tion when I’m edit­ing the data YAML file.

Finally, it’s worth point­ing out that al­though the ex­am­ple here is for per­sonal med­ical data, the idea of us­ing YAML Schema can be ap­plied to other struc­tured data as well.