Decouple Code and Config using yaml

The process of decoupling code and config is really useful to me, one of the most important reasons is that variables in config could be better organized than variables placed inside code. Such better organization of variables in config files allow me to write the code in a clearer way. I don’t need to actually modify the code files any more when some variables need to be changed due to certain reasons.

Previously I wrote a blog about 10 different types of configs that are actively used. For recent couple of months, I relied on toml to help me organize configurations. It is a good format that is easy to use until I found out that this is the reason I need to abandon it. I need a more customized way or intelligent way to manage the config data: structuring the variables is not enough for me, advanced syntax to handle complex sitautions and higher level of abstraction.

So I have to look for alternatives, the first option that I have to reject is json. It gives me the impression that it is too tedious like Java, with lots of curly brackets, I perhaps prefer xml than json. After searching online I found out that yaml could be able to provide such kind of flexibility. This is surprising because I wrote lots of yaml configs for Docker containers and services, I never found something extraordinary about it until I checked more info about it online.

There are lots of info for yaml, from official website to blogs, articles, actually I’m trying to find a way to differentiate this post from others from my perspective by providing advanced usage of yaml with respect to their usefulness, meaning I will provide.

In this blog post, I want to show how to load the config yaml to Python so that you know how to make it work initially. Organization is crutial for decoupling code and config, this makes the config neatly organized. Then I will discuss about parsing texts, which is one of the most important part about variables.

Loading Config

To handle the task of parsing yaml, pyyaml1 is used, you could install it with the pip module manager so that python could find this module. The documentation1 provides a good sumary of how to use it in Python and I highly recommend you to read it once you master the basis. My goal for this post is to highlight the info that could be most relevant to you when you are trying to setup you project.

The following shows how a python script loads a yaml config with the same filename within the same directory. If your yaml has different name, you could modify config_file line to specify the yaml config file. It should be noted to use yaml.load so that all advanced syntax would be applied. There is another method called safe_load, but it would only resolve the basic tags. It should be noted that I used pathlib to handle config file that has the same name with the python script, you don’ have to follow the same procedure.

import yaml, pathlib as pl
config_file=pl.Path(__file__).parent.joinpath(pl.Path(__file__).stem+'.yaml')
config=yaml.load(config_file.read_text(), Loader=yaml.FullLoader)
print(config['section']['subsection'])
Python

I use pyyaml module because it is widely used in the Python community, the drawback is that it only supports version 1.1, the newest yaml version which is 1.2 has not been supported by this module. I’m thinking about modifying the blog and to follow the latest version of yaml, as of for now, I will make sure that the code that I tested are based on yaml module, if not I will mention that explicitly.

Orgnization

The main way to orgnize the config is to use a hierarchical format, meaning a key value pair could reside within a tag. The following shows how such structure could be constructed. Similar with Python, it relies on indentations, it needs 2 spaces to indent, but for modern editors, they are smart enough to help you so that when you tab on a yaml config, the editors would put 2 spaces.

top:
  level1: #dict datastructure
    key: value
YAML

One key rule to mention about yaml is that for the key value pair, a space needs to be placed between : and the value like this: key:␣value. In my opnion, yaml parser would only try to understand the hierarchial relationship based on indentation, empty lines would not interrupt or reset the parsing process.

Once you know a bit about this fundamental structure that could be considered dict in Python, you could add more data strctures like lists to organize config values.

top:
  leve1:
    simple_list: [value1, value2, value3]
    complex_list: # complex list with multiple dicts
      - complex_key1: complex_value1
        complex_key2: complex_value2
      - complex_key3: complex_value3
        complex_key4: complex_value4
  mat: # here are some matrix representations
    matrix1: [
      [1, 2, 3],
      [4, 5, 6],
      [7, 8, 9]
    ]
    matrix2:
      - - 1
        - 2
        - 3
      - - 4
        - 5
        - 6
      - - 7
        - 8
        - 9
YAML

Comments

#full line comment 0
entry: 
  key0: value0 #in-line comment: key 0 info 
  key1: value1 #in-line comment: key 1 info
  #key2: value2
YAML

There are some examples for commenting styples, basically a hashtag is used. But be remembered not to use hashtag on a block with multiple text lines.

Common Data Types

Manipulating Texts

The following snippet shows a yaml file for illustration.

entry:
  key0: &var0 value
  key1: *var0
  text_entry: this is the text entry with variable "{{entry.key0}}"
YAML

It is okay that the variable could be used multiple times, but it should be noted that the variable could only be used as a complete value, there are no room for further customizations. This means for key1 the value should be “value”.

Addtional parsing tool is needed to handle more complex text manipulation. for entry.text_entry, default parser is not able to interpret the variable with the text block using *var0, jinja2 should be used to manage this type of entries suppose you want to offload some part of the work from python script to toml. This type of text manipulation does not make the yaml configuration too difficult to read.

import jinja2
text_entry_val=jinja2.Template(config['entry']['text_entry']).render(config)
#text_entry_val: this is the text entry with variable "value"
Python

After being processed by jinja2, the specific key values could be interpretted correctly with variable names being replaced with the actual values within the yaml config. Thus the value for entry.text_entry is this is the text entry with variable "value". Now readers know that how text could be maniuplated with entry values within the config file.

Afer discussing about the advanced text manipulation with variables, the remaining part of this section discusses text blocks entry. It is important as text blocks are perhaps one of the most frequently used variable types in configuration files.

In the following examples, I will provide the key value pairs snippet, the key name contains the specific trick being used, then using inline comment or full line comment to show what the resultant text looks like after being parsed in Python or other programming languages.

double_quoted: "Hello, \"World\"" #'Hello, "World"'
single_quoted: 'It''s a beautiful day!' #"It's a beautiful day!"
YAML

Numbers

For format of numeric values, it should be noted that the values should follow the following format2. To illustrate how to properly write those values, the following snippet is provided in a succinct way that shows the proper format within a list.

correct_format: [1, 0b1, 0x1, 1.0e+1, -1.0e-1] #✅intepret as number
wrong_format: [1e+1, 1e1, 1.0e1] #❌intepret as string
correct_format: [!!int 1e+1, !!float 1e1, !!float 1.0e1] #✅intepret as number with type declaration
YAML

I added some examples to differentiate numeric values and strings, for numeric values, formats like normal value, binary and hex values, scientic notations are included; For wrong format I put some that are similar to the scientific notations. It is actually easy to tell the difference between string and values based on the highlighted color shown above in the code snippet.

Tags

Default Tags

3 also defines the syntax for types including Null, Boolean, Int and Floating Point that is based on the JSON schema, basically you need to declare the type of the value explicitly rather than following the previous formats.

I summarized from 1, 4 so that the complete list of tags are shown here side by side with Python counterparts.

YAML(Standard, Python Specific)Python
!!null, !!python/noneNone
!!bool, !!python/boolbool
!!int, !!python/intint
!!int, !!python/longlong
!!float, !!python/floatfloat
!!python/complexcomplex
!!binary, !!python/bytesbytes
!!str, !!python/str, !!python/unicodestr or unicode
!!omap, !!pairslist of pairs5, 6
!!setset
!!seq, !!python/listlist
!!map, !!python/dictdict
!!python/tupletuple
!!timestamp, !!python/module.datetime.datetimedatetime.datetime
!!python/name:module.namemodule.name
!!python/module:package.modulepackage.module
!!python/object:module.cls,
!!python/object/new:module.cls
module.cls instance
!!python/object/apply:module.ff(...)

User Defined Tags

I will show one example of how user defined tags could help you manage your config better: include yamls within a yaml file.

7 provides an example of how to provide a customized loader class that could recognize certain defined tags such that the parser understands how to handle the data type by reading the yaml file.

Considering the following config files a.yaml and b.yaml. This is the config file a.yaml that contains b.yaml.

#a.yaml
section:
  include_config: !include b.yaml
YAML

This is the config file for b.yaml that is being referred in a.yaml.

#b.yaml
b_section:
  b_key: b_value
YAML

The following code defines how to use a customized loader to handle the !include tag so that yaml could read multiple config files into one single data structure.

class Loader(yaml.Loader):
    def __init__(self, stream):
        try:
            self._root = os.path.split(stream.name)[0]
        except:
            self._root = os.path.curdir
        super(Loader, self).__init__(stream)

    def include(self, node):
        filename = os.path.join(self._root, self.construct_scalar(node))
        with open(filename, 'r') as f:
            return yaml.load(f, Loader=Loader)
            
def read_config():
    config_file =  "a.yaml"
    Loader.add_constructor('!include', Loader.include)
    return yaml.load(config_file.read_text(), Loader=Loader)

a_config=read_config()
Python

Conclusion

It is irony that for a researcher on Non-veumann architecture, I would write a post about separate config with code regarding decoupling. This decoupling definitely brings a lot of advantages including easier maintenance and better customization, modification.

  1. PyYAML Documentation[][][]
  2. YAML loads 5e-6 as string and not a number[]
  3. YAML Tags[]
  4. Language-Independent Types for YAML™ Version 1.1[]
  5. yaml omap[]
  6. yaml pairs[]
  7. https://gist.github.com/joshbode/569627ced3076931b02f[]

Posted

in

,

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

🧭