yet.org

CoreOS matchbox

You may still have some bare metal servers lying around, freed by the heavy use of public clouds. But don’t throw them away yet, tooling exist to offer almost as much agility as virtual machine provisionning. Tools like Foreman, Cobbler, Razor, MAAS or OpenStack Ironic fills that gap. Today we’ll look at matchbox from CoreOS, it has a pretty name now, it used to be called coreos-baremetal and bootcfg. matchbox offers a HTTP and gRPC service which will help to easily build out CoreOS clusters out of your servers.

Introduction

The purpose of this tool is to help render and serve configuration files for network boot and operating system provisioning (ignition or cloud-config) of machines. It will allow you to create groups servers based on labels and associate these groups with profiles. It can be used as a foundation bare-metal layer to then deploy a Kubernetes cluster.

If you don’t have any available physical servers, you can try this on KVM or any other hypervisor. In this article I’ll use my bulb lab.

Installation

matchbox can be deployed easily on Container Linux as a rkt container, read my previous articles to learn more about them. By the way if you want to test drive Tectonic (free for 10 nodes), a self-driving Kubernetes solution, and don’t want to do that on AWS, matchbox is a requirement. By the way CoreOS just announced Tectonic can also be deployed on OpenStack and Azure clouds.

Amongst the many option to deploy matchbox (binary, rpm, docker, kubernetes service, rkt), I’ll be using the rkt-way which I find well aligned with the overall CoreOS stack. matchbox is developed in Go, Go binaries are easy to deploy, they can be statically linked which makes it way easier then most of the interpreted languagues where the adherence with the host operating system is bigger.

On your provisioner machine check that you have at least rkt version 1.8 or higher

# rkt version
rkt Version: 1.25.0
appc Version: 0.8.10
Go Version: go1.7.4
Go OS/Arch: linux/amd64
Features: -TPM +SDJOURNAL

Clone examples and script from the matchbox github repository

# git clone https://github.com/coreos/matchbox.git

Installing matchbox is as simple as copying the corresponding systemd unit file at the correct location on your Container Linux provisioner machine

# sudo cp contrib/systemd/matchbox-for-tectonic.service /etc/systemd/system/matchbox.service

When started, it will

  • mkdir -p /etc/matchbox and /var/lib/matchbox/assets
  • rkt run the quay.io/coreos/matchbox container
  • listen on port 8080 (read-only)
  • listen on port 8081 (gRPC API)

Your matchbox container will share the host networking (--net=host) stack and will have the config directory /etc/matchbox and the data directory /var/lib/matchbox/ mounted. The later will contain the following sub-directories by default : profiles, groups, ignition, cloud, generic. We will details them in the next section.

We’ve choosed the systemd unit file named matchbox-for-tectonic.service which compared to matchbox-on-coreos.service add the following line

Environment="MATCHBOX_RPC_ADDRESS=0.0.0.0:8081"
Environment="MATCHBOX_LOG_LEVEL=debug"

Which enable the gRPC API required to let Tectonic or any other client with TLS client certificate to change machine configs.

If you want to customize it even further, review other possibilities here and customize it further with.

# sudo systemctl edit matchbox

It will allow you to add more config parameters which will be saved under /etc/systemd/system/matchbox.service.d/overridde.conf

Once you’ve finished your eventual customization you can enable and start matchbox.service

# systemctl daemon-reload
# systemctl enable matchbox
# systemctl start matchbox

Check it is now running

# rkt list
UUID        APP         IMAGE NAME                      STATE       CREATED         STARTED     NETWORKS
3b78c916    matchbox    quay.io/coreos/matchbox:v0.5.0  running     2 minutes ago   2 minutes ago

Next, you have to download Container Linux image assets, from the matchbox git repository run

# ./scripts/get-coreos stable 1298.6.0 /var/lib/matchbox/assets

It will download all the required bits for the choosen release.

TLS certificates

To provide TLS access to matchbox gRPC API, we have to create self-signed certificates for our lab, server certificate and client credentials. A script is provided to help you create those files in examples/etc/matchbox, export DNS names for your provisionner system and run the script.

# cd examples/etc/matchbox
# export SAN=DNS.1:172.17.8.101,IP.1:172.17.8.101
# ./cert-gen

Move the generated certificates at their expected location

# sudo cp ca.crt server.crt server.key /etc/matchbox/

Note: In the example above, for production systems you should use a fully qualified domain name for your provisioner instead of an IP like I’ve done.

Sanity check

Verify everything looks good by trying to access matchbox HTTP API

# curl http://172.17.8.101:8080
matchbox

Check also the gRPC API, from the root of the matchbox repository run

# openssl s_client -connect 172.17.8.101:8081 -CAfile /etc/matchbox/ca.crt -cert examples/etc/matchbox/client.crt -key examples/etc/matchbox/client.key
CONNECTED(00000003)
depth=1 CN = fake-ca
verify return:1
depth=0 CN = fake-server
verify return:1
---

Check image availability

# curl http://172.17.8.101:8080/assets/coreos/1298.6.0/
<pre>
<a href="CoreOS_Image_Signing_Key.asc">CoreOS_Image_Signing_Key.asc</a>
<a href="coreos_production_image.bin.bz2">coreos_production_image.bin.bz2</a>
<a href="coreos_production_image.bin.bz2.sig">coreos_production_image.bin.bz2.sig</a>
<a href="coreos_production_pxe.vmlinuz">coreos_production_pxe.vmlinuz</a>
<a href="coreos_production_pxe.vmlinuz.sig">coreos_production_pxe.vmlinuz.sig</a>
<a href="coreos_production_pxe_image.cpio.gz">coreos_production_pxe_image.cpio.gz</a>
<a href="coreos_production_pxe_image.cpio.gz.sig">coreos_production_pxe_image.cpio.gz.sig</a>
</pre>

Data directory

Before you can network boot any of your nodes, you have to have a data directory in place to define your profiles, which are a set of config templates. The repository that you’ve just cloned contains some examples. We’ll begin with one of them which boot a Container Linux operating system in memory and starts a etcd3 service within a rkt container.

First create the following directory structure to host your configuration files

# mkdir /var/lib/matchbox/{profiles,groups,ignition,cloud,generic}

Lets explain each of them in the following sections

Profiles

The profiles stored in /var/lib/matchbox/profiles reference ignition or cloud-config files and define network boot settings as shown below

# vi /var/lib/matchbox/profiles/etcd3.json
{
  "id": "etcd",
  "name": "CoreOS with etcd3",
  "cloud_id": "",
  "ignition_id": "etcd3.yaml",
  "generic_id": "",
  "boot": {
    "kernel": "/assets/coreos/1298.6.0/coreos_production_pxe.vmlinuz",
    "initrd": ["/assets/coreos/1298.6.0/coreos_production_pxe_image.cpio.gz"],
    "args": [
      "coreos.config.url=http://matchbox.local:8080/ignition?uuid=${uuid}&mac=${mac:hexhyp}",
      "coreos.first_boot=yes",
      "console=tty0",
      "console=ttyS0",
      "coreos.autologin"
    ]
  },
}

The above profile configures coreos.config.url and coreos.first_boot to use ignition instead of cloud-config which is the recommended way. Ignition runs earlier in the boot process so it can do more then cloud-config.

coreos.autologin kernel argument skips the password prompt, it’s ok for development and troubleshooting but should be removed for production systems.

Groups

Groups will then match profiles against sets of machine depending on some selector patterns: mac, uuid, hostname, serial. Here I’m declaring a specific node based on its mac address.

# vi /var/lib/matchbox/groups/node1.json
{
  "id": "node1",
  "name": "etcd node 1",
  "profile": "etcd3",
  "selector": {
    "mac": "52:54:00:89:d8:10"
  },
  "metadata": {
    "domain_name": "node1.local",
    "etcd_name": "node1",
    "etcd_initial_cluster": "node1=http://node1.local:2380",
    "ssh_authorized_keys": ["ssh-rsa pub-key-goes-here"]
  }
}

This example provision the corresponding node according to the etcd3 profile shared earlier, which contain the template filename of the Ignition file that will be served to it (etcd3.yaml).

A group without any selector will match against all machines. Other selectors can be os, uuid, hostname or serial but mac should be enough for most use cases with the addition of os which will use later on.

Ignition config template

The last piece of the puzzle is the Ignition file template. Great news it’s using the YAML format and will be converted to JSON on the fly. In our example etcd.yaml contains

# vi /var/lib/matchbox/ignition/etcd3.yaml
---
systemd:
  units:
    - name: etcd-member.service
      enable: true
      dropins:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
            Environment="ETCD_IMAGE_TAG=v3.1.0"
            Environment="ETCD_NAME={{.etcd_name}}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=http://{{.domain_name}}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://{{.domain_name}}:2380"
            Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379"
            Environment="ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380"
            Environment="ETCD_INITIAL_CLUSTER={{.etcd_initial_cluster}}"
            Environment="ETCD_STRICT_RECONFIG_CHECK=true"
    - name: locksmithd.service
      dropins:
        - name: 40-etcd-lock.conf
          contents: |
            [Service]
            Environment="REBOOT_STRATEGY=etcd-lock"

{{ if index . "ssh_authorized_keys" }}
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        {{ range $element := .ssh_authorized_keys }}
        - {{$element}}
        {{end}}
{{end}}

As you can see the template can use conditionals :)

dnsmasq (DNS, DHCP, TFTP)

matchbox need to be seconded by a DNS, DHCP, TFTP server which will point the client machine to it. At power-on, if your machine BIOS is set to network boot and only support PXE (not iPXE), it will broadcast a DHCPDISCOVER and will get back TFTP server IP (next-server) with the name of a boot filename (undionly.kpxe). After downloading it over TFTP, the server will chain load to iPXE, and the provisioning will start by loading configs, scripts to install the OS, etc…

For newer system that support, iPXE, an enhanced version of PXE, supporting script/images downloading over HTTP instead of TFTP, it’s only necessary to reply to the DHCPOFFER by an HTTP boot script like http://matchbox.local/boot.ipxe with the following content, generated by matchbox

set base-url http://stable.release.core-os.net/amd64-usr/current
kernel ${base-url}/coreos_production_pxe.vmlinuz cloud-config-url=http://matchbox.local/   cloud-config.yml
initrd ${base-url}/coreos_production_pxe_image.cpio.gz
boot

So the TFTP server part of dnsmasq will only be used by clients machines that only support PXE and need to be bootstraped to iPXE after loading undionly.kpxe thru TFTP, as we’ve explained earlier.

You can leverage your own server to do so or use instead dnsmasq container to quickly setup all these requirements, which is what will be doing by creating the following systemd unit file

# vi /etc/systemd/system/dnsmasq.service

[Unit]
Description=dnsmasq
ExecStartPre=/usr/bin/mkdir /etc/dnsmasq
ExecStartPre=/usr/bin/mkdir /var/lib/tftproot

[Service]
TimeoutStartSec=0
ExecStart=/usr/bin/rkt run --hostname=matchbox.local --net=host \
--volume etc-dnsmasq,kind=host,source=/etc/dnsmasq \
--volume tftproot,kind=host,source=/var/lib/tftproot \
coreos.com/dnsmasq:v0.3.0 \
--mount volume=etc-dnsmasq,target=/etc/dnsmasq \
--mount volume=tftproot,target=/var/lib/tftproot \
-- -d -C /etc/dnsmasq/dnsmasq.conf -R -S 8.8.8.8
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

The dnsmasq configuration file should look like this

# mkdir /etc/dnsmasq
# vi /etc/dnsmasq/dnsmasq.conf

dhcp-match=set:ipxe,175

# if request comes from older PXE ROM, chainload to iPXE (via TFTP)
dhcp-boot=tag:!ipxe,undionly.kpxe

# point ipxe tagged requests to the matchbox iPXE boot script (via HTTP)
dhcp-boot=tag:ipxe,http://matchbox.local:8080/boot.ipxe

# verbose
log-queries
log-dhcp

enable-tftp
tftp-root=/var/lib/tftproot

dhcp-range=172.17.8.20,172.17.8.80,30m

# static DNS assignements
address=/matchbox.local/172.17.8.101
address=/node1.local/172.17.8.21
address=/node2.local/172.17.8.22

# default GW [node need external access to fetch containers]
dhcp-option=3,172.17.8.2

# assign fixed hostname and ip address to the nodes
dhcp-host=00:50:56:20:AE:A6,node1,172.17.8.21,infinite
dhcp-host=00:50:56:3C:0E:1A,node2,172.17.8.22,infinite

Note: The first line is an important trick to get out of the infinite loop by setting up a tag to ipxe for system that chain load to iPXE for not having them getting back to ipxe during the next boot but to load instead the boot file from the URL.

If you want to use an alternate DNS server add the following to your configuration script

# disable DNS and specify alternate
port=0
dhcp-option=6,8.8.8.8

The last two lines allow you to specify which IP addresses your node should get from DHCP.

Now download [undionly.kpxe](http://boot.ipxe.org/undionly.kpxe) and copy it to undionly.kpxe.0 into /var/lib/tftproot

# mkdir /var/lib/tftproot
# cd /var/lib/tftproot
# curl -s -o /var/lib/tftproot/undionly.kpxe http://boot.ipxe.org/undionly.kpxe
# cp undionly.kpxe undionly.kpxe.0

Trust the prefix and fetch the dnsmasq container

# sudo rkt trust --prefix coreos.com/dnsmasq
# sudo rkt fetch coreos.com/dnsmasq:v0.3.0

Enable and start dnsmasq

# sudo systemctl enable dnsmasq
# sudo start dnsmasq

Verify it is now listening on UDP/53 and UDP/69

# netstat -lnup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
udp        0      0 0.0.0.0:53              0.0.0.0:*                           1549/dnsmasq
udp        0      0 0.0.0.0:67              0.0.0.0:*                           1549/dnsmasq
udp        0      0 0.0.0.0:69              0.0.0.0:*                           1549/dnsmasq

To enforce local resolving on your provisioner node which may have been configured by DHCP, replace the link /etc/resolv/conf by creating in place a configuration file which leverage the local resolver.

nameserver 127.0.0.1

Check you can now resolve your matchbox DNS name

dig matchbox.local

Provisioning

You can now power-on your slave node, on the same broadcast domain as your provisioner node, which should be bootstrapped with Container Linux 1298.6.0 and be ready to server as an etcd3 server. you now have a fresh CoreOS running in RAM.

You can check everything looks good

# etcdctl cluster-health
member d2e6049bc575518d is healthy: got healthy result from http://node1.local:2379
cluster is healthy

troubleshooting

If your etcd3 node isn’t healthy check, to identify the root cause check the following on node1.local which should have been at least installed.

  • external network connectivity: ping 8.8.8.8
  • DNS resolution: dig node1.local

If it hasn’t been installed but is stuck on PXE booting, check the following on your provisioner node

  • dnsmasq and matchbox container should be running: rkt list
  • matchbox should be responding on its HTTP API: curl http://172.17.8.101:8080
  • assets should be readily available: curl http://172.17.8.101:8080/assets/coreos/1298.6.0/

Install Reboot Provisioning

Now that we know how to PXE boot a Container Linux from RAM, we can follow up by installing it on disk using an install script from CoreOS. Multiple examples are provided in the repository:

  • install-reboot - Install CoreOS and Reboot
  • install-shutdown - Install CoreOS and Shutdown the system to avoid re-installing if the BIOS boot priority is set on network.

Install Reboot Provisioning > Profile

Here is the profile we need

# vi /var/lib/matchbox/profiles/install-reboot.json
{
  "id": "install-reboot",
  "name": "Install CoreOS and Reboot",
  "boot": {
    "kernel": "/assets/coreos/1298.6.0/coreos_production_pxe.vmlinuz",
    "initrd": ["/assets/coreos/1298.6.0/coreos_production_pxe_image.cpio.gz"],
    "args": [
      "coreos.config.url=http://matchbox.local:8080/ignition?uuid=${uuid}&mac=${mac:hexhyp}",
      "coreos.first_boot=yes",
      "console=tty0",
      "console=ttyS0",
      "coreos.autologin"
    ]
  },
  "ignition_id": "install-reboot.yaml"
}

Install Reboot Provisioning > Ignition

As you’ve seen above in the profile, our Install Reboot provisioning still use Ignition. Here is the corresponding configuration file

# vi /var/lib/matchbox/ignition/install-reboot.yaml
---
systemd:
  units:
    - name: installer.service
      enable: true
      contents: |
        [Unit]
        Requires=network-online.target
        After=network-online.target
        [Service]
        Type=simple
        ExecStart=/opt/installer
        [Install]
        WantedBy=multi-user.target
storage:
  files:
    - path: /opt/installer
      filesystem: root
      mode: 0500
      contents:
        inline: |
          #!/bin/bash -ex
          curl "{{.ignition_endpoint}}?{{.request.raw_query}}&os=installed" -o  ignition.json
          coreos-install -d /dev/sda -C {{.coreos_channel}} -V {{.coreos_version}} -i ignition.json {{if index . "baseurl"}}-b  {{.baseurl}}{{end}}
          udevadm settle
          systemctl reboot

{{ if index . "ssh_authorized_keys" }}
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        {{ range $element := .ssh_authorized_keys }}
        - {{$element}}
        {{end}}
{{end}}

The first boot out of RAM will then launch /opt/installer which is created on the fly. This bash script as you can see above, is just downloading the ignition file which correspond to the os=installed matcher and save it as locally as ignition.json.

Then the install script coreos-install will install Container Linux on /dev/sda from the channel configured in the group definition (see below), getting the asset from matchbox endpoint.

Finally udevadm settle waits for udevd to process the device creation events for all hardware devices before rebooting the system.

Beware, if your system boot again from PXE, it will then be re-installed. So make sure boot priority is set on disk.

Install Reboot Provisioning > Group

Next file required to achieve this persistent provisioning is the group which define important metadata used by ignition, and can use a selector to target specific machines with our install-reboot profile as shown below

# vi /var/lib/matchbox/groups/node2-install.json
{
  "id": "node2-install",
  "name": "Simple Container Linux - On Disk Install",
  "profile": "install-reboot",
  "selector": {
    "mac": "00:50:56:3C:0E:1A"
  },
  "metadata": {
    "coreos_channel": "stable",
    "coreos_version": "1298.6.0",
    "ignition_endpoint": "http://matchbox.local:8080/ignition",
    "baseurl": "http://matchbox.local:8080/assets/coreos"
  }
}

Note: If you do not put in here any selector (mac), beware, any machine PXE booting from PXE will be overwritten with a brand new install.

Install Reboot Provisioning > 2nd stage Group

Lastly, because provisioning is done in a two stage process, first one to install the system on-disk, second stage configure the installed system. Another group will be used once the system is installed. You are free to do whatever you want on the installed system, here we provision a single node etcd3 service.

# vi /var/lib/matchbox/groups/node2-provision.json
{
  "id": "node2",
  "name": "etcd node 2",
  "profile": "etcd3",
  "selector": {
    "mac": "00:50:56:3C:0E:1A",
    "os": "installed"
},
  "metadata": {
    "domain_name": "node2.local",
    "etcd_name": "node2",
    "etcd_initial_cluster": "node1=http://node2.local:2380",
    "ssh_authorized_keys": ["ssh-rsa ssh-pub-key-here"]
  }
}

the os: installed matcher ensure this only apply once the system is installed on disk.

troubleshooting

When the machine first auto login, you can watch the disk installation happening

journalctl -u installer.service -f

When done, the machine will then reboot and this time won’t auto login. You should be able to access it thru SSH from your provisioner node if you’ve setup your SSH keys accordingly

# ssh core@172.17.8.22 -i ~/.ssh/yoursshkey

Check etcd3 container is running

# rkt list
UUID        APP IMAGE NAME          STATE   CREATED     STARTED     NETWORKS
def3b764    etcd    quay.io/coreos/etcd:v3.1.0  exited  5 seconds ago   5 seconds ago

Check etcd3 node works ok

# etcdctl cluster-health
member 45ea15b368cee823 is healthy: got healthy result from http://node2.local:2379
cluster is healthy

If that’s not the case, start your investigation from the logs

# journalctl -fu etcd-member.service

We’ll stop here. If you want you can try to setup a 3-node etcd3 cluster using the same methodology. Good luck !

Matchbox on kubernetes

Matchbox can also easily be deploy on top of a kubernetes cluster. You just have to create Deployment and Service k8s API objects from the YAML definition provided in the source repository

# kubectl apply -f contrib/k8s/matchbox-deployment.yaml
# kubectl apply -f contrib/k8s/matchbox-service.yaml

Conclusion

It’s not a coincidence, matchbox integrates really well with Tectonic thru its gRPC API, which then provide a nice UI to add/remove server and manage matchbox templates.

matchbox is a nice little solution for a really common problem which has been solved in many different way. matchbox is powerfull and simple enough to treat your bare metal machines almost like virtual machines.

If you still maintain hardware and want to install Container Linux on them, you cannot afford not to look at matchbox which seems to be an ideal solution.

In a following up article we will show you how Tectonic can use matchbox gRPC API and Bootkube to easily provision a self hosted Kubernetes cluster. stay tuned !