yet.org

OpenStack Salt

After having reviewed Salt, Salt Formulas and reclass, it’s now time to put everything together to deploy OpenStack from openstack-salt project which use an elegant Model-Driven Architecture stored in a git repository which can be used for the life cycle management, auditing and documenting your infrastructure.

Imagine a service oriented, repeatable, documented tooling to deploy OpenStack which also can be used to

  • Deploy monitoring
  • Update OpenStack config
  • Audit the deployment through git workflow
  • Document it from a single source of truth
  • Upgrade OpenStack

With the following core principles

  • Atomicity - services can be spread around as you wish
  • Reusability/replacibility - service definition are reusable and can be replaced without affecting neigbouring services.
  • Roles - services implement various roles, most often client/server variations.
  • Life Cycle Management - never at rest your cloud environment will easily evolve.

Wouldn’t it be nice ? That’s exactly what we’ll show you in this article. With Salt, formulas and reclass we will store the data model of our infrastructure in a git repository. Making change to the instracture will then consist of git workflow:

  • Fork the Model Driven Architecture (MDA) git repository which contain all the reclass data (classes and control node)
  • Clone it
  • Update it
  • Commit your changes to the repository
  • Create a pull request
  • Apply changes to your infrastructure.

So from now on you’re making your changes in one place, and obviously never touching again any nodes directly. Isn’t it DevOps at its core ?

We improve the way we provision and operate the infrastructure, with a single toolset, but the general principles of the OpenStack Control Plane architecture stays almost the same with 3 controllers, stateless API endpoint load balanced by HAProxy, MySQL deployed with a single node for write and 2 more for reads.

But we also simplify the HA architecture by removing Pacemaker and Corosync, all the services are now active/active, there is only keepalived which migrate the VIP when necessary. It’s possible because instead of using Neutron L3 Agent which require active/passive mode, OpenContrail is used.

Services can now be spread on different servers, for example MySQL Galera can be running in its own VMs, we can model the infrastructure as required.

Salt Master installation

As a starter we need to prepare our Salt Master with the required information to drive the deployment. We could have automated this process but doing it step by step helps to understand the different moving parts involved. But if you prefer you’ll find a Heat template which does all of this for you on top of an OpenStack Cloud. A howto guide is available if you want to deploy OpenStack on top of OpenStack.

Provision a Ubuntu 14.04 server anywhere you want and follow our step by step approach to build your Salt Master node.

Update your system and make sure you have curl and wget installed

# apt-get update
# apt-get upgrade
# apt-get install curl wget

Note: Each command that start only with # should be executed on the Salt Master, if a hostname is specified make sure you switch to this node to run the specified command.

Add the tcpcloud nightly built repository

# echo "deb [arch=amd64] http://apt.tcpcloud.eu/nightly/ trusty main security extra tcp tcp-salt" > /etc/apt/sources.list
# wget -O - http://apt.tcpcloud.eu/public.gpg | apt-key add -
# apt-get clean
# apt-get update

Install Salt master and reclass

# apt-get install -y salt-master reclass

Install all the required Salt Formulas

# apt-get install -y salt-formula-linux salt-formula-reclass salt-formula-salt salt-formula-openssh salt-formula-ntp salt-formula-git salt-formula-graphite salt-formula-collectd salt-formula-sensu salt-formula-heka
# apt-get install -y salt-formula-horizon salt-formula-nginx salt-formula-memcached salt-formula-python salt-formula-supervisor salt-formula-sphinx

Configure your master file_root where the above formulas have been installed /usr/share/salt-formulas/env. And tell Salt to use reclass as an external node classifier.

# cat << 'EOF' >> /etc/salt/master.d/master.conf
file_roots:
  base:
  - /usr/share/salt-formulas/env
pillar_opts: False
open_mode: True
reclass: &reclass
  storage_type: yaml_fs
  inventory_base_uri: /srv/salt/reclass
ext_pillar:
  - reclass: *reclass
master_tops:
  reclass: *reclass
EOF

Clone your Model Driven Architecture repository. A production environment would use its own forked repository, but lets keep it simple for now.

# git clone https://github.com/tcpcloud/workshop-salt-model.git /srv/salt/reclass -b master

Configure reclass

# mkdir /etc/reclass
# cat << 'EOF' >> /etc/reclass/reclass-config.yml
storage_type: yaml_fs
pretty_print: True
output: yaml
inventory_base_uri: /srv/salt/reclass
EOF

Import all Salt Formulas service metadata into your reclass subdirectory

# mkdir -p /srv/salt/reclass/classes/service
# for i in /usr/share/salt-formulas/reclass/service/*; do
for> ln -s $i /srv/salt/reclass/classes/service/
for> done

Install and configuring the minion side of your Salt Master

# apt-get install -y salt-minion
# mkdir -p /etc/salt/minion.d
# cat << "EOF" >> /etc/salt/minion.d/minion.conf
id: cfg01.workshop.cloudlab.cz
master: localhost
EOF

id should reflect the config node name declared in /srv/salt/reclass/nodes/.

Restart your Salt Master and Minion services

# service salt-master restart
# rm -f /etc/salt/pki/minion/minion_master.pub
# service salt-minion restart

Finish by generating your nodes into the /srv/salt/reclass/nodes by running the following state

# salt 'cfg01*' state.sls reclass.storage

Refresh the Pillar data on all your nodes

# salt '*' saltutil.refresh_pillar

Before applying your states to your control node, check that everything looks good

# reclass-salt --top

Also try getting information out of your Minion

# salt "cfg01.workshop.cloudlab.cz" grains.get ipv4
cfg01.workshop.cloudlab.cz:
- 127.0.0.1
- 172.16.10.100

# salt-call state.show_top
local:
    ----------
    base:
        - git
        - linux
        - ntp
        - salt
        - openssh
        - reclass
        - horizon
        - nginx
        - collectd
        - sensu
        - heka

# salt-call pillar.items
<snip>

# salt-call grains.items
<snip>

You should be good to go !!!

Salt Master provisioning

Great, it is now time to apply our Salt States to our Master.

To better understand the provisioning of our Master node, let’s now run our Salt States (SLS) one by one

Start by the linux state which configure the linux OS, things like

  • apt repository and corresponding keys
  • timezone
  • /etc/hosts add mon01, mtr01, log01, web01, ctl[1|2|3], cmp[1|2] with [domainname workshop.cloudlab.cz]
  • http/https/ftp proxies
  • Install packages: python-apt, vim-nox and apt-transport-https
  • users
  • networking

For our first state run, I’ll show you 2 tricks, first look at the details of the state

salt-call state.show_sls linux

Run a test dry

salt-call state.sls linux test=True

Lets do the real deal now

salt-call state.sls linux

Everything should be green, re-run it if that’s not the case.

update Timezone

The model repository which I used configured my node in the Europe/Prague timezone, it’s a good example to show you how to update your infrastructure.

It’s pretty simple, just add the correct Europe/Paris timezone in /srv/salt/reclass/nodes/cfg01.workshop.cloudlab.cz.yml, it will override the one inherited from the model metadata, it’s the base principle of the hierarchical node classifier, reclass.

# vi /srv/salt/reclass/nodes/cfg01.workshop.cloudlab.cz.yml
<snip>
linux:
system:
  name: cfg01
  domain: workshop.cloudlab.cz
  timezone: Europe/Paris
<snip>

re-run your state, you should see 1 changed, details below

# salt-call state.sls linux
<snip>
----------
      ID: Europe/Paris
Function: timezone.system
  Result: True
 Comment: Set timezone Europe/Paris
 Started: 17:46:29.187478
Duration: 98.14 ms
 Changes:   
          ----------
          timezone:
              Europe/Paris
<snip>

Check it has been configured

ctl01# cat /etc/timezone
Europe/Paris

It is really easy to audit your environment by looking at the repository diff.

# cd /srv/salt/reclass
# git diff
diff --git a/nodes/cfg01.workshop.cloudlab.cz.yml b/nodes/cfg01.workshop.cloudlab.cz.yml
index ac4372f..0eb95c2 100644
--- a/nodes/cfg01.workshop.cloudlab.cz.yml
+++ b/nodes/cfg01.workshop.cloudlab.cz.yml
@@ -30,6 +30,7 @@ parameters:
     system:
       name: cfg01
       domain: workshop.cloudlab.cz
+      timezone: Europe/Paris

In a production environment we would have sent a Pull request to update the upstream repository to version control our changes.

OpenSSH state ensure there is a proper setting of OpenSSH as well as injecting the SSH key for users

# salt-call state.sls openssh

The next step configure our Salt Minion

# salt-call state.sls salt.minion

Now that our minion is configured you should find the following grains

# salt-call grains.item roles
local:
----------
roles:
    - git.client
    - reclass.storage
    - sensu.client
    - ntp.client
    - linux.storage
    - linux.system
    - linux.network
    - salt.minion
    - salt.master
    - collectd.client
    - heka.server
    - horizon.server
    - openssh.client
    - openssh.server
    - nginx.server

salt-master state ensure our master is up to date and running, create a directory structure, /srv/salt/env/ and symlink /srv/salt/env/prd to /usr/share/salt-formulas/env. It also ensure our formulas packages are up to date.

# salt-call state.sls salt.master

Finish all of this by restarting your Salt Minion

# service salt-minion restart

Maybe next time you’ll run the heat template but at least you know what’s happening and you can provision a control node everywhere you want, including on vSphere where heat templates aren’t welcome.

Next time, once your Salt Master is installed and configured, you can provision it like this

# salt 'cfg01*' state.sls linux,openssh,salt
# salt 'cfg01*' state.sls reclass.storage
# salt '*' saltutil.refresh_pillar

Salt Formulas

Let me give you a quick overview of the important directories within your Salt Master

/srv/salt/env/prd contain all the reusable Salt formulas, always the same for all deployments, delivered as packages for production, or from git repositories for development, to avoid to have to rebuild the packages each time.

You can list all of them with

# dpkg -l | grep formula

or get a list of what’s inside a specific one

# dpkg -L salt-formula-ntp

Note: Each formula contains support metadata, to define which support services should be configured, which metrics should be collected using collectd, which logs should be collected by Heka, which check should be run by Sensu, how to document this component, etc. All of this is defined in the salt-formula-<name>/meta directory, and then automatically generated from the model.

reclass model

/srv/salt/reclass is a clone of the infrastructure metadata model repository. You can check the origin with

# git remote -v

This content is specific for each deployment and contains IP Addresses, encrypted password, roles assignement, etc… which should be edited to fit your infrastructure requirements.

Our Model-driven Architecture is structured like shown below. I’m just showing the relevant stuff that you may have to touch for your deployment after forking the repo, see the README.rst for further details.

/srv/salt/reclass
|-- classes/
|   |-- service/
|   `-- system/
|       `-- reclass/storage/system/
|                            `-- workshop.yml
|       `-- openssh/server
|                     `--  single.yml
|       `-- openstack
|           |-- common
|           |-- compute
|           `-- control
|       `-- billometer/server
|                     `--  single.yml
|       `-- graphite/
|-- nodes/
|   |-- cfg01.domainname. 
|   `-- _generated
|-- verify.sh
`-- README.rst
path details
classes/service/... symlinks to salt formulas metadata folder, created at Salt Master provisioning time. Do not touch that part which comes from Salt Formulas packages
classes/system/reclass/storage/system/workshop.yml nodes definitions used to populate nodes/_generated directory, IP addresses, domain name. This yaml file is defined in our Salt Master Node, nodes/cfg01.workshop.cloudlab.cz.yml definition when we attach the class system.reclass.storage.system.workshop to it
classes/system/openssh/server/single.yml linux users and ssh keys
classes/system/openstack/common all openstack parameters, available to all nodes, passwords, IP addresses, domain name
classes/system/openstack/compute openstack compute related parameters, e.g. novnc proxy
classes/system/openstack/control openstack controller specific parameters, e.g. id router for keepalived.
classes/system/billometer/server/single.yml passwords
classes/system/graphite/ passwords
nodes/cfg01.domainname. definition of Salt Master itself, model repository, timezone, hostname, domainname, IP addresses, repository, Salt accept policy
_generated all nodes definitions, dynamicaly generated on Salt Master from classes/system/reclass/storage/system/workshop.yml which is declared in the classes of nodes/cfg01.workshop.cloudlab.cz.yml. So the filename can be changed there.
verify.sh generate and validate reclass-salt-model in isolated environment (docker, kitchen-docker)

Note: Text in bold above is something you should edit to customize to your environment.

OpenStack deployment

Let start by deploying the OpenStack Control Plane. From your Salt Master check that all Ubuntu 14.04 Minion keys has been accepted

# salt-key
Accepted Keys:
cfg01.workshop.cloudlab.cz
ctl01.workshop.cloudlab.cz
ctl02.workshop.cloudlab.cz
ctl03.workshop.cloudlab.cz
<snip>
Denied Keys:
Unaccepted Keys:
Rejected Keys:

We could orchestrate the overall deployment with salt orchestrate

# salt-run state.orchestrate orchestrate

but for learning purpose, we’ll do it step by step instead.

Controllers > ntp

Start the deployment by installing and configuring ntp on the control plane

# salt "ctl*" state.sls ntp

You can check the metadata associated with this formula

# salt "ctl*" pillar.item ntp

To check ntp status

# salt 'ctl*' service.status ntp
ctl02.workshop.cloudlab.cz:
    True
ctl01.workshop.cloudlab.cz:
    True
ctl03.workshop.cloudlab.cz:
    True

Controllers > linux | salt.minion | openssh

Start by applying the following states to your controllers

salt 'ctl*' state.sls linux,salt.minion,openssh

We’ve already detailed the linux state which does basic operating system configuration.

The salt.minion state install salt-minion and python-m2crypto packages, configure from a template your minion in /etc/salt/minion.d/minion.conf, and create a /etc/salt/grains.d/ directory.

Finally the openssh state does the following

  • install openssh server and client
  • can update the SSH banner
  • ensure the server is running
  • create /root/.ssh
  • inject a private key into /root/.ssh/id_rsa
  • configure ssh server and client
  • ensure the server is running
  • add git.tcpcloud.eu known host from its fingerprint which can be modified in classes/system/openssh/client/lab.yml

Controllers > keepalived

Provision keepalived, a daemon for cluster VIP based on VRRP, on the first controller for now

# salt 'ctl01*' state.sls keepalived

Check the IP Addresses on this controller

# salt 'ctl01*' cmd.run 'ip a'
<snip>
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:ca:d4:c1 brd ff:ff:ff:ff:ff:ff
    inet 172.16.52.201/24 brd 172.16.52.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 172.16.10.254/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:feca:d4c1/64 scope link 
       valid_lft forever preferred_lft forever

It confirm that our 172.16.10.254 VIP is on eth0, if you need to change where the VIP gets created just edit /srv/salt/reclass/classes/system/reclass/storage/system/workshop.yml.

If you edit this file you’ll have to update your generated nodes definitions by running again

# salt 'cfg01*' state.sls reclass.storage

This command also tries to pull from your git repo.

You can check the VIP address definition which comes from Pillar data defined in our model /srv/salt/reclass/classes/system/openstack/common/workshop.yml

# salt 'ctl01*' pillar.get keepalived:cluster:instance:VIP:address
ctl01.workshop.cloudlab.cz:
    172.16.10.254

You can now run it on controllers 2 and 3

# salt 'ctl0[23].*' state.sls keepalived

Controllers > Gluster

Gluster is used to store Keystone fernet tokens to avoid to copy the signed tokens all around, it’s only few small files. Gluster is also used by Glance to store images for small deployment to avoid to deploy swift or any other backend like Ceph for example, images are instead distributed around by Glusterfs.

Gluster provisioning is done over several steps. The first one install packages, ensure it is running, and create the required volume directories

# salt 'ctl*' state.sls glusterfs.server.service

Now Prepare the Glusterfs volumes, beware run this only on one controller.

Note: This time we will run the state locally from the controller itself, it’s not a best practice but it’s a good way to learn. You’ll get a detailled live reporting of the the actions taken on the controller by running it locally. While doing a production deployment you should not SSHing to any node to change anything but run everything from your Salt Master.

ssh in to ctl01 to run

ctl01# salt-call state.sls glusterfs.server.setup

Check Gluster status

ctl01# gluster peer status
Number of Peers: 2

Hostname: 172.16.10.103
Uuid: 622f8b69-cf0e-411f-be29-bbf40fe3cc8f
State: Peer in Cluster (Connected)

Hostname: 172.16.10.102
Uuid: 9878a625-7011-4e28-9070-c58f7960acd1
State: Peer in Cluster (Connected)

ctl01# gluster volume status
Gluster process                                 Port  Online  Pid
------------------------------------------------------------------------------
Brick 172.16.10.101:/srv/glusterfs/glance       49152   Y   3063
Brick 172.16.10.102:/srv/glusterfs/glance       49152   Y   8022
Brick 172.16.10.103:/srv/glusterfs/glance       49152   Y   7047
<snip>

In the output of the last command above, everything should be online (Y).

If any volume provisioning fails or isn’t online, delete the volume and re-run the above setup state

ctl01# gluster volume stop glance
ctl01# gluster volume delete glance
ctl01# gluster volume stop keystone-keys
ctl01# gluster volume delete keystone-keys
ctl01# salt-call state.sls glusterfs.server.setup

These two volumes will be mounted later on, when we will run the glusterfs.client state in the following directories

/srv/glusterfs/glance is mounted on /var/lib/glance/
/srv/glusterfs/keystone-keys is mounted on /var/lib/keystone/fernet-keys

Controllers > RabbitMQ

Let’s now install and configure RabbitMQ on our cluster, which is a critical component of the overall architecture, it takes few minutes.

# salt 'ctl*' state.sls rabbitmq

Metadata is defined in

# vi /srv/salt/reclass/classes/service/rabbitmq/server/cluster.yaml

Note: ${_param:cluster_local_address} is defined out of the single_address parameter which is the IP of the current node defined in /srv/salt/reclass/classes/system/reclass/storage/system/workshop.yml, this is used by all formulas.

When your Rabbitmq cluster is deployed, you can check its state

# rabbitmqctl cluster_status
[{nodes,[{disc,[rabbit@ctl01,rabbit@ctl02,rabbit@ctl03]}]},
 {running_nodes,[rabbit@ctl02,rabbit@ctl03,rabbit@ctl01]},
 {cluster_name,<<"openstack">>},
 {partitions,[]},
 {alarms,[{rabbit@ctl02,[]},{rabbit@ctl03,[]},{rabbit@ctl01,[]}]}]

Compared to Fuel the deployment of Rabbitmq here is pretty similar, apart from Pacemaker which is now gone, so starting/stopping the message bus is simpler

# service rabbitmq-server [status|stop|start]

Controllers > MySQL Galera

Let’s now deploy our database cluster, starting locally on ctl01, with -l info to get more details on what’s happening.

ssh to ctl01 to run

ctl01# salt-call state.sls galera -l info

Galera deployment impose to start it on one node first, others will then join the cluster afterward.

Once the previous command terminate, deploy the remaining controllers

# salt 'ctl0[23]*' state.sls galera

Galera state also creates the database tables and users for OpenStack services.

Check Galera status

ctl01# salt-call mysql.status
<snip>
wsrep_incoming_addresses:
    172.16.10.103:3306,172.16.10.102:3306,172.16.10.101:3306
<snip>

or with

ctl01# mysql -pworkshop -e'SHOW STATUS;'
<snip>
| wsrep_local_state_comment                | Synced                                                   |
| wsrep_cert_index_size                    | 2                                                        |
| wsrep_causal_reads                       | 0                                                        |
| wsrep_incoming_addresses                 | 172.16.10.103:3306,172.16.10.102:3306,172.16.10.101:3306 |
<snip>

If the status is wrong, take a look at the troubleshooting doc.

Controllers > HAProxy

Now comes HAProxy

# salt 'ctl*' state.sls haproxy

This state starts HAProxy on each controller and load balance Keystone, Nova, Glance, Cinder, Neutron, Heat, Rabbitmq, OpenContrail services. Special load balancing rules are configured for the Galera cluster which is in active/backup mode to avoid consistency issues. You can review its configuration at /etc/haproxy/haproxy.cfg

Check HAProxy is really running on all our controllers

# salt 'ctl*' cmd.run 'ps aux | grep haproxy'

For a particular service, you can check that haproxy is listening on its port

# salt 'ctl*' cmd.run 'netstat -tulnp | grep 5000'

As you can see HAProxy is listening on keepalived VIP on all three nodes, instead in the pacemaker case where it is only started on the active node,

HAProxy bind on the VIP, on the controllers that doesn’t have it, it’s also possible because the following parameter has been set inside /etc/sysctl.conf

net.ipv4.ip.nonlocal_bind = 1

Controllers > memcached | keystone

Install memcached and Keystone on the first node to avoid to see a failure when trying to create endpoints from two nodes in parallel. Also run it locally or it will fail due to a bug in the current release of Salt which is currently worked on.

ctl01# salt-call state.sls memcached,keystone

When its done, run on the two remaining controllers

ctl02# salt-call state.sls memcached,keystone
ctl03# salt-call state.sls memcached,keystone

Now it should all be green, re-run the state if that’s not your case.

You can have a look at parameters for Keystone

ctl01# salt-call pillar.data keystone

Check that everything looks good for Keystone

ctl01# source ~/keystonerc
ctl01# keystone user-list
ctl01# keystone tenant-list
ctl01# keystone endpoint-list

Great ! It’s time for a break ;)

Controllers > Glance

Install Glance from your Salt Master

# salt 'ctl*' state.sls glance

The structure of the data is the same across all the services

# salt 'ctl01*' pillar.data glance

Controllers > Glusterfs.client

Execute the glusterfs client to mount Glance and Keystone Glusterfs directories

# salt 'ctl*' state.sls glusterfs.client

Check that our two Glusterfs volumes are mounted as shared volume on all three nodes.

# salt 'ctl*' cmd.run 'df -h'
<snip>
172.16.10.254:/glance          18G   14G  3.0G  83% /var/lib/glance/images
172.16.10.254:/keystone-keys   18G   14G  3.0G  83% /var/lib/keystone/fernet-keys
<snip>

Glance configuration /etc/glance/glance-api.conf is using a standard filesystem_store_datadir of /var/lib/glance/images/ for image repository.

Check glance works properly by creating a Cirros image

ctl01# cd /root
ctl01# source ~/keystonerc
ctl01# wget http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-i386-disk.img
ctl01# glance image-create --name "cirros-0.3.4" --is-public true --disk-format qcow2 --container-format bare --progress --file /root/cirros-0.3.4-i386-disk.img
ctl01# glance image-list
[=============================>] 100%
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | 79b4436412283bb63c2cba4ac796bcd9     |
| container_format | bare                                 |
| created_at       | 2016-10-18T10:48:26.000000           |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | qcow2                                |
| id               | a78f7062-f606-4bd4-927a-76284a437f77 |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | cirros-0.3.4                         |
| owner            | aa55b506cd9d430c9505860ad292966b     |
| protected        | False                                |
| size             | 12506112                             |
| status           | active                               |
| updated_at       | 2016-10-18T10:48:32.000000           |
| virtual_size     | None                                 |
+------------------+--------------------------------------+

Check it is stored where you expect

ctl01# ls /var/lib/glance/images
a78f7062-f606-4bd4-927a-76284a437f77

Now re-run the keystone on ctl01 to recreate the fernet keys

ctl01# salt-call state.sls keystone

Check keys are there

ctl01# ls -alR /var/lib/keystone

You can use Salt to check it from your master

# salt 'ctl*' cmd.run 'ls -al /var/lib/keystone/fernet-keys'

Controllers > Cinder | Nova

When you have keystone running you pretty much can run all the rest ;) If you get failures, just re-run the states. But here we will keep our step by step approach.

Install Cinder and Nova, in this deployment we will use the LVM backend for Cinder

# salt 'ctl*' state.sls cinder,nova

Check Cinder status

ctl01# source ~/keystonerc
ctl01# cinder list

Check Nova status

ctl01# nova-manage service list
Binary           Host                                 Zone             Status     State Updated_At
nova-cert        ctl03.workshop.cloudlab.cz           internal         enabled    :-)   2016-10-14 14:56:56
nova-conductor   ctl03.workshop.cloudlab.cz           internal         enabled    :-)   2016-10-14 14:56:57
nova-consoleauth ctl03.workshop.cloudlab.cz           internal         enabled    :-)   2016-10-14 14:56:58
nova-scheduler   ctl03.workshop.cloudlab.cz           internal         enabled    :-)   2016-10-14 14:56:59

ctl01# source ~/keystonerc
ctl01# nova list

If you don’t see all of the services up and running, like in my case above, just update the allocation ratio, for example, as explained later in this article and re-run Nova state. Because we changed the configuration, services will be restarted which should hopefully fix the issue.

If you want to restart nova you can use the following built-in function

salt 'ctl*' service.restart nova-api

Controllers > Nova > cpu_allocation_ratio

Now Imagine you want to update the cpu allocation ratio, with OpenStack Salt its pretty easy. You would do the following

Clone the latest version of your model repository

# cd /srv/salt/reclass
# git remote -v
# git clone 

Look around for the cpu variable

# git grep cpu
nodes/docker/openstack/nova-controller.yml:      cpu_allocation_ratio: 16.0

Add a line like

cpu_allocation_ratio: 1

In the nova:controller section of /srv/salt/reclass/classes/system/openstack/control/workshop.yml

Create a pull request and test the application locally by just re-running your states. To check what will be changes, you can first run your state in the dummy mode

ctl01# salt-call state.sls nova test=True

Easy isn’t it ?

Controllers > Neutron

Then install the Neutron API layer which abstract away OpenContrail SDN API.

# salt 'ctl*' state.sls neutron

You can try to create network/subnet

ctl01# source ~/.keystonerc
ctl01# neutron net-create --router:external=true  --shared external 
ctl01# neutron subnet-create external 156.0.0.0/24
ctl01# neutron floatingip-create

OpenContrail

When installing OpenContrail it is recommended to start with the database backends to avoid any potential conflict when installing everything in one shot. It takes about 5 minutes to install Cassandra, Zookeeper*, Kafka*.

# salt 'ctl*' state.sls opencontrail.database

Check Cassandra Status

ctl01# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID                               Rack
UN  172.16.10.101  68.72 KB   256     66.1%             66b63122-1277-4662-9c9f-b41e8f74fd88  rack1
UN  172.16.10.102  84.7 KB    256     64.8%             9fab997e-536b-4150-8667-3eaac8eb91b9  rack1
UN  172.16.10.103  68.69 KB   256     69.1%             97217604-865e-4387-9531-7f2d3071c21c  rack1

ctl01# nodetool compactionstats
pending tasks: 0

ctl01# nodetool describecluster
Cluster Information:
Name: Contrail
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:

    6403a0ff-f93b-3b1f-8c35-0a8dc85a5b66: [172.16.10.101, 172.16.10.102, 172.16.10.103]

Wait until everything is bootstrapped, in the last command above, make sure the Schema version is the same on all nodes and the cluster name is Contrail. If there is anything wrong try the following workaround on the affected controller

ctl01# rm -rf '/var/lib/cassandra/*'
ctl01# service supervisor-database restart

Now that Cassandra is ready, we need to make sure Zookeeper is ready too. Run the Zookeeper CLI

ctl01# /usr/share/zookeeper/bin/zkCli.sh

list all db in zookeeper

[zk: localhost:2181(CONNECTED) 0] ls /  
[consumers, config, controller, admin, brokers, zookeeper, controller_epoch]

If not all nodes have the same list of db, restart Zookeper on the corresponding nodes.

ctl01 # service zookeeper restart

Check again and if you still have issues you can remove the following directory on the affected node and restart zookeeper from your Salt Master

ctl01# rm -rf /var/lib/zookeeper/version-2
cfg01# salt 'ctl*' cmd.run 'service zookeeper restart'

All our backends are now ready to continue with the rest of OpenContrail

# salt 'ctl*' state.sls opencontrail

When the previous state run ends, tt takes time, check contrail status

ctl02# contrail-status

Some services must only be active on a single node, you can check which node they run on

# salt 'ctl*' cmd.run 'contrail-status | egreg "svc|device|schema'

Contrail Web UI is accessible through https://172.16.10.254:8143, using admin/workshop as the default login/password. A rediction is also in place from HTTP/8080.

Conpute node

To provision your first compute node, run its state, we run it locally to get more feedback on what’s really happening live

cmp01# ssh 172.16.10.105
cmp01# salt-call state.apply -l info

To get more details on the different states that has been applied

cmp01# salt-call state.show_top

In our workshop lab, we have a single NIC card, so Salt state cannot be used to configure the network or it will cut out the connection, so for production deployment we are saying that is is much better to have at least two network card, one for dataplane and one for management, data plane network can then could be configured by Salt master without loosing the connectivity !!!

In our case you can configure it manually like that

cmp01# vi /etc/network/interfaces
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual
pre-up ifconfig eth0 up
post-down ifconfig eth0 down

auto vhost0
iface vhost0 inet static
    pre-up /usr/lib/contrail/if-vhost0
    address 172.16.10.105
    network_name application
    netmask 255.255.255.0
    gateway 172.16.10.1
    dns-nameservers <YOUR DNS SERVER>

Reboot your compute node and check that your IP should be bound to vhost0, which is required for Contrail . Also check contrail status

cmp01# contrail-status
<snip>

if vrouter deployment fails, I mean if it isn’t active in the output above, try the following

cmp01# dpkg -l | grep cloud-init
cmp01# apt-get purge cloud-init
cmp01# reboot [to recreate the networking configuration]

Contrail Post installation

In the above contrail-status command output you should see something like

<snip>
contrail-control    initializing (No BGP configuration for self)
<snip>

It just means that you still need to configure BGP peers, you can do this from the Contrail UI but here we’ll be using a configuration script, run the following commands to register your controllers:

ctl01# /usr/share/contrail-utils/provision_control.py --api_server_ip 172.16.10.254 --api_server_port 8082 --host_name ctl01 --host_ip 172.16.10.101 --router_asn 64512 --admin_password workshop --admin_user admin --admin_tenant_name admin --oper add
ctl01# /usr/share/contrail-utils/provision_control.py --api_server_ip 172.16.10.254 --api_server_port 8082 --host_name ctl02 --host_ip 172.16.10.102 --router_asn 64512 --admin_password workshop --admin_user admin --admin_tenant_name admin --oper add
ctl01# /usr/share/contrail-utils/provision_control.py --api_server_ip 172.16.10.254 --api_server_port 8082 --host_name ctl03 --host_ip 172.16.10.103 --router_asn 64512 --admin_password workshop --admin_user admin --admin_tenant_name admin --oper add

To register your vRouter run

/usr/share/contrail-utils/provision_vrouter.py --host_name cmp01 --host_ip 172.16.10.105 --api_server_ip 172.16.10.254 --oper add --admin_user admin --admin_password workshop --admin_tenant_name admin

Contrail vRouter on your compute node should be active, re-check it with contrail-status.

Adding a Compute node

To check metadata for a compute node you can run reclass

ctl01# reclass -n  cmp01.workshop.cloudlab.cz

The above metadata is generated from classes/system/reclass/storage/system/workshop.yml

If you want to add Compute Nodes that aren’t already part of our model, edit it at srv/salt/reclass/classes/system/reclass/storage/system/workshop.yml and add another compute node by just copying/editing an existing compute node

cmp03.workshop.cloudlab.cz:
      name: cmp03
      domain: workshop.cloudlab.cz
      classes:
      - system.linux.system.single
      - system.openstack.compute.workshop
      params:
        salt_master_host: ${_param:reclass_config_master}
        single_address: <COMPUTE_IP>
        opencontrail_compute_address: <COMPUTE_IP>
        opencontrail_compute_gateway: 172.16.10.1
        opencontrail_compute_iface: eth0

Run the reclass storage state to generate your new node yaml definition.

cfg01# salt 'cfg01*' state.sls reclass.storage

Controllers > collectd | sensu | heka | heat

Here is the list of controllers states

cfg01# salt 'ctl01*' state.show_top
ctl01.workshop.cloudlab.cz:
----------
base:
    - linux
    - ntp
    - salt
    - openssh
    - keepalived
    - haproxy
    - memcached
    - rabbitmq
    - git
    - keystone
    - python
    - collectd
    - sensu
    - heka
    - glusterfs
    - glance
    - nova
    - neutron
    - cinder
    - heat
    - opencontrail
    - galera

So far we’ve applied most of them apart collectd, sensu, heka and heat.

Let’s terminate our controllers deployments, by running the highstate on all of them

ctl01# salt-call  state.apply
ctl02# salt-call  state.apply
ctl03# salt-call  state.apply

Heka, Sensu, Collectd configuration files are generated from the metadata of each Salt formulas and stored respectively below /etc/heka/conf.d/, /etc/sensu/conf.d and /etc/collectd/conf.d.

Salt Master > Horizon

In our workshop lab, our Salt Master is the only node which gets a public IP so this is where Horizon will be deployed to be accessible from the outside world.

Run all states on the Salt Master to deploy everything including Horizon.

cfg01# salt-call state.apply

Horizon should the nbe available on http://172.16.10.100, default login/password is admin/workshop.

Metering

Apply the states to your metering node

# salt 'mtr*' linux,salt,openssh,ntp
# salt 'mtr*' state.highstate

Monitoring

To finish and to install the monitoring

# salt 'mon*' linux,salt,openssh,ntp
# salt 'mon*' state.sls git
# salt 'mon*' state.highstate

Debugging Salt

Salt can be run in debug mode with

cfg01# salt-call state.sls linux -l info or -l debug

Conclusion

Using the same tooling for deployment and operation is a big advantages when using Mirantis OpenStack Mk.20. Salt idempotent formulas to converge nodes based on state and its ability to do remote execution offers the best of both worlds. It then becomes possible to add new MySQL database for new projects, change passwords as well as deploying the environment from scratch. OpenStack Salt not only deploy OpenStack but allows to do life cycle mangement which is lacking from many deployment tooling around.

Our infrastructure is now versionned, documented to give us the ability to audit at any point in time, it’s no more necessary and even forbidden to do any manual hack on any live systems or not required to introduce any thrid party tooling to maintain the environment in operation.

Today the workflow is based on git and Salt mostly, individual services can map to bare metal or VM’s but in a near future we’ll see how Gerrit, Jenkins* and Artifactory* could be integrated to provide a full CI/CD environment for your private cloud. In a near future, Gerrit will be used to push review of Infrastructure model updates, it will then trigger Jenkins which will do the git mirror and merge on the Salt Master. Then you can pick manually any deploy operations you want to execute from Jenkins.

Finally, the same Model Driven Infractucture is currently evolving to offer the capability to deploy OpenStack on top of Kubernetes.

Stay tuned…