Chris Bugg

3 - Ansible

(2021-04-12)

With the announcement today of Cloudflare Pages being finally available for everyone it might already be time to take another look at this sites backend, but we'll save that for another week!

This week it's all about Ansible and how I've come to use and rely on it for so many things that used to be manual and/or fragile tools and processes.

Quick overview: Ansible is an automation tool for IT and other stuff. It takes all those fragile bash scripts and complex configurations that are only stored on end-systems and brings them back to a central home that's easy to version-control, QA, and share. In my case I use it for my personal computers so there's very little version-control, QA, or sharing, but I do enjoy a lot of the other benefits (like cleaning up my bad habbits)!

How it works: Ansible is open source but is generally used as a 'binary' (pip package) that you run on the machine you use to connect-to/control whatever other machines you're managing. There's no agents so it relies on direct-access and lots of homogeneity in the machines to work best. Generally this works well with flat internal networks or cloud-based servers that all leave SSH open to the world and run the same sort of OS and services ("Cattle, not pets"). The 'binary' essentially handles running all those one-off (then multi-reused) bash-scripts (re-'written' in YAML mostly) and abstracts away a lot of the pain of running five commands against 20 servers, simultaneously.

Now that we're done with my (probably) poor interpretation of Ansible, lets see some examples where it's useful for me. I used to run about 20 servers of various purposes at-home (mostly on an old tower-turned-ESXi-host). This meant it was a nightmare to do simple things that need to happen frequently, like just running apt update on all the servers every week. This obviously led to a worse-overall situation, since many servers just sat there running out-dated or vulnerable software with inconsistent security setups. After too many years, I finally decided to explore some of the IaC solutions, and settled on Ansible. It was simple, open source, and had a large user-base (read: easier to find solutions to edge-cases).

Over the past several years I haven't really embraced Ansible full-on, but what it's done for me is make it not matter how many machines I'm managing along the way because it just...scales! Now-days the ESXi server is gone and I'm manageing a hand-full of Raspberry Pis and some cloud-based VPSs. While the number and type of servers have changed, the management is the same. Every week-or-so I open up a terminal, cd over into my ansible folder, and run ansible-playbook -i hosts.yml maintenance/update_all.yml. After a few minutes, the command completes and I have a warm-fuzzy feeling that everything's good, sans all the time it would take me to do that manually.

Let's break that down!

ansible-playbook: This is just specifically telling Ansible that I'm running a 'playbook' (series of commands). Personally I just throw everything in playbooks because I don't have a need for the other complexities that come with a lot of the other bells-and-whistles that Ansible has.
-i hosts.yml: This tells Ansible to use this specific list of machines instead of any default ones (those generally reside out-side my folder-structure). In my case my hosts.yml file looks something like this:
<hosts.yml> [ubuntu_server_16] #plex #minecraft [ubuntu_server_20] #nginx #postgresql 123.456.789.123 ansible_user=ubuntu 456.789.123.456 ansible_user=ubuntu #motioneye #matrix # perhaps no longer needed due to new auto-interpreter detection? [ubuntu_server_20:vars] ansible_python_interpreter=/usr/bin/python3 [raspberry_pi_os] raspberrypi ansible_user=pi
It's mostly full of old machines that should be deleted, but the point is that it's easy to add new machines, remove old ones from the inventory (even if I don't), group machines, and set special cases (like using a specific python path for Ubuntu 20 machines).
maintenance/update_all.yml: And here's our re-written bash-script! Let's take a look at it:
<update_all.yml> --- - hosts: all become: yes tasks: - name: update and upgrade apt: update-cache: yes upgrade: yes - name: apt autoremove command: apt-get -y autoremove args: warn: false
As you can see we aren't just doing apt update anymore, we threw in some more fun as well! At the top there's some small administrative stuff: hosts: all(apply this to all hosts that are in the supplied hosts file, could also do a specific host by name or a group by name), become: yes (use sudo to become root). We start the real meat-and-potatos with a 'task' (a single thing that's being done, like an apt command) which uses the apt module. This is just a wrapper around apt, but it means that we don't have to worry as much about the specifics of how each version of apt behaves on different machines or in the future. We just say "use apt and do these things with it" and Ansible takes care of the rest. In this first example we're doing two things with apt, updating its cache and upgrading any packages that we now have newer versions of after the update. The equivallent is to run apt update && apt upgrade. Then we do something different with apt. Instead of using the Ansible wrapper, we run it as a command in the shell directly. This is a great example that when Ansible's built-in modules run-dry (there was no support for the 'autoremove' flag when I wrote this, but there is now!), you can just throw in that bash command!

Sometimes it's not the regular things that Ansible helps a-lot with, but the once-in-a-while things! One of the biggest problems with new machines is doing all the 10-thousand little things to lock them down and make sure they're ready-to-go in terms of security. In the previous incantation of this blog I had a post that detailed every step I would/should take to secure a brand-new (post-install) machine. This was a very handy-reference, but in the end I still had to do all that configuring by-hand, and so it didn't always get done. Now I use Ansible, and every new machine is locked-down with a few keystrokes:

Add the new machine to hosts.yml
ansible-playbook -i hosts.yml fresh_box/setup_fresh_box.yml
All machines (including the brand-new one) are secured!

One of the concepts Ansible tries hard to push is modularization. Each thing in it's place. This I find flows well into the new-machine problem. For each new-machine there isn't a list of static commands that need to happen, but rather a list of things that should change, and this is how it's represented in my Ansible files:
<setup_fresh_box.yml> --- - import_playbook: setup_passwordless_sudo.yml - import_playbook: disable_password_auth.yml - import_playbook: setup_unattended_upgrades.yml - import_playbook: setup_fail2ban.yml
Looking at setup_fresh_box.yml, it doesn't actually contain any tasks or anything! It's really just a reference to a bunch of other playbooks that each do their own thing, but together, setup a fresh box (server). Each of these seperate playbooks are totally independant and can be run at-will against any (or all) of the hosts.

Here's disable_password_auth.yml as an example of one of these playbooks:
<disable_password_auth.yml> --- - name: Log in as sudo user to disable root hosts: all become: yes tasks: - name: Disable root login over SSH lineinfile: dest=/etc/ssh/sshd_config regexp="^PermitRootLogin" line="PermitRootLogin no" state=present notify: - restart sshd - name: Disable password login lineinfile: dest=/etc/ssh/sshd_config regexp="^PasswordAuthentication" line="PasswordAuthentication no" state=present notify: - restart sshd handlers: - name: restart sshd service: name: sshd state: restarted
Looking at it now, it's probably mis-labeled a bit as it's doing more than just disabling password auth, but at least we're grouping the sshd changes into the same place. A new feature in this playbook is the concept of handlers. As we see at the bottom though, it's just a native Ansible feature to 'handle' services. In this case it handles the sshd service, and restarts it when we ask (after each of our changes). A more efficient approach might be to just do the restart once after both changes are made, but there's a lot more we could do with this!

So, Ansible is great! I'm very happy that I got into it when I did, and firmly beleive it's made my system management easier and faster. If you're dealing with managing more than a couple systems it's well worth the time to invest in researching these IaC options. Ansible isn't the only option and there are pros and cons to each. Find the one that will fit your work best!

Happy Hacking!
- Chris

Chris Bugg

Developer, Tech-Engineer, Woodworker

(and other stuff)

3 - Ansible