Reducing 2AM headaches part 2: Automate
Last time we talked about standard operating environments and how they can reduce stress and give back time. Now we'll talk about managing not only systems but the SOEs themselves. Repeatable, reproducible, and remote are the watchwords for managing our environments.
Automation is no newcomer to the world of system management, but it remains a crucial component. While admins may find reasons to not standardize, they usually will find some way of automating repetitive efforts. A quote that has stuck with me is "a good sysadmin is a lazy sysadmin." The premise here is if you have to do something manually more than once then you aren't really doing your job. This includes backups, redundancy, and documentation, but it is automation that is key to all aspects of this particularly successful brand of laziness.
We will carry our library of scripts to each new environment from the last, layering in new methods and skills along the way like coral. While Jane's scripts cover some scenarios and John's may overlap, they go about things in a manner that can cause meaningful changes in functionality or user expectations. Even with a common scripting framework, personal changes in style can lead to divergence. TMTOWTDI (Tim Toady) is a great motto for a programming language but not for an efficient and tight knit system administration team.
The goals for a successful automation system are consistency and ease of use. Making sure that actions happen in the same manner goes a long way to controlling divergence. Consistency also helps reduce side effects that can be created by taking differing paths to the same goal. Ease of use is key in making the changes quick and understandable, but also ensuring that all admins will use the tool rather than implementing their own one-off solutions. If your staff won't use the tool, you've wasted your time and money. With a framework that makes sense and provides a consistent view of the environment, your automation tools will provide real value to your admins and to the business at large.
With our goals in place, what are the selection criteria we need to evaluate a particular tool? These vary from environment to environment, but there are two basic camps that most automation tools can be placed in: generalist versus specialist. The generalist tools are designed to provide 'single pane of glass' coverage and visibility over a large number of components. This single authoritative source can have advantages in consistency by providing admins from different groups with a single workflow. However, these workflows are not always designed from the standpoint of the managed component. Specialist tools are designed to provide highly competent solutions for particular managed components. These tools can prove difficult to integrate, making admins use multiple tools to get work done.
I fall on the side of specialist tools, with a condition: they must be easily integrated with other tools and workflows. There must be an API that is easy to use (goal 2) from other tools. Chaining together tools with rich features provide deeper control over environments, allowing for more effective consistency. Recall our last discussion: world+dog SOEs are not effective SOEs, yet generalist tools that don't have the deep understanding of a particular OS or application will tend towards those sorts of configurations. A tool like Red Hat's Satellite provides a deep understanding of the installation and provisioning of Red Hat Enterprise Linux, as well as configuration management and update tools. The flexible API allows for integration with other tools.
By picking a specialist toolset, you can apply the principles of system management to your SOE development and deployment. Managing the cycle and definitions of your SOE in a single location will increase the likelihood that the SOEs will get used. Again, ease of use increases traction in the tools as well as the end products. With the right tool, you can manage complete configurations so the flow from bare metal to useful server can happen without intervention. These sorts of tools tend to be less brittle and more adaptable than template based SOE tools that need more care and feeding to make updates and changes to the SOE.
Standardization and automation combined will go far to remove needless work from all parts of the admin cycle. Virtual machines have only increased the amount of systems we interact with on a regular basis. We have more important things to do than baby sit systems that are applying patches, installing OS and applications, or hand editing configuration files. It may not seem like it, but by loosening your grip on the actions and taking a grasp of the definitions you will gain control over your world. This is the starting premise of the DevOps movement, which I think has potential but needs some more guidance from the Ops half of the world. Get automating and then get involved with the community conversations.