Chef Tutorial For Beginners: Getting Started with an Introduction
Before getting started with Chef itself, we need to understand what is “configuration management”. This is because chef comes under the umbrella of tools tagged as “configuration management tools”.
Let me keep it really simple. Its very likely that if you are reading this article then you are currently managing(or is only aware of) configuration of systems in the environment by manual method. ie: log in to the server, and do the required configuration changes. Be that creating a user, configuring ssh keys for a user, installing packages, upgrading or removing packages, modifying configuration files or any such operations.
If not manual, you must be using a combination of scripts that are specific to your environment + the manual method.
Scripts and manual methods will get the job done, if you only have to manage one or two machines. What if you have 100s or 1000s of servers to manage?. Just imagine the complexity and time required in setting up and managing all those fleet of servers with the correct configuration you want. Imagine a situation where you need to delete a particular user from all those servers, or upgrade a package on all those servers, or do something on all those servers.
So basically you need a reliable and consistent method for managing your configuration in your environment.
You basically need a method to completely recreate your infrastructure with all required servers with exact configurations as quickly as possible. Or I must say you should have the source code or the blueprint to recreate your entire fleet of servers with the exact same configuration if required, with no manual effort.
You also need a method to ensure that the systems are continuously intact with the correct configuration in place. This is where configuration management tools comes into picture.
With the emergence of public cloud companies like AWS, Rackspace, Google Cloud etc, anybody can leverage the pay per meter model provided by them. You can now set up infrastructure in the cloud with a lot of ease and no upfront cost associated.
Cloud provides cost advantages, flexibility, and agility to everybody without any cost and time associated with setting up a datacentre.
Almost all of these public cloud companies offer API services as well. API(Application Programming Interfaces) provide an easy to use interface to interact with the cloud. So you can achieve things in the cloud by simple command line utilities(things like storage, firewalls, servers can be created and modified by command line API calls). What this means is you can manage these infrastructure components in the cloud using the same configuration management utilities.
Chef being a configuration management tool, achieves all of the above mentioned things, both in a physical data centre and in public cloud by taking the same software development approach of managing things using code(do not be intimidated by the word code here...you do not need to be an expert programmer to learn and manage infrastructure using Chef). Hence Chef is popular by the tagline of “Infrastructure as Code”
The main benefit of using a configuration management tool like chef is the fact that the entire infrastructure blueprint can be documented, created and applied to any number of environments. This approach saves a lot of time and effort, and can also prevent human errors(as entire required configuration items are recorded in the blueprint..)
What is Chef?
Chef is a configuration management tool that is written in Ruby and Erlang. It's capable of managing both your on premise and cloud servers with ease.
You can easily manage up to 10000 nodes using chef. Replicating the infrastructure components is easy once we have them automated via chef.
Chef has three core components which are mentioned below.
- Chef Server: This central server holds all configuration data that the nodes will use for configuration.
- Workstation: This machine holds all the configuration data that can later be pushed to the central chef server. Several chef command line utilities will be available in this system, which can be used to interact with nodes, update configurations etc. This is the place from which most of the work happens on a day to day basis.
- Node: This is nothing but a client server/system that will be registered to the central chef server, from where it can pull configuration data that needs to be applied.
Let's understand each of the above mentioned components in a bit more detail.
Central Chef Server
This is a centrally located server that holds all details related to chef infrastructure. These details include full metadata of all the clients that are automated via chef. All configurations applicable to different clients in the architecture.
Chef runs in a server client mode. Each node has a chef client software installed, which will pull down the configuration that are applicable to that node from the central chef server.
The central chef server has an optional web interface which provides several administrative capabilities to users managing chef. Nodes can be deleted and configurations applicable to a node can be modified using this central web interface.
There are three different types of chef server available.
- Chef Solo : Actually chef solo is not a chef server. In fact it removes the need of having a central chef server to test configurations on nodes.
- Open Source Chef: This is completely free and open source chef, which you can install anywhere.
- Hosted Chef: This is paid, where opscode will manage your central chef server, which you can access/configure using the web interface. This makes you free from the responsibility of managing a central chef server yourself.
Consider workstation as a system that can be used to control central chef server. As depicted in the above diagram, there can be multiple workstations that can together manage a central chef server.
Workstations will do the below jobs.
- Writing cookbooks and recipes that will later be pushed to central chef server.
A cookbook is nothing but a unit that configures a particular thing on the node. Consider cookbooks as something that is designed to manage one specific component, service or application on the nodes. Let's take an example of installing and configuring MySQL database server on a node. In that case you will have a cookbook for MySQL, that will take care of installing the required version, applying the required configuration parameters in mysql configuration files, adding users into mysql, creating required databases etc. In short, all aspects of a particular component that needs to be configured on the node can be placed inside a cookbook.
- Managing Nodes on the central chef server.
The workstation system will have the required command line utilities, to control and manage every aspect of the central chef server. Things like adding a new node to the central chef server, deleting a node from the central chef server, modifying node configurations etc can all be managed from the workstation itself.
Basically workstation will have two main components.
1. Knife utility: This command line tool can be used to communicate with the central chef server from workstation. Adding, removing, changing configurations, of nodes in central chef server will be carried out by using this knife utility.
Cookbooks can be uploaded to central chef server using knife utility, Roles and environments can be managed using knife utility. Basically every aspect of central chef server can be controlled from workstation using knife utility.
2. A local Chef repository: This is the place where every configuration components of central chef server is stored. This chef respository can be synchronized with the central chef server (again using the knife utility itself.)
Nodes can be a cloud based virtual server or a physical server in your own data centre, that is managed using central chef server. The main component that needs to be present on the node is an agent that will establish communication with the central chef server.
This agent is called as Chef client.
Chef client does the following...
- Its responsible for interacting with the central chef server.
- It manages the initial registration of the node to the central chef server.
- It pulls down cookbooks, and applies them on the node, to configure the node.
- Periodic polling of the central chef server to fetch new configuration items if any.
In order for the chef client to configure the node depending upon the cookbooks it pulls down from central chef server, chef client needs a lot of details about the node. Details like which operating system version is the node using, which kernel version is installed on the node?, what is the ip address?, what is the cpu architecture?, how many cores are availble?, hostname, nic cards, mac addresses, operating system type(like whether its Debian or Red hat based), memory available and many more....
These details are provided to chef client by a built-in tool that comes along with chef-client. This tool is called Ohai. Ohai detects full details of the operating sytem, and will provide these detected details as JSON data. Consider it as a node profiler. Ohai provides a command called "ohai", running which will provide you with all the details of the system. It's possible to run ohai without chef, but its generally used along with Chef, to gather as much information about a node as possible. All these details provided by Ohai can be accessed via chef. These details can be used to then take configuration decisions on different nodes.
Let's imagine you have a configuration file that needs IP address of the system. You cannot hard code this IP detail in the central chef server(inside a cookbook). Because other nodes will also be using the same cookbook. So things like IP address etc should be an attribute variable that can be used in the central chef server, which will then get replaced by the correct IP on each node when chef-client runs(chef-client will grab the correct IP applicable to the node from ohai.)
You can also take a lot of intelligent decisions based upon the details provided by ohai.
Why should I Use Chef ?
- You can automate an entire infrastructure using Chef. All tasks that were manually being done, can now be done via chef.
- You can configure thousands of nodes within minutes using chef.
- Chef automation works with majority of the public cloud offerings.
- Chef will not only automate things..But it will keep the systems under consistent check, and confirm that the system is in fact configured the way it is required(Chef agent/client does this job). If somebody does a mistake of modifying a file, chef will correct it.
- An entire infrastructure can be recorded in the form of a chef repository, that can then be used as a blueprint to recreate the infrastructure from scratch.
Some working Principles of Chef
Idempotence: This is one of the core principles followed by many of the configuration management tools out there. Chef is no different. The word indicates one of the core principle of chef. Chef Agent can run multiple times on a system, and the end result achieved will always be the same. Configuration management tools that adhers to this principle of Idempotency, will not modify the system, if the system is already in the expected state. It will only modify things if things are different than the expected state.
Chef client acts as a thick client. What that means is...Chef client does most of the heavy lifting...ie: Contacting central chef server periodically, downloading required configurations applicable to the node, compiling those configurations locally on the node(note the fact that compilation is done on the node by the chef client..this reduces substantial level of load from the central chef server). This makes chef server.
You can access the next articles in this series using the below links.