Create a Monitoring Cloud Product with Puppet and Nagios for openQRM Cloud
This How-To is about how to create a Cloud Product for Automatic-Application monitoring on the Cloud systems (VMs and Bare-Metal) with Puppet and Nagios 3 on an openQRM Cloud environment.
REQUIREMENTS
- One (or more) physical Server
- At least 1 GB of Memory
- At least 100 GB of Diskspace
- VT (Virtualization Technology) enabled in the Systems BIOS so that the openQRM Server can run Virtual Machines later
- Minimal Debian installation on a physical server
NOTE:
For this How-To, we assume you have successfully installed openQRM and have at least referenced our Cloud Computing with openQRM on Debian How-To before. If not, it is highly suggested to complete both before continuing
Goal
The Goal for this How-To is to create a "Monitoring" Cloud Product with the following properties:
- It needs to be fully automated without regular manual tasks for the system administrator(s)
- It needs to "selectable" as a regular Cloud Product in the openQRM Cloud Portal and in openQRM Enterprise Cloud Zones.
- It needs to provide a basic system monitoring (ping, ssh) but also should support a custom Nagios monitoring configuration for all standard applications which are available as Cloud Products e.g. Apache (http , https, certificate) MySQL (TCP 3306 , check_mysql ) and other custom Application Stacks added to the openQRM Puppet plugin
- It needs to send out Email notification to the purchaser of the Cloud System (VM or Bare-Metal)
- The dynamically generated monitoring configuration must not affect an existing static monitoring already configured
Idea
Puppet has very good features for automatically configuring Nagios. openQRM Cloud automatically generates the Puppet configuration for a specific Cloud User System according to the purchased application products. As an additional step it should store the Puppet configuration in the Database and with an asynchronous process (e.g. cron) automatically generate a custom monitoring Nagios configuration from the "Stored Configuration" in the Puppet Database.
Requirements
Since the "Monitoring" Cloud Product should be "purchasable" by the openQRM Cloud customer as any other Cloud Application Product it must be defined in the openQRM Cloud Product configuration. The best options to integrate the automated Nagios configuration is with either Puppet or Ansible. In this How-To we are going to use Puppet.
The next problem is that the Puppet Client for automatically configuring the target Cloud System is running on the Cloud System itself while the Nagios Monitoring is running on the openQRM Server. The solution for this is called "Stored Configuration". Therefore, we to customize the PuppetMaster configuration on the central openQRM Server in the following way:
Note: In the following configuration commands and Puppet classes please replace:
OPENQRM_SERVER_HOSTNAME with your openQRM Servers hostname
SECRET_DB_PASSWORD with your password for the openQRM Database.
OPENQRM_DOMAIN_NAME with your actual configured Domain name in the DNS-Plugin configuration
1. Install Ruby GEM
wget --proxy http://production.cf.rubygems.org/rubygems/rubygems-2.1.7.tgz
tar -xvzf rubygems-2.1.7.tgz
cd rubygems-2.1.7
ruby setup.rb
2. Install Rails und MySQL
gem install rails -v 2.2.2gem install mysql -- --with-mysql-config=/usr/bin/mysql_config
3. Create a Database and user for the stored Puppet configuration
create database puppet;
grant all privileges on puppet.* to puppet@localhost identified by 'SECRET_DB_PASSWORD';
flush privileges;
4. Configure the Puppet Master for stored configuration Create a local working copy of the openQRM puppet.conf
cp /usr/share/openqrm/plugins/puppet/web/puppet/puppet.conf puppet.conf
Then please adapt it as below
[master]
templatedir=/var/lib/puppet/templates
storeconfigs = true
dbadapter = mysql
dbuser = puppet
dbpassword = SECRET_DB_PASSWORD
dbserver = localhost
dbsocket = /var/lib/mysql/mysql.sock
dbname = puppet
[agent]
classfile = $vardir/classes.txt
server = OPENQRM_SERVER_HOSTNAME
Now copy back the puppet.conf into openQRM
cp puppet.conf /usr/share/openqrm/plugins/puppet/web/puppet/puppet.conf
5. Restart Puppet and check its new functionality
/etc/init.d/puppetmaster restart
After some time there are several new tables created in the Puppet Database storing the Puppet configuration.
Thus, the prerequisites have been completed.
Puppet recipe "c_monitoring" for the Cloud Server
Due to the existing Puppet integration within openQRM, we only need to create a suitable Puppet class.
The first part of the following c_monitoring Puppet class generates the basic monitoring (ssh, ping) for the target Cloud Server.
The second part starting with "if defined (Class [])" generates the specific Application Monitoring.
class c_monitoring {
# define Server, contactgroup and base service checks in Nagios
@@nagios_host { "hostdefinition_${hostname}":
use => "default-template",
host_name => "${hostname}",
alias => "${fqdn}",
address => "${ipaddress_eth1}",
check_command => "check-host-alive",
max_check_attempts => "3",
#checks_enabled => "1",
notification_interval => "60",
target => "/etc/nagios/conf.d/${hostname}_host_definition.cfg"
}
@@nagios_contactgroup { "contactgroup_${hostname}":
contactgroup_name => "contactgroup_${hostname}",
alias => "${hostname} Admins",
members => "nagiosadmin",
target => "/etc/nagios/conf.d/${hostname}_contactgroup_definition.cfg"
}
@@nagios_service { "check_ping_${hostname}":
use => "default-service-template",
host_name => "${hostname}",
service_description => "check_ping",
check_command => "check_ping!100.0,20%!500.0,60%",
max_check_attempts => "3",
normal_check_interval => "5",
retry_check_interval => "1",
check_period => "24x7",
notification_interval => "60",
notification_period => "24x7",
notification_options => "c,w,u,r",
contact_groups => "contactgroup_${hostname}",
target => "/etc/nagios/conf.d/${hostname}_service_definition.cfg"
}
@@nagios_service { "check_ssh_${hostname}":
use => "default-service-template",
host_name => "${hostname}",
service_description => "check_ssh",
check_command => "check_ssh",
max_check_attempts => "3",
normal_check_interval => "5",
retry_check_interval => "1",
check_period => "24x7",
notification_interval => "60",
notification_period => "24x7",
notification_options => "c,w,u,r",
contact_groups => "contactgroup_${hostname}",
target => "/etc/nagios/conf.d/${hostname}_service_definition.cfg"
}
# Monitoring for Apache
if defined(Class[webserver]) or defined(Class[lamp]) {
@@nagios_service { "check_http_${hostname}":
use => "default-service-template",
host_name => "${hostname}",
service_description => "check_http",
check_command => "check_http",
max_check_attempts => "3",
normal_check_interval => "5",
retry_check_interval => "1",
check_period => "24x7",
notification_interval => "60",
notification_period => "24x7",
notification_options => "c,w,u,r",
contact_groups => "contactgroup_${hostname}",
target => "/etc/nagios/conf.d/${hostname}_service_definition.cfg"
}
@@nagios_service { "check_https_${hostname}":
use => "default-service-template",
host_name => "${hostname}",
service_description => "check_https",
check_command => "check_https",
max_check_attempts => "3",
normal_check_interval => "5",
retry_check_interval => "1",
check_period => "24x7",
notification_interval => "60",
notification_period => "24x7",
notification_options => "c,w,u,r",
contact_groups => "contactgroup_${hostname}",
target => "/etc/nagios/conf.d/${hostname}_service_definition.cfg"
}
}
# Monitoring for Mysql
if defined(Class[database-server]) or defined(Class[lamp]) {
@@nagios_service { "check_mysql_${hostname}":
use => "default-service-template",
host_name => "${hostname}",
service_description => "check_mysql",
check_command => "check_mysql",
max_check_attempts => "3",
normal_check_interval => "5",
retry_check_interval => "1",
check_period => "24x7",
notification_interval => "60",
notification_period => "24x7",
notification_options => "c,w,u,r",
contact_groups => "contactgroup_${hostname}",
target => "/etc/nagios/conf.d/${hostname}_service_definition.cfg"
}
}
}
Puppet recipe for the Puppet Master Server (openQRM)
The following Puppet recipe generates the Nagios configuration from the stored configuration of the Puppet configuration in the Database and enriches with Cloud user name and email plus a Nagios reload. Since this is done periodically every 20 minutes it does not affect other monitoring processes.
class nagios {
Nagios_host <<||>> {
notify => [Exec[gen_nagios_contacts],Exec[make-nag-cfg-readable],Service[nagios]],
}
Nagios_service <<||>> {
notify => [Exec[gen_nagios_contacts],Exec[make-nag-cfg-readable],Service[nagios]],
}
Nagios_contactgroup <<||>> {
notify => [Exec[gen_nagios_contacts],Exec[make-nag-cfg-readable],Service[nagios]],
}
# get the Cloud User contact and email from the openQRM Database
exec { 'gen_nagios_contacts':
command => "/app/openqrm/tools/gen_nagios_contacts",
cwd => "/app/openqrm/tools",
path => "/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/global/bin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin",
}
# make the Nagios configuration readable
exec {'make-nag-cfg-readable':
command => "find /etc/nagios -type f -name '*cfg' | xargs chmod +r",
cwd => "/etc/nagios",
path => "/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/global/bin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin",
require => Exec[gen_nagios_contacts],
}
# Nagios Service
service { nagios:
ensure => running,
enable => true,
require => Exec[make-nag-cfg-readable],
}
}
node 'OPENQRM_SERVER_HOSTNAME' {
include nagios
}
Shell script for generating the us the "gen_nagios_contacts"
#!/bin/bash
NAGIOS_CONF_DIR="/etc/nagios/conf.d"
MYSQL="/usr/bin/mysql -uroot -pSECRET_DB_PASSWORD openqrm -s -N"
for DATEI in ${NAGIOS_CONF_DIR}/*_contactgroup_definition.cfg
do
HOSTNAME="`basename ${DATEI}|cut -d'_' -f1`"
USERNAME="`echo "select u.cu_name from cloud_requests r INNER JOIN cloud_users u ON r.cr_cu_id = u.cu_id where cr_status='3' AND cr_appliance_hostname = '${HOSTNAME}';"|${MYSQL}`"
EMAIL="`echo "select u.cu_email from cloud_requests r INNER JOIN cloud_users u ON r.cr_cu_id = u.cu_id where cr_status='3' AND cr_appliance_hostname = '${HOSTNAME}';"|${MYSQL}`"
if [ "X${USERNAME}" == "X" ]
then
rm /etc/nagios/conf.d/${HOSTNAME}_*
continue
fi
if [ "X${EMAIL}" == "X" ]
then
rm /etc/nagios/conf.d/${HOSTNAME}_*
continue
fi
echo "
define contact{
contact_name ${USERNAME}
use generic-contact
alias Kontakt fuer Appliance ${HOSTNAME}
email ${EMAIL}
}
" > ${NAGIOS_CONF_DIR}/user_${USERNAME}.cfg
echo "
define contactgroup {
members nagiosadmin, ${USERNAME}
contactgroup_name contactgroup_${HOSTNAME}
alias ${HOSTNAME} Admins
}
" >${DATEI}
done
# allow Nagios to re-read its configuration
chmod -R +r /etc/nagios/conf.d
Automatic removal of the Monitoring-configuration
To remove the Monitoring-configuration of a Server during stop the "/usr/share/openqrm/plugins/ip-mgmt/web/openqrm-ip-mgmt-external-dns-hook.php" hook from the IP-Management plugin can be used by adding:
remove)
shift
remove_dns_ptr_record $@
remove_dns_a_record $@
rm /etc/nagios/conf.d/${1}_*
/etc/init.d/nagios reload
/app/openqrm/tools/kill_node_in_storedconfigs_db.rb ${1}.OPENQRM_DOMAIN_NAME
Here the content for the /app/openqrm/tools/kill_node_in_storedconfigs_db.rb tool:
#!/usr/bin/env ruby
require 'puppet/rails'
Puppet[:config] = "/etc/puppet/puppet.conf"
Puppet.parse_config
pm_conf = Puppet.settings.instance_variable_get(:@values)[:master]
adapter = pm_conf[:dbadapter]
args = {:adapter => adapter, :log_level => pm_conf[:rails_loglevel]}
case adapter
when "sqlite3":
args[:dbfile] = pm_conf[:dblocation]
when "mysql", "postgresql":
args[:host] = pm_conf[:dbserver] unless pm_conf[:dbserver].empty?
args[:username] = pm_conf[:dbuser] unless pm_conf[:dbuser].empty?
args[:password] = pm_conf[:dbpassword] unless pm_conf
[:dbpassword].empty?
args[:database] = pm_conf[:dbname]
socket = pm_conf[:dbsocket]
args[:socket] = socket unless socket.empty?
else
raise ArgumentError, "Invalid db adapter %s" % adapter
end
ActiveRecord::Base.establish_connection(args)
if @host = Puppet::Rails::Host.find_by_name(ARGV[0].strip)
print "Killing #{ARGV[0]}..."
$stdout.flush
@host.destroy
puts "done."
else
puts "Can't find host #{ARGV[0]}."
end
Post Configuration
The following post configuration must be done at the end:
- Adding the configuration for the corresponding Nagios check Command Definitions
- Adding the new "Monitoring" Cloud Product to the openQRM Cloud via the openQRM Cloud Product Manager
- In our test on Redhat 6 the reload action in the init script /etc/init.d/nagios was faulty. Please make sure the "reload" action in the Nagios init script works correctly.
Congratulations!!
You have successfully completed this How-To!