Using MPT-Status for RAID Monitoring in a Poweredge C6100 with Perc 6
Using MPT-Status for RAID Monitoring in a Poweredge C6100 with Perc 6
This post outlines the steps needed to get a CLI report of the conditions of your RAIDs in a Poweredge C6100 with a PERC 6/i RAID Controller.
Verify your controller type:
cat /proc/scsi/mptsas/0
ioc0: LSISAS1068E B3, FwRev=011b0000h, Ports=1, MaxQ=277
Download the following packages:
daemonize-1.5.6-1.el5.i386.rpm mpt-status-1.2.0-3.el5.centos.i386.rpm lsscsi-0.17-3.el5.i386.rpm
http://dl.nux.ro/utils/mpt-status/mpt-status-1.2.0-3.el5.centos.i386.rpm
http://dl.nux.ro/utils/mpt-status/daemonize-1.5.6-1.el5.i386.rpm
http://mirror.centos.org/centos/5/os/i386/CentOS/lsscsi-0.17-3.el5.i386.rpm
Install mtp-status:
rpm -ivh mpt-status-1.2.0-3.el5.centos.i386.rpm daemonize-1.5.6-1.el5.i386.rpm lsscsi-0.17-3.el5.i386.rpm
modprobe mptctl
echo mptctl >> /etc/modules
Verify your modules:
lsmod |grep mpt
mptctl 90739 0
mptsas 57560 4
mptscsih 39876 1 mptsas
mptbase 91081 3 mptctl,mptsas,mptscsih
scsi_transport_sas 27681 1 mptsas
scsi_mod 145658 7 mptctl,sg,libata,mptsas,mptscsih,scsi_transport_sas,sd_mod
run:
mpt-status or mpt-status -n -s
Also, you can use: lsscsi -l
This little script:
echo `mpt-status -n -s|awk ‘/OPTIMAL/ {print $1, “OK”}; /ONLINE/ {print $1, “OK”}; /DEGRADED/ {print $1, “FAILURE”}; /scsi/ {print $2}; /MISSING/ {print $1, “FAILURE”} ‘`
reports:
vol_id:0 OK phys_id:1 OK phys_id:0 OK 100% 100%
On a rebuild, it reports:
vol_id:0 FAILURE phys_id:2 OK phys_id:3 OK 75% 75%
Copy that script into a file called “check_raid”, and make it executable, E.G. 755
Edit nagios-statd on parcel1. Replace “sudo /customcommands/check_raid.pl -b -w1 -c1” with filename check-raid (without the switches) at line 20, and remove “sudo”
So, from this:
commandlist[‘Linux’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,”free | awk ‘$1~/^Swap:/{print ($3/$2)*100}'”,”sudo /customcommands/check_raid.pl -b -w1 -c1″)
To this:
commandlist[‘Linux’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,”free | awk ‘$1~/^Swap:/{print ($3/$2)*100}'”,”/customcommands/check_raid”)
Port 1040 will need to be opened in XenServer. Edit /etc/sysconfig/iptables and insert this line:
-A RH-Firewall-1-INPUT -p tcp -m tcp –dport 1040 -j ACCEPT
Restart the firewall:
service iptables restart
Output:
Flushing firewall rules: [ OK ]
Setting chains to policy ACCEPT: filter [ OK ]
Unloading iptables modules: [ OK ]
Applying iptables firewall rules: [ OK ]
Loading additional iptables modules: ip_conntrack_netbios_n[FAILED]
NOTE: The “FAILED” error above doesn’t seem to be a problemVerify that port 1040 is open:
Check the status of port 1040:
service iptables status
Output:
Table: filter
Chain INPUT (policy ACCEPT)
num target prot opt source destination
1 ACCEPT 47 — 0.0.0.0/0 0.0.0.0/0
2 RH-Firewall-1-INPUT all — 0.0.0.0/0 0.0.0.0/0
Chain FORWARD (policy ACCEPT)
num target prot opt source destination
1 RH-Firewall-1-INPUT all — 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy ACCEPT)
num target prot opt source destination
Chain RH-Firewall-1-INPUT (2 references)
num target prot opt source destination
1 ACCEPT all — 0.0.0.0/0 0.0.0.0/0
2 ACCEPT icmp — 0.0.0.0/0 0.0.0.0/0 icmp type 255
3 ACCEPT esp — 0.0.0.0/0 0.0.0.0/0
4 ACCEPT ah — 0.0.0.0/0 0.0.0.0/0
5 ACCEPT udp — 0.0.0.0/0 224.0.0.251 udp dpt:5353
6 ACCEPT udp — 0.0.0.0/0 0.0.0.0/0 udp dpt:631
7 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 tcp dpt:631
8 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 tcp dpt:1040
9 ACCEPT all — 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED
10 ACCEPT udp — 0.0.0.0/0 0.0.0.0/0 state NEW udp dpt:694
11 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22
12 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:80
13 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:443
14 REJECT all — 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited
running “nagios-statd” opens port 1040 on Parcel1 and listens for commands to be initiated by nagios_stat on the nagios server.
On the nagios server, in a file called “remote.orig.cfg, there are commands defined using “nagios-stat”: NOTE: These are from a working server and haven’t been modified to work with mpt. Some changes may need to be made. This is just an example of the interaction between Nagios server and client
Example:
define command{
command_name check_remote_raid
command_line $USER1$/nagios-stat -w $ARG1$ -c $ARG2$ -p $ARG3$ raid $HOSTADDRESS$
}
This command defined above is used in the “services.cfg” file.
Example:
define service{
use matraex-template
host_name mtx-lilac
service_description Lilac /data Raid
check_command check_remote_raid!1!1!1040
The three files needed on the C6100 node are:
/customcommands/check_raid (contents below) -rwxr-xr-x
/customcommands/nagios-statd (contents below) -rwxr-xr-x
/etc/init.d/nagios-statd (contens below) -rwxr–r–
Creating the soft links:
ln -s /etc/init.d/nagios-statd /etc/rc.d/rc3.d/K01nagios-statd
ln -s /etc/init.d/nagios-statd /etc/rc.d/rc3.d/S99nagios-statd
The -s = soft, and -f if used, forces overwrite.
/rc3.d/ designates runlevel 3
So when you do this:
ls -lt /customcommands/nagios-statd /etc/init.d/nagios-statd /customcommands/check_raid /etc/rc.d/rc3.d/*nagios-statd
This is what you should see:
lrwxrwxrwx 1 root root 22 Mar 6 08:08 /etc/rc.d/rc3.d/K01nagios-statd -> ../init.d/nagios-statd
-rwxr-xr-x 1 root root 365 Mar 6 07:59 /customcommands/check_raid
lrwxrwxrwx 1 root root 22 Mar 6 07:52 /etc/rc.d/rc3.d/S99nagios-statd -> ../init.d/nagios-statd
-rwxr-xr-x 1 root root 649 Mar 6 07:51 /etc/init.d/nagios-statd
-rwxr-xr-x 1 root root 9468 Mar 5 12:05 /customcommands/nagios-statd
Script Files:
NOTE: Here’s a little fix that helped me out. I had originally pasted these scripts into a DOS/Windows editor (wordpad) and it added DOS-type returns to the file, resulting in an error:
-bash: ./nagios-statd: /bin/sh^M: bad interpreter: No such file or directory
If you encounter this, do this:
Open the file in vi
hit “:” to go into command mode
enter “set fileformat=unix”
then :wq to quit.
/customcommands/check_raid:
#!/bin/bash
EXECFILE=/usr/sbin/mpt-status
if [ ! -e $EXECFILE ] ; then
echo
echo “Error $EXECFILE is not installed, please install before running”
echo
echo “Usage $0”;
echo
exit 10
fi
echo `$EXECFILE -n -s|awk ‘/OPTIMAL/ {print $1, “OK”}; /ONLINE/ {print $1, “OK”}; /DEGRADED/ {print $1, “FAILURE”}; /scsi/ {print $2};
/MISSING/ {print $1, “FAILURE”} ‘`
/customcommands/nagios_statd
#!/usr/bin/python
import getopt, os, sys, signal, socket, SocketServer
class Functions:
“Contains a set of methods for gathering data from the server.”
def __init__(self):
self.nagios_statd_version = 3.09
# As of right now, the commands are for df, who, proc, uptime, and swap.
commandlist = {}
commandlist[‘AIX’] = (“df -Ik”,”who | wc -l”,”ps ax”,”uptime”,”lsps -sl | grep -v Paging | awk ‘{print $2}’ | cut -f1 -d%”)
commandlist[‘BSD/OS’] = (“df”,”who | wc -l”,”ps -ax”,”uptime”,None)
commandlist[‘CYGWIN_NT-5.0’] = (“df -P”,None,”ps -s -W | awk ‘{printf(“%6s%6s%3s%6s%sn”,$1,$2,” S”,” 0:00″,substr($0,22))}'”,None,None)
commandlist[‘CYGWIN_NT-5.1’] = commandlist[‘CYGWIN_NT-5.0’]
commandlist[‘FreeBSD’] = (“df -k”,”who | wc -l”,”ps ax”,”uptime”,”swapinfo | awk ‘$1!~/^Device/{print $5}'”)
commandlist[‘HP-UX’] = (“bdf -l”,”who -q | grep “#””,”ps -el”,”uptime”,None)
commandlist[‘IRIX’] = (“df -kP”,”who -q | grep “#””,”ps -e -o “pid tty state time comm””,”/usr/bsd/uptime”,None)
commandlist[‘IRIX64’] = commandlist[‘IRIX’]
commandlist[‘Linux’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,”free | awk ‘$1~/^Swap:/{print ($3/$2)*100}'”,”/customcommands/check_raid”)
commandlist[‘NetBSD’] = (“df -k”,”who | wc -l”,”ps ax”,”uptime”,”swapctl -l | awk ‘$1!~/^Device/{print $5}'”)
commandlist[‘NEXTSTEP’] = (“df”,”who | /usr/ucb/wc -l”,”ps -ax”,”uptime”,None)
commandlist[‘OpenBSD’] = (“df -k”,”who | wc -l”,”ps -ax”,”uptime”,”swapctl -l | awk ‘$1!~/^Device/{print $5}'”)
commandlist[‘OSF1’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,None)
commandlist[‘SCO-SV’] = (“df -Bk”,”who -q | grep “#””,”ps -el -o “pid tty s time args””,”uptime”,None)
commandlist[‘SunOS’] = (“df -k”,”who -q | grep “#””,”ps -e -o “pid tty s time comm””,”uptime”,”swap -s | tr -d -s -c [:digit:][:space:] | nawk ‘{print ($3/($3+$4))*100}'”)
commandlist[‘UNIXWARE2’] = (“/usr/ucb/df”,”who -q | grep “#””,”ps -el | awk ‘{printf(“%6d%9s%2s%5s %sn”,$5,substr($0, 61, 8),$2,substr($0,69,5),substr($0,75))}”,”echo `uptime`, load average: 0.00, `sar | awk ‘{oldidle=idle;idle=$5} END {print 100-oldidle}’`,0.00″,None)
# Now to make commandlist with the correct one for your OS.
try:
self.commandlist = commandlist[os.uname()[0]]
except KeyError:
print “Your platform isn’t supported by nagios-statd – exiting.”
sys.exit(3)
# Below are the functions that the client can call.
def disk(self):
return self.__run(0)
def proc(self):
return self.__run(2)
def swap(self):
return self.__run(4)
def uptime(self):
return self.__run(3)
def user(self):
return self.__run(1)
def raid(self):
return self.__run(5)
def version(self):
i = “nagios-statd ” + str(self.nagios_statd_version)
return i
def __run(self,cmdnum):
# Unmask SIGCHLD so popen can detect the return status (temporarily)
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
outputfh = os.popen(self.commandlist[cmdnum])
output = outputfh.read()
returnvalue = outputfh.close()
signal.signal(signal.SIGCHLD, signal.SIG_IGN)
if (returnvalue):
return “ERROR %s ” % output
else:
return output
class NagiosStatd(SocketServer.StreamRequestHandler):
“Handles connection initialization and data transfer (as daemon)”
def handle(self):
# Check to see if user is allowed
if self.__notallowedhost():
self.wfile.write(self.error)
return 1
if not hasattr(self,”generichandler”):
self.generichandler = GenericHandler(self.rfile,self.wfile)
self.generichandler.run()
def __notallowedhost(self):
“Compares list of allowed users to client’s IP address.”
if hasattr(self.server,”allowedhosts”) == 0:
return 0
for i in self.server.allowedhosts:
if i == self.client_address[0]: # Address is in list
return 0
try: # Do an IP lookup of host in blocked list
i_ip = socket.gethostbyname(i)
except:
self.error = “ERROR DNS lookup of blocked host “%s” failed. Denying by default.” % i
return 1
if i_ip != i: # If address in list isn’t an IP
if socket.getfqdn(i) == socket.getfqdn(self.client_address[0]):
return 0
self.error = “ERROR Client is not among hosts allowed to connect.”
return 1
class GenericHandler:
def __init__(self,rfile=sys.stdin,wfile=sys.stdout):
# Create functions object
self.functions = Functions()
self.rfile = rfile
self.wfile = wfile
def run(self):
# Get the request from the client
line = self.rfile.readline()
line = line.strip()
# Check for appropriate requests from client
if len(line) == 0:
self.wfile.write(“ERROR No function requested from client.”)
return 1
# Call the appropriate function
try:
output = getattr(self.functions,line)()
except AttributeError:
error = “ERROR Function “” + line + “” does not exist.”
self.wfile.write(error)
return 1
except TypeError:
error = “ERROR Function “” + line + “” not supported on this platform.”
self.wfile.write(error)
return 1
# Send output
if output.isspace():
error = “ERROR Function “” + line + “” returned no information.”
self.wfile.write(error)
return 1
elif output == “ERROR”:
error = “ERROR Function “” + line + “” exited abnormally.”
self.wfile.write(error)
else:
for line in output:
self.wfile.write(line)
class ReUsingServer (SocketServer.ForkingTCPServer):
allow_reuse_address = True
class Initialization:
“Methods for interacting with user – initial code entry point.”
def __init__(self):
self.port = 1040
self.ip = ”
# Run this through Functions initially, to make sure the platform is supported.
i = Functions()
del(i)
def getoptions(self):
“Parses command line”
try:
opts, args = getopt.getopt(sys.argv[1:], “a:b:ip:P:Vh”, [“allowedhosts=”,”bindto=”,”inetd”,”port=”,”pid=”,”version”,”help”])
except getopt.GetoptError, (msg, opt):
print sys.argv[0] + “: ” + msg
print “Try ‘” + sys.argv[0] + ” –help’ for more information.”
sys.exit(3)
for option,value in opts:
if option in (“-a”,”–allowedhosts”):
value = value.replace(” “,””)
self.allowedhosts = value.split(“,”)
elif option in (“-b”,”–bindto”):
self.ip = value
elif option in (“-i”,”–inetd”):
self.runfrominetd = 1
elif option in (“-p”,”–port”):
self.port = int(value)
elif option in (“-P”,”–pid”):
self.pidfile = value
elif option in (“-V”,”–version”):
self.version()
sys.exit(3)
elif option in (“-h”,”–help”):
self.usage()
def main(self):
# Retrieve command line options
self.getoptions()
# Just splat to stdout if we’re running under inetd
if hasattr(self,”runfrominetd”):
server = GenericHandler()
server.run()
sys.exit(0)
# Check to see if the port is available
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((self.ip, self.port))
s.close()
del(s)
except socket.error, (errno, msg):
print “Unable to bind to port %s: %s – exiting.” % (self.port, msg)
sys.exit(2)
# Detach from terminal
if os.fork() == 0:
# Make this the controlling process
os.setsid()
# Be polite and chdir to /
os.chdir(‘/’)
# Try to close all open filehandles
for i in range(0,256):
try: os.close(i)
except: pass
# Redirect the offending filehandles
sys.stdin = open(‘/dev/null’,’r’)
sys.stdout = open(‘/dev/null’,’w’)
sys.stderr = open(‘/dev/null’,’w’)
# Set the path
os.environ[“PATH”] = “/bin:/usr/bin:/usr/local/bin:/usr/sbin”
# Reap children automatically
signal.signal(signal.SIGCHLD, signal.SIG_IGN)
# Save pid if user requested it
if hasattr(self,”pidfile”):
self.savepid(self.pidfile)
# Create a forking TCP/IP server and start processing
server = ReUsingServer((self.ip,self.port),NagiosStatd)
if hasattr(self,”allowedhosts”):
server.allowedhosts = self.allowedhosts
server.serve_forever()
# Get rid of the parent
else:
sys.exit(0)
def savepid(self,file):
try:
fh = open(file,”w”)
fh.write(str(os.getpid()))
fh.close()
except:
print “Unable to save PID file – exiting.”
sys.exit(2)
def usage(self):
print “Usage: ” + sys.argv[0] + ” [OPTION]”
print “nagios-statd daemon – remote UNIX system monitoring tool for Nagios.n”
print “-a, –allowedhosts=HOSTS Comma delimited list of IPs/hosts allowed to connect.”
print “-b, –bindto=IP IP address for the daemon to bind to.”
print “-i, –inetd Run from inetd.”
print “-p, –port=PORT Port to listen on.”
print “-P, –pid=FILE Save pid to FILE.”
print “-V, –version Output version information and exit.”
print ” -h, –help Print this help and exit.”
sys.exit(3)
def version(self):
i = Functions()
print “nagios-statd %.2f” % i.nagios_statd_version
print “os.uname()[0] = %s ” % os.uname()[0]
print “Written by Nick Reinkingn”
print “Copyright (C) 2002 Nick Reinking”
print “This is free software. There is NO warranty; not even for MERCHANTABILITY or”
print “FITNESS FOR A PARTICULAR PURPOSE.”
print “nNagios is a trademark of Ethan Galstad.”
if __name__ == “__main__”:
# Check to see if running Python 2.x+ / needed because getfqdn() is Python 2.0+ only
if (int(sys.version[0]) < 2):
print “nagios-statd requires Python version 2.0 or greater.”
sys.exit(3)
i = Initialization()
i.main()
/etc/init.d/nagios-statd:
#!/bin/sh
#
# This file should have uid root, gid sys and chmod 744
#
if [ ! -d /usr/bin ]
then # /usr not mounted
exit
fi
killproc() { # kill the named process(es)
pid=`/bin/ps -e |
/bin/grep -w $1 |
/bin/sed -e ‘s/^ *//’ -e ‘s/ .*//’`
[ “$pid” != “” ] && kill $pid
}
# Start/stop processes required for netsaint_statd server
case “$1” in
‘start’)
/customcommands/nagios-statd -a <IP of Allowed Nagios Server>,<IP of Test Workstation> -p 1040
;;
‘stop’)
killproc nagios-statd
;;
*)
echo “Usage: /etc/init.d/nagios-statd { start | stop }”
;;
esac
Testing:
As you can see in the script file above, I’ve added the IP Address of a test workstation. This will allow me to simply telnet to a node in the C6100 and execute one of the commands defined in this section of the /customcommands/nagios-statd script:
# Below are the functions that the client can call.
def disk(self):
return self.__run(0)
def proc(self):
return self.__run(2)
def swap(self):
return self.__run(4)
def uptime(self):
return self.__run(3)
def user(self):
return self.__run(1)
def raid(self):
return self.__run(5)
At your workstation, telnet to <Node IP Address> 1040
When connected, the screen will be blank.
Type “raid”. The screen won’t echo this.
When you hat enter, you should see:
vol_id:0 OK phys_id:2 OK phys_id:3 OK 100% 100%
Now you’re ready to move on to the Nagios configuration.
Matt Long
03/06/2015