There is a Nagios probe to check the service status of an \LB server node. It is distributed from the EMI repository and the name of the package is \texttt{emi-lb-nagios-plugins}.
\subsection{Tests Performed}
-Before starting the actual test the probe checks for existence and validity of a proxy certifiacate, and for availability of commands (essential system commands, various \LB Client commands and grid proxy manipulation commands).
+Before starting the actual test the probe checks for existence and validity of a proxy certificate, and for availability of commands (essential system commands, various \LB Client commands and grid proxy manipulation commands).
The following tests are performed by the probe. Various tests check the working status of various processes running on the \LB server node:
& \emph{Unexpected state of test job} & The state of the test job did not remain unchanged (\texttt{Submitted}) but neither did it reach status \texttt{Cleared} in the alotted time. All deamons seem to work but the processing is slow.\\
& \emph{Could not drop notification} & The owner should be able to drop their own notification. Failure to do so is unexpected but does not mean that the service is not functioning. \\
\hline
-\multirow{6}{*}{DOWN} & \emph{Unable to Get Server Version} & The server did not respond to a query for server version over the WS interface. It is probably not running or is unaccessible. \\
+\multirow{6}{*}{DOWN} & \emph{Unable to Get Server Version} & The server did not respond to a query for server version over the WS interface. It is probably not running, is unaccessible or SSL handshake failed due to faulty/outdated certificates/CRLs. \\
& \emph{Job Registration Failed Locally} & The probe was unable to perform the local side of job registration. This should be rare. \\
-& \emph{\LB Server Not Running} & The probe was unable to register a test job or a test notification. It is probably not running or is unaccessible. \\
+& \emph{\LB Server Not Running} & The probe was unable to register a test job or a test notification with the \LB server. It is probably not running or is unaccessible. \\
& \emph{Event Delivery Chain (Logger/Interlogger) Not Running} & The server process is running but events are not being delivered by \LB's local logger/interlogger. Check the Logger and the Interlogger. \\
& \emph{Notification Interlogger Not Running} & Events are being delivered correctly and server responds properly to status queries, but it its not delivering notification messages. The notification interlogger is probably not running.\\
\hline
\texttt{-h} & \texttt{-{}-help} & Print out simple console help \\
\texttt{-v[vv]} & \texttt{-{}-verbose} & Set verbosity level (\texttt{-{}-verbose} denotes a single \texttt{v}). \\
\texttt{-H} & \texttt{-{}-hostname} & \LB node address. Environmental variable \texttt{GLITE\_WMS\_QUERY\_SERVER} used if unspecified. \\
- \texttt{-p} & \texttt{-{}-port} & \LB server port. Other port port numbers (logger, WS interface) are derrived from it. Environmental variable \texttt{GLITE\_WMS\_QUERY\_SERVER} or default port \texttt{9000} used if unspecified. \\
- \texttt{-t} & \texttt{-{}-timeout} & Timeout in seconds. The minimum reasonable timeout is approx. 10\,s. There is no default, except the internal waiting cycle for notifications, which will time out after approx. 20\,s.\footnote{The probe adjusts the internal waiting cycle to spend a maximum of $\frac{3}{4}$ of the specified timeout interval while waiting for notifications to deliver. It will finish correctly before timing out if undelivered notifications are the only problem.} \\
+ \texttt{-p} & \texttt{-{}-port} & \LB server port. Other port numbers (logger, WS interface) are derrived from it. Environmental variable \texttt{GLITE\_WMS\_QUERY\_SERVER} or default port \texttt{9000} used if unspecified. \\
+ \texttt{-t} & \texttt{-{}-timeout} & Timeout in seconds. The minimum reasonable timeout is approx.~10\,s. There is no default, except the internal waiting cycle for notifications, which will time out after approx.~20\,s.\footnote{The probe adjusts the internal waiting cycle to spend a maximum of $\frac{3}{4}$ of the specified timeout interval while waiting for notifications to deliver. It will finish correctly before timing out if undelivered notifications are the only problem.} \\
\texttt{-T} & \texttt{-{}-tmpdir} & Directory to store temporary files. By default the probe uses \texttt{/var/lib/grid-monitoring/emi.lb} and falls back to \texttt{/tmp} if the former does not exist or is not writable. \\
\end{tabularx}
+\subsubsection{Environmental Variables}
+In essence the probe recognizes the same environmental variables as the \LB client. No environmental variables need to be set if hostname is specified as a command line argument to the probe.
+
+\begin{tabularx}{\textwidth}{p{4.5cm} X}
+\texttt{GLITE\_WMS\_QUERY\_SERVER} & \textbf{The} \LB server. This is the server that will be contacted and tested if no hostname is supplied to the probe. \\
+\texttt{GLITE\_LB\_SERVER\_PORT GLITE\_LB\_LOGGER\_PORT} & Specifies the \LB server port or the \LB local logger port, respectively. It is used only in case a hostname is given as a command line argument to the probe wit no port number. \\
+% This is not a very nice way to set two parapgraps aside. I hate the fixed width setting but I could not find any other solution.
+\end{tabularx}
\subsubsection{Sample Nagios Service Definition}
Simple definition to be included in \texttt{/etc/nagios/commands.cfg}: