\newpage
\section{Environment variables}
\label{a:environment}
-\TODO{ruda: projit}
Complete list of all environment variables affecting LB behaviour follows with
their description and default values (if applicable).
% see also glite/lb/log_proto.h (org.glite.lb.common/interface/log_proto.h)
address of \verb'glite-lb-logd' daemon (for logging events),
in form \verb'hostname:port',
- default value is \verb'localhost:9002' \\
+ default value is \verb'localhost:9002'\\
GLITE\_WMS\_LOG\_TIMEOUT &
% see also glite/lb/timeouts.h (org.glite.lb.common/interface/timeouts.h)
timeout (in seconds) for asynchronous logging,
GLITE\_WMS\_NOTIF\_SERVER &
address of \verb'glite-lb-bkserver' daemon (for receiving notifications)
in form \verb'hostname:port', for receiving notifications,
- there is no default value \\
+ there is no default value,
+ mandatory for \verb'glite-lb-notify' \\
GLITE\_WMS\_NOTIF\_TIMEOUT &
% see also glite/lb/timeouts.h (org.glite.lb.common/interface/timeouts.h)
timeout (in seconds) for notification registration,
X509\_USER\_CERT and X509\_USER\_KEY &
location of user credentials,
default values are \verb'~/.globus/usercert.pem' and \verb'~/.globus/userkey.pem' \\
+GLOBUS\_HOSTNAME &
+ hostname to appear as event origin, useful only for debugging,
+ default value is hostname \\
+QUERY\_SERVER\_OVERRIDE &
+ values defined in QUERY\_SERVER will override also values in jobid in queries,
+ useful for debugging only,
+ default value \verb'no' \\
+QUERY\_JOBS\_LIMIT &
+ maximal size of results for query on jobs,
+ default value is \verb'0' (unlimited) \\
+QUERY\_EVENTS\_LIMIT &
+ maximal size of results for query on events,
+ default value is \verb'0' (unlimited) \\
+QUERY\_RESULTS &
+ specifies behavior of query functions when size limit is reached,
+ value can be \verb'None' (no results are returned),
+ \verb'All' (all results are returned, even if over specified limit),
+ \verb'Limited' (size of results is limited to size specified by QUERY\_JOBS\_LIMIT
+ or QUERY\_EVENTS\_LIMIT) \\
+CONNPOOL\_SIZE &
+ maximal number of open connections in logging library,
+ for developers only,
+ default value is \verb'50' \\
\end{tabularx}
For backward compatibility, all \verb'GLITE_WMS_*' variables can be prefixed by
\subsection{Concepts}
-\TODO{ruda: typy jobu -- simple, dag, collection}
\subsubsection{Jobs and events}
To keep track of user jobs on the Grid, we first need some reliable
the major job state back
it still may carry valuable information to update the job state attributes.
+% typy jobu
+Jobs monitored by \LB service may have different type. For gLite jobs, \LB supports
+simple jobs and jobs representing \emph{set of jobs} -- \emph{DAGs} (with dependencies between
+subjobs described by direct acyclic graph) and \emph{collections} (without dependencies
+between subjobs).
+In these cases, subjobs are monitored in standard way, with one extensions - when job
+status is changed, information is propagated also to the job representing corresponding
+collection or DAG.
+Job representing collection or DAG can be used to monitor status of the set, including
+information like how many subjobs is already finished etc.
+Support for non-gLite jobs, namely for PBS or Condor systems, is described in section
+\ref{sec:nonglite}.
+
+
\subsubsection{Event ordering}%
\label{evorder}
\subsubsection{Non-gLite event sources}
-\TODO{ruda: zminit, ze umime zpracovavat condorove a PBS joby; jsou i v appendixu v
-seznamu eventu, tak by se asi nekde zminit mely}
+\label{sec:nonglite}
\LB has been enhanced to support also non-gLite events, namely events from PBS
or Condor batch systems \cite{hpdc07}. These events are handeled differently from gLite events,
from gLite events. Both PBS and Condor events has its own state machine that processes the
events and determines the now state of the job.
-Recently, there were also attempts to use \LB system to track syslog messages.
-For a detailed description see \cite{TODO:LB4syslog}.
+Recently, there were also attempts to use \LB system to transport different types of events:
+Certificate Revocation Lists or syslog messages. For a detailed description see
+\cite{LB4CRL}.
\verb'glite-lb-client' package.
\subsection{Environment variables}
-\TODO{ruda: tady kapitolku, ze prostredi je dulezite, odkaz na appendix, kde bude vsechno. common/src/param.c}
-
-Behaviour of the commands can be changed by setting some of the following
-enviroment variables:
-
-\begin{tabularx}{\textwidth}{lX}
-GLITE\_WMS\_LOG\_DESTINATION & address of \verb'glite-lb-logd' daemon (for logging events)\\
-GLITE\_WMS\_LOG\_TIMEOUT & timeout (in seconds) for asynchronous logging\\
-GLITE\_WMS\_LOG\_SYNC\_TIMEOUT & timeout (in seconds) for synchronous logging\\
-GLITE\_WMS\_NOTIF\_SERVER& address of \verb'glite-lb-bkserver' daemon (for receiving notifications)\\
-GLITE\_WMS\_NOTIF\_TIMEOUT& timeout (in seconds) for notification registration\\
-GLITE\_WMS\_QUERY\_SERVER& address of \verb'glite-lb-bkserver' daemon (for queries)\\
-X509\_USER\_CERT and X509\_USER\_KEY & location of user credentials\\
-\end{tabularx}
+Behaviour of the commands can be changed by setting enviroment variables, specifing mostly
+location of servers or setting timeouts for various operations.
For a complete list of environment variables, their form and default value
description, see Appendix~\ref{a:environment}. Setting the environment variable
is for some commands mandatory, so reading the documentaion below and
}
-
-
-
+@InProceedings{ LB4CRL,
+ author="Daniel Kouril and Ludek Matyska and Michal Prochazka",
+ title="A Robust and Efficient Mechanism to Distribute Certificate Revocation
+ Information Using the Grid Monitoring Architecture,",
+ pages="614-619",
+ booktitle="21st International Conference on Advanced Information Networking and
+Applications Workshops (AINAW'07)",
+ year=2007
+}
Within the notification validity, clients can disappear and even migrate.
However, only a single active client for a notification is allowed.
-\LB server and port to contact is specified with GLITE\_WMS\_NOTIF\_SERVER
+\LB server and port to contact must be specified with GLITE\_WMS\_NOTIF\_SERVER
environment variable.
-\TODO{ruda, what's supported in \LBold (jen example), what in \LBnew}
+\verb'glite-lb-notify' is supported on in \LBnew. In \LBold, \verb'glite-lb-notify'
+with very limited functionality can be found in \verb'examples' directory.
\verb'glite-lb-notify' support these actions:
system by registering a notification on any state change of this job
and waiting for notification.
-Register notification for given jobid:
+Register notification for given jobid
+(\verb'https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q'),
+with validity 2 hours (7200 seconds):
+
\begin{verbatim}
GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 bin/glite-lb-notify new \
- -j https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q
+ -j https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q -t 7200
\end{verbatim}
returns:
-i 120 https://skurut68-2.cesnet.cz:9100/NOTIF:tOsgB19Wz-M884anZufyUw
\end{verbatim}
+returns:
+\begin{verbatim}
+ notification is valid until: '2008-07-29 15:04:41' (1217343881)
+ https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q Waiting
+ /DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda
+ https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q Ready
+ /DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda
+ https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q Scheduled
+ /DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda
+ https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q Running
+ /DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda
+\end{verbatim}
+
Destroy notification:
\begin{verbatim}
- GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 bin/glite-lb-notify destroy \
+ GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 bin/glite-lb-notify drop \
+ https://skurut68-2.cesnet.cz:9100/NOTIF:tOsgB19Wz-M884anZufyUw
\end{verbatim}
-\TODO{ruda, end of notification, set validity for two hours}
+\subsubsection{Example: Waiting for notifications on all user's jobs}
-When you let notification client running several minutes without any incomming notification, it will finish and remove your registration from the server automatically.
+Instead of waiting for one job, user may want accept notification about
+state changes of all his jobs.
-\subsubsection{Example: Waiting for notifications on all user's jobs}
+Register notification for user (DN obtained by \verb'grid-proxy-info -subject'):
+\begin{verbatim}
+ GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 bin/glite-lb-notify new \
+ -o "/DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda"
+\end{verbatim}
+
+ returns:
+
+\begin{verbatim}
+ notification ID: https://skurut68-2.cesnet.cz:9100/NOTIF:tOsgB19Wz-M884anZufyUw
+\end{verbatim}
+
+And continue with \verb'glite-lb-notify receive' similarly to previous example.
+But in this case, we want to display also other information about job --
+not job owner, but destination (where job is running) and location (which component is
+serving job):
+
+\begin{verbatim}
+ GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 bin/glite-lb-notify receive \
+ -i 240 -f destination,location \
+ https://skurut68-2.cesnet.cz:9100/NOTIF:tOsgB19Wz-M884anZufyUw
+\end{verbatim}
+
+returns:
+
+\begin{verbatim}
+ notification is valid until: '2008-07-29 15:43:41' (1217346221)
+
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting
+ (null) NetworkServer/erebor.ics.muni.cz/
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting
+ (null) destination queue/erebor.ics.muni.cz/
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting
+ (null) WorkloadManager/erebor.ics.muni.cz/
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting
+ (null) name of the called component/erebor.ics.muni.cz/
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting
+ destination CE/queue WorkloadManager/erebor.ics.muni.cz/
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting
+ destination CE/queue WorkloadManager/erebor.ics.muni.cz/
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Ready
+ destination CE/queue destination queue/erebor.ics.muni.cz/
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Ready
+ destination CE/queue JobController/erebor.ics.muni.cz/
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Ready
+ destination CE/queue LRMS/destination hostname/destination instance
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Ready
+ destination CE/queue LogMonitor/erebor.ics.muni.cz/
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Scheduled
+ destination CE/queue LRMS/destination hostname/destination instance
+ https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Running
+ destination CE/queue LRMS/worknode/worker node
+
+\end{verbatim}
-\TODO{}
\subsubsection{Example: Waiting for more notifications on one socket}
-\TODO{ruda}
+Foloving example demonstrates possibility to reuse existing socket for receiving
+multiple notifications. Perl script \verb'./examples/notify.pl' (available in
+examples directory) creates socket, which is then reused for all
+\verb'glite-lb-notify' commands.
+
+\begin{verbatim}
+GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 NOTIFY_CMD=bin/glite-lb-notify \
+ ./examples/notify.pl -o "/DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda"
+\end{verbatim}
+
+ returns:
+
+\begin{verbatim}
+notification ID: https://skurut68-2.cesnet.cz:9100/NOTIF:EO73rjsmexEZJXuSoSZVDg
+valid: '2008-07-29 15:14:06' (1217344446)
+got connection
+https://skurut68-2.cesnet.cz:9100/ANceuj5fXdtaCCkfnhBIXQ Submitted
+/DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda
+bin/glite-lb-notify: Connection timed out (read message)
+got connection
+https://skurut68-2.cesnet.cz:9100/p2jBsy5WkFItY284lW2o8A Submitted
+/DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda
+bin/glite-lb-notify: Connection timed out (read message)
+got connection
+https://skurut68-2.cesnet.cz:9100/p2jBsy5WkFItY284lW2o8A Waiting
+/DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda
+\end{verbatim}