From 3ed6e61103389200a754ded2e2663d43cf0c4661 Mon Sep 17 00:00:00 2001 From: Miroslav Ruda Date: Tue, 29 Jul 2008 13:58:42 +0000 Subject: [PATCH] notification examples, environment variables, collections --- org.glite.lb.doc/src/LBUG-Appendix.tex | 29 ++++++- org.glite.lb.doc/src/LBUG-Introduction.tex | 23 ++++-- org.glite.lb.doc/src/LBUG-Tools.tex | 16 +--- org.glite.lb.doc/src/lbjp.bib | 12 ++- org.glite.lb.doc/src/notify.tex | 117 ++++++++++++++++++++++++++--- 5 files changed, 162 insertions(+), 35 deletions(-) diff --git a/org.glite.lb.doc/src/LBUG-Appendix.tex b/org.glite.lb.doc/src/LBUG-Appendix.tex index 979fc08..40b22df 100644 --- a/org.glite.lb.doc/src/LBUG-Appendix.tex +++ b/org.glite.lb.doc/src/LBUG-Appendix.tex @@ -22,7 +22,6 @@ Complete list of all job' states together with their description follows. \newpage \section{Environment variables} \label{a:environment} -\TODO{ruda: projit} Complete list of all environment variables affecting LB behaviour follows with their description and default values (if applicable). @@ -35,7 +34,7 @@ GLITE\_WMS\_LOG\_DESTINATION & % see also glite/lb/log_proto.h (org.glite.lb.common/interface/log_proto.h) address of \verb'glite-lb-logd' daemon (for logging events), in form \verb'hostname:port', - default value is \verb'localhost:9002' \\ + default value is \verb'localhost:9002'\\ GLITE\_WMS\_LOG\_TIMEOUT & % see also glite/lb/timeouts.h (org.glite.lb.common/interface/timeouts.h) timeout (in seconds) for asynchronous logging, @@ -49,7 +48,8 @@ GLITE\_WMS\_LOG\_SYNC\_TIMEOUT & GLITE\_WMS\_NOTIF\_SERVER & address of \verb'glite-lb-bkserver' daemon (for receiving notifications) in form \verb'hostname:port', for receiving notifications, - there is no default value \\ + there is no default value, + mandatory for \verb'glite-lb-notify' \\ GLITE\_WMS\_NOTIF\_TIMEOUT & % see also glite/lb/timeouts.h (org.glite.lb.common/interface/timeouts.h) timeout (in seconds) for notification registration, @@ -76,6 +76,29 @@ GLITE\_WMS\_LBPROXY\_USER & X509\_USER\_CERT and X509\_USER\_KEY & location of user credentials, default values are \verb'~/.globus/usercert.pem' and \verb'~/.globus/userkey.pem' \\ +GLOBUS\_HOSTNAME & + hostname to appear as event origin, useful only for debugging, + default value is hostname \\ +QUERY\_SERVER\_OVERRIDE & + values defined in QUERY\_SERVER will override also values in jobid in queries, + useful for debugging only, + default value \verb'no' \\ +QUERY\_JOBS\_LIMIT & + maximal size of results for query on jobs, + default value is \verb'0' (unlimited) \\ +QUERY\_EVENTS\_LIMIT & + maximal size of results for query on events, + default value is \verb'0' (unlimited) \\ +QUERY\_RESULTS & + specifies behavior of query functions when size limit is reached, + value can be \verb'None' (no results are returned), + \verb'All' (all results are returned, even if over specified limit), + \verb'Limited' (size of results is limited to size specified by QUERY\_JOBS\_LIMIT + or QUERY\_EVENTS\_LIMIT) \\ +CONNPOOL\_SIZE & + maximal number of open connections in logging library, + for developers only, + default value is \verb'50' \\ \end{tabularx} For backward compatibility, all \verb'GLITE_WMS_*' variables can be prefixed by diff --git a/org.glite.lb.doc/src/LBUG-Introduction.tex b/org.glite.lb.doc/src/LBUG-Introduction.tex index 823b5c1..f65ae02 100644 --- a/org.glite.lb.doc/src/LBUG-Introduction.tex +++ b/org.glite.lb.doc/src/LBUG-Introduction.tex @@ -75,7 +75,6 @@ see Sect.~\ref{local}. \subsection{Concepts} -\TODO{ruda: typy jobu -- simple, dag, collection} \subsubsection{Jobs and events} To keep track of user jobs on the Grid, we first need some reliable @@ -267,6 +266,20 @@ cannot switch the major job state back it still may carry valuable information to update the job state attributes. +% typy jobu +Jobs monitored by \LB service may have different type. For gLite jobs, \LB supports +simple jobs and jobs representing \emph{set of jobs} -- \emph{DAGs} (with dependencies between +subjobs described by direct acyclic graph) and \emph{collections} (without dependencies +between subjobs). +In these cases, subjobs are monitored in standard way, with one extensions - when job +status is changed, information is propagated also to the job representing corresponding +collection or DAG. +Job representing collection or DAG can be used to monitor status of the set, including +information like how many subjobs is already finished etc. +Support for non-gLite jobs, namely for PBS or Condor systems, is described in section +\ref{sec:nonglite}. + + \subsubsection{Event ordering}% \label{evorder} @@ -916,8 +929,7 @@ when RB decides where jobs get submitted. \subsubsection{Non-gLite event sources} -\TODO{ruda: zminit, ze umime zpracovavat condorove a PBS joby; jsou i v appendixu v -seznamu eventu, tak by se asi nekde zminit mely} +\label{sec:nonglite} \LB has been enhanced to support also non-gLite events, namely events from PBS or Condor batch systems \cite{hpdc07}. These events are handeled differently from gLite events, @@ -927,7 +939,8 @@ defined in \LB (see also Appendix \ref{a:jobstat}), events are processed separat from gLite events. Both PBS and Condor events has its own state machine that processes the events and determines the now state of the job. -Recently, there were also attempts to use \LB system to track syslog messages. -For a detailed description see \cite{TODO:LB4syslog}. +Recently, there were also attempts to use \LB system to transport different types of events: +Certificate Revocation Lists or syslog messages. For a detailed description see +\cite{LB4CRL}. diff --git a/org.glite.lb.doc/src/LBUG-Tools.tex b/org.glite.lb.doc/src/LBUG-Tools.tex index 5d23e38..d9b043a 100644 --- a/org.glite.lb.doc/src/LBUG-Tools.tex +++ b/org.glite.lb.doc/src/LBUG-Tools.tex @@ -6,21 +6,9 @@ might want to use. If not stated otherwise, the tools are distributed in the \verb'glite-lb-client' package. \subsection{Environment variables} -\TODO{ruda: tady kapitolku, ze prostredi je dulezite, odkaz na appendix, kde bude vsechno. common/src/param.c} - -Behaviour of the commands can be changed by setting some of the following -enviroment variables: - -\begin{tabularx}{\textwidth}{lX} -GLITE\_WMS\_LOG\_DESTINATION & address of \verb'glite-lb-logd' daemon (for logging events)\\ -GLITE\_WMS\_LOG\_TIMEOUT & timeout (in seconds) for asynchronous logging\\ -GLITE\_WMS\_LOG\_SYNC\_TIMEOUT & timeout (in seconds) for synchronous logging\\ -GLITE\_WMS\_NOTIF\_SERVER& address of \verb'glite-lb-bkserver' daemon (for receiving notifications)\\ -GLITE\_WMS\_NOTIF\_TIMEOUT& timeout (in seconds) for notification registration\\ -GLITE\_WMS\_QUERY\_SERVER& address of \verb'glite-lb-bkserver' daemon (for queries)\\ -X509\_USER\_CERT and X509\_USER\_KEY & location of user credentials\\ -\end{tabularx} +Behaviour of the commands can be changed by setting enviroment variables, specifing mostly +location of servers or setting timeouts for various operations. For a complete list of environment variables, their form and default value description, see Appendix~\ref{a:environment}. Setting the environment variable is for some commands mandatory, so reading the documentaion below and diff --git a/org.glite.lb.doc/src/lbjp.bib b/org.glite.lb.doc/src/lbjp.bib index fa39544..7582e2a 100644 --- a/org.glite.lb.doc/src/lbjp.bib +++ b/org.glite.lb.doc/src/lbjp.bib @@ -763,6 +763,12 @@ } - - - +@InProceedings{ LB4CRL, + author="Daniel Kouril and Ludek Matyska and Michal Prochazka", + title="A Robust and Efficient Mechanism to Distribute Certificate Revocation + Information Using the Grid Monitoring Architecture,", + pages="614-619", + booktitle="21st International Conference on Advanced Information Networking and +Applications Workshops (AINAW'07)", + year=2007 +} diff --git a/org.glite.lb.doc/src/notify.tex b/org.glite.lb.doc/src/notify.tex index af1d67d..0bf4142 100644 --- a/org.glite.lb.doc/src/notify.tex +++ b/org.glite.lb.doc/src/notify.tex @@ -17,10 +17,11 @@ socket. Within the notification validity, clients can disappear and even migrate. However, only a single active client for a notification is allowed. -\LB server and port to contact is specified with GLITE\_WMS\_NOTIF\_SERVER +\LB server and port to contact must be specified with GLITE\_WMS\_NOTIF\_SERVER environment variable. -\TODO{ruda, what's supported in \LBold (jen example), what in \LBnew} +\verb'glite-lb-notify' is supported on in \LBnew. In \LBold, \verb'glite-lb-notify' +with very limited functionality can be found in \verb'examples' directory. \verb'glite-lb-notify' support these actions: @@ -86,10 +87,13 @@ Following steps describe basic testing procedure of the notification system by registering a notification on any state change of this job and waiting for notification. -Register notification for given jobid: +Register notification for given jobid +(\verb'https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q'), +with validity 2 hours (7200 seconds): + \begin{verbatim} GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 bin/glite-lb-notify new \ - -j https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q + -j https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q -t 7200 \end{verbatim} returns: @@ -105,19 +109,112 @@ Wait for notification (with timeout 120 seconds): -i 120 https://skurut68-2.cesnet.cz:9100/NOTIF:tOsgB19Wz-M884anZufyUw \end{verbatim} +returns: +\begin{verbatim} + notification is valid until: '2008-07-29 15:04:41' (1217343881) + https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q Waiting + /DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda + https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q Ready + /DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda + https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q Scheduled + /DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda + https://skurut68-2.cesnet.cz:9100/D1qbFGwvXLnd927JOcja1Q Running + /DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda +\end{verbatim} + Destroy notification: \begin{verbatim} - GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 bin/glite-lb-notify destroy \ + GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 bin/glite-lb-notify drop \ + https://skurut68-2.cesnet.cz:9100/NOTIF:tOsgB19Wz-M884anZufyUw \end{verbatim} -\TODO{ruda, end of notification, set validity for two hours} +\subsubsection{Example: Waiting for notifications on all user's jobs} -When you let notification client running several minutes without any incomming notification, it will finish and remove your registration from the server automatically. +Instead of waiting for one job, user may want accept notification about +state changes of all his jobs. -\subsubsection{Example: Waiting for notifications on all user's jobs} +Register notification for user (DN obtained by \verb'grid-proxy-info -subject'): +\begin{verbatim} + GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 bin/glite-lb-notify new \ + -o "/DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda" +\end{verbatim} + + returns: + +\begin{verbatim} + notification ID: https://skurut68-2.cesnet.cz:9100/NOTIF:tOsgB19Wz-M884anZufyUw +\end{verbatim} + +And continue with \verb'glite-lb-notify receive' similarly to previous example. +But in this case, we want to display also other information about job -- +not job owner, but destination (where job is running) and location (which component is +serving job): + +\begin{verbatim} + GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 bin/glite-lb-notify receive \ + -i 240 -f destination,location \ + https://skurut68-2.cesnet.cz:9100/NOTIF:tOsgB19Wz-M884anZufyUw +\end{verbatim} + +returns: + +\begin{verbatim} + notification is valid until: '2008-07-29 15:43:41' (1217346221) + + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting + (null) NetworkServer/erebor.ics.muni.cz/ + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting + (null) destination queue/erebor.ics.muni.cz/ + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting + (null) WorkloadManager/erebor.ics.muni.cz/ + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting + (null) name of the called component/erebor.ics.muni.cz/ + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting + destination CE/queue WorkloadManager/erebor.ics.muni.cz/ + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Waiting + destination CE/queue WorkloadManager/erebor.ics.muni.cz/ + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Ready + destination CE/queue destination queue/erebor.ics.muni.cz/ + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Ready + destination CE/queue JobController/erebor.ics.muni.cz/ + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Ready + destination CE/queue LRMS/destination hostname/destination instance + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Ready + destination CE/queue LogMonitor/erebor.ics.muni.cz/ + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Scheduled + destination CE/queue LRMS/destination hostname/destination instance + https://skurut68-2.cesnet.cz:9100/qbRFxDFCg2qO4-9WYBiiig Running + destination CE/queue LRMS/worknode/worker node + +\end{verbatim} -\TODO{} \subsubsection{Example: Waiting for more notifications on one socket} -\TODO{ruda} +Foloving example demonstrates possibility to reuse existing socket for receiving +multiple notifications. Perl script \verb'./examples/notify.pl' (available in +examples directory) creates socket, which is then reused for all +\verb'glite-lb-notify' commands. + +\begin{verbatim} +GLITE_WMS_NOTIF_SERVER=skurut68-2.cesnet.cz:9100 NOTIFY_CMD=bin/glite-lb-notify \ + ./examples/notify.pl -o "/DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda" +\end{verbatim} + + returns: + +\begin{verbatim} +notification ID: https://skurut68-2.cesnet.cz:9100/NOTIF:EO73rjsmexEZJXuSoSZVDg +valid: '2008-07-29 15:14:06' (1217344446) +got connection +https://skurut68-2.cesnet.cz:9100/ANceuj5fXdtaCCkfnhBIXQ Submitted +/DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda +bin/glite-lb-notify: Connection timed out (read message) +got connection +https://skurut68-2.cesnet.cz:9100/p2jBsy5WkFItY284lW2o8A Submitted +/DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda +bin/glite-lb-notify: Connection timed out (read message) +got connection +https://skurut68-2.cesnet.cz:9100/p2jBsy5WkFItY284lW2o8A Waiting +/DC=cz/DC=cesnet-ca/O=Masaryk University/CN=Miroslav Ruda +\end{verbatim} -- 1.8.2.3