From 60e576c1a85fb3bc6f50b5ae61df97b13f8b5fae Mon Sep 17 00:00:00 2001 From: =?utf8?q?Michal=20Voc=C5=AF?= Date: Wed, 21 Jun 2006 11:23:07 +0000 Subject: [PATCH] * some notes about performance testing --- org.glite.lb/doc/perftest.tex | 524 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 524 insertions(+) create mode 100644 org.glite.lb/doc/perftest.tex diff --git a/org.glite.lb/doc/perftest.tex b/org.glite.lb/doc/perftest.tex new file mode 100644 index 0000000..241e1af --- /dev/null +++ b/org.glite.lb/doc/perftest.tex @@ -0,0 +1,524 @@ +\documentclass{egee} +\usepackage{comment} + +\def\LB{L\&B} + +\title{\LB\ Performance Test Plan} +\author{CESNET EGEE JRA1 team} +\DocIdentifier{EGEE-JRA1-??} +\Date{\today} +\Activity{JRA1: Middleware Engineering and Integration} +\DocStatus{DRAFT} +\Dissemination{PUBLIC} +\DocumentLink{} + +%\def\req{\noindent\textbf{Prerequisities:}} +%\def\how{\noindent\textbf{How to run:}} +%\def\result{\noindent\textbf{Expected result:}} + +\def\path#1{{\normalfont\textsf{#1}}} +\def\code#1{\texttt{#1}} +\def\todo#1{\textbf{TODO:} #1} + +\begin{document} + +\input{frontmatter} +\newpage +\tableofcontents +\newpage + +\section{Rationale} +\todo{} + +\begin{verbatim} + +L&B Performance Testing +======================= + +- all source modifications for tests are in CVS, conditionaly compiled + only with appropriate symbol + +- binaries for all tests are built using special property + for ant target (or environment variable for Makefile), which + compiles sources using the right #define combinations + +- component tests are run by shell scripts located under component + directories, these tests may require binaries from other components, + though + +- all tests use sequence of events for typical jobs (small job, big + job, small DAG, big DAG) prepared beforehand. These events are + stored in files in ULM format in CVS. + +- events are generated by stresslog program, which reads ULM text of + events for particular test job and logs the event sequence directly + by calling *_DoLogEvent. The number of test jobs is + configurable. Stresslog inserts into every event timestamp when the + event was generated and sent.* + +- event are consumed by breaking normal event processing either in the + component being tested or the next component in chain, that is + instrumented to read and discard events immediately. The consumption + itself is done by calling special function which takes current time, + extracts timestamp from event and prints the difference (ie. the + event processing time).* These "break points" are chosen to measure + throughput of the various component parts and to identify possible + bottlenecks within the components. + + * the only exception is test of the logging library itself + +- test jobs are preregistered within the LB if the test includes + bookkeeping server and/or proxy by the test script program and + their id's are stored in separate file to enable re-use by other + load-generating tools (status queries, for example) + +- test results: + - some numbers must be reported by component themselves, not by + the event generator (due to the asynchronous LB nature). The + test script collects those numbers and presents them as the test + result at the end of testing. + + - after completion test scripts print the table described for the + respective tests filled in with measured values (ie. the table + is not filled in manually by human tester) + + - event throughput = 1/(time_delivered - time_arrived) + * only if next event is sent after previous was delivered + +? measure job throughput for event patterns of typical jobs or deduce +job throughput from throughput of selected types of events? + + +I) Component tests + *************** + +- tests of the isolated components on one node +- may require binaries from other components to produce/consume events + +-------------------- +Logging library test +-------------------- + +* component: + org.glite.lb.client + +* binaries required: + logevent_libtest + +* test shell script: + perftest_loglib + +* input required: + - events + +* test description: + - measures time required to format given events into ULM. Events + are read from file, parsed into components, timestamped and + produced. + + - events produced: + - by calling logging function edg_wll_LogEvent*() + + - events consumed: + - discarded by logging function instead of sending via + appropriate protocol (LogEventMaster) + +* results: + + job type (size) throughput (100k jobs) + ----------------------------------------- + small job + big job + small DAG + big DAG + + + +---------------- +Locallogger test +---------------- + +* component: + org.glite.lb.logger + +* binaries required: + stresslog + glite_lb_logd_perf + glite_lb_logd_perf_nofile + - does not store events in file + glite_lb_interlogd_perf_empty + - consumes immediately after reading event + +* test shell script: + perftest_logd + +* input required: + - client and host certificates + - events + +* test description: + - measures time required for event to be sent from client to + local logger and processed by locallogger. Localloger is + either instructed (by option) or instrumented to skip some + parts of event processing: + a) no parse, no file, no ipc + glite_lb_logd_perf_nofile --noParse --noIPC + b) no file, no ipc + glite_lb_logd_perf_nofile --noIPC + c) no ipc + glite_lb_logd_perf --noIPC + d) normal operation + glite_lb_logd_perf + + no parse - LL does not parse events + no file - LL does not store events into files + no ipc - LL does not send events through socket to IL + + - events produced: + - stresslog sends events to logd using client->logd + protocol (*_DoLogEvent()) + + - events consumed: + i) after storing into files + ii) by "empty" IL + +* results: + + + +i) events stored in files + + throughput: small big small big + job job DAG DAG + ------------------------------------------------- + a) + b) + c) + d) + +ii) events sent to IL + + throughput: small big small big + job job DAG DAG + ------------------------------------------------- + a) + b) + c) + d) + + + +---------------- +Interlogger test +---------------- + +* component: + org.glite.lb.logger + +* binaries required: + stresslog + glite_lb_interlogd_perf + glite_lb_interlogd_perf_noparse + - does not parse events, server address is hardcoded + glite_lb_interlogd_perf_nosync + - does not call event_store_sync() + glite_lb_interlogd_perf_norecover + - recovery thread disabled + glite_lb_interlogd_perf_nosend + - events are consumed instead of sending + glite_lb_interlogd_perf_lazy + - lazy closing connection to bkserver + glite_lb_bkserverd_perf_empty + - consumes event immediately after receiving + +* test shell script: + perftest_interlogd + +* input required: + - host certificate + - events + +* test description: + - measures time the event travels through interlogger. + Interlogger is instrumented to skip some parts of eventh + processing for particular test, specifically tests include + these variants: + a) disabled event parsing. The server address + (eg. jobid) is hardcoded. + b) disabled event synchronization from files + c) disabled recovery thread + d) lazy bkserver connection close + e) normal operation + + - events produced: + 1) stresslog sends events to interlogger using the unix + domain socket and logd->interlogger protocol, events are + stored in files (stresslog behaves like logd) + TODO: pro toto neni funkce v producerske knihovne + 2) interlogger reads events from event files created by + stresslog (by recovery thread) + 3) stresslog stores events to files and every n-th + (optional argument) is sent also through the unix socket + + - events consumed: + i) discarded instead of being sent + ii) by "empty" bkserver + +* results: + + +i) events discarded +1) events received on socket +(options 2 and 3 are not tested) + + throughput: small big small big + job job DAG DAG + ------------------------------------------------- + a) + b) + c) + e) + + +ii) events sent to empty bkserver +1) events received on socket + + throughput: small big small big + job job DAG DAG + ------------------------------------------------- + a) + b) + c) + d) + e) + + +2) events recovered from files + + throughput: small big small big + job job DAG DAG + ------------------------------------------------- + d) + e) + + +3) events synced from files, every 10th event sent on socket + + throughput: small big small big + job job DAG DAG + ------------------------------------------------- + a) + b) + c) + d) + e) + + +------------ +LBProxy test +------------ + +* component: + org.glite.lb.proxy + +* binaries required: + stresslog + glite_lb_proxy_perf_noparse + - consumes events before parsing + glite_lb_proxy_perf_nostore + - consumes events before storing into database + glite_lb_proxy_perf_nostate + - consumes events before computing job status + glite_lb_proxy_perf_nosend + - consumes events before sending to interlogger + glite_lb_interlogd_perf_empty + - consumes immediately after reading event + +* test shell script: + perftest_proxy + +* input required: + - events + +* test description: + - measures time required for processing event by LB proxy. Test + is performed with (a)) and without (b)) checking for duplicit + events. + + - events produced: + - stresslog sends events using the IL protokol on local + socket (using DoLogEventProxy()) + + - events consumed: + i) before parsing + ii) before storing into database + iii) after storing into database + iv) after job status computation + v) normal operation + + + + +* results: + +a) with duplicity check: + + throughput: small big small big + job job DAG DAG + ------------------------------------------------- + i) + ii) + iii) + iv) + v) + + +b) without duplicity check: + + throughput: small big small big + job job DAG DAG + ------------------------------------------------- + i) + ii) + iii) + iv) + v) + + +-------------- +LB server test +-------------- + +* component: + org.glite.lb.server + +* binaries required: + stresslog + glite_lb_server_perf_noparse + - consumes events before parsing + glite_lb_server_perf_nostore + - consumes events before storing into database + glite_lb_server_perf_nostate + - consumes events before computing job status + +* test shell script: + perftest_server + +* input required: + - host certificates + - events + +* test description: + - measures time required for processing event by LB server. Test + is performed with (a)) and without (b)) checking for duplicit + events. + + - events produced: + - stresslog sends events using the IL protokol (using DoLogEventDirect()) + + - events consumed: + i) before parsing + ii) before storing into database + iii) after storing into database + iv) normal operation + +* results: + +a) with duplicity check: + + throughput: small big small big + job job DAG DAG + ------------------------------------------------- + i) + ii) + iii) + iv) + + +b) without duplicity check: + + throughput: small big small big + job job DAG DAG + ------------------------------------------------- + i) + ii) + iii) + iv) + + + +--------------------- +Job registration test +--------------------- + +* component: + org.glite.lb.server + org.glite.lb.proxy + +* binaries required: + stressreg + - generates registration events + glite_lb_bkserverd + glite_lb_proxy + glite_lb_bkserverd_perf_empty + glite_lb_proxy_perf_empty + +* test shell script: + perftest_jobreg + +* input required: + - host & user certificates + +* test description: + - measures time required to register given number of jobs (time + to process registration event). The registration event is + synchronous in principle, so it is possible to get results just + from the client (stressreg). Test variants include: + a) current implementation + b) implementation of connection pool at the client + c) parallel communication with server and proxy + + + - events produced: + - stressreg sends registration events by calling + edg_wll_RegisterJob*() + + - events consumed: + i) normally processed by server & proxy + ii) server replies immediate success + iii) proxy replies immediate success + +* results: + +a) current implementation + + throughput: one DAG DAG DAG + job (1000 nodes) (5000 nodes) (10000 nodes) + ----------------------------------------------------------------- + i) + ii) + iii) + + +b) connection pool + + throughput: one DAG DAG DAG + job (1000 nodes) (5000 nodes) (10000 nodes) + ----------------------------------------------------------------- + i) + ii) + iii) + + +c) parallel communication + + throughput: one DAG DAG DAG + job (1000 nodes) (5000 nodes) (10000 nodes) + ----------------------------------------------------------------- + i) + + + +\end{verbatim} + +\end{document} \ No newline at end of file -- 1.8.2.3