Version 8 (modified by grenville, 5 years ago) (diff) |
---|
DDT is a powerful debugging tool which allows you to step interactively through code, apply breakpoints, tracepoints, examine variables, debug memory, and more…
Documentation for how to use the tool is available (try http://content.allinea.com/downloads/userguide-forge.pdf), but how to get it running in the UM infrastructure is less obvious (though not difficult). The instructions given here refer to a UM 8.2 set up - there may be minor changes needed for other versions of the model, but they shouldn't amount to much (contact the CMS if you need further assistance).
You will need some familiarity with where the UM puts its various files and scripts. In particular, you will need a umui_runs directory for the job you wish to debug on ARCHER and you will need to modify the qsatmos script, usually found in $DATAW/bin.
Do this:
1. Build the UM with the -g flag set
2. Submit a run for the failing model but kill the job before it runs (qdel the job) - this will ensure that you have a umui_runs directory available, which is needed later
3. Edit the qsatmos script, change
if [[ "$OASIS" = true ]]; then aprun `cat OASIScoupled.conf` >> $OUTPUT else echo aprun -n $UM_IOS_NPES -N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK \ -S $NTASKS_PER_NUMANODE -ss $LOADMODULE >>$OUTPUT aprun -n $UM_IOS_NPES -N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK \ -S $NTASKS_PER_NUMANODE -ss $LOADMODULE >>$OUTPUT fi
to
if [[ "$OASIS" = true ]]; then aprun `cat OASIScoupled.conf` >> $OUTPUT else echo ddt -start -noqueue -n $UM_IOS_NPES -mpiargs "-N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK -S $NTASKS_PER_NUMANODE -ss" $LOADMODULE >>$OUTPUT ddt -start -noqueue -n $UM_IOS_NPES -mpiargs "-N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK -S $NTASKS_PER_NUMANODE -ss" $LOADMODULE >>$OUTPUT fi
4. Get an interactive ARCHER session - in this example, I requested to have an interactive session in the short queue, for 4 nodes for 20 minutes - you will be subject to the normal wait times when doing this:
grenvill@eslogin005 qsub -q short -X -IVl select=4,walltime=0:20:0 -A n02-cms qsub: waiting for job 2768460.sdb to start qsub: job 2768460.sdb ready -------------------------------------------------------------------------------- *** grenvill Job: 2768460.sdb started: 01/04/15 10:33:44 host: mom3 *** *** grenvill Job: 2768460.sdb started: 01/04/15 10:33:44 host: mom3 *** *** grenvill Job: 2768460.sdb started: 01/04/15 10:33:44 host: mom3 *** *** grenvill Job: 2768460.sdb started: 01/04/15 10:33:44 host: mom3 *** -------------------------------------------------------------------------------- grenvill@mom3:~>
At this stage you are on a job-launcher node (mom3 in this case) and can run aprun directly, ie launch a parallel job directly rather than through the scheduler.
5. cd to the umui_runs directory for the failing job (xlehy in this example), ie the directory created in step 2, load the allinea module and run the submit script interactively
grenvill@mom3 cd ~/umui_runs/xlehy-091105123 grenvill@mom3 module load allinea grenvill@mom3 ./umuisubmit_run
DDT should run - you'll see the DDT logo and a few seconds later the debugging window will appear like this:
It is probably best to ensure that the resources needed for the job you wish to debug match those requested in the interactive session - in this example I requested 4 interactive nodes and the job was configured to run on 4x12 MPI tasks each with 2 OMP threads for a total of 4 nodes.
Attachments (5)
-
ddt.png
(109.9 KB) -
added by grenville 5 years ago.
picture of DDT startuo window
-
ddt1.png
(50.3 KB) -
added by grenville 5 years ago.
debug start window
-
ddt2.png
(57.4 KB) -
added by grenville 5 years ago.
memory usage
-
ddt3.png
(24.2 KB) -
added by grenville 5 years ago.
memory stats-1
-
ddt4.png
(21.9 KB) -
added by grenville 5 years ago.
memory stats-2
Download all attachments as: .zip