wiki:Archer/DDT

Version 8 (modified by grenville, 5 years ago) (diff)

DDT is a powerful debugging tool which allows you to step interactively through code, apply breakpoints, tracepoints, examine variables, debug memory, and more…

Documentation for how to use the tool is available (try http://content.allinea.com/downloads/userguide-forge.pdf), but how to get it running in the UM infrastructure is less obvious (though not difficult). The instructions given here refer to a UM 8.2 set up - there may be minor changes needed for other versions of the model, but they shouldn't amount to much (contact the CMS if you need further assistance).

You will need some familiarity with where the UM puts its various files and scripts. In particular, you will need a umui_runs directory for the job you wish to debug on ARCHER and you will need to modify the qsatmos script, usually found in $DATAW/bin.

Do this:

1. Build the UM with the -g flag set

2. Submit a run for the failing model but kill the job before it runs (qdel the job) - this will ensure that you have a umui_runs directory available, which is needed later

3. Edit the qsatmos script, change

    if [[ "$OASIS" = true ]]; then
      aprun `cat OASIScoupled.conf` >> $OUTPUT
    else
      echo aprun -n $UM_IOS_NPES -N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK \
          -S $NTASKS_PER_NUMANODE -ss $LOADMODULE >>$OUTPUT
      aprun -n $UM_IOS_NPES -N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK \
          -S $NTASKS_PER_NUMANODE -ss $LOADMODULE >>$OUTPUT
    fi

to

    if [[ "$OASIS" = true ]]; then
      aprun `cat OASIScoupled.conf` >> $OUTPUT
    else
      echo ddt -start -noqueue -n $UM_IOS_NPES -mpiargs "-N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK  -S $NTASKS_PER_NUMANODE -ss" $LOADMODULE >>$OUTPUT
      ddt  -start -noqueue -n $UM_IOS_NPES -mpiargs "-N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK -S $NTASKS_PER_NUMANODE -ss" $LOADMODULE >>$OUTPUT
    fi

4. Get an interactive ARCHER session - in this example, I requested to have an interactive session in the short queue, for 4 nodes for 20 minutes - you will be subject to the normal wait times when doing this:

grenvill@eslogin005 qsub -q short -X -IVl select=4,walltime=0:20:0 -A n02-cms
qsub: waiting for job 2768460.sdb to start
qsub: job 2768460.sdb ready

--------------------------------------------------------------------------------
*** grenvill   Job: 2768460.sdb   started: 01/04/15 10:33:44   host: mom3 ***
*** grenvill   Job: 2768460.sdb   started: 01/04/15 10:33:44   host: mom3 ***
*** grenvill   Job: 2768460.sdb   started: 01/04/15 10:33:44   host: mom3 ***
*** grenvill   Job: 2768460.sdb   started: 01/04/15 10:33:44   host: mom3 ***

--------------------------------------------------------------------------------
grenvill@mom3:~> 

At this stage you are on a job-launcher node (mom3 in this case) and can run aprun directly, ie launch a parallel job directly rather than through the scheduler.

5. cd to the umui_runs directory for the failing job (xlehy in this example), ie the directory created in step 2, load the allinea module and run the submit script interactively

grenvill@mom3 cd ~/umui_runs/xlehy-091105123
grenvill@mom3 module load allinea
grenvill@mom3 ./umuisubmit_run

DDT should run - you'll see the DDT logo and a few seconds later the debugging window will appear like this:

picture of DDT startuo window

It is probably best to ensure that the resources needed for the job you wish to debug match those requested in the interactive session - in this example I requested 4 interactive nodes and the job was configured to run on 4x12 MPI tasks each with 2 OMP threads for a total of 4 nodes.

Attachments (5)

Download all attachments as: .zip