Changes between Version 1 and Version 2 of Archer/DDT


Ignore:
Timestamp:
01/04/15 10:46:30 (5 years ago)
Author:
grenville
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Archer/DDT

    v1 v2  
    1 ddt 
     1DDT is a powerful debugging tool which allows you to step interactively through code, apply breakpoints, tracepoints, examine variables, debug memory, and more... 
     2 
     3Documentation for how to use the tool is available, but how to get it running in the UM infrastructure is less obvious (though not difficult). The instructions given here refer to a UM 8.2 set up - there may be minor changes needed for other versions of the model, but they shouldn't amount to much (contact the CMS if you need further assistance). 
     4 
     5You will need some familiarity with where the UM puts its various files and scripts. In particular, you will need a umui_runs directory for the job you wish to debug on ARCHER and you will need to modify the qsatmos script, usually found in $DATAW/bin. 
     6 
     7Do this: 
     8 
     9'''1.''' Build the UM with the -g flag set 
     10 
     11'''2.''' Submit a run for the failing model but kill the job before it runs (qdel the job) - this will ensure that you have a umui_runs directory available, which is needed later 
     12 
     13'''3.''' Edit the qsatmos script, change 
     14 
     15{{{ 
     16    if [[ "$OASIS" = true ]]; then 
     17      aprun `cat OASIScoupled.conf` >> $OUTPUT 
     18    else 
     19      echo aprun -n $UM_IOS_NPES -N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK \ 
     20          -S $NTASKS_PER_NUMANODE -ss $LOADMODULE >>$OUTPUT 
     21      aprun -n $UM_IOS_NPES -N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK \ 
     22          -S $NTASKS_PER_NUMANODE -ss $LOADMODULE >>$OUTPUT 
     23    fi 
     24}}} 
     25 
     26to  
     27 
     28{{{ 
     29 
     30    if [[ "$OASIS" = true ]]; then 
     31      aprun `cat OASIScoupled.conf` >> $OUTPUT 
     32    else 
     33      echo ddt -start -noqueue -n $UM_IOS_NPES -mpiargs "-N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK  -S $NTASKS_PER_NUMANODE -ss" $LOADMODULE >>$OUTPUT 
     34      ddt  -start -noqueue -n $UM_IOS_NPES -mpiargs "-N $NTASKS_PER_NODE -d $NTHREADS_PER_TASK -S $NTASKS_PER_NUMANODE -ss" $LOADMODULE >>$OUTPUT 
     35    fi 
     36 
     37}}} 
     38 
     39'''4.''' Get an interactive ARCHER session - in this example, I requested to have an interactive session in the short queue, for 4 nodes for 20 minutes - you will be subject to the normal wait times when doing this: 
     40 
     41{{{ 
     42grenvill@eslogin005 qsub -q short -X -IVl select=4,walltime=0:20:0 -A n02-cms 
     43qsub: waiting for job 2768460.sdb to start 
     44qsub: job 2768460.sdb ready 
     45 
     46-------------------------------------------------------------------------------- 
     47*** grenvill   Job: 2768460.sdb   started: 01/04/15 10:33:44   host: mom3 *** 
     48*** grenvill   Job: 2768460.sdb   started: 01/04/15 10:33:44   host: mom3 *** 
     49*** grenvill   Job: 2768460.sdb   started: 01/04/15 10:33:44   host: mom3 *** 
     50*** grenvill   Job: 2768460.sdb   started: 01/04/15 10:33:44   host: mom3 *** 
     51 
     52-------------------------------------------------------------------------------- 
     53grenvill@mom3:~>  
     54}}} 
     55 
     56At this stage you are on a job-launcher node (mom3 in this case) and can run aprun directly. 
     57 
     58'''5.''' cd to the umui_runs directory for your failing job - the directory created in step '''2''' and run the submit script interactively 
     59 
     60{{{ 
     61grenvill@mom3 ./umuisubmit_run 
     62}}} 
     63 
     64