Opened 13 years ago

Closed 12 years ago

#106 closed help (fixed)

Empty dgs

Reported by: jmarsham Owned by: lois
Component: UM Model Keywords:
Cc: Platform:
UM Version:


I submitted an 8 by 8 processor job 20 min (ncas tic code) on Thur pm. This ran Sat night/Sun am (this seems a long time - I must be doing something wrong?)

It has generated dg files ( /hpcx/devt/n02/n02-ncas/jmarsham/xcuxc/xcuxc.pp1 ) which can be read by xconv but appear to be empty (but still be quite big). Do you know what is wrong with these?

/hpcx/devt/n02/n02-ncas/jmarsham/xcuxc/xcuxc.pp1 starts with:
stty: tcgetattr: A specified file does not support the ioctl system call.but appears to be OK for 20min before it timed out (as expected).


Change History (3)

comment:1 Changed 13 years ago by lois

  • Owner changed from um_support to lois
  • Status changed from new to assigned

When you submit jobs with the n02-ncas tic code then you are accessing only 192 processors and so if you are asking for 64 of them, even if it is only for 20 minutes, you are competing with a lot of other NCAS users also trying to use these processors.

Your high resolution job has run ~176 timesteps but you are running with 2880 timestep per day so it is possible that your job simply ran out of time before it output your diagnostics. Do you know how much time your job should take to run the 19 model hours you requested, 20 minutes might not be sufficient?

The UM reserves the space for the diagnostics before it writes the actual diagnostics which is why when a job fails for some reason the diagnostics files look large but they are empty!

If your job really needs 64 processors to run and it needs more time to complete then you may be better off running on the main HPCx system in a 64 processor, 3 or 6 or 12 hour queue. I will need to add you to the n02-bjob group and you will have to change the UM tic code to n02-bjob. The only down side to moving to the main HPCx system is that it allows you to have access to more processors but ou may still queue as these 'small' jobs have a lower priority. Playing the game of getting through the queues on HPCx is not easy.

If your job doesn't need 64 processors then you could reduce the number of processors to 16 or 32 (4x4 or 4x8) and you could reduce the run length from 19 hours to something shorter just to see what you need to do to get the job running in 20 minutes or 1 hour, which are the queues that run during the working day (9am to 4pm).


comment:2 Changed 13 years ago by jmarsham

OK- I just couldn't see why the files were so big but empty. I now understand.

I used 20 min not so it would finish, but to test it. Giovani used 16 by 8 for 3 hours, so on an ncas tic code 4 by 8 for 12 hours would be appropriate (I've asked Giovanni exactly how long it took to run).


comment:3 Changed 12 years ago by ros

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.