Opened 11 years ago

Closed 8 years ago

#354 closed help (fixed)

LAM job error

Reported by: a.elvidge Owned by: grenville
Component: UM Model Keywords:
Cc: Platform: HECToR
UM Version: 7.1

Description

Hello,

I am attempting to run a LAM job for the Antarctic Peninsula. I used the same domain as used in one of Andrew Orr's jobs. I get the following error in my .leave file:

aprun: -N cannot exceed -n
aprun: Exiting due to errors. Application aborted

Any ideas as to what Ive done wrong?

Thanks, Andy

Change History (22)

comment:1 Changed 11 years ago by ros

  • Owner changed from um_support to ros
  • Status changed from new to accepted

Hi Andy,

That error usually means you're trying to run using quad core but have specified less than 4 cores for the reconfiguration which you can't do.

I don't know which job this is to be able to check.

Please go to UMUI window Reconfiguration → General Recon Options and check that you have specified a configuration of 4 PEs or more and not something like 1x2.

Regards,
Ros.

comment:2 Changed 11 years ago by a.elvidge

Hi Ros,

Having changed what you suggested, and submitted the job again, it doesnt appear to give me a .leave file. Im pretty sure Ive given it long enough. Any ideas? It is job xenoi by the way.

I have used the tutorial job to give me my LBCs and have copied a new version of the tutorial job and tailored it (obviously unsuccessfully) to be a LAM for the Antarctic Peninsula.

Cheers, Andy

comment:3 Changed 11 years ago by ros

Hi Andy,

I'm afraid you need to wait a bit longer, the job is still in the queue waiting to be run. Sometimes it can take quite a while before a job will run.

The command qstat can be used to list all the jobs in the queues.

qstat -u aelvidge will list all of your jobs.

Regards,
Ros.

comment:4 Changed 11 years ago by a.elvidge

Ok.

I ran two jobs yesterday which never outputted a .leave file, only the .comp.leave files. Would this imply that I must have made a mistake somewhere, or that the jobs never even ran?

Thanks, Andy

comment:5 Changed 11 years ago by ros

Hi Andy,

Without being able to see the .comp.leave files, I'd surmise that there was an error somewhere. If the jobs were successfully submitted to the queues to run you should get a .leave file telling you what happened when the job was run. If the scheduler couldn't run the job for any reason you should receive an email.

If you want me to take a look to see if I can figure out what happened, you will need to give me read access (chmod g+r) to your um/umui_out directory and all those directories above it.

Regards,
Ros.

comment:6 Changed 11 years ago by ros

  • UM Version changed from <select version> to 7.1

comment:7 Changed 11 years ago by a.elvidge

Hi Ros,

No, don't worry, what ever that problem was I must have rectified it because the job is at least running now. However, I now get the following error:

"Number of land points does not agree with input namelist!"

I used the same number of land points as that used in Andrew Orr's job. What do 'land points' refer to exactly? How do I find out how many I need?

Thanks, Andy

comment:8 Changed 11 years ago by ros

Hi Andy,

There's not much to say about Land points they are simply gridpoints that contain land as indicated by the land/sea mask. There's a little more information about them w.r.t. LAM in the UM User Guide available on the CMS Website under UM Documentation → Met Office Docs.

To find out how many land points the model is expecting include the following environment variable in the UMUI

submodel independent → script inserts and modifications

XLFRTEOPTS and set this to namelist=old:buffering=disable_all

This flushes the buffers so you should always find in the output .leave file what number of land points the model says it needs.

HTH
Ros.

comment:9 Changed 11 years ago by a.elvidge

Ok, thanks, that seems to be working now. But now I have another error - a grid problem with my ancillary files. I have found a case of a similar problem in an old ticket but it didnt help me much.

ERROR!!! in reconfiguration in routine Rcf_Ancil_Atmos
Error Code:- 4
Error Message:- INANCILA:integer header error - row length
Error generated from processor 0

Thanks, Andy

comment:10 Changed 11 years ago by ros

Ok, so I missed the point where you said you were converting the tutorial job. Doh!

This error is because you're still using the vegetation ancillary files from the HadGEM3-A job which are on the wrong grid (192x145).

You'll probably be better off starting with a job that is more similar to the type you want to run, so you don't have to change everythin. I'm going to ask Grenville if he can advise.

comment:11 Changed 11 years ago by grenville

  • Owner changed from ros to grenville
  • Status changed from accepted to assigned

Andy

You mention that you are using the same domain as in one of Andrew's jobs - could you not use the entire job? I have vn7.1 LAM's that run at 12 and 4km resolution (not so close to the pole), that might be better start points than a global job.

Grenville

comment:12 Changed 11 years ago by a.elvidge

Thanks Ros.

Hi Grenville,

I used Andrew's vn6.1 job xdpxj for the domain. Since I will be using vn7, I didnt want to replicate this one. I presume the jobs you are refering to are the ones in xdpk? I will try using the 12km one tomorrow. Out of interest, what are the 'operational settings' alluded to in one of the 4km jobs?

Cheers, Andy

comment:13 Changed 11 years ago by grenville

Andy

Try taking a copy of xdpux - it makes reference to some code in my space, but that shouldn't be a problem. This job won't be optimized for your cold area, but it should run as a LAM. It's simple enough, if somewhat tedious, to transfer the physics settings from Andrew's job to this one. You should be able to use Andrew's ancillary files with this job.

Operation settings just refers to the physics settings that are used in the Met Office's operational model (for the UK).

Grenville

comment:14 Changed 11 years ago by a.elvidge

Grenville,

Using your xdpux, I am getting this error again:

INANCILA:integer header error - row length

Is this because the ancillary files are set up to match your domain?

Thanks, Andy

comment:15 Changed 11 years ago by grenville

Andy

Yes, xdpux is set up for a particular domain; the ancillary files, boundary conditions and start file are specific to that domain. You'll need to use those appropriate to your LAM. Files that ran for the vn6.1 job should be OK with this 7.1 job.

Grenville

comment:16 Changed 11 years ago by a.elvidge

Hi Grenville,

For this same job I am trying to change the start time. So, since it is a LAM, I have gone back to the global job and changed the start dump for that (using an .astart file previously used by Andrew Orr). I have run the reconfiguration on this start dump but am getting the following error:

/work/n02/n02/aelvidge/xenoa/bin/qsexecute: Error in dump reconfiguration - see OUTPUT
*

Ending script : qsexecute
Completion code : 137
Completion time : Thu Jan 7 21:49:10 GMT 2010

*

/work/n02/n02/aelvidge/xenoa/bin/qsmaster: Failed in qsexecute in model xenoa
*

Starting script : qsfinal
Starting time : Thu Jan 7 21:49:10 GMT 2010

*

/work/n02/n02/aelvidge/xenoa/bin/qsfinal: Model xenoa - Error: No history files
*

Ending script : qsfinal
Completion code : 135
Completion time : Thu Jan 7 21:49:10 GMT 2010

*

Any ideas? The job ID is xenoa

Thanks, Andy

comment:17 Changed 11 years ago by grenville

Andy

Please give me permission to see the relevant work files and home files, ie files in /work/n02/n02/aelvidge/xenoa/ (remember to make the directory executable), and /home/n02/n02/aelvidge/um/umui_out.

Grenville

comment:18 Changed 11 years ago by a.elvidge

Grenville,

Hopefully you now have permission.

Cheers, Andy

comment:19 Changed 11 years ago by grenville

Andy

I don't have permission yet

run chmod -R a+x on directories /home/n02/n02/aelvidge and /work/n02/n02/aelvedge. That should set the execute permissions on the directories so I have access.

Grenville

comment:20 Changed 11 years ago by a.elvidge

Grenville,
Ive changed my home directory ok but it seems to hang when I try chmod on the work directory. However, ls -l suggests that it might have worked all the same (at least ls -l gives the same (drwx—s—x) as with my home directory).
Cheers, Andy

comment:21 Changed 11 years ago by a.elvidge

Grenville,

You should definitely now have access to both my home and work folders.

Andy

comment:22 Changed 8 years ago by ros

  • Platform set to HECToR
  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.