Opened 9 years ago

Closed 8 years ago

#513 closed help (fixed)

HadAM3 job running, but no data being output

Reported by: eead Owned by: lois
Component: UM Model Keywords:
Cc: a.m.dolan08@… Platform:
UM Version: 4.5

Description

Hi,
I am having some trouble with a HadAM3 run on Hector (tcyya). I am trying to continue a previous run (tctsz) but with modified ancil files. As far as I can tell the ancil files look o.k.

I can get a .astart file to configure and then the job will run, but no data is being output to datam. I have looked through the attached .leave file, but I cannot see anything that would be causing this.

I am sorry if this is a trivial problem, but I have been tweaking this run for a while now trying to get it to work.

Thanks,
Aisling

Change History (11)

comment:1 Changed 9 years ago by eead

Hi sorry,

I have just tried to attach the .leave file, but it is too big. The file can be found at /home/n02/n02/eead/um/umui_out/tcyya000.tcyya.d10278.t112536.comp.leave

Thanks,
Aisling

comment:2 Changed 9 years ago by lois

  • Owner changed from um_support to lois
  • Status changed from new to assigned

Could you change the file permissions Aisling, for the .leave file and all the directories to get to it?

Thanks

Lois

comment:3 Changed 9 years ago by eead

Hi Lois,

Sorry about that, you should be able to look at the .leave file now.
Thanks,
Aisling

comment:4 Changed 9 years ago by lois

Could you run the following commands Aisling as I don't think the directory permsiions are right

chmod -R g+rX /home/n02/n02/eead
chmod -R g+rX /work/n02/n02/eead

Then I can look and see what the problem might be.

Thanks
Lois

comment:5 Changed 9 years ago by eead

Hi Lois,

I have done this now, so hopefully it should be readable. Sorry!

Thanks, Aisling

comment:6 Changed 9 years ago by eead

Hi Lois,

I have just tried running tcyya again, but I changed the stash to output after every time step. I thought the model might have got into an infinite loop, which was why it was running, but not outputting any data. In doing this I got a new .leave file which has the following error in it:

Segmentation fault! Fault address: 0x7ffeb4d1fe60

Fault address is 5291195872 bytes below the nearest valid
mapping boundary, which is at 0x7ffffff64000.

This is likely to have been caused by a stack overflow.
Use your shell's ulimit or limit command to see if your
stack size limit is too small.

This is similar to the problem highlighted in another ticket: http://ncas-cms.nerc.ac.uk/trac/UMHelpdesk/ticket/489 - but that doesn't seem to have been resolved.

Thanks,
Aisling

Also, I don't receive any email notification when you reply on this helpdesk (not sure why - or if I should even), but I just thought I would let you know.

comment:7 Changed 9 years ago by lois

  • Cc a.m.dolan08@… added

Hello,

I copied your job and tried to run on a few more cores and it still fails with the message
section 3 item 217 unsupported processing code.

So in your job your STASH is trying to produce time series rather than in tctsz which is producing time means. Before I begin to investigate how to fix this can I just check that you really want time series rather than time means?

Thanks

Lois

I suspect that your email address in the helpdesk is not correctly set. Log onto thhe help desk and click on preferences, check the email address and correct it if it is wrong. This hopefully explians your lack of emails from the helpdesk.

comment:8 Changed 9 years ago by eead

Hi Lois,

I actually want time means, not a time series. I must have put that on in trying to output after everytime step to see if the job was caught in a loop. Basically, the job should be identical to tctsz apart from the updated ancil files.

Thanks for your help.

Aisling

comment:9 Changed 9 years ago by lois

Hello Aisling

I can get your job to run with the time means rather than the time series however it is running extraordinarily slowly. So I am investigating what might be causing this, hopefully I will have an answer soon.

Lois

comment:10 Changed 9 years ago by lois

Your job is blowing up and then hanging Aisling, I don't know why yet so I need to investigate further.

I am off to the HECToR User Metting tomorrow so will be back on the case on Wednesday.

Lois

comment:11 Changed 8 years ago by ros

  • Resolution set to fixed
  • Status changed from assigned to closed
  • UM Version changed from <select version> to 4.5

Since this query was opened Lois has left and so unfortunately the solution to this query cannot be documented.

Note: See TracTickets for help on using tickets.