Opened 7 months ago

Closed 6 months ago

#2696 closed error (answered)

Error "Address not mapped to object" in nesting suite case studies LAM forecast

Reported by: a.elvidge Owned by: um_support
Component: UM Model Keywords:
Cc: a.elvidge@… Platform:
UM Version:

Description

Hi,

I am attempting to run a nesting suite case studies job, with a 2.2km LAM nested inside an n1280 global run. Everything runs successfully until the 2.2 km LAM reaches its second 6-hour forecast (CRUN), at which point it fails with the rather nondescript error message below. Note this is the case for all three case studies which have run - the first CRUN completes successfully, then the second one fails. The failures occur just a minute or two into each task.

[252] exceptions: An exception was raised:11 (Segmentation fault)
[252] exceptions: the exception reports the extra information: Address not mapped to object.
[252] exceptions: whilst in a serial region
[252] exceptions: Task had pid=10236 on host nid00726
[252] exceptions: Program is "/home/d00/aelvidge/cylc-run/u-bd445/share/fcm_make/build-atmos/bin/um-atmos.exe"
Warning in umPrintMgr: umPrintExceptionHandler : Handler Invoked
[252] exceptions: Data address (si_addr): 0x00000000; rip: 0x21884a05

My suite number is u-bd445.

Any help with this would be much appreciated.

Thanks, Andy

Change History (12)

comment:1 Changed 7 months ago by a.elvidge

  • Cc a.elvidge@… added

comment:2 Changed 7 months ago by grenville

Andy

its failing in rad_ctl.F90 - I'll take a local copy of your suite. Please don't change anything.

Grenville

comment:3 Changed 7 months ago by grenville

Andy

Could you put the start files somewhere accessible (not MASS)

Grenville

comment:4 Changed 7 months ago by a.elvidge

Hi Grenville,

Can you see the latest version of my suite or do I need to do an fcm commit? (It was previously failing, potentially in rad_ctl.F90, because the radiation time step was wrong - but I corrected that)

I have put one of the start dumps on Monsoon here: /home/d00/aelvidge/start_dump/20180228T0000Z_glm_t+0

Cheers, Andy

comment:5 Changed 7 months ago by grenville

Hi Andy

I can see your copy on Monsoon - I have not referred to MOSRS. Thanks for the start file.

Grenville

comment:6 Changed 7 months ago by grenville

Andy

We've not forgotten about this - I can reproduce the problem but don't yet understand what's causing it.

Grenville

comment:7 Changed 7 months ago by a.elvidge

Hi Grenville,

Thanks for the update.
This job is very important as the simulation will cover a recent Arctic field campaign - the data output will be used by many researchers working on this project. I had hoped to have the job up and running over Christmas so that the output would be ready for all in the new year. But from what you say it's looking like this might be touch and go..? Don't suppose there's been any further news?

Thanks again for you help with this.

Andy

comment:8 Changed 7 months ago by simon

Hi Andy,

I may have a clue as to what's happening. The model is failing when it tries to process to TOA SW diagnostic trying to access a non-existent pointer. This diagnstic has T20MIN time process set in STASH which writes out instantaneous fields every 20 minutes starting 360 minutes (ie 6 hours) into the run. The radiation t/s is 15 minutes. For the radiation the STASH request and the radiation timestep have to coincide. The model fails after 6 hours 20 minutes, when it tries to sample the radiation, but it isn't a radiation t/s and the variable isn't defined.

Anyway, try changing ifre to 15 in T20MIN and try again.

comment:9 Changed 7 months ago by a.elvidge

Hi Simon,

Thanks for this.
I've changed the time profile to output every 30 minutes and have just re-started the job.
Fingers crossed this sorts - will let you know.

Cheers, Andy

comment:10 Changed 6 months ago by a.elvidge

Hi Simon,

This did indeed fix the problem, thanks very much.
The job ran and seemed to complete successfully.

However, I noticed that, in addition to the STASH output I requested, a whole bunch of additional stash files have been output to the MASS archive. I don't want this data clogging up the project space, so would like to delete it. However, I can't work out how to! It says I don't have permission to delete. Are you able to advise on how I go about doing deleting this data?

Thanks, Andy

comment:11 Changed 6 months ago by ros

Hi Andy,

You will need to contact the Met Office person that owns the project data to ask them to delete it. Non-Met Office people cannot delete data from MASS.

Cheers,
Ros.

comment:12 Changed 6 months ago by grenville

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.