Opened 13 years ago

Closed 13 years ago

#197 closed help (fixed)

Run on Hector crashes at 263 timesteps

Reported by: swr06rjk Owned by: willie
Component: UM Model Keywords: HECToR
Cc: w.mcginty@… Platform:
UM Version: 6.1


I'm trying to run job xdofb on Hector, and it crashes with the following message in the .leave file:

" Segmentation fault! Fault address: (nil)

This is likely to have been caused by either a null pointer dereference or a general protection fault.
_pmii_daemon(SIGCHLD): PE 7 exit signal Aborted"

The output from node 7 just stops at 263 timesteps. If I ask the model to run for 262 timesteps it runs fine without any problems.

I had the job working on HPCx, as xdfpi, so I would have expected it to work OK on Hector.

Change History (5)

comment:1 Changed 13 years ago by willie

  • Owner changed from um_support to willie
  • Status changed from new to assigned

OK I'll take a look at this

comment:2 Changed 13 years ago by lois

  • Cc w.mcginty@… added

Hello Richard, it looks as though you don't have the minimum mods needed to run on HECToR. This set of mods is

script mods : $PUM_MODS61/
reconfiguration/model mods : $PUM_MODS61/


If you include these mods and try running it again, hopefully all should be solved


comment:3 Changed 13 years ago by willie


I note that you have specified 14 boundary layer levels in the vertical, but in the ratio table there are only 13 entries. See Atmos > Scentific > Scetion by Section > Boudary layers.

Do a check setup in the UMUI reveals a few problems. These should be eliminated before doing a run.

Let me know if this solves the problem.


comment:4 Changed 13 years ago by willie


I've now run your job for over 50,000 times steps. The script update should only appear in the script sections and not in the modsets for reconfiguration or the model. I have not found any duplicate Fortran files and I have run the reconfiguration and model sequentially in one run. The model does crash however and there is one type of error "error halo_j too small 3" which occurs three times. This looks like a science issue.


comment:5 Changed 13 years ago by willie

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.