#2492 closed help (duplicate)

Error in buffin errorCode= 3.

Reported by: ggxmy Owned by: um_support
Component: UM Model Keywords: buffin
Cc: g.w.mann@…, j.marsham@… Platform: ARCHER
UM Version: 8.2

Description

Hi. Now I'm trying to run my UM vn8.2 limited area job tewnb, which I created based on xlhub. It crashes after a minute of submission and /home/n02/n02/masara/output/tewnb000.tewnb.d18163.t093415.leave contains the following information. Near the top are messages like these;

???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
? Error in routine: mppio:buffinApplication 31111189 is crashing. ATP analysis proceeding...

? Error Code:    24
? Error Message:  Error in buffin errorCode= 3.  len= 256 / 256
? Error generated from processor:     0
? This run generated   2 warnings
????????????????????????????????????????????????????????????????????????????????

Rank 0 [Tue Jun 12 09:37:44 2018] [c4-0c1s4n2] application called MPI_Abort(comm=0xC4000001, 9) - process 0

ATP Stack walkback for Rank 0 starting:
  [empty]@0x7ffff961f3a7
  in_bound_@in_bound.f90:1910
  inbounda_@inbounda.f90:4689
  read_flh_@read_flh.f90:61
  buffin64_i$io_@io.f90:1514
  mppio_ereport$io_@io.f90:450
  ereport64$ereport_mod_@ereport_mod.f90:102
  gc_abort_@gc_abort.F90:137
  mpl_abort_@mpl_abort.F90:46
  pmpi_abort_@0x10b6e3c
  PMPI_Abort@0x10d572c
  MPID_Abort@0x10fd5e1
  abort@abort.c:92
  raise@pt-raise.c:42
ATP Stack walkback for Rank 0 done
Process died with signal 6: 'Aborted'
Forcing core dumps of ranks 0, 1, 84, 24
View application merged backtrace tree with: stat-view atpMergedBT.dot
You may need to: module load stat

_pmiu_daemon(SIGCHLD): [NID 00864] [c4-0c1s8n0] [Tue Jun 12 09:37:50 2018] PE RANK 25 exit signal Killed
_pmiu_daemon(SIGCHLD): [NID 00850] [c4-0c1s4n2] [Tue Jun 12 09:37:50 2018] PE RANK 13 exit signal Killed
_pmiu_daemon(SIGCHLD): [NID 00859] [c4-0c1s6n3] [Tue Jun 12 09:37:50 2018] PE RANK 103 exit signal Killed
[NID 00850] 2018-06-12 09:37:50 Apid 31111189: initiated application termination
_pmiu_daemon(SIGCHLD): [NID 00857] [c4-0c1s6n1] [Tue Jun 12 09:37:50 2018] PE RANK 63 exit signal Killed
tewnb: Run failed
*****************************************************************
   Ending script   :   qsatmos
   Completion code :   137
   Completion time :   Tue Jun 12 09:37:56 BST 2018
*****************************************************************

/work/n02/n02/masara/um/tewnb/bin/qsmaster: Failed in qsatmos in job tewnb
***************************************************************
   Starting script :   qsfinal
   Starting time   :   Tue Jun 12 09:37:56 BST 2018
***************************************************************

 STOP  
/work/n02/n02/masara/um/tewnb/bin/qshistprint: Job terminated normally
/work/n02/n02/masara/um/tewnb/bin/qsresubmit: No resubmit requested
*****************************************************************
   Ending script   :   qsfinal
   Completion code :   0
   Completion time :   Tue Jun 12 09:37:56 BST 2018
*****************************************************************

/work/n02/n02/masara/um/tewnb/bin/qsmaster: Failed in qsfinal in job tewnb
 <<<< Information about How Many Lines of Output follow >>>>
 38  lines in main OUTPUT file.
 1537 lines of O/P from pe0.
 <<<<         Lines of Output Information ends          >>>>

And near the bottom are these messages;

OPEN:  File /work/n02/n02/masara/xklhf_makebc/xklhf_1.lbc to be Opened on Unit 125 does not Exist
OPEN:  **WARNING: FILE NOT FOUND
OPEN:  Ignored Request to Open File /work/n02/n02/masara/xklhf_makebc/xklhf_1.lbc for Reading
 ****************** IO Error Report ***********************************
Unit Generating error=  125
---File States --------------------------
Unit  30 open on filename /work/n02/n02/masara/ancils/vn8.2/cascade_12km/qrclim.ozone_L70_O70
  --> Opened from environment variable:OZONE
   --> Read Only:  T  Local:  T  AllLocal:  F  Remote:  F  Broadcast:  T
Unit  35 open on filename /work/n02/n02/masara/ancils/vn8.2/cascade_12km/qrclim.sst
  --> Opened from environment variable:SSTIN
   --> Read Only:  T  Local:  T  AllLocal:  F  Remote:  F  Broadcast:  T
Unit 135 open on filename /work/n02/n02/masara/ancils/vn8.2/cascade_12km/qrparm.veg.frac_hswd
  --> Opened from environment variable:FRACINIT
   --> Read Only:  T  Local:  T  AllLocal:  F  Remote:  F  Broadcast:  T
Unit 136 open on filename /work/n02/n02/masara/ancils/vn8.2/cascade_12km/qrparm.veg.func_hswd
  --> Opened from environment variable:VEGINIT
   --> Read Only:  T  Local:  T  AllLocal:  F  Remote:  F  Broadcast:  T
Unit 154 open on filename /work/n02/n02/masara/ancils/vn8.2/cascade_12km/qrclim.biog70
  --> Opened from environment variable:ARCLBIOG
   --> Read Only:  T  Local:  T  AllLocal:  F  Remote:  F  Broadcast:  T
Unit 155 open on filename /work/n02/n02/masara/ancils/vn8.2/cascade_12km/qrclim.biom70
  --> Opened from environment variable:ARCLBIOM
   --> Read Only:  T  Local:  T  AllLocal:  F  Remote:  F  Broadcast:  T
Unit 156 open on filename /work/n02/n02/masara/ancils/vn8.2/cascade_12km/qrclim.blck70
  --> Opened from environment variable:ARCLBLCK
   --> Read Only:  T  Local:  T  AllLocal:  F  Remote:  F  Broadcast:  T
Unit 157 open on filename /work/n02/n02/masara/ancils/vn8.2/cascade_12km/qrclim.sslt70
  --> Opened from environment variable:ARCLSSLT
   --> Read Only:  T  Local:  T  AllLocal:  F  Remote:  F  Broadcast:  T
Unit 158 open on filename /work/n02/n02/masara/ancils/vn8.2/cascade_12km/qrclim.sulp70
  --> Opened from environment variable:ARCLSULP
   --> Read Only:  T  Local:  T  AllLocal:  F  Remote:  F  Broadcast:  T
Unit 160 open on filename /work/n02/n02/masara/ancils/vn8.2/cascade_12km/qrclim.ocff70
  --> Opened from environment variable:ARCLOCFF
   --> Read Only:  T  Local:  T  AllLocal:  F  Remote:  F  Broadcast:  T
---End File States ----------------------

???!!!???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!???!!!?
? Error in routine: mppio:buffin
? Error Code:    24
? Error Message:  Error in buffin errorCode= 3.  len= 256 / 256
? Error generated from processor:     0
? This run generated   2 warnings
????????????????????????????????????????????????????????????????????????????????

So it looks like it is complaining about these ancillary data. Can you see a problem with the ancillaries? I copied these from /nerc/n02/n02-SWAMMA/wmcginty/ancil.vn8.2/cascade_12km/ and appear to me to be the same (in terms of file names, sizes and permissions). Please could I have an advice on this.

Thanks,
Masaru

Change History (1)

comment:1 Changed 17 months ago by willie

  • Resolution set to duplicate
  • Status changed from new to closed

Closed as duplicate of #2493

Note: See TracTickets for help on using tickets.