Opened 2 years ago

Closed 2 years ago

#2078 closed help (fixed)

qsserver failure; /nerc Read-only file system

Reported by: s1016670 Owned by: willie
Component: UM Model Keywords: Post Processing,SSH
Cc: Platform: ARCHER
UM Version: 7.3

Description

Hi,

I've been trying to get this job xneda running on ARCHER. It is a copy of xmslo, a job which ran successfully for 2 years. I've changed it to archive to NERC disk, and I've changed the ozone ancillary to a daily one. After 20 days of running xneda, it crashes and I get an error message at the top of the leave file something like:

mkdir: cannot create directory `/nerc': Read-only file system
cp: cannot create regular file `/nerc/n02/n02/s1016670/archive/xneda': No such file or directory
qsexecute: Copying /work/n02/n02/s1016670/um/xneda/xneda.thist to backup thist file /work/n02/n02/s1016670/um/xneda/xneda.thist_keep

Further down the leave file, I encounter:

Data successfully written
34448492 words written to unit 22
(Model data)
MEANCTL: * Called in ATMOSPHERIC mode *
Period_1 data read from unit number 24
Period_1 data written to unit number 23
EXITCHK: Request to stop model run received

qsserver failure at Thu Feb 2 21:06:56 GMT 2017

I've seen similar issues in previous tickets, and it's not entirely clear what the solution to this/these problems was. Apologies if I've missed it somewhere.

Cameron

Change History (8)

comment:1 Changed 2 years ago by willie

Hi Cameron,

Can you check that you have write permission to the archive directory in /nerc/n02/n02/s1016670. Try

ls -la /nerc/n02/n02/s1016670

and let me know the results.

Regards
Willie

comment:2 Changed 2 years ago by s1016670

Hi Willie,

I know I have write permission because I made the directory for the jobid a couple days ago after my first failed attempt. This is what comes up with an ls -la :

drwxr-sr-x 5 s1016670 n02 4096 Feb 2 15:15 xneda

Cameron

comment:3 Changed 2 years ago by s1016670

oh that was the archive directory…

drwx—S—- 7 s1016670 n02 32768 Sep 23 2015 .
drwxr-x—- 708 root n02 32768 Feb 1 12:49 ..
drwxr-sr-x 35 s1016670 n02 32768 Feb 1 18:27 archive
-rw———- 1 s1016670 n02 602 Oct 29 2015 .bash_history
-rwx—S—- 1 s1016670 n02 18 Oct 8 2014 .bash_logout
-rwx—S—- 1 s1016670 n02 176 Oct 8 2014 .bash_profile
-rwx—S—- 1 s1016670 n02 124 Oct 8 2014 .bashrc
-rwxr-xr-x 1 s1016670 n02 8503441 Sep 23 2015 convsh
drwxr-sr-x 2 s1016670 n02 32768 Sep 16 2015 hold
drwxr-sr-x 2 s1016670 n02 32768 Sep 16 2015 hold2
drwxr-sr-x 9 s1016670 n02 32768 Sep 16 2015 nc
drwxr-sr-x 2 s1016670 n02 32768 Apr 20 2015 .ssh
-rw———- 1 s1016670 n02 1341 Sep 16 2015 .viminfo
-rw———- 1 s1016670 n02 110 Sep 16 2015 .Xauthority

This is what I get

Cameron

comment:4 Changed 2 years ago by willie

Thanks Cameron, Did it actually archive xnedaa.pak0cb0? It seems the delete has failed.

Willie

comment:5 Changed 2 years ago by s1016670

Hi Willie,

Nothing has been archived, the nerc directory is empty.

Cameron

comment:6 Changed 2 years ago by willie

  • Keywords Post Processing,SSH added
  • Owner changed from um_support to willie
  • Status changed from new to accepted

Hi Cameron,

I have updated the post processing branch VN73_HadGEM3-A_r2.0_hector_monsoon_archiving (to rev 21776). If you build your code again with this revision or later the archiving will work.

You could switch off the handedit for IBM as this is not necessary on ARCHER.

Regards
Willie

comment:7 Changed 2 years ago by s1016670

Hi Willie,

Thanks for this, will give it a try on later simulations, as I have almost completed this integration without archiving to /nerc.

Would I expect bit-reproducible output if I rerun with the change in branch revision #/ IBM hand-edit change?

Cameron

comment:8 Changed 2 years ago by willie

  • Resolution set to fixed
  • Status changed from accepted to closed
Note: See TracTickets for help on using tickets.