Opened 5 months ago

Closed 5 months ago

#3383 closed help (answered)

supermeans Python2.7 code crashing with iris and biggus on JASMIN short-serial

Reported by: pmcguire Owned by: um_support
Component: Data Keywords: supermeans, Python2.7, AutoAssess, iris, biggus, JASMIN, SLURM, short-serial
Cc: Platform: JASMIN
UM Version:

Description (last modified by pmcguire)

Hi CMS Helpdesk:
These lines of ~pmcguire/autoassess/autoassess7a1a/supermean7e1.py which is called by ~pmcguire/autoassess/autoassess7a1a/submit-supermean7e3mam.scr uses iris in the short-serial SLURM queue on JASMIN. The error message below shows that iris is using biggus which is threaded, which is not allowed on short-serial, giving intermittent errors after 10-20 hours of wallclock time. What should/could I do? I have my processing for AutoAssess setup to do 2x35x5=350 runs of these supermeans calculations (one per each season, one for each year, one for each run).
Patrick

   for i in range(len(vars)):
        foo[i] = vars[i].collapsed('time',iris.analysis.MEAN)
    iris.save(foo,out_dir+'/'+runid_short+'a.m'+supermeanlabel+str(year)+time+'.pp')

produces this error in ~pmcguire/autoassess/autoassess7a1a/supermean7e3mam.16310412_29.err

File "supermean7e1.py", line 136, in <module>

iris.save(foo,out_dir+'/'+runid_short+'a.m'+supermeanlabel+str(year)+time+'.pp')

File "/apps/contrib/jaspy/miniconda_envs/jaspy2.7/m2-4.6.14/envs/jaspy2.7-m2-4.6.14-r20190715/lib/python2.7/site-packages/iris/io/init.py", line 422, in save

saver(cube, target, kwargs)

File "/apps/contrib/jaspy/miniconda_envs/jaspy2.7/m2-4.6.14/envs/jaspy2.7-m2-4.6.14-r20190715/lib/python2.7/site-packages/iris/fileformats/pp.py", line 2309, in save

save_fields(fields, target, append=append)

File "/apps/contrib/jaspy/miniconda_envs/jaspy2.7/m2-4.6.14/envs/jaspy2.7-m2-4.6.14-r20190715/lib/python2.7/site-packages/iris/fileformats/pp.py", line 2558, in save_fields

pp_field.save(pp_file)

File "/apps/contrib/jaspy/miniconda_envs/jaspy2.7/m2-4.6.14/envs/jaspy2.7-m2-4.6.14-r20190715/lib/python2.7/site-packages/iris/fileformats/pp.py", line 1387, in save

data = self.data

File "/apps/contrib/jaspy/miniconda_envs/jaspy2.7/m2-4.6.14/envs/jaspy2.7-m2-4.6.14-r20190715/lib/python2.7/site-packages/iris/fileformats/pp.py", line 1290, in data

data = self._data.masked_array()

File "/apps/contrib/jaspy/miniconda_envs/jaspy2.7/m2-4.6.14/envs/jaspy2.7-m2-4.6.14-r20190715/lib/python2.7/site-packages/biggus/_init.py", line 2619, in masked_array

result, = biggus.engine.masked_arrays(self)

File "/apps/contrib/jaspy/miniconda_envs/jaspy2.7/m2-4.6.14/envs/jaspy2.7-m2-4.6.14-r20190715/lib/python2.7/site-packages/biggus/_init.py", line 437, in masked_arrays

return self._evaluate(arrays, True)

File "/apps/contrib/jaspy/miniconda_envs/jaspy2.7/m2-4.6.14/envs/jaspy2.7-m2-4.6.14-r20190715/lib/python2.7/site-packages/biggus/_init.py", line 431, in _evaluate

ndarrays = group.evaluate(masked)

File "/apps/contrib/jaspy/miniconda_envs/jaspy2.7/m2-4.6.14/envs/jaspy2.7-m2-4.6.14-r20190715/lib/python2.7/site-packages/biggus/_init.py", line 409, in evaluate

node.thread()

File "/apps/contrib/jaspy/miniconda_envs/jaspy2.7/m2-4.6.14/envs/jaspy2.7-m2-4.6.14-r20190715/lib/python2.7/site-packages/biggus/_init.py", line 135, in thread

thread.start()

File "/apps/contrib/jaspy/miniconda_envs/jaspy2.7/m2-4.6.14/envs/jaspy2.7-m2-4.6.14-r20190715/lib/python2.7/threading.py", line 736, in start

_start_new_thread(self.bootstrap, ())

thread.error: can't start new thread

Change History (4)

comment:1 Changed 5 months ago by pmcguire

  • Description modified (diff)

comment:2 Changed 5 months ago by pmcguire

I figured out that the jaspy/2.7 module (used with import module jaspy/2.7) uses iris 1.13, which was before the upgrade from biggus to dask.

There seem to be ways (if needed) to disable the (parallel) threading, with dask and the lazy arrays.

So, I ported the supermean7e1.py from python2.7 to python3.7 as supermean8e1.py, in order to use iris 2.2 (which has dask instead of biggus), as iris 2.2 is installed automatically with import module jaspy.

I am testing the new code now.
Patrick

comment:3 Changed 5 months ago by pmcguire

I did indeed get the supermeans working better in the short-serial SLURM queue on JASMIN, at least in the preliminary testing I did. I did 2x35x5 = num_runs x num_years x num_seasons = 350 short-serial runs (using job arrays). And the runs took maybe of order 1 hour or so, and they didn't hang or crash. Before, the runs were taking of order 10-20 hours for 1 year and 1 season.

The main thing I saw before that led me to being able to fix this is that most of the supermeans runs worked ok, but a few didn't. One of the few that didn't had an error message that it was trying (and failing) to start a new thread with biggus. As you may know, biggus was used by iris before iris version 2.0 for lazy arrays (that aren't all in memory at the same time). For a short-serial queue, maybe it couldn't handle starting a new thread.

I decided to upgrade to a later version of iris (2.2) that used dask instead of biggus for lazy arrays, partly because I saw that there was a way to turn off parallel threading in dask, but partly since it is more modern and better supported and probably faster and more robust. Rather than installing a special conda environment that had iris2.2, I opted to use 'module load jaspy' instead of 'module load jaspy/2.7', since the normal version of jaspy uses iris 2.2 instead of the iris 1.13 which is what jaspy/2.7 uses. This meant that I needed to use python3.7 instead of python2.7, which wasn't hard to arrange.
Patrick

comment:4 Changed 5 months ago by pmcguire

  • Resolution set to answered
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.