Opened 4 years ago
Closed 4 years ago
#1957 closed help (fixed)
Extracting data from netCDF files
Reported by: | simon.tett | Owned by: | um_support |
---|---|---|---|
Component: | Other | Keywords: | netcdf, extraction |
Cc: | Platform: | ARCHER | |
UM Version: | 8.5 |
Description
Hi,
I am extracting data from lots of netcdf files on the ARCHER archive. These files are produced using David H's cfa library to convert field files to netCDF. Each netCDF file contains lots of diagnostics and I am using cdo to extract a single variable from multiple files into one file. This process appears to be quite slow with the timing information suggesting this is not to CPU use:
extracting toa_outgoing_longwave_flux from all files to outlw.nc
cdo select: Processed 3317760 values from 28080 variables over 120 timesteps ( 11.76s )
real 5m40.302s
user 0m9.917s
sys 0m1.860s
(In this case I am extracting data from a 120 files). Are there any optimisations I can do to speed this up?
many thanks
Simon
Attachments (1)
Change History (14)
comment:1 Changed 4 years ago by grenville
comment:2 Changed 4 years ago by simon.tett
See /nerc/n02/n02/stett2/archive/xmvpa/apy which contains a bunch of netcdf data.
Simon
comment:3 Changed 4 years ago by grenville
Please change permissions so we can read these.
Grenville
comment:4 Changed 4 years ago by simon.tett
Done!
I think..I have changed both file and dir permissions though don't understand s bit!
Simon
comment:5 Changed 4 years ago by grenville
Not quite there, please
chmod -R g+rX /nerc/n02/n02/stett2/archive/xmvpa
comment:6 Changed 4 years ago by simon.tett
Done!
Simon
comment:7 Changed 4 years ago by grenville
Hi Simon
We are experimenting on files in /nerc/n02/n02/stett2/archive/xmvpa/apy - there were 16 yesterday, but a new one has appeared which we don't have permission to read - could you make it readable for us (we are using wild cards in our tests)
Thanks
Grenville
comment:8 Changed 4 years ago by simon.tett
Hi Grenville,
trouble with giving you a live simulation… I have copied 14 of the files to /nerc/n02/n02/stett2/archive/test
I think those should work for you.
Simon
comment:9 Changed 4 years ago by grenville
Thanks
Changed 4 years ago by charles
comment:10 Changed 4 years ago by charles
Hi Simon,
I have written a python script to use cf-python to read in the 14 test files, select toa_outgoing_longwave_flux and write out a single file. The timings for the this are:
real 0m23.567s
user 0m13.249s
sys 0m10.029s
I have attached the script to this query. Please modify the paths and filenames and try the script on your files.
Thanks,
Charles
comment:11 Changed 4 years ago by charles
The timings using cdo for the same operation on the 14 test files are:
real 0m31.618s
user 0m1.212s
sys 0m0.160s
Charles
comment:12 Changed 4 years ago by simon.tett
So not a lot in it and the cdo seems to use considerably less user/sys time.
One advantage of the cf solution is that it would be easier to loop over the various things one wants to extract from the netCDF files. (This can be thought of as a monster data transposition.)
Simon
comment:13 Changed 4 years ago by ros
- Resolution set to fixed
- Status changed from new to closed
Simon
Could we try doing this with cf-python as a comparison? Please point us to the files.
Grenville