"Metasplit" or "stashsplit" format
"Metasplit" or "stashsplit" (the two names are equivalent)
is a scheme implemented for PV-WAVE/IDL which stores PP data in
small PP files with systematic names indicating their contents.
The files are organised into metasplit (stashsplit) directories,
typically with separate directories for different UM experiments,
perhaps distinguishing by meaning period as well.
The main advantages of storing the fields in metasplit directories are:
The files in metasplit directories are typically numerous and with long
names, so they are inconvenient for humans to handle.
Tools are provided for accessing and manipulating the
contents of the directories.
You are recommended not to move, rename etc. the files manually.
This is possible, but it requires understanding of how the metasplit
which is not generally necessary.
- It makes it easy to organise your data.
Suppose you are analysing a UM experiment.
The data you want to look at may come from several different UM PP datasets.
Possibly you might fetch a certain set of quantities from these datasets
and then later decide you need some more quantities from the same datasets.
One way or another, you can end up with a collection of PP files to keep
track of on your local disk space.
With the metasplit scheme,
you can put all the data from these files into a single
directory, which you can then access in PV-WAVE/IDL as a unit.
You can regard it as a pool of fields,
and you do not have to be concerned about how it is arranged.
- Access can be quicker than to large PP files.
This is because metasplit directories have a kind of index to the fields
they contain (the pph file),
so the PV-WAVE/IDL software does not have to read all the PP files to find
out what's in them.
How to access data in metasplit directories
In PV-WAVE/IDL, use ss_assoc to "associate" the directory,
just as you would use pp_assoc for an ordinary PP file.
This is analogous to "opening" the file on a logical unit in PV-WAVE/IDL
or Fortran. E.g.
% SS_ASSOC: Associated 3120 fields from /data/hcmim1/hadsa/aatza
Subsequently you can get fields from the directory using ppa
How to put data into metasplit directories
The following can be used to put fields
into metasplit directories:
It is often convenient to separate data from a given UM job into
separate directories according to meaning period.
For instance, monthly data from job aatza may be stored in
directory aatza.000001, seasonal data in aatza.000003,
annual in aatza.000100 and decadal in aatza.001000
(the suffix is of the form YYYMM).
To have data delivered in this way by query_masscam or
specify option -partition.
With parmah or pariah,
use -H ss_partition instead of -H makepph.
- In the Met Office, if you fetch UM data from archive using query_masscam
it will be delivered to the HP workstation system in metasplit format,
stored in a separate directory for each UM job (with its five-letter name).
- In the Met Office, if you fetch UM data using the Unix commands
parmah or pariah,
specify options "-h wavemetasplit -H makepph"
to have the data delivered in metasplit format,
again in separate directories for different UM jobs.
- To convert existing PP files to metasplit format,
use the Unix command wavemetasplit
or the PV-WAVE/IDL command metasplit.
When you have stored all the data you want in the metasplit directory,
run the Unix command makepph
on it, or use the PV-WAVE/IDL command makepph.
makepph (by either means)
must be repeated whenever files are added to the directory.
It is sometimes important to be aware that each
three-dimensional UM field
(i.e. fields on several vertical levels)
is stored in a single file by default;
it is not split up into its separate levels.
This can lead to problems with fetching data from archive at the Met Office.
If you originally fetch just one level, say,
and later decide you need all the others too,
a new file will be created with these new levels,
but it will overwrite the original file.
Hence you will have lost the original data.
This is particularly a problem with query_camelot,
which by default fetches only the data which is not already stored on HP;
you may end up repeatedly fetching alternate selections of levels.
To get round this problem, you could try any of these:
The problem is also avoided by using a
which splits up fields by level,
for instance pp_code2fn3.
- When fetching data from archive,
run query_masscam or query_camelot
with -nodataline to fetch the
requested data regardless of whether it exists on HP already.
- Delete the original levels (with
ss_rm) and fetch them again
along with the others.
- Rename the original directory, fetch the new data into a separate
directory, then merge the two (with
How to manage fields in metasplit directories
The following tools are available. Except for ss_partition,
they are PV-WAVE/IDL utilities:
- To split up a metasplit directory into directories suffixed by meaning
period (see above),
use the Unix command ss_partition.
- To move all the fields or a selection of fields
from one directory to another, use ss_mv.
If used on large directories,
ss_mv can take many minutes,
because it may have to read and write a lot of data.
This works even if the directories have different metafunctions.
It can hence be used for reprocessing the directory to change its metafunction.
A potential problem arises if the target directory contains files in which
fields from the source directory need to be stored.
Use the /replace keyword in this case
if you are confident that overwriting these files will not lose any data,
because all the fields in the target files concerned
are also in the source directory.
If this may not be the case, use ss_merge
which is slower, but safe.
- To copy all the fields or a selection of fields
from one directory to another, use ss_cp.
The same comments apply as for ss_mv.
- To delete all the fields or a selection of fields
from a directory, use ss_rm.
To merge metasplit directories together, if there may be files with the
same names but different contents in the various directories
(seess_mv), use ss_merge.
This may be slow because it has to rewrite all the data from the
How metasplit works
Metasplit directories contain a file named pph,
which is a concatenation of the headers only of all the PP fields in the
The pph file is made automatically by most utilities which change
the contents of the directory
(but you have to remember to run makepph
if you use metasplit).
ss_assoc reads the pph file to find out what's in
the directory, and it holds this information in memory.
When you ask for a field through ppa,
it first refers to the header information in memory to decide
whether the field exists in the directory.
If the field exists, ppa then has to fetch it from
the appropriate PP file.
Metasplit works by defining a mapping between a PP field header
and the name of the file in which that field is stored.
The routine which performs this translation is called the metafunction
of the directory.
For directories created with metasplit defaults,
the metafunction is pp_ss_basename,
which generates filenames that depend on time, meaning period, submodel,
stash code and processing code.
ppa uses the metafunction to work out where to find the requested
To save further time, the pph file also records the location within
the file at which the field will be found,
so ppa can go straight there and does not have to scan the file
looking for the field.
When fields are written to metasplit directories,
the metafunction is used similarly to derive the names of the files they
should be stored in.
The metafunction for the directory must be a PV-WAVE/IDL function in your
and its name is stored in the file called metafunction
in the metasplit directory.
If there is no metafunction file,
the directory uses the original "stashsplit" naming convention,
defined by short_pp_ss_fn.
This scheme is no longer used by default because it does not refer to
the submodel number, introduced at version 4.1 of the UM.
The new default scheme should be adequate for UM data.
It may not be adequate for data you have created yourself, however,
for instance if the fields do not have stashcodes.
The important thing is that the metafunction must give different names
for fields which may be stored by separate operations.
A number of alternatives are already available,
of which the most flexible general-purpose choice is pp_code2fn3,
which distinguishes files on the basis of some additional information
including the PP field code.
If this is not suitable, you can define any convention you like by writing
a PV-WAVE/IDL function
which returns a vector of file basenames given a vector of
PP fields as argument,
and putting the function in a directory in your WAVE_PATH.
A non-default metafunction must be specified at the time the
metasplit directory is created.
Usually the directory is created automatically when data is first
put in it (this is what metasplit does, for instance).
To get round this, you can create
the directory explicitly in advance, using ss_mkdir.
metasplit and other routines also have keywords allowing you to
specify the metafunction for a newly created directory.
To change the metafunction of a directory with data in it,
you cannot just alter the metafunction file
(unless you are absolutely certain that the arrangement of existing
fields in files will be the same for the two metafunctions).
Use ss_mv to reprocess the directory.