#1732 closed help (answered)

xconv still segfaulting

Reported by: iwi Owned by: jeff
Priority: normal Component: UM Tools
Keywords: Cc:
Platform: Other UM Version: <select version>

Description

After it seemed that upgrading xconv to 1.93 fixes segfaults when opening some PP files from Met Office NWP, I have a report from a user regarding some cases where it is still segfaulting with 1.93. They are large files, but a test case with a much smaller, subsetted file produces similar issues.

Please see ~iwi/for-jeff/test_xconv.pp on oak, which contains the first 37 PP records from one of the problematic files. It opens okay into cf-python (on another system - I couldn't find cf-python on oak):

>>> cf.read("test-xconv.pp")
[<CF Field: eastward_wind(model_level_number(25), latitude(1152), longitude(1536)) m s-1>,
 <CF Field: eastward_wind(model_level_number(2), time(3), latitude(1152), longitude(1536)) m s-1>,
 <CF Field: northward_wind(time(2), model_level_number(2), latitude(1153), longitude(1536)) m s-1>,
 <CF Field: UM_m01s01i230_vn805(latitude(1152), longitude(1536)) >,
 <CF Field: UM_m01s01i231_vn805(latitude(1152), longitude(1536)) >]

In xconv, it segfaults - and not only that, but it does not do the same thing consistently. On oak, it sometimes segfaults without a stack trace and sometimes with a stack trace. On another system (on JASMIN at CEDA) it varies about whether it segfaults on opening or on exit, and on whether there is a stack trace or not; and sometimes it opens but the information in the field list is corrupted (see screenshots). Something nasty seems to be happening with uninitialised values.

Please can you take a look.

Thanks,
Alan

Attachments (2)

xconv.png (32.5 KB) - added by iwi 20 months ago.
xconv2.png (32.8 KB) - added by iwi 20 months ago.

Download all attachments as: .zip

Change History (6)

Changed 20 months ago by iwi

Changed 20 months ago by iwi

comment:1 Changed 20 months ago by iwi

Example with stack trace:

[iwi@jasmin-sci1 gws]$ ~/xconv -i test-xconv.pp 
*** glibc detected *** /home/users/iwi/xconv: free(): invalid next size (fast): 0x0000000002e6f380 ***
======= Backtrace: =========
/lib64/libc.so.6[0x34d8875e66]
/lib64/libc.so.6[0x34d88789ba]
/tmp/tcl_kAL4uo(free_x+0xe)[0x7f85f3338c30]
/tmp/tcl_kAL4uo(freexhead+0x4e)[0x7f85f32ef6de]
/tmp/tcl_kAL4uo(freehead+0x2a)[0x7f85f32f1aeb]
/tmp/tcl_kAL4uo(freeallhead+0x12)[0x7f85f32f2189]
/tmp/tcl_kAL4uo(DeleteHead+0xc)[0x7f85f3358b31]
/home/users/iwi/xconv[0x4cba80]
/home/users/iwi/xconv[0x4cbf01]
/home/users/iwi/xconv[0x45de57]
/home/users/iwi/xconv[0x44e967]
/home/users/iwi/xconv[0x502ab8]
/home/users/iwi/xconv[0x406d00]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x34d881ed5d]
/home/users/iwi/xconv(sinh+0x61)[0x405479]
======= Memory map: ========
00400000-005c9000 r-xp 00000000 00:17 2421340960                         /home/users/iwi/xconv
007c9000-007d7000 r--p 001c9000 00:17 2421340960                         /home/users/iwi/xconv
007d7000-007db000 rw-p 001d7000 00:17 2421340960                         /home/users/iwi/xconv
007db000-007ec000 rw-p 00000000 00:00 0 
0252d000-02f13000 rw-p 00000000 00:00 0                                  [heap]
34d8400000-34d8420000 r-xp 00000000 08:02 720903                         /lib64/ld-2.12.so
34d861f000-34d8620000 r--p 0001f000 08:02 720903                         /lib64/ld-2.12.so
34d8620000-34d8621000 rw-p 00020000 08:02 720903                         /lib64/ld-2.12.so
34d8621000-34d8622000 rw-p 00000000 00:00 0 
34d8800000-34d898a000 r-xp 00000000 08:02 720907                         /lib64/libc-2.12.so
34d898a000-34d8b8a000 ---p 0018a000 08:02 720907                         /lib64/libc-2.12.so
34d8b8a000-34d8b8e000 r--p 0018a000 08:02 720907                         /lib64/libc-2.12.so
34d8b8e000-34d8b8f000 rw-p 0018e000 08:02 720907                         /lib64/libc-2.12.so
34d8b8f000-34d8b94000 rw-p 00000000 00:00 0 
34d8c00000-34d8c83000 r-xp 00000000 08:02 720954                         /lib64/libm-2.12.so
34d8c83000-34d8e82000 ---p 00083000 08:02 720954                         /lib64/libm-2.12.so
34d8e82000-34d8e83000 r--p 00082000 08:02 720954                         /lib64/libm-2.12.so
34d8e83000-34d8e84000 rw-p 00083000 08:02 720954                         /lib64/libm-2.12.so
34d9000000-34d9002000 r-xp 00000000 08:02 720943                         /lib64/libdl-2.12.so
34d9002000-34d9202000 ---p 00002000 08:02 720943                         /lib64/libdl-2.12.so
34d9202000-34d9203000 r--p 00002000 08:02 720943                         /lib64/libdl-2.12.so
34d9203000-34d9204000 rw-p 00003000 08:02 720943                         /lib64/libdl-2.12.so
34d9400000-34d9417000 r-xp 00000000 08:02 720911                         /lib64/libpthread-2.12.so
34d9417000-34d9617000 ---p 00017000 08:02 720911                         /lib64/libpthread-2.12.so
34d9617000-34d9618000 r--p 00017000 08:02 720911                         /lib64/libpthread-2.12.so
34d9618000-34d9619000 rw-p 00018000 08:02 720911                         /lib64/libpthread-2.12.so
34d9619000-34d961d000 rw-p 00000000 00:00 0 
34d9800000-34d9815000 r-xp 00000000 08:02 720899                         /lib64/libz.so.1.2.3
34d9815000-34d9a14000 ---p 00015000 08:02 720899                         /lib64/libz.so.1.2.3
34d9a14000-34d9a15000 r--p 00014000 08:02 720899                         /lib64/libz.so.1.2.3
34d9a15000-34d9a16000 rw-p 00015000 08:02 720899                         /lib64/libz.so.1.2.3
34da000000-34da016000 r-xp 00000000 08:02 721268                         /lib64/libgcc_s-4.4.7-20120601.so.1
34da016000-34da215000 ---p 00016000 08:02 721268                         /lib64/libgcc_s-4.4.7-20120601.so.1
34da215000-34da216000 rw-p 00015000 08:02 721268                         /lib64/libgcc_s-4.4.7-20120601.so.1
34db800000-34db81e000 r-xp 00000000 08:02 3057082                        /usr/lib64/libxcb.so.1.1.0
34db81e000-34dba1d000 ---p 0001e000 08:02 3057082                        /usr/lib64/libxcb.so.1.1.0
34dba1d000-34dba1e000 rw-p 0001d000 08:02 3057082                        /usr/lib64/libxcb.so.1.1.0
34dbc00000-34dbc02000 r-xp 00000000 08:02 3057077                        /usr/lib64/libXau.so.6.0.0
34dbc02000-34dbe02000 ---p 00002000 08:02 3057077                        /usr/lib64/libXau.so.6.0.0
34dbe02000-34dbe03000 rw-p 00002000 08:02 3057077                        /usr/lib64/libXau.so.6.0.0
34dc000000-34dc026000 r-xp 00000000 08:02 721313                         /lib64/libexpat.so.1.5.2
34dc026000-34dc225000 ---p 00026000 08:02 721313                         /lib64/libexpat.so.1.5.2
34dc225000-34dc228000 rw-p 00025000 08:02 721313                         /lib64/libexpat.so.1.5.2
34dc400000-34dc537000 r-xp 00000000 08:02 3056757                        /usr/lib64/libX11.so.6.3.0
34dc537000-34dc737000 ---p 00137000 08:02 3056757                        /usr/lib64/libX11.so.6.3.0
34dc737000-34dc73d000 rw-p 00137000 08:02 3056757                        /usr/lib64/libX11.so.6.3.0
34df000000-34df098000 r-xp 00000000 08:02 3059395                        /usr/lib64/libfreetype.so.6.3.22
34df098000-34df297000 ---p 00098000 08:02 3059395                        /usr/lib64/libfreetype.so.6.3.22
34df297000-34df29d000 rw-p 00097000 08:02 3059395                        /usr/lib64/libfreetype.so.6.3.22
34dfc00000-34dfc34000 r-xp 00000000 08:02 3056417                        /usr/lib64/libfontconfig.so.1.4.4
34dfc34000-34dfe34000 ---p 00034000 08:02 3056417                        /usr/lib64/libfontconfig.so.1.4.4
34dfe34000-34dfe36000 rw-p 00034000 08:02 3056417                        /usr/lib64/libfontconfig.so.1.4.4
3b47000000-3b47014000 r-xp 00000000 08:02 3061905                        /usr/lib64/libXft.so.2.3.1
3b47014000-3b47214000 ---p 00014000 08:02 3061905                        /usr/lib64/libXft.so.2.3.1
3b47214000-3b47215000 rw-p 00014000 08:02 3061905                        /usr/lib64/libXft.so.2.3.1
3b47400000-3b47409000 r-xp 00000000 08:02 3061475                        /usr/lib64/libXrender.so.1.3.0
3b47409000-3b47608000 ---p 00009000 08:02 3061475                        /usr/lib64/libXrender.so.1.3.0
3b47608000-3b47609000 rw-p 00008000 08:02 3061475                        /usr/lib64/libXrender.so.1.3.0
3b48800000-3b48805000 r-xp 00000000 08:02 3061478                        /usr/lib64/libXfixes.so.3.1.0
3b48805000-3b48a04000 ---p 00005000 08:02 3061478                        /usr/lib64/libXfixes.so.3.1.0Aborted (core dumped)

Example without stack trace:

[iwi@jasmin-sci1 gws]$ ~/xconv -i test-xconv.pp 
Segmentation fault (core dumped)

And looking at the core dump:

(gdb) where
#0  0x00000034d8875f85 in malloc_consolidate () from /lib64/libc.so.6
#1  0x00000034d88791c5 in _int_malloc () from /lib64/libc.so.6
#2  0x00000034d887a751 in malloc () from /lib64/libc.so.6
#3  0x00000000005263e3 in ?? ()
#4  0x00000000004592d5 in ?? ()
#5  0x00000000004b48be in ?? ()
#6  0x00000000004b7e3c in ?? ()
#7  0x0000000000516614 in ?? ()
#8  0x00000000005167c7 in ?? ()
#9  0x00000000005170fc in TclNRInterpProc ()
#10 0x000000000044e967 in ?? ()
#11 0x000000000044f68d in ?? ()
#12 0x000000000044fa06 in ?? ()
#13 0x0000000000513a8b in ?? ()
#14 0x0000000000513dd9 in ?? ()
#15 0x0000000000514597 in ?? ()
#16 0x000000000044e967 in ?? ()
#17 0x000000000044f68d in ?? ()
#18 0x00000000004fcd08 in ?? ()
#19 0x0000000000502b67 in ?? ()
#20 0x0000000000406d00 in ?? ()
#21 0x00000034d881ed5d in __libc_start_main () from /lib64/libc.so.6
#22 0x0000000000405479 in ?? ()
#23 0x00007fff81537fc8 in ?? ()
#24 0x000000000000001c in ?? ()
#25 0x0000000000000003 in ?? ()
#26 0x00007fff815383e2 in ?? ()
#27 0x00007fff815383f8 in ?? ()
#28 0x00007fff815383fb in ?? ()
#29 0x0000000000000000 in ?? ()
(gdb) 

comment:2 Changed 20 months ago by jeff

  • Owner changed from um_support to jeff
  • Status changed from new to accepted

Hi Alan

This problem occurs because of the way the fields are in the pp file. Looking at test_xconv.pp it has u-wind at 3 times 2015/01/12:04.00, 2015/01/12:05.00, 2015/01/12:05.00. For the first 2 times the file has u-wind on 2 levels (model level 1 and model level 11) and the 3rd time has u-wind on 27 model levels. What xconv does is first read the pp file to work out the dimensions but it assumes all fields of a particular type have the same number of levels, in this case it thinks u-wind has 2 levels and 3 times so when it tries to read the 27 levels it is reading past the array bounds and then random bad things happen.

Firstly I should ask is this the way the file is meant to be? If so then I would suggest using cf-python to convert them to netCDF as any major xconv development is unlikely to happen now. Alternatively the first 2 times could be removed from the pp file, if they aren't needed, then xconv should work ok.

Jeff.

comment:3 Changed 20 months ago by iwi

Jeff,

Thank you. That is very useful to know. I will discuss options with the user.

The test file I supplied is heavily truncated compared to the whole file, but I believe that it would be the same issue at play. The situation can arise in the UM where the same diagnostic is set up to be written on many levels with one time frequency, and also on fewer levels but more frequently, so I suspect this is part of such a time series. David has obviously gone to some effort to ensure that CF-python can handle this, and when I wrote the C code to help optimise it, we made sure that this would still be the case. If you later come to look into supporting it in xconv, it is not overly difficult to detect that such a set of PP records needs to be expressed as more than one variable (sort records by time and then by level, and then test whether they form a 2d grid in Z,T space or not), and if not then it is trivial to break it up into many variables that will work (one per time, or one per level), but it is rather harder to work out the *minimum* set of variables needed, which is what David's fancy Python code does.

Anyway, for now, I'll treat it as a limitation in xconv, and explain this to the user.

Regards,
Alan

comment:4 Changed 19 months ago by jeff

  • Resolution set to answered
  • Status changed from accepted to closed

Thanks Alan, I'll close this ticket now.

Jeff.

Note: See TracTickets for help on using tickets.