这是个笔记
在移植CESM的时候,我想着我的服务器比较强,所以希望同时跑两个案例。
在我跑第二个案例的时候 “./case.submit”,出现以下错误:
我的案例及其编译器
./create_newcase --case 1850CLM50Bgc_gnu_cesm --res f19_g16 --compset I1850Clm50Bgc --run-unsupported --compiler gnu --mach mygnu
2021-08-08 12:21:36 MODEL EXECUTION BEGINS HERE
run command is mpirun -np 4 /home/ubuntu/cesm/scratch/1850CLM50Bgc_gnu_cesm/bld/cesm.exe >> cesm.log.$LID 2>&1
ERROR: RUN FAIL: Command 'mpirun -np 4 /home/ubuntu/cesm/scratch/1850CLM50Bgc_gnu_cesm/bld/cesm.exe >> cesm.log.$LID 2>&1 ' failed
See log file for details: /home/ubuntu/cesm/scratch/1850CLM50Bgc_gnu_cesm/run/cesm.log.210808-122133
运行代码 “cat /home/ubuntu/cesm/scratch/1850CLM50Bgc_gnu_cesm/run/cesm.log.210808-122133”,查看cesm日志。
Invalid PIO rearranger comm max pend req (comp2io), 0
Resetting PIO rearranger comm max pend req (comp2io) to 64
PIO rearranger options:
comm type =p2p
comm fcd =2denable
max pend req (comp2io) = 0
enable_hs (comp2io) = T
enable_isend (comp2io) = F
max pend req (io2comp) = 64
enable_hs (io2comp) = F
enable_isend (io2comp) = T
(seq_comm_setcomm) init ID ( 1 GLOBAL ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 2 CPL ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)( suffix =)
(seq_comm_setcomm) init ID ( 5 ATM ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)( suffix =)
(seq_comm_joincomm) init ID ( 6 CPLATM ) join IDs = 2 5 ( npes = 4) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 3 ALLATMID ) join multiple comp IDs ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 4 CPLALLATMID ) join IDs = 2 3 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) init ID ( 9 LND ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)( suffix =)
(seq_comm_joincomm) init ID ( 10 CPLLND ) join IDs = 2 9 ( npes = 4) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 7 ALLLNDID ) join multiple comp IDs ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 8 CPLALLLNDID ) join IDs = 2 7 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) init ID ( 13 ICE ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)( suffix =)
(seq_comm_joincomm) init ID ( 14 CPLICE ) join IDs = 2 13 ( npes = 4) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 11 ALLICEID ) join multiple comp IDs ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 12 CPLALLICEID ) join IDs = 2 11 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) init ID ( 17 OCN ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)( suffix =)
(seq_comm_joincomm) init ID ( 18 CPLOCN ) join IDs = 2 17 ( npes = 4) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 15 ALLOCNID ) join multiple comp IDs ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 16 CPLALLOCNID ) join IDs = 2 15 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) init ID ( 21 ROF ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)( suffix =)
(seq_comm_joincomm) init ID ( 22 CPLROF ) join IDs = 2 21 ( npes = 4) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 19 ALLROFID ) join multiple comp IDs ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 20 CPLALLROFID ) join IDs = 2 19 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) init ID ( 25 GLC ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)( suffix =)
(seq_comm_joincomm) init ID ( 26 CPLGLC ) join IDs = 2 25 ( npes = 4) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 23 ALLGLCID ) join multiple comp IDs ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 24 CPLALLGLCID ) join IDs = 2 23 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) init ID ( 29 WAV ) pelist = 0 3 1 ( npes = 4) ( nthreads = 1)( suffix =)
(seq_comm_joincomm) init ID ( 30 CPLWAV ) join IDs = 2 29 ( npes = 4) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 27 ALLWAVID ) join multiple comp IDs ( npes = 4) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 28 CPLALLWAVID ) join IDs = 2 27 ( npes = 4) ( nthreads = 1)
(seq_comm_setcomm) init ID ( 33 ESP ) pelist = 0 0 1 ( npes = 1) ( nthreads = 1)( suffix =)
(seq_comm_joincomm) init ID ( 34 CPLESP ) join IDs = 2 33 ( npes = 4) ( nthreads = 1)
(seq_comm_jcommarr) init ID ( 31 ALLESPID ) join multiple comp IDs ( npes = 1) ( nthreads = 1)
(seq_comm_joincomm) init ID ( 32 CPLALLESPID ) join IDs = 2 31 ( npes = 4) ( nthreads = 1)
(seq_comm_printcomms) 1 0 4 1 GLOBAL:
(seq_comm_printcomms) 2 0 4 1 CPL:
(seq_comm_printcomms) 3 0 4 1 ALLATMID:
(seq_comm_printcomms) 4 0 4 1 CPLALLATMID:
(seq_comm_printcomms) 5 0 4 1 ATM:
(seq_comm_printcomms) 6 0 4 1 CPLATM:
(seq_comm_printcomms) 7 0 4 1 ALLLNDID:
(seq_comm_printcomms) 8 0 4 1 CPLALLLNDID:
(seq_comm_printcomms) 9 0 4 1 LND:
(seq_comm_printcomms) 10 0 4 1 CPLLND:
(seq_comm_printcomms) 11 0 4 1 ALLICEID:
(seq_comm_printcomms) 12 0 4 1 CPLALLICEID:
(seq_comm_printcomms) 13 0 4 1 ICE:
(seq_comm_printcomms) 14 0 4 1 CPLICE:
(seq_comm_printcomms) 15 0 4 1 ALLOCNID:
(seq_comm_printcomms) 16 0 4 1 CPLALLOCNID:
(seq_comm_printcomms) 17 0 4 1 OCN:
(seq_comm_printcomms) 18 0 4 1 CPLOCN:
(seq_comm_printcomms) 19 0 4 1 ALLROFID:
(seq_comm_printcomms) 20 0 4 1 CPLALLROFID:
(seq_comm_printcomms) 21 0 4 1 ROF:
(seq_comm_printcomms) 22 0 4 1 CPLROF:
(seq_comm_printcomms) 23 0 4 1 ALLGLCID:
(seq_comm_printcomms) 24 0 4 1 CPLALLGLCID:
(seq_comm_printcomms) 25 0 4 1 GLC:
(seq_comm_printcomms) 26 0 4 1 CPLGLC:
(seq_comm_printcomms) 27 0 4 1 ALLWAVID:
(seq_comm_printcomms) 28 0 4 1 CPLALLWAVID:
(seq_comm_printcomms) 29 0 4 1 WAV:
(seq_comm_printcomms) 30 0 4 1 CPLWAV:
(seq_comm_printcomms) 31 0 1 1 ALLESPID:
(seq_comm_printcomms) 32 0 4 1 CPLALLESPID:
(seq_comm_printcomms) 33 0 1 1 ESP:
(seq_comm_printcomms) 34 0 4 1 CPLESP:
(t_initf) Read in prof_inparm namelist from: drv_in
(t_initf) Using profile_disable= F
(t_initf) profile_timer= 4
(t_initf) profile_depth_limit= 4
(t_initf) profile_detail_limit= 2
(t_initf) profile_barrier= F
(t_initf) profile_outpe_num= 1
(t_initf) profile_outpe_stride= 0
(t_initf) profile_single_file= F
(t_initf) profile_global_stats= T
(t_initf) profile_ovhd_measurement= F
(t_initf) profile_add_detail= F
(t_initf) profile_papi_enable= F
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x151e7878232a
#1 0x151e78781503
#2 0x151e77dff03f
#3 0x55d58d7cc3ed
#4 0x55d58d7c29da
#5 0x55d58d7bf712
#6 0x55d58d67cb71
#7 0x55d58d6e40d0
#8 0x55d58cf41888
#9 0x55d58cf3bab7
#10 0x55d58cec4b2e
#11 0x55d58ceb5d8f
#12 0x55d58cec20e0
#13 0x151e77de1bf6
#14 0x55d58cea84d9
#15 0xffffffffffffffff
#0 0x155370fb532a
#1 0x155370fb4503
#2 0x15537063203f
#3 0x5590caf743c1
#4 0x5590caf6a9da
#5 0x5590caf67712
#6 0x5590cae2619f
#7 0x5590cae8c0d0
#8 0x5590ca6e9888
#9 0x5590ca6e3ab7
#10 0x5590ca66cb2e
#11 0x5590ca65dd8f
#12 0x5590ca66a0e0
#13 0x155370614bf6
#14 0x5590ca6504d9
#15 0xffffffffffffffff
#0 0x14d229d6232a
#1 0x14d229d61503
#2 0x14d2293df03f
#3 0x55bbac8b93c1
#4 0x55bbac8af9da
#5 0x55bbac8ac712
#6 0x55bbac769b71
#7 0x55bbac7d10d0
#8 0x55bbac02e888
#9 0x55bbac028ab7
#10 0x55bbabfb1b2e
#11 0x55bbabfa2d8f
#12 0x55bbabfaf0e0
#13 0x14d2293c1bf6
#14 0x55bbabf954d9
#15 0xffffffffffffffff
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node ubuntu exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
结果方案
其实什么错误都没有,只是工作站只能 submit 一个案例。关掉前一个,运行这一个就ok。写此博客第一为了做笔记,第二希望后面的人不要再像我一样,去查找解决方案,最后还没解决了。