저번 14 에 이어진 내용이라고 볼 수 있다.
지난시간 왜 시스템 커널 분석에서 fc_stat, fc_lport 에 집중했는지 나오는 부분이다.
KERNEL:
/4.1.12-61.1.17.el6uek.x86_64/vmlinux
DUMPFILE:
2018-01-04-19:13:01/vmcore
CPUS: 20 [OFFLINE: 19]
DATE: Thu Jan 4 19:06:44 2018
UPTIME: 135 days, 07:36:31
LOAD AVERAGE: 1.68, 1.44, 1.03
TASKS: 802
NODENAME: *******
RELEASE: 4.1.12-61.1.17.el6uek.x86_64
VERSION: #2 SMP Mon Oct 31 18:17:37 PDT 2016
MACHINE: x86_64 (2494 Mhz)
MEMORY: 11.2 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at
00000000000003e8"
PID: 21090
COMMAND: "kworker/1:0"
TASK: ffff8803413ff000 [THREAD_INFO: ffff8801f9a9c000]
CPU: 1
STATE: TASK_RUNNING (PANIC)
crash64> log
[ 882.760140] sd 1:0:0:19: [sdbp] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 882.760153] sd 1:0:0:19: [sdbp] tag#0 Sense Key : Aborted Command
[current]
[ 882.760157] sd 1:0:0:19: [sdbp] tag#0 Add. Sense: Internal target failure
[ 882.760162] sd 1:0:0:19: [sdbp] tag#0 CDB: Read(10) 28 00 00 00 00 00 00
00 01 00
[ 882.760165] blk_update_request: I/O error, dev sdbp, sector 0
[ 882.760243] device-mapper: multipath: Failing path 68:48.
[ 882.760773] sd 1:0:0:19: [sdbp] tag#1 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 882.760783] sd 1:0:0:19: [sdbp] tag#1 Sense Key : Aborted Command
[current]
[ 882.760787] sd 1:0:0:19: [sdbp] tag#1 Add. Sense: Internal target failure
[ 882.760791] sd 1:0:0:19: [sdbp] tag#1 CDB: Read(10) 28 00 00 00 00 00 00
00 08 00
[ 882.760794] blk_update_request: I/O error, dev sdbp, sector 0
[ 882.805194] sd 1:0:1:19: [sdcn] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 882.805206] sd 1:0:1:19: [sdcn] tag#0 Sense Key : Aborted Command
[current]
[ 882.805211] sd 1:0:1:19: [sdcn] tag#0 Add. Sense: Internal target failure
[ 882.805215] sd 1:0:1:19: [sdcn] tag#0 CDB: Read(10) 28 00 00 00 00 00 00
00 08 00
[ 882.805217] blk_update_request: I/O error, dev sdcn, sector 0
[ 882.805294] device-mapper: multipath: Failing path 69:176.
[ 882.806164] sd 1:0:1:19: [sdcn] tag#1 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 882.806175] sd 1:0:1:19: [sdcn] tag#1 Sense Key : Aborted Command
[current]
[ 882.806180] sd 1:0:1:19: [sdcn] tag#1 Add. Sense: Internal target failure
[ 882.806183] sd 1:0:1:19: [sdcn] tag#1 CDB: Read(10) 28 00 00 00 00 00 00
00 01 00
[[[ Snip ]]]
[5444058.471741] buffer_io_error: 633 callbacks suppressed
[5444058.471746] Buffer I/O error on dev sdae, logical block 15, async page
read
[5444058.473596] Buffer I/O error on dev sdx, logical block 0, async page
read
[5444058.476522] Buffer I/O error on dev sdcj, logical block 7, async page
read
[5444058.477537] Buffer I/O error on dev sdca, logical block 15, async page
read
[5444058.481552] Buffer I/O error on dev sdv, logical block 1, async page
read
[5444058.481781] Buffer I/O error on dev sdar, logical block 7, async page
read
[5444058.488460] Buffer I/O error on dev sdat, logical block 7, async page
read
[5444058.492502] Buffer I/O error on dev sdci, logical block 0, async page
read
[5444058.502637] Buffer I/O error on dev sdcn, logical block 7, async page
read
[5444058.504540] Buffer I/O error on dev sdx, logical block 0, async page
read
[[[ Snip ]]]
[11067977.352682] device-mapper: multipath: Failing path 68:128.
[11067977.353391] device-mapper: multipath: Failing path 69:176.
[11067977.354347] device-mapper: multipath: Failing path 69:208.
[11067977.354476] device-mapper: multipath: Failing path 69:224.
[11067977.354589] device-mapper: multipath: Failing path 69:240.
[11147703.066973] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-7 by
loop7
[11234102.896196] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-12
by loop10
[11320502.882931] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-7 by
loop7
[11406902.721147] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-7 by
loop11
[11493302.710790] ocfs2: Unaligned AIO/DIO on inode 2657546 on device dm-7 by
loop8
[11579702.526382] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-7 by
loop7
[11666102.387746] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-7 by
loop7
[11691591.650948] BUG: unable to handle kernel NULL pointer dereference at
00000000000003e8
[11691591.651178] IP: [<ffffffffa02955ba>] fc_rport_enter_prli+0x10a/0x270
[libfc] <<<
[11691591.651583] PGD 0
[11691591.651786] Oops: 0000 [#1] SMP
[11691591.651997] Modules linked in: tun nfsv3 nfs_acl rpcsec_gss_krb5
auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace ocfs2 xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd ocfs2_dlmfs
ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc
fcoe bridge 8021q mrp garp stp llc bonding iTCO_wdt iTCO_vendor_support
pcspkr sb_edac edac_core lpc_ich mfd_core sg ipmi_devintf ipmi_si
ipmi_msghandler ext4 jbd2 mbcache dm_round_robin sd_mod fnic libfcoe libfc
scsi_transport_fc enic wmi crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4
cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs
libiscsi scsi_transport_iscsi dm_multipath dm_mirror dm_region_hash dm_log
dm_mod [last unloaded: ip_tables]
[11691591.654474] CPU: 1 PID: 21090 Comm: kworker/1:0 Not tainted
4.1.12-61.1.17.el6uek.x86_64 #2
[11691591.654873] Hardware name: Cisco Systems Inc UCSB-B200-M4/UCSB-B200-M4,
BIOS B200M4.2.2.6a.0.080520150014 08/05/2015
[11691591.655282] Workqueue: events fc_rport_timeout [libfc]
[11691591.655492] task: ffff8803413ff000 ti: ffff8801f9a9c000 task.ti:
ffff8801f9a9c000
[11691591.655886] RIP: e030:[<ffffffffa02955ba>] [<ffffffffa02955ba>]
fc_rport_enter_prli+0x10a/0x270 [libfc] <<
[11691591.656288] RSP: e02b:ffff8801f9a9fd78 EFLAGS: 00010202
[11691591.656492] RAX: 0000000000000000 RBX: ffff880103e64600 RCX:
000000000000008c
[11691591.656884] RDX: 00000000807f1c0d RSI: 000000000000002c RDI:
ffff88011c83ac00
[11691591.657273] RBP: ffff8801f9a9fda8 R08: 0000000000000300 R09:
ffff88011c83ba00
[11691591.657661] R10: 0000000000007ff0 R11: 0000000000000001 R12:
ffff88011c83ac00
[11691591.662102] R13: ffff880103e67a00 R14: ffff880355c5b800 R15:
0000000000250100
[11691591.662487] FS: 0000000000000000(0000) GS:ffff880355c40000(0000)
knlGS:ffff880355c40000
[11691591.662870] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[11691591.663072] CR2: 00000000000003e8 CR3: 0000000001a8a000 CR4:
0000000000042660
[11691591.663452] Stack:
[11691591.663639] 0000000000000000 ffff880103e64670 ffff880103e64648
ffff880103e64600
[11691591.664035] ffff880355c5b800 0000000000000000 ffff8801f9a9fdd8
ffffffffa0295d88
[11691591.664432] ffff880216ce0f00 ffff880103e64670 ffff880175a45980
ffff880355c56ec0
[11691591.664830] Call Trace:
[11691591.665025] [<ffffffffa0295d88>] fc_rport_timeout+0x98/0xb0 [libfc]
[11691591.665232] [<ffffffff8109f07e>] process_one_work+0x14e/0x4b0
[11691591.665434] [<ffffffff8100c262>] ? check_events+0x12/0x20
[11691591.665638] [<ffffffff8109f500>] worker_thread+0x120/0x480
[11691591.665840] [<ffffffff816c6eb9>] ? __schedule+0x309/0x890
[11691591.666043] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[11691591.666246] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[11691591.666449] [<ffffffff810a46de>] kthread+0xce/0xf0
[11691591.666649] [<ffffffff810a4610>] ?
kthread_freezable_should_stop+0x70/0x70
[11691591.667033] [<ffffffff816cbb62>] ret_from_fork+0x42/0x70
[11691591.667231] [<ffffffff810a4610>] ?
kthread_freezable_should_stop+0x70/0x70
[11691591.667611] Code: 83 c1 18 48 83 c1 04 31 d2 be 10 00 00 00 48 89 df ff
10 49 8b 45 00 41 83 bc 24 80 00 00 00 17 44 8b 7b 28 48 8b 80 98 07 00 00
<44> 8b b0 e8 03 00 00 0f 86 d5 00 00 00 49 8b 84 24 d8 00 00 00
[11691591.668479] RIP [<ffffffffa02955ba>] fc_rport_enter_prli+0x10a/0x270
[libfc]
[11691591.668865] RSP <ffff8801f9a9fd78>
[11691591.669057] CR2: 00000000000003e8
[11691591.669559] ---[ end trace c5af00a9256fcc71 ]---
[11691591.770100] Kernel panic - not syncing: Fatal exception
[11691591.770380] Kernel Offset: disabled
crash64> bt -l
PID: 21090 TASK: ffff8803413ff000 CPU: 1 COMMAND: "kworker/1:0"
#0 [ffff8801f9a9fab0] panic at ffffffff816c6809
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/panic.c:
137
#1 [ffff8801f9a9fb30] oops_end at ffffffff8101a79c
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/kernel/dumps
tack.c: 247
#2 [ffff8801f9a9fb60] no_context at ffffffff8106d7a1
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/mm/fault.c:
737
#3 [ffff8801f9a9fbb0] __bad_area_nosemaphore at ffffffff8106d99d
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/mm/fault.c:
817
#4 [ffff8801f9a9fc00] bad_area_nosemaphore at ffffffff8106dab3
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/mm/fault.c:
825
#5 [ffff8801f9a9fc10] __do_page_fault at ffffffff8106e028
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/mm/fault.c:
1302
#6 [ffff8801f9a9fc80] do_page_fault at ffffffff8106e337
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/mm/fault.c:
1320
#7 [ffff8801f9a9fcc0] page_fault at ffffffff816cd758
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/kernel/entry
_64.S: 1262
[exception RIP: fc_rport_enter_prli+266]
<<<
RIP: ffffffffa02955ba RSP: ffff8801f9a9fd78 RFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff880103e64600 RCX: 000000000000008c
RDX: 00000000807f1c0d RSI: 000000000000002c RDI: ffff88011c83ac00
RBP: ffff8801f9a9fda8 R8: 0000000000000300 R9: ffff88011c83ba00
R10: 0000000000007ff0 R11: 0000000000000001 R12: ffff88011c83ac00
R13: ffff880103e67a00 R14: ffff880355c5b800 R15: 0000000000250100
ORIG_RAX: ffffffffffffffff CS: e030 SS: e02b
#8 [ffff8801f9a9fdb0] fc_rport_timeout at ffffffffa0295d88 [libfc]
#9 [ffff8801f9a9fde0] process_one_work at ffffffff8109f07e
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/trace/events/
workqueue.h: 116
#10 [ffff8801f9a9fe40] worker_thread at ffffffff8109f500
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/linux/list.h:
189
#11 [ffff8801f9a9fec0] kthread at ffffffff810a46de
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/kthread.c:
207
#12 [ffff8801f9a9ff50] ret_from_fork at ffffffff816cbb62
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/kernel/entry
_64.S: 640
RIP 를 확인 해보자.
crash64> dis -rl ffffffffa02955ba
0xffffffffa02954b0 <fc_rport_enter_prli>: push %rbp
0xffffffffa02954b1 <fc_rport_enter_prli+1>: mov %rsp,%rbp
0xffffffffa02954b4 <fc_rport_enter_prli+4>: sub $0x30,%rsp
0xffffffffa02954b8 <fc_rport_enter_prli+8>: mov %rbx,-0x28(%rbp)
0xffffffffa02954bc <fc_rport_enter_prli+12>: mov %r12,-0x20(%rbp)
0xffffffffa02954c0 <fc_rport_enter_prli+16>: mov %r13,-0x18(%rbp)
0xffffffffa02954c4 <fc_rport_enter_prli+20>: mov %r14,-0x10(%rbp)
0xffffffffa02954c8 <fc_rport_enter_prli+24>: mov %r15,-0x8(%rbp)
0xffffffffa02954cc <fc_rport_enter_prli+28>: nopl 0x0(%rax,%rax,1)
0xffffffffa02954d1 <fc_rport_enter_prli+33>: mov 0x28(%rdi),%edx
0xffffffffa02954d4 <fc_rport_enter_prli+36>: mov (%rdi),%r13
0xffffffffa02954d7 <fc_rport_enter_prli+39>: mov %rdi,%rbx
0xffffffffa02954da <fc_rport_enter_prli+42>: cmp $0xfffbff,%edx
0xffffffffa02954e0 <fc_rport_enter_prli+48>: ja 0xffffffffa0295680
<fc_rport_enter_prli+464>
0xffffffffa02954e6 <fc_rport_enter_prli+54>: testb $0x8,0xbe73(%rip)
# 0xffffffffa02a1360
0xffffffffa02954ed <fc_rport_enter_prli+61>: jne 0xffffffffa02956b2
<fc_rport_enter_prli+514>
0xffffffffa02954f3 <fc_rport_enter_prli+67>: cmpl $0x4,0x14(%rbx)
0xffffffffa02954f7 <fc_rport_enter_prli+71>: je 0xffffffffa0295500
<fc_rport_enter_prli+80>
0xffffffffa02954f9 <fc_rport_enter_prli+73>: movl $0x0,0x38(%rbx)
0xffffffffa0295500 <fc_rport_enter_prli+80>: movl $0x4,0x14(%rbx)
0xffffffffa0295507 <fc_rport_enter_prli+87>: mov $0x14,%edi
0xffffffffa029550c <fc_rport_enter_prli+92>: callq 0xffffffffa02906a0
<_fc_frame_alloc>
0xffffffffa0295511 <fc_rport_enter_prli+97>: test %rax,%rax
0xffffffffa0295514 <fc_rport_enter_prli+100>: mov %rax,%r12
0xffffffffa0295517 <fc_rport_enter_prli+103>: je 0xffffffffa0295690
<fc_rport_enter_prli+480>
0xffffffffa029551d <fc_rport_enter_prli+109>: xor %eax,%eax
0xffffffffa029551f <fc_rport_enter_prli+111>: cmpl $0x2b,0x80(%r12)
0xffffffffa0295528 <fc_rport_enter_prli+120>: jbe 0xffffffffa0295536
<fc_rport_enter_prli+134>
0xffffffffa029552a <fc_rport_enter_prli+122>: mov 0xd8(%r12),%rax
0xffffffffa0295532 <fc_rport_enter_prli+130>: add $0x18,%rax
0xffffffffa0295536 <fc_rport_enter_prli+134>: movq $0x0,(%rax)
0xffffffffa029553d <fc_rport_enter_prli+141>: movl $0x0,0x10(%rax)
0xffffffffa0295544 <fc_rport_enter_prli+148>: movq $0x0,0x8(%rax)
0xffffffffa029554c <fc_rport_enter_prli+156>: movb $0x20,(%rax)
0xffffffffa029554f <fc_rport_enter_prli+159>: movb $0x10,0x1(%rax)
0xffffffffa0295553 <fc_rport_enter_prli+163>: movw $0x1400,0x2(%rax)
0xffffffffa0295559 <fc_rport_enter_prli+169>: movb $0x8,0x4(%rax)
0xffffffffa029555d <fc_rport_enter_prli+173>: movb $0x20,0x6(%rax)
0xffffffffa0295561 <fc_rport_enter_prli+177>: mov 0x360(%r13),%edx
0xffffffffa0295568 <fc_rport_enter_prli+184>: bswap %edx
0xffffffffa029556a <fc_rport_enter_prli+186>: mov %edx,0x10(%rax)
0xffffffffa029556d <fc_rport_enter_prli+189>: mov 0xb46c(%rip),%rax
# 0xffffffffa02a09e0
0xffffffffa0295574 <fc_rport_enter_prli+196>: test %rax,%rax
0xffffffffa0295577 <fc_rport_enter_prli+199>: je 0xffffffffa02955a2
<fc_rport_enter_prli+242>
0xffffffffa0295579 <fc_rport_enter_prli+201>: xor %ecx,%ecx
0xffffffffa029557b <fc_rport_enter_prli+203>: cmpl $0x2b,0x80(%r12)
0xffffffffa0295584 <fc_rport_enter_prli+212>: jbe 0xffffffffa0295592
<fc_rport_enter_prli+226>
0xffffffffa0295586 <fc_rport_enter_prli+214>: mov 0xd8(%r12),%rcx
0xffffffffa029558e <fc_rport_enter_prli+222>: add $0x18,%rcx
0xffffffffa0295592 <fc_rport_enter_prli+226>: add $0x4,%rcx
0xffffffffa0295596 <fc_rport_enter_prli+230>: xor %edx,%edx
0xffffffffa0295598 <fc_rport_enter_prli+232>: mov $0x10,%esi
0xffffffffa029559d <fc_rport_enter_prli+237>: mov %rbx,%rdi
0xffffffffa02955a0 <fc_rport_enter_prli+240>: callq *(%rax)
0xffffffffa02955a2 <fc_rport_enter_prli+242>: mov 0x0(%r13),%rax
0xffffffffa02955a6 <fc_rport_enter_prli+246>: cmpl $0x17,0x80(%r12)
0xffffffffa02955af <fc_rport_enter_prli+255>: mov 0x28(%rbx),%r15d
0xffffffffa02955b3 <fc_rport_enter_prli+259>: mov 0x798(%rax),%rax
0xffffffffa02955ba <fc_rport_enter_prli+266>: mov 0x3e8(%rax),%r14d
<<<
crash64>
fc_rport_enter_prli+242 라인에 rax 를 r13 으로 호출하는 것이 있다.
r13 = ffff880103e67a00.
확인해 보도록 한다.
역시 소스코드의 라인이 나타나지 않는다. 모듈을 로드하자.
crash7latest> mod -s libfc usr/lib/debug/lib/modules/4.1.12-61.1.17.el6uek.x86_64/kernel/drivers/scsi/libfc/libfc.ko.debug
MODULE NAME SIZE OBJECT FILE
ffffffffa02a1100 libfc 114450 usr/lib/debug/lib/modules/4.1.12-61.1.17.el6uek.x86_64/kernel/drivers/scsi/libfc/libfc.ko.debug
crash7latest> dis -rl ffffffffa02955ba
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1090
0xffffffffa02954b0 <fc_rport_enter_prli>: push %rbp
0xffffffffa02954b1 <fc_rport_enter_prli+1>: mov %rsp,%rbp
0xffffffffa02954b4 <fc_rport_enter_prli+4>: sub $0x30,%rsp
0xffffffffa02954b8 <fc_rport_enter_prli+8>: mov %rbx,-0x28(%rbp)
0xffffffffa02954bc <fc_rport_enter_prli+12>: mov %r12,-0x20(%rbp)
0xffffffffa02954c0 <fc_rport_enter_prli+16>: mov %r13,-0x18(%rbp)
0xffffffffa02954c4 <fc_rport_enter_prli+20>: mov %r14,-0x10(%rbp)
0xffffffffa02954c8 <fc_rport_enter_prli+24>: mov %r15,-0x8(%rbp)
0xffffffffa02954cc <fc_rport_enter_prli+28>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1103
0xffffffffa02954d1 <fc_rport_enter_prli+33>: mov 0x28(%rdi),%edx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1091
0xffffffffa02954d4 <fc_rport_enter_prli+36>: mov (%rdi),%r13
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1090
0xffffffffa02954d7 <fc_rport_enter_prli+39>: mov %rdi,%rbx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1103
0xffffffffa02954da <fc_rport_enter_prli+42>: cmp $0xfffbff,%edx
0xffffffffa02954e0 <fc_rport_enter_prli+48>: ja 0xffffffffa0295680 <fc_rport_enter_prli+464>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1108
0xffffffffa02954e6 <fc_rport_enter_prli+54>: testb $0x8,0xbe73(%rip) # 0xffffffffa02a1360 <fc_debug_logging>
0xffffffffa02954ed <fc_rport_enter_prli+61>: jne 0xffffffffa02956b2 <fc_rport_enter_prli+514>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 231
0xffffffffa02954f3 <fc_rport_enter_prli+67>: cmpl $0x4,0x14(%rbx)
0xffffffffa02954f7 <fc_rport_enter_prli+71>: je 0xffffffffa0295500 <fc_rport_enter_prli+80>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 232
0xffffffffa02954f9 <fc_rport_enter_prli+73>: movl $0x0,0x38(%rbx)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 233
0xffffffffa0295500 <fc_rport_enter_prli+80>: movl $0x4,0x14(%rbx)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_frame.h: 139
0xffffffffa0295507 <fc_rport_enter_prli+87>: mov $0x14,%edi
0xffffffffa029550c <fc_rport_enter_prli+92>: callq 0xffffffffa02906a0 <_fc_frame_alloc>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1114
0xffffffffa0295511 <fc_rport_enter_prli+97>: test %rax,%rax
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_frame.h: 139
0xffffffffa0295514 <fc_rport_enter_prli+100>: mov %rax,%r12
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1114
0xffffffffa0295517 <fc_rport_enter_prli+103>: je 0xffffffffa0295690 <fc_rport_enter_prli+480>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_frame.h: 208
0xffffffffa029551d <fc_rport_enter_prli+109>: xor %eax,%eax
0xffffffffa029551f <fc_rport_enter_prli+111>: cmpl $0x2b,0x80(%r12)
0xffffffffa0295528 <fc_rport_enter_prli+120>: jbe 0xffffffffa0295536 <fc_rport_enter_prli+134>
0xffffffffa029552a <fc_rport_enter_prli+122>: mov 0xd8(%r12),%rax
0xffffffffa0295532 <fc_rport_enter_prli+130>: add $0x18,%rax
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 664
0xffffffffa0295536 <fc_rport_enter_prli+134>: movq $0x0,(%rax)
0xffffffffa029553d <fc_rport_enter_prli+141>: movl $0x0,0x10(%rax)
0xffffffffa0295544 <fc_rport_enter_prli+148>: movq $0x0,0x8(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 665
0xffffffffa029554c <fc_rport_enter_prli+156>: movb $0x20,(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 666
0xffffffffa029554f <fc_rport_enter_prli+159>: movb $0x10,0x1(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 667
0xffffffffa0295553 <fc_rport_enter_prli+163>: movw $0x1400,0x2(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 668
0xffffffffa0295559 <fc_rport_enter_prli+169>: movb $0x8,0x4(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 669
0xffffffffa029555d <fc_rport_enter_prli+173>: movb $0x20,0x6(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/uapi/linux/swab.h: 60
0xffffffffa0295561 <fc_rport_enter_prli+177>: mov 0x360(%r13),%edx
0xffffffffa0295568 <fc_rport_enter_prli+184>: bswap %edx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 670
0xffffffffa029556a <fc_rport_enter_prli+186>: mov %edx,0x10(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1121
0xffffffffa029556d <fc_rport_enter_prli+189>: mov 0xb46c(%rip),%rax # 0xffffffffa02a09e0 <fc_passive_prov+64>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1122
0xffffffffa0295574 <fc_rport_enter_prli+196>: test %rax,%rax
0xffffffffa0295577 <fc_rport_enter_prli+199>: je 0xffffffffa02955a2 <fc_rport_enter_prli+242>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_frame.h: 208
0xffffffffa0295579 <fc_rport_enter_prli+201>: xor %ecx,%ecx
0xffffffffa029557b <fc_rport_enter_prli+203>: cmpl $0x2b,0x80(%r12)
0xffffffffa0295584 <fc_rport_enter_prli+212>: jbe 0xffffffffa0295592 <fc_rport_enter_prli+226>
0xffffffffa0295586 <fc_rport_enter_prli+214>: mov 0xd8(%r12),%rcx
0xffffffffa029558e <fc_rport_enter_prli+222>: add $0x18,%rcx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1124
0xffffffffa0295592 <fc_rport_enter_prli+226>: add $0x4,%rcx
0xffffffffa0295596 <fc_rport_enter_prli+230>: xor %edx,%edx
0xffffffffa0295598 <fc_rport_enter_prli+232>: mov $0x10,%esi
0xffffffffa029559d <fc_rport_enter_prli+237>: mov %rbx,%rdi
0xffffffffa02955a0 <fc_rport_enter_prli+240>: callq *(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1127
0xffffffffa02955a2 <fc_rport_enter_prli+242>: mov 0x0(%r13),%rax
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_frame.h: 173 >> lport->host
0xffffffffa02955a6 <fc_rport_enter_prli+246>: cmpl $0x17,0x80(%r12)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1127 <<
0xffffffffa02955af <fc_rport_enter_prli+255>: mov 0x28(%rbx),%r15d
0xffffffffa02955b3 <fc_rport_enter_prli+259>: mov 0x798(%rax),%rax >> scsi_host->host_data
0xffffffffa02955ba <fc_rport_enter_prli+266>: mov 0x3e8(%rax),%r14d
crash7latest> >> fc_port_attr->port_id
패닉은 fc_rport_enter_prli+266 에서 발생했다.
해당 어셈블리 내용은 아래 코드와 매치된다.
1089 static void fc_rport_enter_prli(struct fc_rport_priv *rdata)
1090 {
1091 struct fc_lport *lport = rdata->local_port;
1092 struct {
1093 struct fc_els_prli prli;
1094 struct fc_els_spp spp;
1095 } *pp;
1096 struct fc_frame *fp;
1097 struct fc4_prov *prov;
1098
1099 /*
1100 * If the rport is one of the well known addresses
1101 * we skip PRLI and RTV and go straight to READY.
1102 */
1103 if (rdata->ids.port_id >= FC_FID_DOM_MGR) {
1104 fc_rport_enter_ready(rdata);
1105 return;
1106 }
1107
1108 FC_RPORT_DBG(rdata, "Port entered PRLI state from %s state\n",
1109 fc_rport_state(rdata));
1110
1111 fc_rport_state_enter(rdata, RPORT_ST_PRLI);
1112
1113 fp = fc_frame_alloc(lport, sizeof(*pp));
1114 if (!fp) {
1115 fc_rport_error_retry(rdata, fp);
1116 return;
1117 }
1118
1119 fc_prli_fill(lport, fp);
1120
1121 prov = fc_passive_prov[FC_TYPE_FCP];
1122 if (prov) {
1123 pp = fc_frame_payload_get(fp, sizeof(*pp));
1124 prov->prli(rdata, sizeof(pp->spp), NULL, &pp->spp);
1125 }
1126
1127 fc_fill_fc_hdr(fp, FC_RCTL_ELS_REQ, rdata->ids.port_id,
1128 fc_host_port_id(lport->host), FC_TYPE_ELS,
1129 FC_FC_FIRST_SEQ | FC_FC_END_SEQ | FC_FC_SEQ_INIT, 0); <<
1130
1131 if (!lport->tt.exch_seq_send(lport, fp, fc_rport_prli_resp,
1132 NULL, rdata, 2 * lport->r_a_tov))
1133 fc_rport_error_retry(rdata, NULL);
1134 else
1135 kref_get(&rdata->kref);
1136 }
자, 이전에 본 14번 내용과 비슷하게 fc 관련 header 를 가져와 채우는 부분이 문제가 되고 있다.
r13 의 내용을 확인해 보자. fc_port 데이타를 가리키고 있기 때문이다.
주소는 0xffff880103e67a00 이지만 그보다 16 더 앞부터 총 32비트의 내용까지 살펴본다.
crash7latest> dis -l 0xffff880103e679f0 32
dis: WARNING: ffff880103e679f0: no associated kernel symbol found
0xffff880103e679f0: add %al,(%rax)
0xffff880103e679f2: add %al,(%rax)
0xffff880103e679f4: add %al,(%rax)
0xffff880103e679f6: add %al,(%rax)
0xffff880103e679f8: add %al,(%rax)
0xffff880103e679fa: add %al,(%rax)
0xffff880103e679fc: add %al,(%rax)
0xffff880103e679fe: add %al,(%rax)
0xffff880103e67a00: add %ah,-0x1a(%rax)
0xffff880103e67a03: add (%rcx),%eax
0xffff880103e67a05: mov %bh,%bh
0xffff880103e67a07: incl 0x61(%rdx)
0xffff880103e67a0a: jae 0xffff880103e67a71
0xffff880103e67a0c: rex.W push %rsp
0xffff880103e67a0e: push %rsp
0xffff880103e67a0f: push %rax
0xffff880103e67a10: (bad)
0xffff880103e67a11: xor %ch,(%rsi)
0xffff880103e67a13: xor (%rax),%esp
0xffff880103e67a15: push %rax
0xffff880103e67a16: jns 0xffff880103e67a8c
0xffff880103e67a18: pushq $0x322f6e6f
0xffff880103e67a1d: cs
0xffff880103e67a1e: ss
0xffff880103e67a1f: cs
0xffff880103e67a20: ss
0xffff880103e67a21: or $0x3030300a,%eax
0xffff880103e67a26: xor %dh,(%rax)
0xffff880103e67a28: (bad)
0xffff880103e67a29: fs
0xffff880103e67a2a: gs
0xffff880103e67a2b: jbe 0xffff880103e67a96
crash7latest>
실재로 해당 내용은 bad address 를 표시하고 있다.
crash7latest> struct fc_lport ffff880103e67a00
struct fc_lport {
host = 0xffff880103e66000,
ema_list = {
next = 0x5054544865736142,
prev = 0x74795020332e302f
},
dns_rdata = 0x2e362e322f6e6f68,
ms_rdata = 0x30303030300a0d36,
ptp_rdata = 0x2f6563697665642f,
scsi_priv = 0x3137006973637376,
.............
crash7latest> rd ffff880103e679f0 32
ffff880103e679f0: 0000000000000000 0000000000000000 ................
ffff880103e67a00: ffff880103e66000 5054544865736142 .`......BaseHTTP
ffff880103e67a10: 74795020332e302f 2e362e322f6e6f68 /0.3 Python/2.6.
ffff880103e67a20: 30303030300a0d36 2f6563697665642f 6..00000/device/
ffff880103e67a30: 3137006973637376 6e656b6361622f32 vscsi.712/backen
ffff880103e67a40: 21005b0064692d64 00450008729c64f6 d-id.[.!.d.r..E.
ffff880103e67a50: 0640004018a37600 0a0a29240a0a4b3b .v..@.@.;K..$)..
ffff880103e67a60: 51c01bc4f105e223 1880f57adb746b52 #......QRkt.z...
ffff880103e67a70: 01010000875c4a01 39310cfb80b80a08 .J\...........19
ffff880103e67a80: 00060000420058b4 fe8e020400000000 .X.B............
ffff880103e67a90: 020100007b050200 0000000000000300 ...{............
ffff880103e67aa0: 0000000000000000 302d41524f190101 ...........ORA-0
ffff880103e67ab0: 6f6e203a33303431 6f66206174616420 1403: no data fo
ffff880103e67ac0: 0000000000000001 0000000000000000 ................
ffff880103e67ad0: 0000000000000000 0000000000000000 ................
ffff880103e67ae0: 0000000600000001 ffffc90006064528 ........(E......
해당 값에는 fc_lport 의 정보라기 보기 어려운 다른 값이 들어가 있음을 확인할 수 있다.
흥미롭게도 이번과 저번 모두 fc_lport 의 주소가 물리적으로 동일하다.
crash7latest> vtop 0xffff88019891ea00
VIRTUAL PHYSICAL
ffff88019891ea00 19891ea00
PML4 DIRECTORY: ffffffff81a8a000
PAGE DIRECTORY: 4009a8b067 [machine]
PAGE DIRECTORY: 1a8b000
PUD: 1a8b030 => 4002cd3067 [machine]
PUD: 2cf6c5000
PMD: 2cf6c5620 => 4002c0e067 [machine]
PMD: 2cf600000
PTE: 2cf6008f0 => 801000215072c067 [machine]
PTE: 19891e067
PAGE: 215072c000 [machine]
PAGE: 19891e000
PTE PHYSICAL FLAGS
19891e067 19891e000 (PRESENT|RW|USER|ACCESSED|DIRTY)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0006624780 19891e000 0 2 0 2fffff80000000
이전 게시글에서와 동일하게 ffff880103e67a00 를 참조하는 부분에서 문제가 발생하였다.
따라서 fc_lport 에 관련되어 메모리를 사용함에 있어, 이미 해제된 메모리를 재참조하거나,
메모리 corruption 조건이 발생되고 있다는 것을 의미한다.
역시 저번 게시글과 같이 해당 코드에 논리적 오류나 Race condition 발생 여부,
Used-after-freed 현상에 대해서 더 살펴봐야 하는 상태이지만,
코드 상 특별히 문제될만한 부분은 발견되지 않았다는 점이 현 문제의 가장 큰 난관이다.
지금은 개발팀에서 in-house 로 동일 환경을 구성 후 재현테스트를 하고 있는 중이다.
이는 장비에 대한 특성 ( 펌웨어 문제 또는 하드웨어 관련 된 정보의 괴리 ),
즉 하드웨어 문제를 배제할 수 없는 의심스러운 부분이 있다는 것을 의미한다.
'Skills > mY Technutz' 카테고리의 다른 글
libfc: Update rport reference counting bug - 1368175 (0) | 2018.03.29 |
---|---|
Kernel Dump Analysis #16 (0) | 2018.02.22 |
Kernel Dump Analysis #14 (0) | 2018.02.13 |
Kernel Crash dump Analysis - #13 (0) | 2017.12.20 |
커널이 지원하는 기능을 확인하는 습관. (1) | 2017.02.06 |