이젠 뭐, 커널에 대한 덤프 분석이라기 보다는
CRASH tool 의 사용법과, 이슈에 따라서 분석을 진행하는 방법에 대한 Guide 가 되어가는 것 같다.
이전엔 주절주절 말을 많이한것 같았는데, 오늘은 주로 코드나 어셈의 흐름을 위주로 설명해 볼까 한다.
시스템이 갑작스럽게 코어와 함께 리붓되었단다.
KERNEL: /4.1.12-61.1.17.el6uek.x86_64/vmlinux
DUMPFILE: vmcore
CPUS: 20 [OFFLINE: 18]
DATE: Thu Feb 1 10:44:31 2018
UPTIME: 27 days, 15:00:47
LOAD AVERAGE: 0.68, 1.75, 1.43
TASKS: 721
NODENAME: ******
RELEASE: 4.1.12-61.1.17.el6uek.x86_64
VERSION: #2 SMP Mon Oct 31 18:17:37 PDT 2016
MACHINE: x86_64 (2494 Mhz)
MEMORY: 11.2 GB
PANIC: "BUG: unable to handle kernel paging request at fffffffc464d0838"
PID: 9647
COMMAND: "kworker/u40:2"
TASK: ffff88034a9ed400 [THREAD_INFO: ffff88000888c000]
CPU: 16
STATE: TASK_RUNNING (PANIC)
crash7latest> log
[[[ Snip ]]]
[2387223.097904] ------------[ cut here ]------------
[2387223.097934] WARNING: CPU: 6 PID: 9647 at include/linux/kref.h:47 fc_rport_enter_plogi+0x1cc/0x1e0 [libfc]()
[2387223.097936] Modules linked in: iptable_filter ip_tables nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace ocfs2 xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc fcoe bridge 8021q mrp garp stp llc bonding iTCO_wdt iTCO_vendor_support pcspkr sb_edac edac_core lpc_ich mfd_core sg ipmi_devintf ipmi_si ipmi_msghandler ext4 jbd2 mbcache dm_round_robin sd_mod fnic libfcoe libfc scsi_transport_fc enic wmi crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi dm_multipath dm_mirror dm_region_hash dm_log dm_mod
[2387223.098027] CPU: 6 PID: 9647 Comm: kworker/u40:2 Not tainted 4.1.12-61.1.17.el6uek.x86_64 #2
[2387223.098029] Hardware name: Cisco Systems Inc UCSB-B200-M4/UCSB-B200-M4, BIOS B200M4.2.2.6a.0.080520150014 08/05/2015
[2387223.098040] Workqueue: fc_rport_eq fc_rport_work [libfc]
[2387223.098043] 0000000000000000 ffff88000888fca8 ffffffff816c6a80 0000000000000000
[2387223.098048] 000000000000002f ffff88000888fce8 ffffffff810845e5 0000000000000000
[2387223.098060] ffff88019891ea00 ffff8803442327f8 ffff88019891ea48 ffff8803442327f8
[2387223.098064] Call Trace:
[2387223.098074] [<ffffffff816c6a80>] dump_stack+0x63/0x83
[2387223.098081] [<ffffffff810845e5>] warn_slowpath_common+0x95/0xe0
[2387223.098086] [<ffffffff8108464a>] warn_slowpath_null+0x1a/0x20
[2387223.098096] [<ffffffffa0295abc>] fc_rport_enter_plogi+0x1cc/0x1e0 [libfc]
[2387223.098110] [<ffffffffa0295b75>] fc_rport_enter_flogi+0xa5/0x130 [libfc]
[2387223.098117] [<ffffffff816c9386>] ? mutex_lock+0x16/0x40
[2387223.098125] [<ffffffffa0297a3d>] fc_rport_work+0x3cd/0x690 [libfc]
[2387223.098133] [<ffffffffa02c9b26>] ? fnic_handle_frame+0x76/0xf0 [fnic]
[2387223.098139] [<ffffffff8109f07e>] process_one_work+0x14e/0x4b0
[2387223.098147] [<ffffffff8109f500>] worker_thread+0x120/0x480
[2387223.098154] [<ffffffff816c6eb9>] ? __schedule+0x309/0x890
[2387223.098159] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[2387223.098163] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[2387223.098170] [<ffffffff810a46de>] kthread+0xce/0xf0
[2387223.098178] [<ffffffff810a4610>] ? kthread_freezable_should_stop+0x70/0x70
[2387223.098185] [<ffffffff816cbb62>] ret_from_fork+0x42/0x70
[2387223.098188] [<ffffffff810a4610>] ? kthread_freezable_should_stop+0x70/0x70
[2387223.098202] ---[ end trace b16b4e42f8133221 ]---
[[[ Snip ]]]
[2387224.712025] BUG: unable to handle kernel paging request at fffffffc464d0838
[2387224.712244] IP: [<ffffffff810cd83e>] osq_lock+0x4e/0x120
[2387224.712453] PGD 1a8d067 PUD 0
[2387224.712655] Oops: 0000 [#1] SMP
[2387224.712858] Modules linked in: iptable_filter ip_tables nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace ocfs2 xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc fcoe bridge 8021q mrp garp stp llc bonding iTCO_wdt iTCO_vendor_support pcspkr sb_edac edac_core lpc_ich mfd_core sg ipmi_devintf ipmi_si ipmi_msghandler ext4 jbd2 mbcache dm_round_robin sd_mod fnic libfcoe libfc scsi_transport_fc enic wmi crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi dm_multipath dm_mirror dm_region_hash dm_log dm_mod
[2387224.715204] CPU: 16 PID: 9647 Comm: kworker/u40:2 Tainted: G W 4.1.12-61.1.17.el6uek.x86_64 #2
[2387224.715598] Hardware name: Cisco Systems Inc UCSB-B200-M4/UCSB-B200-M4, BIOS B200M4.2.2.6a.0.080520150014 08/05/2015
[2387224.716002] Workqueue: fnic_event_wq fnic_handle_frame [fnic]
[2387224.716209] task: ffff88034a9ed400 ti: ffff88000888c000 task.ti: ffff88000888c000
[2387224.716596] RIP: e030:[<ffffffff810cd83e>] [<ffffffff810cd83e>] osq_lock+0x4e/0x120
[2387224.716988] RSP: e02b:ffff88000888fb38 EFLAGS: 00010286
[2387224.717191] RAX: ffffffff9891ea77 RBX: ffff88019891ea60 RCX: ffff8803560181c0
[2387224.717578] RDX: ffff8803560181d0 RSI: 00000000000181c0 RDI: ffff88019891ea80
[2387224.717963] RBP: ffff88000888fb58 R08: ffff88000888fb48 R09: 0000000000000001
[2387224.718349] R10: 0000000000007ff0 R11: 0000000000000001 R12: 0000000000000000
[2387224.718732] R13: ffff88019891ea60 R14: 0000000000000000 R15: ffff88019891ea78
[2387224.719121] FS: 0000000000000000(0000) GS:ffff880356000000(0000) knlGS:ffff880356000000
[2387224.719511] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[2387224.719714] CR2: fffffffc464d0838 CR3: 0000000341bbc000 CR4: 0000000000042660
[2387224.720105] Stack:
[2387224.720293] ffff880351803700 ffffffffa028b17b ffff880344232830 ffff88000888fc10
[2387224.720689] ffff88000888fbe8 ffffffff810cd235 ffff88000888fb88 ffff88019891ea80
[2387224.721086] ffff88034a9ed400 ffffffffa028b17b ffff88019891ea78 ffffffff815ded14
[2387224.721484] Call Trace:
[2387224.721681] [<ffffffffa028b17b>] ? fc_disc_recv_rscn_req+0x9b/0x3d0 [libfc]
[2387224.722061] [<ffffffff810cd235>] mutex_optimistic_spin+0x75/0x1f0
[2387224.722267] [<ffffffffa028b17b>] ? fc_disc_recv_rscn_req+0x9b/0x3d0 [libfc]
[2387224.722654] [<ffffffff815ded14>] ? __kfree_skb+0x34/0x90
[2387224.722854] [<ffffffff815de03f>] ? kfree_skb+0x4f/0xc0
[2387224.723057] [<ffffffff816c928b>] __mutex_lock_slowpath+0x2b/0x110
[2387224.723259] [<ffffffff816cb040>] ? _raw_spin_unlock_irqrestore+0x20/0x50
[2387224.723463] [<ffffffff816c9393>] mutex_lock+0x23/0x40
[2387224.723669] [<ffffffffa028aef6>] fc_disc_gpn_id_resp+0x36/0x220 [libfc]
[2387224.723877] [<ffffffffa028aec0>] ? fc_disc_start+0x60/0x60 [libfc]
[2387224.724083] [<ffffffffa028c0eb>] fc_invoke_resp+0x8b/0xf0 [libfc]
[2387224.724292] [<ffffffffa028d87e>] fc_exch_recv_seq_resp+0x18e/0x290 [libfc]
[2387224.724501] [<ffffffffa028ed91>] fc_exch_recv+0x231/0x290 [libfc]
[2387224.724705] [<ffffffffa02c9b1e>] fnic_handle_frame+0x6e/0xf0 [fnic]
[2387224.724909] [<ffffffff8109f07e>] process_one_work+0x14e/0x4b0
[2387224.725110] [<ffffffff8109f500>] worker_thread+0x120/0x480
[2387224.725311] [<ffffffff816c6eb9>] ? __schedule+0x309/0x890
[2387224.725510] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[2387224.725710] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[2387224.725912] [<ffffffff810a46de>] kthread+0xce/0xf0
[2387224.726110] [<ffffffff810a4610>] ? kthread_freezable_should_stop+0x70/0x70
[2387224.726316] [<ffffffff816cbb62>] ret_from_fork+0x42/0x70
[2387224.726518] [<ffffffff810a4610>] ? kthread_freezable_should_stop+0x70/0x70
RIP 체크를 해보자.
crash7latest> dis -rl osq_lock+0x4e
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 85
0xffffffff810cd7f0 <osq_lock>: push %rbp
0xffffffff810cd7f1 <osq_lock+1>: mov %rsp,%rbp
0xffffffff810cd7f4 <osq_lock+4>: sub $0x20,%rsp
0xffffffff810cd7f8 <osq_lock+8>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 86
0xffffffff810cd7fd <osq_lock+13>: mov $0x181c0,%rsi
0xffffffff810cd804 <osq_lock+20>: mov %rsi,%rcx
0xffffffff810cd807 <osq_lock+23>: add %gs:0x7ef3c919(%rip),%rcx # 0xa128
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 88
0xffffffff810cd80f <osq_lock+31>: mov %gs:0x7ef3c91a(%rip),%eax # 0xa130
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 21
0xffffffff810cd816 <osq_lock+38>: add $0x1,%eax
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 91
0xffffffff810cd819 <osq_lock+41>: movl $0x0,0x10(%rcx)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 92
0xffffffff810cd820 <osq_lock+48>: movq $0x0,(%rcx)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 93
0xffffffff810cd827 <osq_lock+55>: mov %eax,0x14(%rcx)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/include/asm/atomic.h: 182
0xffffffff810cd82a <osq_lock+58>: xchg %eax,(%rdi)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 96
0xffffffff810cd82c <osq_lock+60>: test %eax,%eax
0xffffffff810cd82e <osq_lock+62>: je 0xffffffff810cd88a <osq_lock+154>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 28
0xffffffff810cd830 <osq_lock+64>: cltq
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 112
0xffffffff810cd832 <osq_lock+66>: lea 0x10(%rcx),%rdx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/linux/compiler.h: 204
0xffffffff810cd836 <osq_lock+70>: lea -0x10(%rbp),%r8
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 28
0xffffffff810cd83a <osq_lock+74>: sub $0x1,%rax
0xffffffff810cd83e <osq_lock+78>: add -0x7e424b80(,%rax,8),%rsi
crash7latest>
Check Source code :
24 static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
25 {
26 int cpu_nr = encoded_cpu_val - 1;
27
28 return per_cpu_ptr(&osq_node, cpu_nr);
29 }
cpu lock 관련된 것인것 같은데 이런.. 산으로 가는 분위기다.
보다 내부적으로 집중해 보자.
fnic 과 libfc 모듈이 주로 나온것으로 보아, 이쪽으로 살펴봐야 할 것 같다.
crash7latest> bt
PID: 9647 TASK: ffff88034a9ed400 CPU: 16 COMMAND: "kworker/u40:2"
#0 [ffff88000888f870] panic at ffffffff816c6809
#1 [ffff88000888f8f0] oops_end at ffffffff8101a79c
#2 [ffff88000888f920] no_context at ffffffff8106d7a1
#3 [ffff88000888f970] __bad_area_nosemaphore at ffffffff8106d99d
#4 [ffff88000888f9c0] bad_area_nosemaphore at ffffffff8106dab3
#5 [ffff88000888f9d0] __do_page_fault at ffffffff8106e028
#6 [ffff88000888fa40] do_page_fault at ffffffff8106e337
#7 [ffff88000888fa80] page_fault at ffffffff816cd758
[exception RIP: osq_lock+78]
RIP: ffffffff810cd83e RSP: ffff88000888fb38 RFLAGS: 00010286
RAX: ffffffff9891ea77 RBX: ffff88019891ea60 RCX: ffff8803560181c0
RDX: ffff8803560181d0 RSI: 00000000000181c0 RDI: ffff88019891ea80
RBP: ffff88000888fb58 R8: ffff88000888fb48 R9: 0000000000000001
R10: 0000000000007ff0 R11: 0000000000000001 R12: 0000000000000000
R13: ffff88019891ea60 R14: 0000000000000000 R15: ffff88019891ea78
ORIG_RAX: ffffffffffffffff CS: e030 SS: e02b
#8 [ffff88000888fb60] mutex_optimistic_spin at ffffffff810cd235
#9 [ffff88000888fbf0] __mutex_lock_slowpath at ffffffff816c928b
#10 [ffff88000888fc50] mutex_lock at ffffffff816c9393
#11 [ffff88000888fc70] fc_disc_gpn_id_resp at ffffffffa028aef6 [libfc]
#12 [ffff88000888fcb0] fc_invoke_resp at ffffffffa028c0eb [libfc]
#13 [ffff88000888fd00] fc_exch_recv_seq_resp at ffffffffa028d87e [libfc]
#14 [ffff88000888fd50] fc_exch_recv at ffffffffa028ed91 [libfc]
#15 [ffff88000888fd90] fnic_handle_frame at ffffffffa02c9b1e [fnic]
#16 [ffff88000888fde0] process_one_work at ffffffff8109f07e
#17 [ffff88000888fe40] worker_thread at ffffffff8109f500
#18 [ffff88000888fec0] kthread at ffffffff810a46de
#19 [ffff88000888ff50] ret_from_fork at ffffffff816cbb62
libfc check :
crash7latest> dis -lr ffffffffa028aef6
0xffffffffa028aec0 <fc_disc_gpn_id_resp>: push %rbp
0xffffffffa028aec1 <fc_disc_gpn_id_resp+1>: mov %rsp,%rbp
0xffffffffa028aec4 <fc_disc_gpn_id_resp+4>: sub $0x30,%rsp
0xffffffffa028aec8 <fc_disc_gpn_id_resp+8>: mov %rbx,-0x28(%rbp)
0xffffffffa028aecc <fc_disc_gpn_id_resp+12>: mov %r12,-0x20(%rbp)
0xffffffffa028aed0 <fc_disc_gpn_id_resp+16>: mov %r13,-0x18(%rbp)
0xffffffffa028aed4 <fc_disc_gpn_id_resp+20>: mov %r14,-0x10(%rbp)
0xffffffffa028aed8 <fc_disc_gpn_id_resp+24>: mov %r15,-0x8(%rbp)
0xffffffffa028aedc <fc_disc_gpn_id_resp+28>: nopl 0x0(%rax,%rax,1)
0xffffffffa028aee1 <fc_disc_gpn_id_resp+33>: mov (%rdx),%rbx
0xffffffffa028aee4 <fc_disc_gpn_id_resp+36>: mov %rsi,%r15
0xffffffffa028aee7 <fc_disc_gpn_id_resp+39>: mov %rdx,%r12
0xffffffffa028aeea <fc_disc_gpn_id_resp+42>: lea 0x60(%rbx),%r13
0xffffffffa028aeee <fc_disc_gpn_id_resp+46>: mov %r13,%rdi
0xffffffffa028aef1 <fc_disc_gpn_id_resp+49>: callq 0xffffffff816c9370 <mutex_lock>
0xffffffffa028aef6 <fc_disc_gpn_id_resp+54>: cmp $0xfffffffffffffffe,%r15
Module 에 대한 소스파일을 확인하려고 -l 옵션을 주었으나 나오지않는다.
모듈이 로드되지않았기 때문이다. 확인 후 로드해준다.
crash7latest> mod
MODULE NAME SIZE OBJECT FILE
ffffffffa02a1100 libfc 114450 (not loaded) [CONFIG_KALLSYMS]
ffffffffa02b9260 libfcoe 56522 (not loaded) [CONFIG_KALLSYMS]
ffffffffa02db540 fnic 103995 (not loaded) [CONFIG_KALLSYMS]
crash7latest> mod -s libfc usr/lib/debug/lib/modules/4.1.12-61.1.17.el6uek.x86_64/kernel/drivers/scsi/libfc/libfc.ko.debug
MODULE NAME SIZE OBJECT FILE
ffffffffa02a1100 libfc 114450 usr/lib/debug/lib/modules/4.1.12-61.1.17.el6uek.x86_64/kernel/drivers/scsi/libfc/libfc.ko.debug
다시 disassembly 해보자 :
crash7latest> dis -lr ffffffffa028aef6
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_disc.c: 583
0xffffffffa028aec0 <fc_disc_gpn_id_resp>: push %rbp
0xffffffffa028aec1 <fc_disc_gpn_id_resp+1>: mov %rsp,%rbp
0xffffffffa028aec4 <fc_disc_gpn_id_resp+4>: sub $0x30,%rsp
0xffffffffa028aec8 <fc_disc_gpn_id_resp+8>: mov %rbx,-0x28(%rbp)
0xffffffffa028aecc <fc_disc_gpn_id_resp+12>: mov %r12,-0x20(%rbp)
0xffffffffa028aed0 <fc_disc_gpn_id_resp+16>: mov %r13,-0x18(%rbp)
0xffffffffa028aed4 <fc_disc_gpn_id_resp+20>: mov %r14,-0x10(%rbp)
0xffffffffa028aed8 <fc_disc_gpn_id_resp+24>: mov %r15,-0x8(%rbp)
0xffffffffa028aedc <fc_disc_gpn_id_resp+28>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_disc.c: 592
0xffffffffa028aee1 <fc_disc_gpn_id_resp+33>: mov (%rdx),%rbx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_disc.c: 583
0xffffffffa028aee4 <fc_disc_gpn_id_resp+36>: mov %rsi,%r15
0xffffffffa028aee7 <fc_disc_gpn_id_resp+39>: mov %rdx,%r12
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_disc.c: 595
0xffffffffa028aeea <fc_disc_gpn_id_resp+42>: lea 0x60(%rbx),%r13
0xffffffffa028aeee <fc_disc_gpn_id_resp+46>: mov %r13,%rdi
0xffffffffa028aef1 <fc_disc_gpn_id_resp+49>: callq 0xffffffff816c9370 <mutex_lock>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_disc.c: 596
0xffffffffa028aef6 <fc_disc_gpn_id_resp+54>: cmp $0xfffffffffffffffe,%r15
RBX 의 흐름을 소스를 통해 살펴보자.
0xffffffffa028aeea <fc_disc_gpn_id_resp+42>: lea 0x60(%rbx),%r13 <-- fc_lport
581 static void fc_disc_gpn_id_resp(struct fc_seq *sp, struct fc_frame *fp,
582 void *rdata_arg)
583 {
584 struct fc_rport_priv *rdata = rdata_arg;
585 struct fc_rport_priv *new_rdata;
586 struct fc_lport *lport;
587 struct fc_disc *disc;
588 struct fc_ct_hdr *cp;
589 struct fc_ns_gid_pn *pn;
590 u64 port_name;
591
592 lport = rdata->local_port;
593 disc = &lport->disc;
594
595 mutex_lock(&disc->disc_mutex);
아까 RBX 의 할당으로 mutex lock 이 호출된다.
구조체 fc_lport 와 fc_disc 를 살펴보자 :
crash7latest> struct -o fc_lport
struct fc_lport {
[0] struct Scsi_Host *host;
[8] struct list_head ema_list;
[24] struct fc_rport_priv *dns_rdata;
[32] struct fc_rport_priv *ms_rdata;
[40] struct fc_rport_priv *ptp_rdata;
[48] void *scsi_priv;
[56] struct fc_disc disc;
[288] struct list_head vports;
.......
SIZE: 1256
crash7latest> struct -o fc_disc
struct fc_disc {
[0] unsigned char retry_count;
[1] unsigned char pending;
[2] unsigned char requested;
[4] unsigned short seq_count;
[6] unsigned char buf_len;
[8] u16 disc_id;
[16] struct list_head rports;
[32] void *priv;
[40] struct mutex disc_mutex;
해당 오프셋 구성을 바탕으로 스택의 내용을 확인해 본다
스택을 확인하는 이유는 fc_lport 의 주소를 유추해 내기 위해서 이다 :
crash7latest> rd -s ffff88000888fb38 56
ffff88000888fb38: ffff880351803700 fc_disc_recv_rscn_req+155
ffff88000888fb48: ffff880344232830 ffff88000888fc10
ffff88000888fb58: ffff88000888fbe8 mutex_optimistic_spin+117
ffff88000888fb68: ffff88000888fb88 ffff88019891ea80
ffff88000888fb78: ffff88034a9ed400 fc_disc_recv_rscn_req+155
ffff88000888fb88: ffff88019891ea78 __kfree_skb+52
ffff88000888fb98: 0000000000000000 ffff88034e567100
ffff88000888fba8: ffff88000888fbd8 kfree_skb+79
ffff88000888fbb8: ffff88000888fc18 ffff88019891ea60
ffff88000888fbc8: ffff88034a9ed400 ffff88019891ea60
ffff88000888fbd8: ffff88034cced31c ffff88034e567b00
ffff88000888fbe8: ffff88000888fc48 __mutex_lock_slowpath+43
ffff88000888fbf8: ffff88034cced388 ffff88035600eb40
ffff88000888fc08: _raw_spin_unlock_irqrestore+32 ffff88034cced388
ffff88000888fc18: ffff88000888fc58 ffff88019891ea60
ffff88000888fc28: ffff88019891ea00 ffff88019891ea60
ffff88000888fc38: ffff88034cced31c ffff88034e567b00
ffff88000888fc48: ffff88000888fc68 mutex_lock+35
ffff88000888fc58: ffff88000888fc98 ffff88019891ea00
ffff88000888fc68: ffff88000888fca8 fc_disc_gpn_id_resp+54
ffff88000888fc78: ffff88000888fcb0 0000000000000001
ffff88000888fc88: ffff88034cced2c0 fc_disc_gpn_id_resp
ffff88000888fc98: ffff88034cced31c ffff88034e567b00
ffff88000888fca8: ffff88000888fcf8 fc_invoke_resp+139
ffff88000888fcb8: ffff88000888fcd8 ffff88019891ea00
ffff88000888fcc8: ffffe8ffff600000 ffff88034cced2c0
ffff88000888fcd8: ffff88034e567b00 ffff88034cced31c
ffff88000888fce8: 0000000000000000 0000000000180000
Mutex 와 fc_disc 의 스택사이의 offset 은 16 차이.
따라서 ffff88019891ea00 곧 fc_lport 를 나타낸다고 볼 수 있다.
crash7latest> struct fc_lport ffff88019891ea00
struct fc_lport {
host = 0xffff88019891ea00,
ema_list = {
next = 0x2f302f6e69616d6f,
prev = 0x2f646e656b636163
},
dns_rdata = 0x31352f352f646276,
ms_rdata = 0x657079742f323137,
ptp_rdata = 0x300250100,
scsi_priv = 0x800000000ff0005,
disc = {
retry_count = 0 '\000',
pending = 0 '\000',
requested = 0 '\000',
seq_count = 0,
buf_len = 0 '\000',
disc_id = 2000,
rports = {
next = 0x1,
prev = 0xffff88019891ea50
},
priv = 0xffff88019891ea50,
disc_mutex = {
count = {
counter = -1
},
wait_lock = {
{
rlock = {
raw_lock = {
{
head_tail = 0,
tickets = {
head = 0,
tail = 0
}
}
}
}
......
crash7latest> rd ffff88019891ea00 32
ffff88019891ea00: ffff88019891ea00 2f302f6e69616d6f ........omain/0/
ffff88019891ea10: 2f646e656b636163 31352f352f646276 cackend/vbd/5/51
ffff88019891ea20: 657079742f323137 0000000300250100 712/type..%.....
ffff88019891ea30: 0800000000ff0005 0000000000000000 ................
ffff88019891ea40: 00002710000007d0 0000000000000001 .....'..........
ffff88019891ea50: ffff88019891ea50 ffff88019891ea50 P.......P.......
ffff88019891ea60: 00000000ffffffff 0000000000000000 ................
ffff88019891ea70: 0000000fffffffe0 ffff88019891ea78 ........x.......
ffff88019891ea80: ffff880100000011 ffffffffa0295cf0 .........\).....
ffff88019891ea90: 0000000000000000 dead000000000200 ................
ffff88019891eaa0: 000000018e3fd3b6 ffff880355d8eb42 ..?.....B..U....
ffff88019891eab0: ffffffff8109df80 ffff88019891ea70 ........p.......
ffff88019891eac0: 0000000000000000 0000000000000000 ................
ffff88019891ead0: 0000000000000000 0000000000000000 ................
ffff88019891eae0: ffff880300000001 0000000000000006 ................
ffff88019891eaf0: 0000000000000000 0000000000000000 ................
crash7latest> vtop 0xffff88019891ea00
VIRTUAL PHYSICAL
ffff88019891ea00 19891ea00
PML4 DIRECTORY: ffffffff81a8a000
PAGE DIRECTORY: 4009a8b067 [machine]
PAGE DIRECTORY: 1a8b000
PUD: 1a8b030 => 4002cd3067 [machine]
PUD: 2cf6c5000
PMD: 2cf6c5620 => 4002c0e067 [machine]
PMD: 2cf600000
PTE: 2cf6008f0 => 801000215072c067 [machine]
PTE: 19891e067
PAGE: 215072c000 [machine]
PAGE: 19891e000
PTE PHYSICAL FLAGS
19891e067 19891e000 (PRESENT|RW|USER|ACCESSED|DIRTY)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0006624780 19891e000 0 0 0 2fffff80008000 tail
참고로 현재 이슈는 다른 이슈와 연결되어 발생한 이슈로써,
아직 해결되지는 않은 이슈이며, fc_lport 의 값을 찾기 위한 이유에 대해서 밝히지 않고 있었다.
물론 이어지는 덤프분석이 또 있을 것이며, 거기서 이유가 밝혀질 것이다.
여기서 중요한 부분은 위의 로그에서 빨간색으로 표현한 두줄의 스택로그이다.
현재 의심되는 부분은 RBX 로 들어가는 메모리가 Corruption 되거나 used-after-freed 현상인데,
두줄의 로그를 보았을 경우 used-after-freed 가 되지 않을까 싶다.
이번 경우에는 확실히 used-after-freed 현상이 가장 가깝기 때문이다.
즉, 해제된 것을 다시 사용하거나, Double free race condition 등이 발생하고 있다는 것이다.
물론... 소스코드상에서는 아직 오류를 찾을 수 없는 상황이다.
보다 자세하고 큰 그림은 다음 분석을 통해 함께 그려보도록 하겠다.
'Skills > mY Technutz' 카테고리의 다른 글
Kernel Dump Analysis #16 (0) | 2018.02.22 |
---|---|
Kernel Dump Analysis #15 (0) | 2018.02.19 |
Kernel Crash dump Analysis - #13 (0) | 2017.12.20 |
커널이 지원하는 기능을 확인하는 습관. (1) | 2017.02.06 |
Kexec/Kdump 의 제약사항에 대해서 (4) | 2016.01.14 |