간단히 요약하자면,
- Global Network Security
망분리 및 Cloud network routing 뿐만 아니라 데이타 자체에 대한 보안을 지원하는것.
사실 별로 관심도 없다.
어차피 데이타센터 레벨 이야기나 마찬가지기 때문에....
역시 늘 똑같은 레퍼토리로 지속적인 LDC (Local DataCentre) 또는
HDC (Hub DataCentre) 구축에 박차를 가할거라고 '공약' 만 "공략" 한다.
- Machine Learning/AI protection
ML 을 통해 AI 를 이용하여 고객의 장애에 자동적/즉각적으로 대처하고
장애 요소를 사전 파악하여 장애가 발생하기 전에 사전 처리도
자동화하여 서비스 무중단을 지속적으로 제공하는것.
더보기
*** '암웨이' 라는 다국적 다단계 방문팡매를 전문으로 하던 상사(기업) 에 대한
한국식 발음이자 무시하는 의미의 단어이다.
*** 다단계의 경우 '입소문' 이라는 기반을 바탕으로 제품의 우수성을 홍보하던 방식을 많이 썼는데,
사실상 거짓과 사실이 혼재하였고 검증하기 어려웠기 때문에 무시당하는 결과에 적잖은 영향을 미쳤다.
'Skills > Cloud Computing' 카테고리의 다른 글
2018 Oracle Open World - Cloud Generation 2 (0) | 2018.11.01 |
---|---|
VXbench 라는 툴을 아십니까~? (3) | 2012.02.21 |
Swiss army knife 에 비견되는 BtrFS - 2부 (0) | 2012.02.20 |
Swiss army knife 에 비견되는 BtrFS - 1부 (0) | 2012.02.17 |
클라우드 시스템에서의 DBMS 방식 (2) | 2011.03.30 |
그대들은 유칼립투스를 아는가? (0) | 2009.12.23 |
Trackback :
결국 같은 이슈였다.
내가 전에 올렸던 Kernel Dump Analysis #14 / #15 두개의 이슈..
결국 #15 와 #14 모두 동일한 rport reference count 관련 race condition bug 였다.
비록 양상이 다르긴 했지만 모두 Kref 를 감소시키며 0 으로 만드는 과정에서
Race condition 으로 인해 다른 메모리 값을 참조하게 되는 버그였다.
아래는 해당 버그에 대한 Patchwork (GIT) 링크 이다.
https://patchwork.kernel.org/patch/9132823/
대략 reference count 를 초기화 하는 과정에서 Mutex lock 을 걸어
Violation 을 방지하려고 했는데, 매 타임마다 Locking 을 하고 그 값을 대기하다보니
그 대기시간중에 다른 스레드가 해당 구조체에 엑세스 할 기회를 만들어주게 되어
결국 해당 값이 바뀌어 버리는 일이 생길 수 있다는 것이고, ( Race condition )
이 과정을 단순히 Remote port 에 대한 참조카운트를 추가하여
해당 값을 Lookup 할때 (디스크 제거시에만) 참조값을 줄이면 되기 때문에
디스크를 제거/검색 하는 과정에서 매번 디스크 뮤텍스를 걸 필요가 없게 수정한 이다.
개 빡치는것은, 내가 분명 두가지 모두 최종 Call trace 는 다르지만 모두 libfc 모듈에서
fc_lport / fc_rport 관련 구조체에 대한 Referencing 에서 나오는 이슈이므로
동일한 것같다고 이야기 했지만,
커널 개발팀에서 계속 "아니다, 다르다, 재현해서 다시 살펴봐야 한다" 라고 우기는 바람에
고객이 너무 빡쳐서 "그냥 다 때려쳐 씨밤, 너네 구려!" 이랬다는 것...
졸라 중요고객이라고 지랄들을 해대서,
어떻게 해야 할지 몰라 냅뒀더니 두달이 지난 지금에서야
We are unable to reproduce the crash in QA setup which is often hard to do
because it is not possible to replicate customer environment in the lab.
However, The two core dumps even though showed different symptoms, but the
underlying root cause is the same, the kref count being zero.
'응 그래, 재현이 졸라 안되서 미안하지만...그래도 동일 이슈맞는것 같아
우린 이미 앞선 패치를 통해 솔루션을 제공했어. 우린 잘했다고'이지랄을 떨고 있다.
개발자들 일안하냐? 내가 호구로 보이냐? ㅎㅎㅎㅎㅎ 아몰랑!
'Skills > mY Technutz' 카테고리의 다른 글
libfc: Update rport reference counting bug - 1368175 (0) | 2018.03.29 |
---|---|
Kernel Dump Analysis #16 (0) | 2018.02.22 |
Kernel Dump Analysis #15 (0) | 2018.02.19 |
Kernel Dump Analysis #14 (0) | 2018.02.13 |
Kernel Crash dump Analysis - #13 (0) | 2017.12.20 |
커널이 지원하는 기능을 확인하는 습관. (1) | 2017.02.06 |
Trackback :
댓글을 달아 주세요
와.. 대박... 일이 엄청나게 들어와..
공격적으로 오라클 리눅스를 장사하면 내가 쓰일만한 일이 있긴 하구나 하면서도
일이 이렇게 많아지면 개빡쳐서 아무것도 하기 싫어지는 현자타임같은게 온다는..
무튼, 요즘 핫하게 밀려들어오는 업무량 덕에 코어분석 일도 숭풍숭풍 막 쳐 들어오는..
이번에는 레드햇 커널 즉, Oracle Linux 에서 Cmompatibility 를 위해 제공하는
Redhat Compatible Kernel ( RHCK ) 와 관련된 커널 덤프 분석을 해주시겠다.
더보기
오늘은 보너스로 debug info 를 이용하는 것을 조금 더 알려주겠다.(사실 이런건 기본이잖아... 젖먹는거와도 같다고....)
$ rpm2cpio /share/linuxrpm/debuginfo-rpms/build-output-6-debuginfo/kernel-debuginfo-2.6.32-642.el6.x86_64.rpm |cpio -ivd
./usr/lib/debug
./usr/lib/debug/.build-id
./usr/lib/debug/.build-id/00
....
대강 풀어 준 뒤, 자 이제 들어가자 Crash 로!!
crash7latest> sys
KERNEL: /share/linuxrpm/vmlinux_repo/64/2.6.32-642.el6.x86_64/vmlinux
DUMPFILE: vmcore
CPUS: 16
DATE: Sun Feb 11 02:05:27 2018
UPTIME: 237 days, 23:45:59
LOAD AVERAGE: 5.72, 5.15, 3.66
TASKS: 3093
NODENAME: ****02
RELEASE: 2.6.32-642.el6.x86_64
VERSION: #1 SMP Tue May 10 12:00:56 PDT 2016
MACHINE: x86_64 (3495 Mhz)
MEMORY: 127.9 GB
PANIC: "[20568855.389826] kernel BUG at fs/fscache/operation.c:460!"
이젠 뭐 귀찮아서 그냥 sys 로만 간단히 본다
2월 11일 오전 2시경 리붓되었고, 사용중인 커널버젼은 2.6.32-642, OL 6.8 GA 이다.
당연하게 로그를 봐야겠지?
[5242336.027593] nfs: server 10.219.1.249 not responding, timed out
[5242339.028547] nfs: server 10.219.1.249 not responding, still trying
[5242339.029133] nfs: server 10.219.1.249 OK
[5251518.769273] nfs: server 10.219.1.249 not responding, still trying
[5251524.772508] nfs: server 10.219.1.249 OK
...
[20568855.389184] FS-Cache:
[20568855.389343] FS-Cache: Assertion failed
[20568855.389495] FS-Cache: 0 > 0 is false
[20568855.389674] ------------[ cut here ]------------
[20568855.389826] kernel BUG at fs/fscache/operation.c:460!
[20568855.389977] invalid opcode: 0000 [#1] SMP
[20568855.390134] last sysfs file: /sys/devices/system/cpu/online
[20568855.390284] CPU 11
[20568855.390290]
Modules linked in: iptable_filter ip_tables joydev nfs lockd
auth_rpcgss nfs_acl sunrpc symap_rh_ES_6_2_6_32_431_el6_x86_64(P)(U)
symev_rh_ES_6_2_6_32_431_el6_x86_64(U) cpufreq_ondemand freq_table
pcc_cpufreq cachefiles fscache(T) bonding ipv6 iTCO_wdt
iTCO_vendor_support microcode serio_raw power_meter acpi_ipmi ipmi_si
ipmi_msghandler sb_edac edac_core sg i2c_i801 i2c_core lpc_ich mfd_core
tg3 hpilo hpwdt ioatdma ixgbe dca ptp pps_core mdio ext4 jbd2 mbcache
sr_mod cdrom sd_mod crc_t10dif ahci xhci_hcd hpsa wmi dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[20568855.391789]
[20568855.391934]
Pid: 78, comm: events/11 Tainted: P -- ------------ T
2.6.32-642.el6.x86_64 #1 HP ProLiant DL360 Gen9/ProLiant DL360 Gen9
[20568855.392240] RIP: 0010:[<ffffffffa03797fd>] [<ffffffffa03797fd>] fscache_op_complete+0x13d/0x1b0 [fscache]
[20568855.392540] RSP: 0018:ffff881069673d50 EFLAGS: 00010282
[20568855.392688] RAX: 0000000000000031 RBX: ffff88104973c980 RCX: 00000000000079b4
[20568855.392975] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
[20568855.393265] RBP: ffff881069673d70 R08: 0000000000000000 R09: ffff8800000b99a0
[20568855.393554] R10: 0000000000000000 R11: 0000000000000029 R12: ffff8810452b0540
[20568855.393843] R13: ffff8810452b0540 R14: 0000000000000001 R15: 0000160000000000
[20568855.394140] FS: 0000000000000000(0000) GS:ffff8800283c0000(0000) knlGS:0000000000000000
[20568855.394429] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[20568855.394577] CR2: 00007f541c71ccd8 CR3: 000000078f346000 CR4: 00000000001407e0
[20568855.394868] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[20568855.395159] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[20568855.395446] Process events/11 (pid: 78, threadinfo ffff881069670000, task ffff88106965e040)
[20568855.395739] Stack:
[20568855.395883] ffff881069670000 ffff8807945bfa40 ffff8810452b0540 6db6db6db6db6db7
[20568855.396041] <d> ffff881069673e10 ffffffffa03935f5 ffff8800283d9d00 ffff88104973c980
[20568855.396345] <d> ffff881069673fd8 ffff880000000000 ffff8807945bfa80 ffff88104973c980
[20568855.396953] Call Trace:
[20568855.397102] [<ffffffffa03935f5>] cachefiles_read_copier+0x385/0x440 [cachefiles]
[20568855.397395] [<ffffffffa037a760>] ? fscache_retrieval_work+0x0/0x30 [fscache]
[20568855.397688] [<ffffffffa037a77b>] fscache_retrieval_work+0x1b/0x30 [fscache]
[20568855.397980] [<ffffffff8109fdc0>] worker_thread+0x170/0x2a0
[20568855.398130] [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
[20568855.398280] [<ffffffff8109fc50>] ? worker_thread+0x0/0x2a0
[20568855.398430] [<ffffffff810a662e>] kthread+0x9e/0xc0
[20568855.398580] [<ffffffff8100c28a>] child_rip+0xa/0x20
[20568855.398730] [<ffffffff810a6590>] ? kthread+0x0/0xc0
[20568855.398878] [<ffffffff8100c280>] ? child_rip+0x0/0x20
[20568855.399026]
Code: a0 31 c0 e8 d9 d6 1c e1 48 c7 c7 58 d2 37 a0 31 c0 e8 cb d6 1c e1
49 63 74 24 20 31 d2 48 c7 c7 78 d2 37 a0 31 c0 e8 b6 d6 1c e1
<0f> 0b eb fe 48 c7 c7 eb e9 37 a0 31 c0 e8 a4 d6 1c e1 48 c7 c7
[20568855.399656] RIP [<ffffffffa03797fd>] fscache_op_complete+0x13d/0x1b0 [fscache]
[20568855.399951] RSP <ffff881069673d50>
역시 로그만 보면 큰 의미가 없다.
일단, NFS 랑 fscache / cachefiles 요녀석들이 말썽의 원인같다.
이 fscache / cachefiles 는 커널 2.6.32.X 부터 지원되기 시작했으며,
Redhat/Oracle Linux 6.x 부터 Technical Preview 로 제공되는 커널 기능이다.
자세한건 LWN 기사 또는 커널문서를 참조하자. 찾아보면 많다. 좋다.
( https://lwn.net/Articles/176487/ )
다음으로 진행하기 전에 이 fscache 모듈들을 로드하자.
crash7latest> mod -s fscache usr/lib/debug/lib/modules/2.6.32-642.el6.x86_64/kernel/fs/fscache/fscache.ko.debug
MODULE NAME SIZE OBJECT FILE
ffffffffa037fce0 fscache 60813 usr/lib/debug/lib/modules/2.6.32-642.el6.x86_64/kernel/fs/fscache/fscache.ko.debug
crash7latest> mod -s fscache usr/lib/debug/lib/modules/2.6.32-642.el6.x86_64/kernel/fs/fscache/fscache.ko.debug
MODULE NAME SIZE OBJECT FILE
ffffffffa037fce0 fscache 60813 usr/lib/debug/lib/modules/2.6.32-642.el6.x86_64/kernel/fs/fscache/fscache.ko.debug
crash7latest> bt -l
PID: 78 TASK: ffff88106965e040 CPU: 11 COMMAND: "events/11"
#0 [ffff881069673a10] machine_kexec at ffffffff8103fdcb
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/arch/x86/kernel/machine_kexec_64.c: 336
#1 [ffff881069673a70] crash_kexec at ffffffff810d1fe2
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/kernel/kexec.c: 1110
#2 [ffff881069673b40] oops_end at ffffffff8154bc40
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/arch/x86/kernel/dumpstack.c: 245
#3 [ffff881069673b70] die at ffffffff8101102b
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/arch/x86/kernel/dumpstack.c: 325
#4 [ffff881069673ba0] do_trap at ffffffff8154b494
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/arch/x86/kernel/traps.c: 244
#5 [ffff881069673c00] do_invalid_op at ffffffff8100cd95
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/arch/x86/kernel/traps.c: 285
#6 [ffff881069673ca0] invalid_op at ffffffff8100c01b
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/arch/x86/kernel/entry_64.S: 1144
[exception RIP: fscache_op_complete+317]
RIP: ffffffffa03797fd RSP: ffff881069673d50 RFLAGS: 00010282
RAX: 0000000000000031 RBX: ffff88104973c980 RCX: 00000000000079b4
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff881069673d70 R8: 0000000000000000 R9: ffff8800000b99a0
R10: 0000000000000000 R11: 0000000000000029 R12: ffff8810452b0540
R13: ffff8810452b0540 R14: 0000000000000001 R15: 0000160000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 460
#7 [ffff881069673d78] cachefiles_read_copier at ffffffffa03935f5 [cachefiles]
#8 [ffff881069673e18] fscache_retrieval_work at ffffffffa037a77b [fscache]
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/page.c: 269
#9 [ffff881069673e38] worker_thread at ffffffff8109fdc0
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/kernel/workqueue.c: 316
#10 [ffff881069673ee8] kthread at ffffffff810a662e
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/kernel/kthread.c: 88
#11 [ffff881069673f48] kernel_thread at ffffffff8100c28a
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/arch/x86/kernel/entry_64.S: 1238
crash7latest>
다 귀찮다 그냥 쭉 본다. 돈줄거냐? 돈 필요없다. 그냥 설명 안한다. ㅎ
crash7latest> dis -rl ffffffffa03797fd
......
0xffffffffa03796ee <fscache_op_complete+46>: jne 0xffffffffa0379801 <fscache_op_complete+321>
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 460
0xffffffffa03796f4 <fscache_op_complete+52>: mov 0x20(%r12),%eax
0xffffffffa03796f9 <fscache_op_complete+57>: test %eax,%eax
0xffffffffa03796fb <fscache_op_complete+59>: jle 0xffffffffa03797cc <fscache_op_complete+268>
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 461
0xffffffffa0379701 <fscache_op_complete+65>: testb $0x20,0x40(%rdi)
0xffffffffa0379705 <fscache_op_complete+69>: jne 0xffffffffa037977f <fscache_op_complete+191>
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 466
0xffffffffa0379707 <fscache_op_complete+71>: lea 0x2c(%r12),%r13
0xffffffffa037970c <fscache_op_complete+76>: mov %r13,%rdi
0xffffffffa037970f <fscache_op_complete+79>: callq 0xffffffff8154a9b0 <_spin_lock>
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 468
0xffffffffa0379714 <fscache_op_complete+84>: test %r14b,%r14b
0xffffffffa0379717 <fscache_op_complete+87>: jne 0xffffffffa0379770 <fscache_op_complete+176>
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 469
0xffffffffa0379719 <fscache_op_complete+89>: movl $0x4,0x48(%rbx)
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 475
0xffffffffa0379720 <fscache_op_complete+96>: testb $0x20,0x40(%rbx)
0xffffffffa0379724 <fscache_op_complete+100>: jne 0xffffffffa0379760 <fscache_op_complete+160>
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 477
0xffffffffa0379726 <fscache_op_complete+102>: mov 0x20(%r12),%eax
0xffffffffa037972b <fscache_op_complete+107>: sub $0x1,%eax
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 478
0xffffffffa037972e <fscache_op_complete+110>: test %eax,%eax
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 477
0xffffffffa0379730 <fscache_op_complete+112>: mov %eax,0x20(%r12)
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 478
0xffffffffa0379735 <fscache_op_complete+117>: jne 0xffffffffa037973f <fscache_op_complete+127>
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 479
0xffffffffa0379737 <fscache_op_complete+119>: mov %r12,%rdi
0xffffffffa037973a <fscache_op_complete+122>: callq 0xffffffffa03795e0 <fscache_start_operations>
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/arch/x86/include/asm/spinlock.h: 174
0xffffffffa037973f <fscache_op_complete+127>: mov %r13,%rax
0xffffffffa0379742 <fscache_op_complete+130>: incw (%rax)
0xffffffffa0379745 <fscache_op_complete+133>: data32 xchg %ax,%ax
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 483
0xffffffffa0379748 <fscache_op_complete+136>: mov (%rsp),%rbx
0xffffffffa037974c <fscache_op_complete+140>: mov 0x8(%rsp),%r12
0xffffffffa0379751 <fscache_op_complete+145>: mov 0x10(%rsp),%r13
0xffffffffa0379756 <fscache_op_complete+150>: mov 0x18(%rsp),%r14
0xffffffffa037975b <fscache_op_complete+155>: leaveq
0xffffffffa037975c <fscache_op_complete+156>: retq
0xffffffffa037975d <fscache_op_complete+157>: nopl (%rax)
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 476
0xffffffffa0379760 <fscache_op_complete+160>: subl $0x1,0x24(%r12)
0xffffffffa0379766 <fscache_op_complete+166>: jmp 0xffffffffa0379726 <fscache_op_complete+102>
0xffffffffa0379768 <fscache_op_complete+168>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 471
0xffffffffa0379770 <fscache_op_complete+176>: mov %rbx,%rdi
0xffffffffa0379773 <fscache_op_complete+179>: callq *0x60(%rbx)
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 472
0xffffffffa0379776 <fscache_op_complete+182>: movl $0x5,0x48(%rbx)
0xffffffffa037977d <fscache_op_complete+189>: jmp 0xffffffffa0379720 <fscache_op_complete+96>
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 461
0xffffffffa037977f <fscache_op_complete+191>: cmpl $0x0,0x24(%r12)
0xffffffffa0379785 <fscache_op_complete+197>: jle 0xffffffffa0379837 <fscache_op_complete+375>
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 463
0xffffffffa037978b <fscache_op_complete+203>: cmp $0x1,%eax
0xffffffffa037978e <fscache_op_complete+206>: je 0xffffffffa0379707 <fscache_op_complete+71>
0xffffffffa0379794 <fscache_op_complete+212>: mov $0xffffffffa037e9eb,%rdi
0xffffffffa037979b <fscache_op_complete+219>: xor %eax,%eax
0xffffffffa037979d <fscache_op_complete+221>: callq 0xffffffff81546eb3 <printk>
0xffffffffa03797a2 <fscache_op_complete+226>: mov $0xffffffffa037d258,%rdi
0xffffffffa03797a9 <fscache_op_complete+233>: xor %eax,%eax
0xffffffffa03797ab <fscache_op_complete+235>: callq 0xffffffff81546eb3 <printk>
0xffffffffa03797b0 <fscache_op_complete+240>: movslq 0x20(%r12),%rsi
0xffffffffa03797b5 <fscache_op_complete+245>: mov $0x1,%edx
0xffffffffa03797ba <fscache_op_complete+250>: mov $0xffffffffa037d2a0,%rdi
0xffffffffa03797c1 <fscache_op_complete+257>: xor %eax,%eax
0xffffffffa03797c3 <fscache_op_complete+259>: callq 0xffffffff81546eb3 <printk>
0xffffffffa03797c8 <fscache_op_complete+264>: ud2
0xffffffffa03797ca <fscache_op_complete+266>: jmp 0xffffffffa03797ca <fscache_op_complete+266>
/usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/fs/fscache/operation.c: 460
0xffffffffa03797cc <fscache_op_complete+268>: mov $0xffffffffa037e9eb,%rdi
0xffffffffa03797d3 <fscache_op_complete+275>: xor %eax,%eax
0xffffffffa03797d5 <fscache_op_complete+277>: callq 0xffffffff81546eb3 <printk>
0xffffffffa03797da <fscache_op_complete+282>: mov $0xffffffffa037d258,%rdi
0xffffffffa03797e1 <fscache_op_complete+289>: xor %eax,%eax
0xffffffffa03797e3 <fscache_op_complete+291>: callq 0xffffffff81546eb3 <printk>
0xffffffffa03797e8 <fscache_op_complete+296>: movslq 0x20(%r12),%rsi
0xffffffffa03797ed <fscache_op_complete+301>: xor %edx,%edx
0xffffffffa03797ef <fscache_op_complete+303>: mov $0xffffffffa037d278,%rdi
0xffffffffa03797f6 <fscache_op_complete+310>: xor %eax,%eax
0xffffffffa03797f8 <fscache_op_complete+312>: callq 0xffffffff81546eb3 <printk>
0xffffffffa03797fd <fscache_op_complete+317>: ud2
crash7latest> evval 0x13d
crash7latest: command not found: evval
crash7latest> eval 0x13d
hexadecimal: 13d
decimal: 317
octal: 475
binary: 0000000000000000000000000000000000000000000000000000000100111101
crash7latest> x/s 0xffffffffa037d258
0xffffffffa037d258:
"<3>FS-Cache: Assertion failed\n"
<<
crash7latest> x/s 0xffffffffa037d278
0xffffffffa037d278:
"<3>FS-Cache: %lx > %lx is false\n"
<<
crash7latest> x/s 0xffffffffa037e9eb
0xffffffffa037e9eb:
"<3>FS-Cache: \n"
<<
crash7latest> x/s ffff8810452b0540
No symbol "ffff8810452b0540" in current context.
gdb: gdb request failed: x/s ffff8810452b0540
crash7latest> x/s 0xffff8810452b0540
0xffff8810452b0540: "\240\314\067\240\377\377\377\377@\314\067\240\377\377\377\377\tF\024"
crash7latest> rd ffff8810452b0540 32
ffff8810452b0540: ffffffffa037cca0 ffffffffa037cc40 ..7.....@.7.....
ffff8810452b0550: 0000000000144609 0000000000000001 .F..............
ffff8810452b0560: 0000000000000000 0006000600000001 ................
ffff8810452b0570: 00000005c99820b8 0000000000000060 . ......`.......
ffff8810452b0580: 000000000000006d 0000000000000010 m...............
ffff8810452b0590: 0000000000000038 ffff8810452b0718 8.........+E....
ffff8810452b05a0: ffff882065bf5678 0000000000000000 xV.e ...........
ffff8810452b05b0: ffff8810489b5b08 ffff882065bf5600 .[.H.....V.e ...
ffff8810452b05c0: ffff8810489b5af0 ffff880b7baa5b00 .Z.H.....[.{....
ffff8810452b05d0: 0000000000000000 0000000000000008 ................
ffff8810452b05e0: ffffffffa037c940 ffff8810452b05e8 @.7.......+E....
ffff8810452b05f0: ffff8810452b05e8 ffff8810452b05f8 ..+E......+E....
ffff8810452b0600: ffff8810452b05f8 ffff8810452b0608 ..+E......+E....
ffff8810452b0610: ffff8810452b0608 ffff8810452b0618 ..+E......+E....
ffff8810452b0620: ffff8810452b0618 0000000000000023 ..+E....#.......
ffff8810452b0630: 000000000002295e 0000000000000000 ^)..............
crash7latest> dis -rl ffffffffa037cca0
0xffffffffa037cca0
<fscache_osm_WAIT_FOR_CMD>: push %rdi
<<
자, Disassembly 로 까보니 printk 가 문제 코드 근처에 존재했을 뿐이고,
printk 가 뭔가 출력해 주는 짓을 할테니 그 곳에 무언가 스트링이나
값이 들어가있을 것이기 때문에, 각 변수들의 내용을 String 으로 출력해 보니,
대략의 코드내용을 얻을 수 있었다.
해당 내용은 위에 log 에서 cut here 바로 앞에 나온다 :
[20568855.389184] FS-Cache:
[20568855.389343] FS-Cache: Assertion failed
[20568855.389495] FS-Cache: 0 > 0 is false
딱 봐도 뭔가 개 구린 코드다. 개띠해라고 개 구리냐..
코드를 보자. 오라클 리눅스에서는 gitlab 을 운영하므로 웹에서 쉽게 볼 수 있는데,
레드햇은 참 나쁘다. 안보여준다. 계정 돈내고 가입하라고 한다.
열심히 찾아본다. 소스코드 받는다... 제길 귀찮다...
무튼 아래와 같은 오퍼레이션 코드와 문제의 Function 이 확인된다.
--- include/linux/fscache-cache.h
enum fscache_operation_state {
FSCACHE_OP_ST_BLANK, /* Op is not yet submitted */
<<<
FSCACHE_OP_ST_INITIALISED, /* Op is initialised */
FSCACHE_OP_ST_PENDING, /* Op is blocked from running */
FSCACHE_OP_ST_IN_PROGRESS, /* Op is in progress */
FSCACHE_OP_ST_COMPLETE, /* Op is complete */
FSCACHE_OP_ST_CANCELLED, /* Op has been cancelled */
FSCACHE_OP_ST_DEAD /* Op is now dead */
};
--- kernel/fs/fscache/operation.c
void fscache_op_complete(struct fscache_operation *op, bool cancelled)
{
struct fscache_object *object = op->object;
_enter("OBJ%x", object->debug_id);
ASSERTCMP(op->state, ==, FSCACHE_OP_ST_IN_PROGRESS);
ASSERTCMP(object->n_in_progress, >, 0);
ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags),
object->n_exclusive, >, 0);
ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags),
object->n_in_progress, ==, 1);
object->n_in_progress 가 0 보다 크면 무언가 하는 매크로가 존재한다.
--- fs/fscache/internal.h
#define ASSERTCMP(X, OP, Y) do {} while (0)
단순하다. 0이 될 때 까지 해당 값을 감소시키는 코드로 보여진다.
Disassem 으로 본 내용역시 460 에서 483 갔다가 476 등으로 가는등
루프백 점핑이 확인된다.
0 이라는 in_progress 값이 FSCACHE_OP_ST_BLANK 이며,
설명에 Op is not yet submitted 라고되어 있다.
Race condition 등이 발생하여 해당 값을 제대로 처리하지 못한 부분이
패닉으로 발생한 것으로 판단된다.
실재 맞는 프로그레스 상태인지 확인해 보도록 하자.
스택을 일일히 확인하는 방법 또는 Memory Dump 와 Structure Offset 을 이용해
직접 계산해 보는 방법이 있는데, 스택으로 들어있는 주소를 그냥 찾아보는게
훨씬 빠르고 간편하고 틀린 방법이 아니므로, 그냥 대입한다.
스택에 대한 부분은 아까 위에 로그나 BT 등에서 Stack 이라고 표시되거나 색처리 했다.
(훌륭한 학습 방법은 직접 따라해보거나 흐름을 잘 눈 크게 뜨고 살피는것이다.)
crash7latest> struct fscache_operation ffff881069670000
struct fscache_operation {
{
fast_work = {
data = {
counter = -131870907572160
},
entry = {
next = 0xffffffff81aabb80 <default_exec_domain>,
prev = 0x0
},
func = 0xb
},
slow_work = {
owner = 0xffff88106965e040,
flags = 18446744071590034304,
ops = 0x0,
link = {
next = 0xb,
prev = 0xffffffffffffffff
}
}
},
pend_link = {
next = 0xffffffff810917e0 <do_no_restart_syscall>,
prev = 0x0
},
object = 0x0,
flags = 0,
state = FSCACHE_OP_ST_BLANK,
usage = {
counter = 0
},
debug_id = 0,
processor = 0x0,
cancel = 0x0,
release = 0x57ac6e9d
}
crash7latest> struct fscache_operation ffffffffa037cca0
struct fscache_operation {
{
fast_work = {
data = {
counter = 5931036609520157015
},
entry = {
next = 0x444d435f,
prev = 0x0
},
func = 0x444d433f
},
slow_work = {
owner = 0x524f465f54494157,
flags = 1145914207,
ops = 0x0,
link = {
next = 0x444d433f,
prev = 0x0
}
}
},
pend_link = {
next = 0x8,
prev = 0xffffffffa037cde0 <fscache_osm_INVALIDATE_OBJECT>
},
object = 0x4,
flags = 18446744072102595872,
state = FSCACHE_OP_ST_INITIALISED,
usage = {
counter = 0
},
debug_id = 2688011360,
processor = 0x0,
cancel = 0x0,
release = 0x0
}
crash7latest> struct fscache_operation ffffffffa037cc40
struct fscache_operation {
{
fast_work = {
data = {
counter = 96
},
entry = {
next = 0xffffffffa037c960 <fscache_osm_KILL_OBJECT>,
prev = 0x0
},
func = 0x0
},
slow_work = {
owner = 0x60,
flags = 18446744072102594912,
ops = 0x0,
link = {
next = 0x0,
prev = 0x52415453504d554a
}
}
},
pend_link = {
next = 0x535045445f54,
prev = 0x0
},
object = 0x504d554a,
flags = 18446744072102578672,
state = FSCACHE_OP_ST_BLANK,
usage = {
counter = 0
},
debug_id = 0,
processor = 0x0,
cancel = 0x524f465f54494157,
release = 0x444d435f
}
Task 에 대한 구조체 값을 확인함으로 써 state 변화를 확인할 수 있다.
역시 in_progress 상태를 지속적으로 확인하고 변화시키는 과정에서 문제가 발생했다.
Technical Preview 이기 때문에 해당 부분에 대한 코드는
ASSERTCMP(op->state, ==, FSCACHE_OP_ST_DEAD);
로 수정되는 식으로 상황을 정리하는 코드가 추가된 것 같다.
( 수정된 코드 다운받거나 찾기 귀찮다! 바쁘다. 점심안에 써야한다 글을. )
물론 픽스는 해당 fscache 모듈을 사용하지 않도록 하거나,
( cachefilesd 데몬을 Disable 한다. )
ELSA-2017-0307 또는 ELSA-2017-0817 ( RHSA-2017-0817 대응) 에서 권고하는
kernel-2.6.32-696.el6 이후의 최신 버젼으로 업그레이드 하거나
6.8 GA kernel-2.6.32-642.el6 이하 커널로 다운그레이드 하면 된다고
Redhat 에서 이야기 하는 것 같다.
사실 개인생각으로는 커널 업그레이드를 해봤자 동일할것 같긴 해서
구지 필요하지 않다면 Cachefilesd 를 Off 하는게 더 좋을것 같다고 생각된다.
mount option 에서 fsc 가 생겼는데 이게 없으면 자동으로 fscache 가 활성화 된다고 하는 것 같다.
레드햇 이슈라 명확하게 솔루션 코드가 확인되지 않는 상태라 잘 모르겠으나
분석 방법이랑 워크어라운드 기입되 있음 이미 난 충분히 세계를 위해 기여한거시다...
'Skills > mY Technutz' 카테고리의 다른 글
libfc: Update rport reference counting bug - 1368175 (0) | 2018.03.29 |
---|---|
Kernel Dump Analysis #16 (0) | 2018.02.22 |
Kernel Dump Analysis #15 (0) | 2018.02.19 |
Kernel Dump Analysis #14 (0) | 2018.02.13 |
Kernel Crash dump Analysis - #13 (0) | 2017.12.20 |
커널이 지원하는 기능을 확인하는 습관. (1) | 2017.02.06 |
Trackback :
댓글을 달아 주세요
저번 14 에 이어진 내용이라고 볼 수 있다.
지난시간 왜 시스템 커널 분석에서 fc_stat, fc_lport 에 집중했는지 나오는 부분이다.
더보기
KERNEL:
/4.1.12-61.1.17.el6uek.x86_64/vmlinux
DUMPFILE:
2018-01-04-19:13:01/vmcore
CPUS: 20 [OFFLINE: 19]
DATE: Thu Jan 4 19:06:44 2018
UPTIME: 135 days, 07:36:31
LOAD AVERAGE: 1.68, 1.44, 1.03
TASKS: 802
NODENAME: *******
RELEASE: 4.1.12-61.1.17.el6uek.x86_64
VERSION: #2 SMP Mon Oct 31 18:17:37 PDT 2016
MACHINE: x86_64 (2494 Mhz)
MEMORY: 11.2 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at
00000000000003e8"
PID: 21090
COMMAND: "kworker/1:0"
TASK: ffff8803413ff000 [THREAD_INFO: ffff8801f9a9c000]
CPU: 1
STATE: TASK_RUNNING (PANIC)
crash64> log
[ 882.760140] sd 1:0:0:19: [sdbp] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 882.760153] sd 1:0:0:19: [sdbp] tag#0 Sense Key : Aborted Command
[current]
[ 882.760157] sd 1:0:0:19: [sdbp] tag#0 Add. Sense: Internal target failure
[ 882.760162] sd 1:0:0:19: [sdbp] tag#0 CDB: Read(10) 28 00 00 00 00 00 00
00 01 00
[ 882.760165] blk_update_request: I/O error, dev sdbp, sector 0
[ 882.760243] device-mapper: multipath: Failing path 68:48.
[ 882.760773] sd 1:0:0:19: [sdbp] tag#1 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 882.760783] sd 1:0:0:19: [sdbp] tag#1 Sense Key : Aborted Command
[current]
[ 882.760787] sd 1:0:0:19: [sdbp] tag#1 Add. Sense: Internal target failure
[ 882.760791] sd 1:0:0:19: [sdbp] tag#1 CDB: Read(10) 28 00 00 00 00 00 00
00 08 00
[ 882.760794] blk_update_request: I/O error, dev sdbp, sector 0
[ 882.805194] sd 1:0:1:19: [sdcn] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 882.805206] sd 1:0:1:19: [sdcn] tag#0 Sense Key : Aborted Command
[current]
[ 882.805211] sd 1:0:1:19: [sdcn] tag#0 Add. Sense: Internal target failure
[ 882.805215] sd 1:0:1:19: [sdcn] tag#0 CDB: Read(10) 28 00 00 00 00 00 00
00 08 00
[ 882.805217] blk_update_request: I/O error, dev sdcn, sector 0
[ 882.805294] device-mapper: multipath: Failing path 69:176.
[ 882.806164] sd 1:0:1:19: [sdcn] tag#1 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 882.806175] sd 1:0:1:19: [sdcn] tag#1 Sense Key : Aborted Command
[current]
[ 882.806180] sd 1:0:1:19: [sdcn] tag#1 Add. Sense: Internal target failure
[ 882.806183] sd 1:0:1:19: [sdcn] tag#1 CDB: Read(10) 28 00 00 00 00 00 00
00 01 00
[[[ Snip ]]]
[5444058.471741] buffer_io_error: 633 callbacks suppressed
[5444058.471746] Buffer I/O error on dev sdae, logical block 15, async page
read
[5444058.473596] Buffer I/O error on dev sdx, logical block 0, async page
read
[5444058.476522] Buffer I/O error on dev sdcj, logical block 7, async page
read
[5444058.477537] Buffer I/O error on dev sdca, logical block 15, async page
read
[5444058.481552] Buffer I/O error on dev sdv, logical block 1, async page
read
[5444058.481781] Buffer I/O error on dev sdar, logical block 7, async page
read
[5444058.488460] Buffer I/O error on dev sdat, logical block 7, async page
read
[5444058.492502] Buffer I/O error on dev sdci, logical block 0, async page
read
[5444058.502637] Buffer I/O error on dev sdcn, logical block 7, async page
read
[5444058.504540] Buffer I/O error on dev sdx, logical block 0, async page
read
[[[ Snip ]]]
[11067977.352682] device-mapper: multipath: Failing path 68:128.
[11067977.353391] device-mapper: multipath: Failing path 69:176.
[11067977.354347] device-mapper: multipath: Failing path 69:208.
[11067977.354476] device-mapper: multipath: Failing path 69:224.
[11067977.354589] device-mapper: multipath: Failing path 69:240.
[11147703.066973] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-7 by
loop7
[11234102.896196] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-12
by loop10
[11320502.882931] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-7 by
loop7
[11406902.721147] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-7 by
loop11
[11493302.710790] ocfs2: Unaligned AIO/DIO on inode 2657546 on device dm-7 by
loop8
[11579702.526382] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-7 by
loop7
[11666102.387746] ocfs2: Unaligned AIO/DIO on inode 2657545 on device dm-7 by
loop7
[11691591.650948] BUG: unable to handle kernel NULL pointer dereference at
00000000000003e8
[11691591.651178] IP: [<ffffffffa02955ba>] fc_rport_enter_prli+0x10a/0x270
[libfc] <<<
[11691591.651583] PGD 0
[11691591.651786] Oops: 0000 [#1] SMP
[11691591.651997] Modules linked in: tun nfsv3 nfs_acl rpcsec_gss_krb5
auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace ocfs2 xen_netback
xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd ocfs2_dlmfs
ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc
fcoe bridge 8021q mrp garp stp llc bonding iTCO_wdt iTCO_vendor_support
pcspkr sb_edac edac_core lpc_ich mfd_core sg ipmi_devintf ipmi_si
ipmi_msghandler ext4 jbd2 mbcache dm_round_robin sd_mod fnic libfcoe libfc
scsi_transport_fc enic wmi crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4
cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs
libiscsi scsi_transport_iscsi dm_multipath dm_mirror dm_region_hash dm_log
dm_mod [last unloaded: ip_tables]
[11691591.654474] CPU: 1 PID: 21090 Comm: kworker/1:0 Not tainted
4.1.12-61.1.17.el6uek.x86_64 #2
[11691591.654873] Hardware name: Cisco Systems Inc UCSB-B200-M4/UCSB-B200-M4,
BIOS B200M4.2.2.6a.0.080520150014 08/05/2015
[11691591.655282] Workqueue: events fc_rport_timeout [libfc]
[11691591.655492] task: ffff8803413ff000 ti: ffff8801f9a9c000 task.ti:
ffff8801f9a9c000
[11691591.655886] RIP: e030:[<ffffffffa02955ba>] [<ffffffffa02955ba>]
fc_rport_enter_prli+0x10a/0x270 [libfc] <<
[11691591.656288] RSP: e02b:ffff8801f9a9fd78 EFLAGS: 00010202
[11691591.656492] RAX: 0000000000000000 RBX: ffff880103e64600 RCX:
000000000000008c
[11691591.656884] RDX: 00000000807f1c0d RSI: 000000000000002c RDI:
ffff88011c83ac00
[11691591.657273] RBP: ffff8801f9a9fda8 R08: 0000000000000300 R09:
ffff88011c83ba00
[11691591.657661] R10: 0000000000007ff0 R11: 0000000000000001 R12:
ffff88011c83ac00
[11691591.662102] R13: ffff880103e67a00 R14: ffff880355c5b800 R15:
0000000000250100
[11691591.662487] FS: 0000000000000000(0000) GS:ffff880355c40000(0000)
knlGS:ffff880355c40000
[11691591.662870] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[11691591.663072] CR2: 00000000000003e8 CR3: 0000000001a8a000 CR4:
0000000000042660
[11691591.663452] Stack:
[11691591.663639] 0000000000000000 ffff880103e64670 ffff880103e64648
ffff880103e64600
[11691591.664035] ffff880355c5b800 0000000000000000 ffff8801f9a9fdd8
ffffffffa0295d88
[11691591.664432] ffff880216ce0f00 ffff880103e64670 ffff880175a45980
ffff880355c56ec0
[11691591.664830] Call Trace:
[11691591.665025] [<ffffffffa0295d88>] fc_rport_timeout+0x98/0xb0 [libfc]
[11691591.665232] [<ffffffff8109f07e>] process_one_work+0x14e/0x4b0
[11691591.665434] [<ffffffff8100c262>] ? check_events+0x12/0x20
[11691591.665638] [<ffffffff8109f500>] worker_thread+0x120/0x480
[11691591.665840] [<ffffffff816c6eb9>] ? __schedule+0x309/0x890
[11691591.666043] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[11691591.666246] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[11691591.666449] [<ffffffff810a46de>] kthread+0xce/0xf0
[11691591.666649] [<ffffffff810a4610>] ?
kthread_freezable_should_stop+0x70/0x70
[11691591.667033] [<ffffffff816cbb62>] ret_from_fork+0x42/0x70
[11691591.667231] [<ffffffff810a4610>] ?
kthread_freezable_should_stop+0x70/0x70
[11691591.667611] Code: 83 c1 18 48 83 c1 04 31 d2 be 10 00 00 00 48 89 df ff
10 49 8b 45 00 41 83 bc 24 80 00 00 00 17 44 8b 7b 28 48 8b 80 98 07 00 00
<44> 8b b0 e8 03 00 00 0f 86 d5 00 00 00 49 8b 84 24 d8 00 00 00
[11691591.668479] RIP [<ffffffffa02955ba>] fc_rport_enter_prli+0x10a/0x270
[libfc]
[11691591.668865] RSP <ffff8801f9a9fd78>
[11691591.669057] CR2: 00000000000003e8
[11691591.669559] ---[ end trace c5af00a9256fcc71 ]---
[11691591.770100] Kernel panic - not syncing: Fatal exception
[11691591.770380] Kernel Offset: disabled
crash64> bt -l
PID: 21090 TASK: ffff8803413ff000 CPU: 1 COMMAND: "kworker/1:0"
#0 [ffff8801f9a9fab0] panic at ffffffff816c6809
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/panic.c:
137
#1 [ffff8801f9a9fb30] oops_end at ffffffff8101a79c
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/kernel/dumps
tack.c: 247
#2 [ffff8801f9a9fb60] no_context at ffffffff8106d7a1
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/mm/fault.c:
737
#3 [ffff8801f9a9fbb0] __bad_area_nosemaphore at ffffffff8106d99d
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/mm/fault.c:
817
#4 [ffff8801f9a9fc00] bad_area_nosemaphore at ffffffff8106dab3
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/mm/fault.c:
825
#5 [ffff8801f9a9fc10] __do_page_fault at ffffffff8106e028
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/mm/fault.c:
1302
#6 [ffff8801f9a9fc80] do_page_fault at ffffffff8106e337
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/mm/fault.c:
1320
#7 [ffff8801f9a9fcc0] page_fault at ffffffff816cd758
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/kernel/entry
_64.S: 1262
[exception RIP: fc_rport_enter_prli+266]
<<<
RIP: ffffffffa02955ba RSP: ffff8801f9a9fd78 RFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff880103e64600 RCX: 000000000000008c
RDX: 00000000807f1c0d RSI: 000000000000002c RDI: ffff88011c83ac00
RBP: ffff8801f9a9fda8 R8: 0000000000000300 R9: ffff88011c83ba00
R10: 0000000000007ff0 R11: 0000000000000001 R12: ffff88011c83ac00
R13: ffff880103e67a00 R14: ffff880355c5b800 R15: 0000000000250100
ORIG_RAX: ffffffffffffffff CS: e030 SS: e02b
#8 [ffff8801f9a9fdb0] fc_rport_timeout at ffffffffa0295d88 [libfc]
#9 [ffff8801f9a9fde0] process_one_work at ffffffff8109f07e
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/trace/events/
workqueue.h: 116
#10 [ffff8801f9a9fe40] worker_thread at ffffffff8109f500
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/linux/list.h:
189
#11 [ffff8801f9a9fec0] kthread at ffffffff810a46de
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/kthread.c:
207
#12 [ffff8801f9a9ff50] ret_from_fork at ffffffff816cbb62
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/kernel/entry
_64.S: 640
RIP 를 확인 해보자.
crash64> dis -rl ffffffffa02955ba
0xffffffffa02954b0 <fc_rport_enter_prli>: push %rbp
0xffffffffa02954b1 <fc_rport_enter_prli+1>: mov %rsp,%rbp
0xffffffffa02954b4 <fc_rport_enter_prli+4>: sub $0x30,%rsp
0xffffffffa02954b8 <fc_rport_enter_prli+8>: mov %rbx,-0x28(%rbp)
0xffffffffa02954bc <fc_rport_enter_prli+12>: mov %r12,-0x20(%rbp)
0xffffffffa02954c0 <fc_rport_enter_prli+16>: mov %r13,-0x18(%rbp)
0xffffffffa02954c4 <fc_rport_enter_prli+20>: mov %r14,-0x10(%rbp)
0xffffffffa02954c8 <fc_rport_enter_prli+24>: mov %r15,-0x8(%rbp)
0xffffffffa02954cc <fc_rport_enter_prli+28>: nopl 0x0(%rax,%rax,1)
0xffffffffa02954d1 <fc_rport_enter_prli+33>: mov 0x28(%rdi),%edx
0xffffffffa02954d4 <fc_rport_enter_prli+36>: mov (%rdi),%r13
0xffffffffa02954d7 <fc_rport_enter_prli+39>: mov %rdi,%rbx
0xffffffffa02954da <fc_rport_enter_prli+42>: cmp $0xfffbff,%edx
0xffffffffa02954e0 <fc_rport_enter_prli+48>: ja 0xffffffffa0295680
<fc_rport_enter_prli+464>
0xffffffffa02954e6 <fc_rport_enter_prli+54>: testb $0x8,0xbe73(%rip)
# 0xffffffffa02a1360
0xffffffffa02954ed <fc_rport_enter_prli+61>: jne 0xffffffffa02956b2
<fc_rport_enter_prli+514>
0xffffffffa02954f3 <fc_rport_enter_prli+67>: cmpl $0x4,0x14(%rbx)
0xffffffffa02954f7 <fc_rport_enter_prli+71>: je 0xffffffffa0295500
<fc_rport_enter_prli+80>
0xffffffffa02954f9 <fc_rport_enter_prli+73>: movl $0x0,0x38(%rbx)
0xffffffffa0295500 <fc_rport_enter_prli+80>: movl $0x4,0x14(%rbx)
0xffffffffa0295507 <fc_rport_enter_prli+87>: mov $0x14,%edi
0xffffffffa029550c <fc_rport_enter_prli+92>: callq 0xffffffffa02906a0
<_fc_frame_alloc>
0xffffffffa0295511 <fc_rport_enter_prli+97>: test %rax,%rax
0xffffffffa0295514 <fc_rport_enter_prli+100>: mov %rax,%r12
0xffffffffa0295517 <fc_rport_enter_prli+103>: je 0xffffffffa0295690
<fc_rport_enter_prli+480>
0xffffffffa029551d <fc_rport_enter_prli+109>: xor %eax,%eax
0xffffffffa029551f <fc_rport_enter_prli+111>: cmpl $0x2b,0x80(%r12)
0xffffffffa0295528 <fc_rport_enter_prli+120>: jbe 0xffffffffa0295536
<fc_rport_enter_prli+134>
0xffffffffa029552a <fc_rport_enter_prli+122>: mov 0xd8(%r12),%rax
0xffffffffa0295532 <fc_rport_enter_prli+130>: add $0x18,%rax
0xffffffffa0295536 <fc_rport_enter_prli+134>: movq $0x0,(%rax)
0xffffffffa029553d <fc_rport_enter_prli+141>: movl $0x0,0x10(%rax)
0xffffffffa0295544 <fc_rport_enter_prli+148>: movq $0x0,0x8(%rax)
0xffffffffa029554c <fc_rport_enter_prli+156>: movb $0x20,(%rax)
0xffffffffa029554f <fc_rport_enter_prli+159>: movb $0x10,0x1(%rax)
0xffffffffa0295553 <fc_rport_enter_prli+163>: movw $0x1400,0x2(%rax)
0xffffffffa0295559 <fc_rport_enter_prli+169>: movb $0x8,0x4(%rax)
0xffffffffa029555d <fc_rport_enter_prli+173>: movb $0x20,0x6(%rax)
0xffffffffa0295561 <fc_rport_enter_prli+177>: mov 0x360(%r13),%edx
0xffffffffa0295568 <fc_rport_enter_prli+184>: bswap %edx
0xffffffffa029556a <fc_rport_enter_prli+186>: mov %edx,0x10(%rax)
0xffffffffa029556d <fc_rport_enter_prli+189>: mov 0xb46c(%rip),%rax
# 0xffffffffa02a09e0
0xffffffffa0295574 <fc_rport_enter_prli+196>: test %rax,%rax
0xffffffffa0295577 <fc_rport_enter_prli+199>: je 0xffffffffa02955a2
<fc_rport_enter_prli+242>
0xffffffffa0295579 <fc_rport_enter_prli+201>: xor %ecx,%ecx
0xffffffffa029557b <fc_rport_enter_prli+203>: cmpl $0x2b,0x80(%r12)
0xffffffffa0295584 <fc_rport_enter_prli+212>: jbe 0xffffffffa0295592
<fc_rport_enter_prli+226>
0xffffffffa0295586 <fc_rport_enter_prli+214>: mov 0xd8(%r12),%rcx
0xffffffffa029558e <fc_rport_enter_prli+222>: add $0x18,%rcx
0xffffffffa0295592 <fc_rport_enter_prli+226>: add $0x4,%rcx
0xffffffffa0295596 <fc_rport_enter_prli+230>: xor %edx,%edx
0xffffffffa0295598 <fc_rport_enter_prli+232>: mov $0x10,%esi
0xffffffffa029559d <fc_rport_enter_prli+237>: mov %rbx,%rdi
0xffffffffa02955a0 <fc_rport_enter_prli+240>: callq *(%rax)
0xffffffffa02955a2 <fc_rport_enter_prli+242>: mov 0x0(%r13),%rax
0xffffffffa02955a6 <fc_rport_enter_prli+246>: cmpl $0x17,0x80(%r12)
0xffffffffa02955af <fc_rport_enter_prli+255>: mov 0x28(%rbx),%r15d
0xffffffffa02955b3 <fc_rport_enter_prli+259>: mov 0x798(%rax),%rax
0xffffffffa02955ba <fc_rport_enter_prli+266>: mov 0x3e8(%rax),%r14d
<<<
crash64>
fc_rport_enter_prli+242 라인에 rax 를 r13 으로 호출하는 것이 있다.
r13 = ffff880103e67a00.
확인해 보도록 한다.
역시 소스코드의 라인이 나타나지 않는다. 모듈을 로드하자.
crash7latest> mod -s libfc usr/lib/debug/lib/modules/4.1.12-61.1.17.el6uek.x86_64/kernel/drivers/scsi/libfc/libfc.ko.debug
MODULE NAME SIZE OBJECT FILE
ffffffffa02a1100 libfc 114450 usr/lib/debug/lib/modules/4.1.12-61.1.17.el6uek.x86_64/kernel/drivers/scsi/libfc/libfc.ko.debug
crash7latest> dis -rl ffffffffa02955ba
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1090
0xffffffffa02954b0 <fc_rport_enter_prli>: push %rbp
0xffffffffa02954b1 <fc_rport_enter_prli+1>: mov %rsp,%rbp
0xffffffffa02954b4 <fc_rport_enter_prli+4>: sub $0x30,%rsp
0xffffffffa02954b8 <fc_rport_enter_prli+8>: mov %rbx,-0x28(%rbp)
0xffffffffa02954bc <fc_rport_enter_prli+12>: mov %r12,-0x20(%rbp)
0xffffffffa02954c0 <fc_rport_enter_prli+16>: mov %r13,-0x18(%rbp)
0xffffffffa02954c4 <fc_rport_enter_prli+20>: mov %r14,-0x10(%rbp)
0xffffffffa02954c8 <fc_rport_enter_prli+24>: mov %r15,-0x8(%rbp)
0xffffffffa02954cc <fc_rport_enter_prli+28>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1103
0xffffffffa02954d1 <fc_rport_enter_prli+33>: mov 0x28(%rdi),%edx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1091
0xffffffffa02954d4 <fc_rport_enter_prli+36>: mov (%rdi),%r13
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1090
0xffffffffa02954d7 <fc_rport_enter_prli+39>: mov %rdi,%rbx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1103
0xffffffffa02954da <fc_rport_enter_prli+42>: cmp $0xfffbff,%edx
0xffffffffa02954e0 <fc_rport_enter_prli+48>: ja 0xffffffffa0295680 <fc_rport_enter_prli+464>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1108
0xffffffffa02954e6 <fc_rport_enter_prli+54>: testb $0x8,0xbe73(%rip) # 0xffffffffa02a1360 <fc_debug_logging>
0xffffffffa02954ed <fc_rport_enter_prli+61>: jne 0xffffffffa02956b2 <fc_rport_enter_prli+514>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 231
0xffffffffa02954f3 <fc_rport_enter_prli+67>: cmpl $0x4,0x14(%rbx)
0xffffffffa02954f7 <fc_rport_enter_prli+71>: je 0xffffffffa0295500 <fc_rport_enter_prli+80>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 232
0xffffffffa02954f9 <fc_rport_enter_prli+73>: movl $0x0,0x38(%rbx)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 233
0xffffffffa0295500 <fc_rport_enter_prli+80>: movl $0x4,0x14(%rbx)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_frame.h: 139
0xffffffffa0295507 <fc_rport_enter_prli+87>: mov $0x14,%edi
0xffffffffa029550c <fc_rport_enter_prli+92>: callq 0xffffffffa02906a0 <_fc_frame_alloc>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1114
0xffffffffa0295511 <fc_rport_enter_prli+97>: test %rax,%rax
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_frame.h: 139
0xffffffffa0295514 <fc_rport_enter_prli+100>: mov %rax,%r12
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1114
0xffffffffa0295517 <fc_rport_enter_prli+103>: je 0xffffffffa0295690 <fc_rport_enter_prli+480>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_frame.h: 208
0xffffffffa029551d <fc_rport_enter_prli+109>: xor %eax,%eax
0xffffffffa029551f <fc_rport_enter_prli+111>: cmpl $0x2b,0x80(%r12)
0xffffffffa0295528 <fc_rport_enter_prli+120>: jbe 0xffffffffa0295536 <fc_rport_enter_prli+134>
0xffffffffa029552a <fc_rport_enter_prli+122>: mov 0xd8(%r12),%rax
0xffffffffa0295532 <fc_rport_enter_prli+130>: add $0x18,%rax
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 664
0xffffffffa0295536 <fc_rport_enter_prli+134>: movq $0x0,(%rax)
0xffffffffa029553d <fc_rport_enter_prli+141>: movl $0x0,0x10(%rax)
0xffffffffa0295544 <fc_rport_enter_prli+148>: movq $0x0,0x8(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 665
0xffffffffa029554c <fc_rport_enter_prli+156>: movb $0x20,(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 666
0xffffffffa029554f <fc_rport_enter_prli+159>: movb $0x10,0x1(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 667
0xffffffffa0295553 <fc_rport_enter_prli+163>: movw $0x1400,0x2(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 668
0xffffffffa0295559 <fc_rport_enter_prli+169>: movb $0x8,0x4(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 669
0xffffffffa029555d <fc_rport_enter_prli+173>: movb $0x20,0x6(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/uapi/linux/swab.h: 60
0xffffffffa0295561 <fc_rport_enter_prli+177>: mov 0x360(%r13),%edx
0xffffffffa0295568 <fc_rport_enter_prli+184>: bswap %edx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_encode.h: 670
0xffffffffa029556a <fc_rport_enter_prli+186>: mov %edx,0x10(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1121
0xffffffffa029556d <fc_rport_enter_prli+189>: mov 0xb46c(%rip),%rax # 0xffffffffa02a09e0 <fc_passive_prov+64>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1122
0xffffffffa0295574 <fc_rport_enter_prli+196>: test %rax,%rax
0xffffffffa0295577 <fc_rport_enter_prli+199>: je 0xffffffffa02955a2 <fc_rport_enter_prli+242>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_frame.h: 208
0xffffffffa0295579 <fc_rport_enter_prli+201>: xor %ecx,%ecx
0xffffffffa029557b <fc_rport_enter_prli+203>: cmpl $0x2b,0x80(%r12)
0xffffffffa0295584 <fc_rport_enter_prli+212>: jbe 0xffffffffa0295592 <fc_rport_enter_prli+226>
0xffffffffa0295586 <fc_rport_enter_prli+214>: mov 0xd8(%r12),%rcx
0xffffffffa029558e <fc_rport_enter_prli+222>: add $0x18,%rcx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1124
0xffffffffa0295592 <fc_rport_enter_prli+226>: add $0x4,%rcx
0xffffffffa0295596 <fc_rport_enter_prli+230>: xor %edx,%edx
0xffffffffa0295598 <fc_rport_enter_prli+232>: mov $0x10,%esi
0xffffffffa029559d <fc_rport_enter_prli+237>: mov %rbx,%rdi
0xffffffffa02955a0 <fc_rport_enter_prli+240>: callq *(%rax)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1127
0xffffffffa02955a2 <fc_rport_enter_prli+242>: mov 0x0(%r13),%rax
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/scsi/fc_frame.h: 173 >> lport->host
0xffffffffa02955a6 <fc_rport_enter_prli+246>: cmpl $0x17,0x80(%r12)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_rport.c: 1127 <<
0xffffffffa02955af <fc_rport_enter_prli+255>: mov 0x28(%rbx),%r15d
0xffffffffa02955b3 <fc_rport_enter_prli+259>: mov 0x798(%rax),%rax >> scsi_host->host_data
0xffffffffa02955ba <fc_rport_enter_prli+266>: mov 0x3e8(%rax),%r14d
crash7latest> >> fc_port_attr->port_id
패닉은 fc_rport_enter_prli+266 에서 발생했다.
해당 어셈블리 내용은 아래 코드와 매치된다.
1089 static void fc_rport_enter_prli(struct fc_rport_priv *rdata)
1090 {
1091 struct fc_lport *lport = rdata->local_port;
1092 struct {
1093 struct fc_els_prli prli;
1094 struct fc_els_spp spp;
1095 } *pp;
1096 struct fc_frame *fp;
1097 struct fc4_prov *prov;
1098
1099 /*
1100 * If the rport is one of the well known addresses
1101 * we skip PRLI and RTV and go straight to READY.
1102 */
1103 if (rdata->ids.port_id >= FC_FID_DOM_MGR) {
1104 fc_rport_enter_ready(rdata);
1105 return;
1106 }
1107
1108 FC_RPORT_DBG(rdata, "Port entered PRLI state from %s state\n",
1109 fc_rport_state(rdata));
1110
1111 fc_rport_state_enter(rdata, RPORT_ST_PRLI);
1112
1113 fp = fc_frame_alloc(lport, sizeof(*pp));
1114 if (!fp) {
1115 fc_rport_error_retry(rdata, fp);
1116 return;
1117 }
1118
1119 fc_prli_fill(lport, fp);
1120
1121 prov = fc_passive_prov[FC_TYPE_FCP];
1122 if (prov) {
1123 pp = fc_frame_payload_get(fp, sizeof(*pp));
1124 prov->prli(rdata, sizeof(pp->spp), NULL, &pp->spp);
1125 }
1126
1127 fc_fill_fc_hdr(fp, FC_RCTL_ELS_REQ, rdata->ids.port_id,
1128 fc_host_port_id(lport->host), FC_TYPE_ELS,
1129 FC_FC_FIRST_SEQ | FC_FC_END_SEQ | FC_FC_SEQ_INIT, 0); <<
1130
1131 if (!lport->tt.exch_seq_send(lport, fp, fc_rport_prli_resp,
1132 NULL, rdata, 2 * lport->r_a_tov))
1133 fc_rport_error_retry(rdata, NULL);
1134 else
1135 kref_get(&rdata->kref);
1136 }
자, 이전에 본 14번 내용과 비슷하게 fc 관련 header 를 가져와 채우는 부분이 문제가 되고 있다.
r13 의 내용을 확인해 보자. fc_port 데이타를 가리키고 있기 때문이다.
주소는 0xffff880103e67a00 이지만 그보다 16 더 앞부터 총 32비트의 내용까지 살펴본다.
crash7latest> dis -l 0xffff880103e679f0 32
dis: WARNING: ffff880103e679f0: no associated kernel symbol found
0xffff880103e679f0: add %al,(%rax)
0xffff880103e679f2: add %al,(%rax)
0xffff880103e679f4: add %al,(%rax)
0xffff880103e679f6: add %al,(%rax)
0xffff880103e679f8: add %al,(%rax)
0xffff880103e679fa: add %al,(%rax)
0xffff880103e679fc: add %al,(%rax)
0xffff880103e679fe: add %al,(%rax)
0xffff880103e67a00: add %ah,-0x1a(%rax)
0xffff880103e67a03: add (%rcx),%eax
0xffff880103e67a05: mov %bh,%bh
0xffff880103e67a07: incl 0x61(%rdx)
0xffff880103e67a0a: jae 0xffff880103e67a71
0xffff880103e67a0c: rex.W push %rsp
0xffff880103e67a0e: push %rsp
0xffff880103e67a0f: push %rax
0xffff880103e67a10: (bad)
0xffff880103e67a11: xor %ch,(%rsi)
0xffff880103e67a13: xor (%rax),%esp
0xffff880103e67a15: push %rax
0xffff880103e67a16: jns 0xffff880103e67a8c
0xffff880103e67a18: pushq $0x322f6e6f
0xffff880103e67a1d: cs
0xffff880103e67a1e: ss
0xffff880103e67a1f: cs
0xffff880103e67a20: ss
0xffff880103e67a21: or $0x3030300a,%eax
0xffff880103e67a26: xor %dh,(%rax)
0xffff880103e67a28: (bad)
0xffff880103e67a29: fs
0xffff880103e67a2a: gs
0xffff880103e67a2b: jbe 0xffff880103e67a96
crash7latest>
실재로 해당 내용은 bad address 를 표시하고 있다.
crash7latest> struct fc_lport ffff880103e67a00
struct fc_lport {
host = 0xffff880103e66000,
ema_list = {
next = 0x5054544865736142,
prev = 0x74795020332e302f
},
dns_rdata = 0x2e362e322f6e6f68,
ms_rdata = 0x30303030300a0d36,
ptp_rdata = 0x2f6563697665642f,
scsi_priv = 0x3137006973637376,
.............
crash7latest> rd ffff880103e679f0 32
ffff880103e679f0: 0000000000000000 0000000000000000 ................
ffff880103e67a00: ffff880103e66000 5054544865736142 .`......BaseHTTP
ffff880103e67a10: 74795020332e302f 2e362e322f6e6f68 /0.3 Python/2.6.
ffff880103e67a20: 30303030300a0d36 2f6563697665642f 6..00000/device/
ffff880103e67a30: 3137006973637376 6e656b6361622f32 vscsi.712/backen
ffff880103e67a40: 21005b0064692d64 00450008729c64f6 d-id.[.!.d.r..E.
ffff880103e67a50: 0640004018a37600 0a0a29240a0a4b3b .v..@.@.;K..$)..
ffff880103e67a60: 51c01bc4f105e223 1880f57adb746b52 #......QRkt.z...
ffff880103e67a70: 01010000875c4a01 39310cfb80b80a08 .J\...........19
ffff880103e67a80: 00060000420058b4 fe8e020400000000 .X.B............
ffff880103e67a90: 020100007b050200 0000000000000300 ...{............
ffff880103e67aa0: 0000000000000000 302d41524f190101 ...........ORA-0
ffff880103e67ab0: 6f6e203a33303431 6f66206174616420 1403: no data fo
ffff880103e67ac0: 0000000000000001 0000000000000000 ................
ffff880103e67ad0: 0000000000000000 0000000000000000 ................
ffff880103e67ae0: 0000000600000001 ffffc90006064528 ........(E......
해당 값에는 fc_lport 의 정보라기 보기 어려운 다른 값이 들어가 있음을 확인할 수 있다.
흥미롭게도 이번과 저번 모두 fc_lport 의 주소가 물리적으로 동일하다.
crash7latest> vtop 0xffff88019891ea00
VIRTUAL PHYSICAL
ffff88019891ea00 19891ea00
PML4 DIRECTORY: ffffffff81a8a000
PAGE DIRECTORY: 4009a8b067 [machine]
PAGE DIRECTORY: 1a8b000
PUD: 1a8b030 => 4002cd3067 [machine]
PUD: 2cf6c5000
PMD: 2cf6c5620 => 4002c0e067 [machine]
PMD: 2cf600000
PTE: 2cf6008f0 => 801000215072c067 [machine]
PTE: 19891e067
PAGE: 215072c000 [machine]
PAGE: 19891e000
PTE PHYSICAL FLAGS
19891e067 19891e000 (PRESENT|RW|USER|ACCESSED|DIRTY)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0006624780 19891e000 0 2 0 2fffff80000000
이전 게시글에서와 동일하게 ffff880103e67a00 를 참조하는 부분에서 문제가 발생하였다.
따라서 fc_lport 에 관련되어 메모리를 사용함에 있어, 이미 해제된 메모리를 재참조하거나,
메모리 corruption 조건이 발생되고 있다는 것을 의미한다.
역시 저번 게시글과 같이 해당 코드에 논리적 오류나 Race condition 발생 여부,
Used-after-freed 현상에 대해서 더 살펴봐야 하는 상태이지만,
코드 상 특별히 문제될만한 부분은 발견되지 않았다는 점이 현 문제의 가장 큰 난관이다.
지금은 개발팀에서 in-house 로 동일 환경을 구성 후 재현테스트를 하고 있는 중이다.
이는 장비에 대한 특성 ( 펌웨어 문제 또는 하드웨어 관련 된 정보의 괴리 ),
즉 하드웨어 문제를 배제할 수 없는 의심스러운 부분이 있다는 것을 의미한다.
'Skills > mY Technutz' 카테고리의 다른 글
libfc: Update rport reference counting bug - 1368175 (0) | 2018.03.29 |
---|---|
Kernel Dump Analysis #16 (0) | 2018.02.22 |
Kernel Dump Analysis #15 (0) | 2018.02.19 |
Kernel Dump Analysis #14 (0) | 2018.02.13 |
Kernel Crash dump Analysis - #13 (0) | 2017.12.20 |
커널이 지원하는 기능을 확인하는 습관. (1) | 2017.02.06 |
Trackback :
댓글을 달아 주세요
이젠 뭐, 커널에 대한 덤프 분석이라기 보다는
CRASH tool 의 사용법과, 이슈에 따라서 분석을 진행하는 방법에 대한 Guide 가 되어가는 것 같다.
이전엔 주절주절 말을 많이한것 같았는데, 오늘은 주로 코드나 어셈의 흐름을 위주로 설명해 볼까 한다.
시스템이 갑작스럽게 코어와 함께 리붓되었단다.
더보기
KERNEL: /4.1.12-61.1.17.el6uek.x86_64/vmlinux
DUMPFILE: vmcore
CPUS: 20 [OFFLINE: 18]
DATE: Thu Feb 1 10:44:31 2018
UPTIME: 27 days, 15:00:47
LOAD AVERAGE: 0.68, 1.75, 1.43
TASKS: 721
NODENAME: ******
RELEASE: 4.1.12-61.1.17.el6uek.x86_64
VERSION: #2 SMP Mon Oct 31 18:17:37 PDT 2016
MACHINE: x86_64 (2494 Mhz)
MEMORY: 11.2 GB
PANIC: "BUG: unable to handle kernel paging request at fffffffc464d0838"
PID: 9647
COMMAND: "kworker/u40:2"
TASK: ffff88034a9ed400 [THREAD_INFO: ffff88000888c000]
CPU: 16
STATE: TASK_RUNNING (PANIC)
crash7latest> log
[[[ Snip ]]]
[2387223.097904] ------------[ cut here ]------------
[2387223.097934] WARNING: CPU: 6 PID: 9647 at include/linux/kref.h:47 fc_rport_enter_plogi+0x1cc/0x1e0 [libfc]()
[2387223.097936] Modules linked in: iptable_filter ip_tables nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace ocfs2 xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc fcoe bridge 8021q mrp garp stp llc bonding iTCO_wdt iTCO_vendor_support pcspkr sb_edac edac_core lpc_ich mfd_core sg ipmi_devintf ipmi_si ipmi_msghandler ext4 jbd2 mbcache dm_round_robin sd_mod fnic libfcoe libfc scsi_transport_fc enic wmi crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi dm_multipath dm_mirror dm_region_hash dm_log dm_mod
[2387223.098027] CPU: 6 PID: 9647 Comm: kworker/u40:2 Not tainted 4.1.12-61.1.17.el6uek.x86_64 #2
[2387223.098029] Hardware name: Cisco Systems Inc UCSB-B200-M4/UCSB-B200-M4, BIOS B200M4.2.2.6a.0.080520150014 08/05/2015
[2387223.098040] Workqueue: fc_rport_eq fc_rport_work [libfc]
[2387223.098043] 0000000000000000 ffff88000888fca8 ffffffff816c6a80 0000000000000000
[2387223.098048] 000000000000002f ffff88000888fce8 ffffffff810845e5 0000000000000000
[2387223.098060] ffff88019891ea00 ffff8803442327f8 ffff88019891ea48 ffff8803442327f8
[2387223.098064] Call Trace:
[2387223.098074] [<ffffffff816c6a80>] dump_stack+0x63/0x83
[2387223.098081] [<ffffffff810845e5>] warn_slowpath_common+0x95/0xe0
[2387223.098086] [<ffffffff8108464a>] warn_slowpath_null+0x1a/0x20
[2387223.098096] [<ffffffffa0295abc>] fc_rport_enter_plogi+0x1cc/0x1e0 [libfc]
[2387223.098110] [<ffffffffa0295b75>] fc_rport_enter_flogi+0xa5/0x130 [libfc]
[2387223.098117] [<ffffffff816c9386>] ? mutex_lock+0x16/0x40
[2387223.098125] [<ffffffffa0297a3d>] fc_rport_work+0x3cd/0x690 [libfc]
[2387223.098133] [<ffffffffa02c9b26>] ? fnic_handle_frame+0x76/0xf0 [fnic]
[2387223.098139] [<ffffffff8109f07e>] process_one_work+0x14e/0x4b0
[2387223.098147] [<ffffffff8109f500>] worker_thread+0x120/0x480
[2387223.098154] [<ffffffff816c6eb9>] ? __schedule+0x309/0x890
[2387223.098159] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[2387223.098163] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[2387223.098170] [<ffffffff810a46de>] kthread+0xce/0xf0
[2387223.098178] [<ffffffff810a4610>] ? kthread_freezable_should_stop+0x70/0x70
[2387223.098185] [<ffffffff816cbb62>] ret_from_fork+0x42/0x70
[2387223.098188] [<ffffffff810a4610>] ? kthread_freezable_should_stop+0x70/0x70
[2387223.098202] ---[ end trace b16b4e42f8133221 ]---
[[[ Snip ]]]
[2387224.712025] BUG: unable to handle kernel paging request at fffffffc464d0838
[2387224.712244] IP: [<ffffffff810cd83e>] osq_lock+0x4e/0x120
[2387224.712453] PGD 1a8d067 PUD 0
[2387224.712655] Oops: 0000 [#1] SMP
[2387224.712858] Modules linked in: iptable_filter ip_tables nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd sunrpc grace ocfs2 xen_netback xen_blkback xen_gntalloc xen_gntdev xen_evtchn xenfs xen_privcmd ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc fcoe bridge 8021q mrp garp stp llc bonding iTCO_wdt iTCO_vendor_support pcspkr sb_edac edac_core lpc_ich mfd_core sg ipmi_devintf ipmi_si ipmi_msghandler ext4 jbd2 mbcache dm_round_robin sd_mod fnic libfcoe libfc scsi_transport_fc enic wmi crc32c_intel be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi dm_multipath dm_mirror dm_region_hash dm_log dm_mod
[2387224.715204] CPU: 16 PID: 9647 Comm: kworker/u40:2 Tainted: G W 4.1.12-61.1.17.el6uek.x86_64 #2
[2387224.715598] Hardware name: Cisco Systems Inc UCSB-B200-M4/UCSB-B200-M4, BIOS B200M4.2.2.6a.0.080520150014 08/05/2015
[2387224.716002] Workqueue: fnic_event_wq fnic_handle_frame [fnic]
[2387224.716209] task: ffff88034a9ed400 ti: ffff88000888c000 task.ti: ffff88000888c000
[2387224.716596] RIP: e030:[<ffffffff810cd83e>] [<ffffffff810cd83e>] osq_lock+0x4e/0x120
[2387224.716988] RSP: e02b:ffff88000888fb38 EFLAGS: 00010286
[2387224.717191] RAX: ffffffff9891ea77 RBX: ffff88019891ea60 RCX: ffff8803560181c0
[2387224.717578] RDX: ffff8803560181d0 RSI: 00000000000181c0 RDI: ffff88019891ea80
[2387224.717963] RBP: ffff88000888fb58 R08: ffff88000888fb48 R09: 0000000000000001
[2387224.718349] R10: 0000000000007ff0 R11: 0000000000000001 R12: 0000000000000000
[2387224.718732] R13: ffff88019891ea60 R14: 0000000000000000 R15: ffff88019891ea78
[2387224.719121] FS: 0000000000000000(0000) GS:ffff880356000000(0000) knlGS:ffff880356000000
[2387224.719511] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[2387224.719714] CR2: fffffffc464d0838 CR3: 0000000341bbc000 CR4: 0000000000042660
[2387224.720105] Stack:
[2387224.720293] ffff880351803700 ffffffffa028b17b ffff880344232830 ffff88000888fc10
[2387224.720689] ffff88000888fbe8 ffffffff810cd235 ffff88000888fb88 ffff88019891ea80
[2387224.721086] ffff88034a9ed400 ffffffffa028b17b ffff88019891ea78 ffffffff815ded14
[2387224.721484] Call Trace:
[2387224.721681] [<ffffffffa028b17b>] ? fc_disc_recv_rscn_req+0x9b/0x3d0 [libfc]
[2387224.722061] [<ffffffff810cd235>] mutex_optimistic_spin+0x75/0x1f0
[2387224.722267] [<ffffffffa028b17b>] ? fc_disc_recv_rscn_req+0x9b/0x3d0 [libfc]
[2387224.722654] [<ffffffff815ded14>] ? __kfree_skb+0x34/0x90
[2387224.722854] [<ffffffff815de03f>] ? kfree_skb+0x4f/0xc0
[2387224.723057] [<ffffffff816c928b>] __mutex_lock_slowpath+0x2b/0x110
[2387224.723259] [<ffffffff816cb040>] ? _raw_spin_unlock_irqrestore+0x20/0x50
[2387224.723463] [<ffffffff816c9393>] mutex_lock+0x23/0x40
[2387224.723669] [<ffffffffa028aef6>] fc_disc_gpn_id_resp+0x36/0x220 [libfc]
[2387224.723877] [<ffffffffa028aec0>] ? fc_disc_start+0x60/0x60 [libfc]
[2387224.724083] [<ffffffffa028c0eb>] fc_invoke_resp+0x8b/0xf0 [libfc]
[2387224.724292] [<ffffffffa028d87e>] fc_exch_recv_seq_resp+0x18e/0x290 [libfc]
[2387224.724501] [<ffffffffa028ed91>] fc_exch_recv+0x231/0x290 [libfc]
[2387224.724705] [<ffffffffa02c9b1e>] fnic_handle_frame+0x6e/0xf0 [fnic]
[2387224.724909] [<ffffffff8109f07e>] process_one_work+0x14e/0x4b0
[2387224.725110] [<ffffffff8109f500>] worker_thread+0x120/0x480
[2387224.725311] [<ffffffff816c6eb9>] ? __schedule+0x309/0x890
[2387224.725510] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[2387224.725710] [<ffffffff8109f3e0>] ? process_one_work+0x4b0/0x4b0
[2387224.725912] [<ffffffff810a46de>] kthread+0xce/0xf0
[2387224.726110] [<ffffffff810a4610>] ? kthread_freezable_should_stop+0x70/0x70
[2387224.726316] [<ffffffff816cbb62>] ret_from_fork+0x42/0x70
[2387224.726518] [<ffffffff810a4610>] ? kthread_freezable_should_stop+0x70/0x70
RIP 체크를 해보자.
crash7latest> dis -rl osq_lock+0x4e
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 85
0xffffffff810cd7f0 <osq_lock>: push %rbp
0xffffffff810cd7f1 <osq_lock+1>: mov %rsp,%rbp
0xffffffff810cd7f4 <osq_lock+4>: sub $0x20,%rsp
0xffffffff810cd7f8 <osq_lock+8>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 86
0xffffffff810cd7fd <osq_lock+13>: mov $0x181c0,%rsi
0xffffffff810cd804 <osq_lock+20>: mov %rsi,%rcx
0xffffffff810cd807 <osq_lock+23>: add %gs:0x7ef3c919(%rip),%rcx # 0xa128
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 88
0xffffffff810cd80f <osq_lock+31>: mov %gs:0x7ef3c91a(%rip),%eax # 0xa130
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 21
0xffffffff810cd816 <osq_lock+38>: add $0x1,%eax
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 91
0xffffffff810cd819 <osq_lock+41>: movl $0x0,0x10(%rcx)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 92
0xffffffff810cd820 <osq_lock+48>: movq $0x0,(%rcx)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 93
0xffffffff810cd827 <osq_lock+55>: mov %eax,0x14(%rcx)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/arch/x86/include/asm/atomic.h: 182
0xffffffff810cd82a <osq_lock+58>: xchg %eax,(%rdi)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 96
0xffffffff810cd82c <osq_lock+60>: test %eax,%eax
0xffffffff810cd82e <osq_lock+62>: je 0xffffffff810cd88a <osq_lock+154>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 28
0xffffffff810cd830 <osq_lock+64>: cltq
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 112
0xffffffff810cd832 <osq_lock+66>: lea 0x10(%rcx),%rdx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/include/linux/compiler.h: 204
0xffffffff810cd836 <osq_lock+70>: lea -0x10(%rbp),%r8
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/kernel/locking/osq_lock.c: 28
0xffffffff810cd83a <osq_lock+74>: sub $0x1,%rax
0xffffffff810cd83e <osq_lock+78>: add -0x7e424b80(,%rax,8),%rsi
crash7latest>
Check Source code :
24 static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
25 {
26 int cpu_nr = encoded_cpu_val - 1;
27
28 return per_cpu_ptr(&osq_node, cpu_nr);
29 }
cpu lock 관련된 것인것 같은데 이런.. 산으로 가는 분위기다.
보다 내부적으로 집중해 보자.
fnic 과 libfc 모듈이 주로 나온것으로 보아, 이쪽으로 살펴봐야 할 것 같다.
crash7latest> bt
PID: 9647 TASK: ffff88034a9ed400 CPU: 16 COMMAND: "kworker/u40:2"
#0 [ffff88000888f870] panic at ffffffff816c6809
#1 [ffff88000888f8f0] oops_end at ffffffff8101a79c
#2 [ffff88000888f920] no_context at ffffffff8106d7a1
#3 [ffff88000888f970] __bad_area_nosemaphore at ffffffff8106d99d
#4 [ffff88000888f9c0] bad_area_nosemaphore at ffffffff8106dab3
#5 [ffff88000888f9d0] __do_page_fault at ffffffff8106e028
#6 [ffff88000888fa40] do_page_fault at ffffffff8106e337
#7 [ffff88000888fa80] page_fault at ffffffff816cd758
[exception RIP: osq_lock+78]
RIP: ffffffff810cd83e RSP: ffff88000888fb38 RFLAGS: 00010286
RAX: ffffffff9891ea77 RBX: ffff88019891ea60 RCX: ffff8803560181c0
RDX: ffff8803560181d0 RSI: 00000000000181c0 RDI: ffff88019891ea80
RBP: ffff88000888fb58 R8: ffff88000888fb48 R9: 0000000000000001
R10: 0000000000007ff0 R11: 0000000000000001 R12: 0000000000000000
R13: ffff88019891ea60 R14: 0000000000000000 R15: ffff88019891ea78
ORIG_RAX: ffffffffffffffff CS: e030 SS: e02b
#8 [ffff88000888fb60] mutex_optimistic_spin at ffffffff810cd235
#9 [ffff88000888fbf0] __mutex_lock_slowpath at ffffffff816c928b
#10 [ffff88000888fc50] mutex_lock at ffffffff816c9393
#11 [ffff88000888fc70] fc_disc_gpn_id_resp at ffffffffa028aef6 [libfc]
#12 [ffff88000888fcb0] fc_invoke_resp at ffffffffa028c0eb [libfc]
#13 [ffff88000888fd00] fc_exch_recv_seq_resp at ffffffffa028d87e [libfc]
#14 [ffff88000888fd50] fc_exch_recv at ffffffffa028ed91 [libfc]
#15 [ffff88000888fd90] fnic_handle_frame at ffffffffa02c9b1e [fnic]
#16 [ffff88000888fde0] process_one_work at ffffffff8109f07e
#17 [ffff88000888fe40] worker_thread at ffffffff8109f500
#18 [ffff88000888fec0] kthread at ffffffff810a46de
#19 [ffff88000888ff50] ret_from_fork at ffffffff816cbb62
libfc check :
crash7latest> dis -lr ffffffffa028aef6
0xffffffffa028aec0 <fc_disc_gpn_id_resp>: push %rbp
0xffffffffa028aec1 <fc_disc_gpn_id_resp+1>: mov %rsp,%rbp
0xffffffffa028aec4 <fc_disc_gpn_id_resp+4>: sub $0x30,%rsp
0xffffffffa028aec8 <fc_disc_gpn_id_resp+8>: mov %rbx,-0x28(%rbp)
0xffffffffa028aecc <fc_disc_gpn_id_resp+12>: mov %r12,-0x20(%rbp)
0xffffffffa028aed0 <fc_disc_gpn_id_resp+16>: mov %r13,-0x18(%rbp)
0xffffffffa028aed4 <fc_disc_gpn_id_resp+20>: mov %r14,-0x10(%rbp)
0xffffffffa028aed8 <fc_disc_gpn_id_resp+24>: mov %r15,-0x8(%rbp)
0xffffffffa028aedc <fc_disc_gpn_id_resp+28>: nopl 0x0(%rax,%rax,1)
0xffffffffa028aee1 <fc_disc_gpn_id_resp+33>: mov (%rdx),%rbx
0xffffffffa028aee4 <fc_disc_gpn_id_resp+36>: mov %rsi,%r15
0xffffffffa028aee7 <fc_disc_gpn_id_resp+39>: mov %rdx,%r12
0xffffffffa028aeea <fc_disc_gpn_id_resp+42>: lea 0x60(%rbx),%r13
0xffffffffa028aeee <fc_disc_gpn_id_resp+46>: mov %r13,%rdi
0xffffffffa028aef1 <fc_disc_gpn_id_resp+49>: callq 0xffffffff816c9370 <mutex_lock>
0xffffffffa028aef6 <fc_disc_gpn_id_resp+54>: cmp $0xfffffffffffffffe,%r15
Module 에 대한 소스파일을 확인하려고 -l 옵션을 주었으나 나오지않는다.
모듈이 로드되지않았기 때문이다. 확인 후 로드해준다.
crash7latest> mod
MODULE NAME SIZE OBJECT FILE
ffffffffa02a1100 libfc 114450 (not loaded) [CONFIG_KALLSYMS]
ffffffffa02b9260 libfcoe 56522 (not loaded) [CONFIG_KALLSYMS]
ffffffffa02db540 fnic 103995 (not loaded) [CONFIG_KALLSYMS]
crash7latest> mod -s libfc usr/lib/debug/lib/modules/4.1.12-61.1.17.el6uek.x86_64/kernel/drivers/scsi/libfc/libfc.ko.debug
MODULE NAME SIZE OBJECT FILE
ffffffffa02a1100 libfc 114450 usr/lib/debug/lib/modules/4.1.12-61.1.17.el6uek.x86_64/kernel/drivers/scsi/libfc/libfc.ko.debug
다시 disassembly 해보자 :
crash7latest> dis -lr ffffffffa028aef6
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_disc.c: 583
0xffffffffa028aec0 <fc_disc_gpn_id_resp>: push %rbp
0xffffffffa028aec1 <fc_disc_gpn_id_resp+1>: mov %rsp,%rbp
0xffffffffa028aec4 <fc_disc_gpn_id_resp+4>: sub $0x30,%rsp
0xffffffffa028aec8 <fc_disc_gpn_id_resp+8>: mov %rbx,-0x28(%rbp)
0xffffffffa028aecc <fc_disc_gpn_id_resp+12>: mov %r12,-0x20(%rbp)
0xffffffffa028aed0 <fc_disc_gpn_id_resp+16>: mov %r13,-0x18(%rbp)
0xffffffffa028aed4 <fc_disc_gpn_id_resp+20>: mov %r14,-0x10(%rbp)
0xffffffffa028aed8 <fc_disc_gpn_id_resp+24>: mov %r15,-0x8(%rbp)
0xffffffffa028aedc <fc_disc_gpn_id_resp+28>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_disc.c: 592
0xffffffffa028aee1 <fc_disc_gpn_id_resp+33>: mov (%rdx),%rbx
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_disc.c: 583
0xffffffffa028aee4 <fc_disc_gpn_id_resp+36>: mov %rsi,%r15
0xffffffffa028aee7 <fc_disc_gpn_id_resp+39>: mov %rdx,%r12
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_disc.c: 595
0xffffffffa028aeea <fc_disc_gpn_id_resp+42>: lea 0x60(%rbx),%r13
0xffffffffa028aeee <fc_disc_gpn_id_resp+46>: mov %r13,%rdi
0xffffffffa028aef1 <fc_disc_gpn_id_resp+49>: callq 0xffffffff816c9370 <mutex_lock>
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.1.17.el6uek/drivers/scsi/libfc/fc_disc.c: 596
0xffffffffa028aef6 <fc_disc_gpn_id_resp+54>: cmp $0xfffffffffffffffe,%r15
RBX 의 흐름을 소스를 통해 살펴보자.
0xffffffffa028aeea <fc_disc_gpn_id_resp+42>: lea 0x60(%rbx),%r13 <-- fc_lport
581 static void fc_disc_gpn_id_resp(struct fc_seq *sp, struct fc_frame *fp,
582 void *rdata_arg)
583 {
584 struct fc_rport_priv *rdata = rdata_arg;
585 struct fc_rport_priv *new_rdata;
586 struct fc_lport *lport;
587 struct fc_disc *disc;
588 struct fc_ct_hdr *cp;
589 struct fc_ns_gid_pn *pn;
590 u64 port_name;
591
592 lport = rdata->local_port;
593 disc = &lport->disc;
594
595 mutex_lock(&disc->disc_mutex);
아까 RBX 의 할당으로 mutex lock 이 호출된다.
구조체 fc_lport 와 fc_disc 를 살펴보자 :
crash7latest> struct -o fc_lport
struct fc_lport {
[0] struct Scsi_Host *host;
[8] struct list_head ema_list;
[24] struct fc_rport_priv *dns_rdata;
[32] struct fc_rport_priv *ms_rdata;
[40] struct fc_rport_priv *ptp_rdata;
[48] void *scsi_priv;
[56] struct fc_disc disc;
[288] struct list_head vports;
.......
SIZE: 1256
crash7latest> struct -o fc_disc
struct fc_disc {
[0] unsigned char retry_count;
[1] unsigned char pending;
[2] unsigned char requested;
[4] unsigned short seq_count;
[6] unsigned char buf_len;
[8] u16 disc_id;
[16] struct list_head rports;
[32] void *priv;
[40] struct mutex disc_mutex;
해당 오프셋 구성을 바탕으로 스택의 내용을 확인해 본다
스택을 확인하는 이유는 fc_lport 의 주소를 유추해 내기 위해서 이다 :
crash7latest> rd -s ffff88000888fb38 56
ffff88000888fb38: ffff880351803700 fc_disc_recv_rscn_req+155
ffff88000888fb48: ffff880344232830 ffff88000888fc10
ffff88000888fb58: ffff88000888fbe8 mutex_optimistic_spin+117
ffff88000888fb68: ffff88000888fb88 ffff88019891ea80
ffff88000888fb78: ffff88034a9ed400 fc_disc_recv_rscn_req+155
ffff88000888fb88: ffff88019891ea78 __kfree_skb+52
ffff88000888fb98: 0000000000000000 ffff88034e567100
ffff88000888fba8: ffff88000888fbd8 kfree_skb+79
ffff88000888fbb8: ffff88000888fc18 ffff88019891ea60
ffff88000888fbc8: ffff88034a9ed400 ffff88019891ea60
ffff88000888fbd8: ffff88034cced31c ffff88034e567b00
ffff88000888fbe8: ffff88000888fc48 __mutex_lock_slowpath+43
ffff88000888fbf8: ffff88034cced388 ffff88035600eb40
ffff88000888fc08: _raw_spin_unlock_irqrestore+32 ffff88034cced388
ffff88000888fc18: ffff88000888fc58 ffff88019891ea60
ffff88000888fc28: ffff88019891ea00 ffff88019891ea60
ffff88000888fc38: ffff88034cced31c ffff88034e567b00
ffff88000888fc48: ffff88000888fc68 mutex_lock+35
ffff88000888fc58: ffff88000888fc98 ffff88019891ea00
ffff88000888fc68: ffff88000888fca8 fc_disc_gpn_id_resp+54
ffff88000888fc78: ffff88000888fcb0 0000000000000001
ffff88000888fc88: ffff88034cced2c0 fc_disc_gpn_id_resp
ffff88000888fc98: ffff88034cced31c ffff88034e567b00
ffff88000888fca8: ffff88000888fcf8 fc_invoke_resp+139
ffff88000888fcb8: ffff88000888fcd8 ffff88019891ea00
ffff88000888fcc8: ffffe8ffff600000 ffff88034cced2c0
ffff88000888fcd8: ffff88034e567b00 ffff88034cced31c
ffff88000888fce8: 0000000000000000 0000000000180000
Mutex 와 fc_disc 의 스택사이의 offset 은 16 차이.
따라서 ffff88019891ea00 곧 fc_lport 를 나타낸다고 볼 수 있다.
crash7latest> struct fc_lport ffff88019891ea00
struct fc_lport {
host = 0xffff88019891ea00,
ema_list = {
next = 0x2f302f6e69616d6f,
prev = 0x2f646e656b636163
},
dns_rdata = 0x31352f352f646276,
ms_rdata = 0x657079742f323137,
ptp_rdata = 0x300250100,
scsi_priv = 0x800000000ff0005,
disc = {
retry_count = 0 '\000',
pending = 0 '\000',
requested = 0 '\000',
seq_count = 0,
buf_len = 0 '\000',
disc_id = 2000,
rports = {
next = 0x1,
prev = 0xffff88019891ea50
},
priv = 0xffff88019891ea50,
disc_mutex = {
count = {
counter = -1
},
wait_lock = {
{
rlock = {
raw_lock = {
{
head_tail = 0,
tickets = {
head = 0,
tail = 0
}
}
}
}
......
crash7latest> rd ffff88019891ea00 32
ffff88019891ea00: ffff88019891ea00 2f302f6e69616d6f ........omain/0/
ffff88019891ea10: 2f646e656b636163 31352f352f646276 cackend/vbd/5/51
ffff88019891ea20: 657079742f323137 0000000300250100 712/type..%.....
ffff88019891ea30: 0800000000ff0005 0000000000000000 ................
ffff88019891ea40: 00002710000007d0 0000000000000001 .....'..........
ffff88019891ea50: ffff88019891ea50 ffff88019891ea50 P.......P.......
ffff88019891ea60: 00000000ffffffff 0000000000000000 ................
ffff88019891ea70: 0000000fffffffe0 ffff88019891ea78 ........x.......
ffff88019891ea80: ffff880100000011 ffffffffa0295cf0 .........\).....
ffff88019891ea90: 0000000000000000 dead000000000200 ................
ffff88019891eaa0: 000000018e3fd3b6 ffff880355d8eb42 ..?.....B..U....
ffff88019891eab0: ffffffff8109df80 ffff88019891ea70 ........p.......
ffff88019891eac0: 0000000000000000 0000000000000000 ................
ffff88019891ead0: 0000000000000000 0000000000000000 ................
ffff88019891eae0: ffff880300000001 0000000000000006 ................
ffff88019891eaf0: 0000000000000000 0000000000000000 ................
crash7latest> vtop 0xffff88019891ea00
VIRTUAL PHYSICAL
ffff88019891ea00 19891ea00
PML4 DIRECTORY: ffffffff81a8a000
PAGE DIRECTORY: 4009a8b067 [machine]
PAGE DIRECTORY: 1a8b000
PUD: 1a8b030 => 4002cd3067 [machine]
PUD: 2cf6c5000
PMD: 2cf6c5620 => 4002c0e067 [machine]
PMD: 2cf600000
PTE: 2cf6008f0 => 801000215072c067 [machine]
PTE: 19891e067
PAGE: 215072c000 [machine]
PAGE: 19891e000
PTE PHYSICAL FLAGS
19891e067 19891e000 (PRESENT|RW|USER|ACCESSED|DIRTY)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0006624780 19891e000 0 0 0 2fffff80008000 tail
참고로 현재 이슈는 다른 이슈와 연결되어 발생한 이슈로써,
아직 해결되지는 않은 이슈이며, fc_lport 의 값을 찾기 위한 이유에 대해서 밝히지 않고 있었다.
물론 이어지는 덤프분석이 또 있을 것이며, 거기서 이유가 밝혀질 것이다.
여기서 중요한 부분은 위의 로그에서 빨간색으로 표현한 두줄의 스택로그이다.
현재 의심되는 부분은 RBX 로 들어가는 메모리가 Corruption 되거나 used-after-freed 현상인데,
두줄의 로그를 보았을 경우 used-after-freed 가 되지 않을까 싶다.
이번 경우에는 확실히 used-after-freed 현상이 가장 가깝기 때문이다.
즉, 해제된 것을 다시 사용하거나, Double free race condition 등이 발생하고 있다는 것이다.
물론... 소스코드상에서는 아직 오류를 찾을 수 없는 상황이다.
보다 자세하고 큰 그림은 다음 분석을 통해 함께 그려보도록 하겠다.
'Skills > mY Technutz' 카테고리의 다른 글
Kernel Dump Analysis #16 (0) | 2018.02.22 |
---|---|
Kernel Dump Analysis #15 (0) | 2018.02.19 |
Kernel Dump Analysis #14 (0) | 2018.02.13 |
Kernel Crash dump Analysis - #13 (0) | 2017.12.20 |
커널이 지원하는 기능을 확인하는 습관. (1) | 2017.02.06 |
Kexec/Kdump 의 제약사항에 대해서 (4) | 2016.01.14 |
댓글을 달아 주세요