재밌는 리턴코드가 발견되어서 간만에 좀 디깅을 해봤다.
KERNEL: /share/linuxrpm/vmlinux_repo/64/3.10.0-693.el7.x86_64/vmlinux
DUMPFILE: 127.0.0.1-2019-01-17-00_38_36.zip_extract/vmcore [PARTIAL DUMP]
CPUS: 40
DATE: Thu Jan 17 00:38:38 2019
UPTIME: 39 days, 18:26:52
LOAD AVERAGE: 0.19, 0.20, 0.24
TASKS: 1420
NODENAME: **********
RELEASE: 3.10.0-693.el7.x86_64
VERSION: #1 SMP Wed Aug 2 06:49:08 PDT 2017
MACHINE: x86_64 (2199 Mhz)
MEMORY: 63.9 GB
PANIC: "BUG: unable to handle kernel paging request at ffffffffc0500790"
PID: 72639
COMMAND: "kworker/15:0"
TASK: ffff880dc2744f10 [THREAD_INFO: ffff881052fb8000]
CPU: 15
STATE: TASK_RUNNING (PANIC)
crash64> log
[ 37.351715] ip6_tables: (C) 2000-2006 Netfilter Core Team
[236233.529248] perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[346636.325549] perf: interrupt took too long (3127 > 3126), lowering kernel.perf_event_max_sample_rate to 63000
[602033.211929] perf: interrupt took too long (3916 > 3908), lowering kernel.perf_event_max_sample_rate to 51000
[2952215.839114] perf: interrupt took too long (4941 > 4895), lowering kernel.perf_event_max_sample_rate to 40000
[3436125.331154] BUG: unable to handle kernel paging request at ffffffffc0500790
[3436125.331196] IP: [<ffffffffc0500790>] 0xffffffffc050078f
[3436125.331224] PGD 19f5067 PUD 19f7067 PMD 8598de067 PTE 0
[3436125.331251] Oops: 0010 [#1] SMP
[3436125.331270] Modules linked in: ip6table_filter ip6_tables sctp_diag sctp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc iptable_filter bonding sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass ipmi_ssif crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt dcdbas iTCO_vendor_support mxm_wmi sg ipmi_si ipmi_devintf ipmi_msghandler pcspkr mei_me mei lpc_ich shpchp wmi acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe ahci crct10dif_pclmul crct10dif_common igb crc32c_intel libahci megaraid_sas libata mdio i2c_algo_bit ptp i2c_core pps_core dca dm_mirror dm_region_hash
[3436125.331620] dm_log dm_mod
[3436125.331629] CPU: 15 PID: 72639 Comm: kworker/15:0 Not tainted 3.10.0-693.el7.x86_64 #1
[3436125.331661] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.5.5 08/16/2017
[3436125.331721] Workqueue: xfs-cil/dm-3 xlog_cil_push_work [xfs]
[3436125.331746] task: ffff880dc2744f10 ti: ffff881052fb8000 task.ti: ffff881052fb8000
[3436125.331776] RIP: 0010:[<ffffffffc0500790>] [<ffffffffc0500790>] 0xffffffffc050078f
[3436125.331810] RSP: 0018:ffff881052fbbca0 EFLAGS: 00010286
[3436125.331832] RAX: ffffc90007396450 RBX: ffff88100ce8dc08 RCX: 0000000000000000
[3436125.331860] RDX: 0000000000000480 RSI: ffff880c9fbd7980 RDI: ffffc900073968d0
[3436125.331887] RBP: ffff881052fbbd38 R08: 0000000000000000 R09: ffffc90007396450
[3436125.331915] R10: 0000000000000480 R11: 0000000000018d8c R12: 0000000000000480
[3436125.331943] R13: 0000000000000033 R14: 0000000000000002 R15: ffff880c9fbd7000
[3436125.331971] FS: 0000000000000000(0000) GS:ffff88105dfc0000(0000) knlGS:0000000000000000
[3436125.332002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3436125.332026] CR2: ffffffffc0500790 CR3: 000000067136f000 CR4: 00000000003407e0
[3436125.332053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3436125.332081] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[3436125.332113] Stack:
[3436125.332124] ffff880a3aa68010 0000000000000000 ffff88085522f400 000000000c041837
[3436125.332158] 0000000000018d8c 0000000000000480 ffff88085522f528 0000000000000000
[3436125.332192] 0000000000000480 ffff881000006244 ffff880c9fbd7038 0000625000000001
[3436125.332224] Call Trace:
[3436125.332254] [<ffffffffc04021d8>] xlog_cil_push+0x2a8/0x430 [xfs]
[3436125.332293] [<ffffffffc0402375>] xlog_cil_push_work+0x15/0x20 [xfs]
[3436125.332322] [<ffffffff810a881a>] process_one_work+0x17a/0x440
[3436125.332347] [<ffffffff810a9638>] worker_thread+0x278/0x3c0
[3436125.332371] [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0
[3436125.332398] [<ffffffff810b098f>] kthread+0xcf/0xe0
[3436125.332420] [<ffffffff8108ddeb>] ? do_exit+0x6bb/0xa40
[3436125.332443] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[3436125.332470] [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
[3436125.332493] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[3436125.332518] Code: Bad RIP value.
[3436125.332538] RIP [<ffffffffc0500790>] 0xffffffffc050078f
[3436125.332564] RSP <ffff881052fbbca0>
[3436125.332580] CR2: ffffffffc0500790
일단 xfs 모듈이 올라가 있으므로, xfs 모듈을 불러와야한다.
Debuginfo 패키지에서 모듈을 추출하는 것은 이전 포스트를 참조하라.
crash64> mod -s xfs 3.10.0-693.el7.x86_64/kernel/fs/xfs/xfs.ko.debug
MODULE NAME SIZE OBJECT FILE
ffffffffc044e280 xfs 978100 3.10.0-693.el7.x86_64/kernel/fs/xfs/xfs.ko.debug
backtrace 를 해서 펑션들의 코드가 있는 소스파일이 무엇인지 확인해 보자.
crash64> bt -l
PID: 72639 TASK: ffff880dc2744f10 CPU: 15 COMMAND: "kworker/15:0"
#0 [ffff881052fbb948] machine_kexec at ffffffff8105c4cb
/linux-3.10.0-693.el7.x86_64/arch/x86/kernel/machine_kexec_64.c: 320
#1 [ffff881052fbb9a8] __crash_kexec at ffffffff81104a32
/linux-3.10.0-693.el7.x86_64/kernel/kexec_core.c: 875
#2 [ffff881052fbba78] crash_kexec at ffffffff81104b20
/linux-3.10.0-693.el7.x86_64/arch/x86/include/asm/atomic.h: 38
#3 [ffff881052fbba90] oops_end at ffffffff816ad278
/linux-3.10.0-693.el7.x86_64/arch/x86/kernel/dumpstack.c: 225
#4 [ffff881052fbbab8] no_context at ffffffff8169d29a
/linux-3.10.0-693.el7.x86_64/arch/x86/mm/fault.c: 703
#5 [ffff881052fbbb08] __bad_area_nosemaphore at ffffffff8169d330
/linux-3.10.0-693.el7.x86_64/arch/x86/mm/fault.c: 782
#6 [ffff881052fbbb50] bad_area_nosemaphore at ffffffff8169d49a
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/arch/x86/mm/fault.c: 790
#7 [ffff881052fbbb60] __do_page_fault at ffffffff816b013e
/linux-3.10.0-693.el7.x86_64/arch/x86/mm/fault.c: 1227
#8 [ffff881052fbbbc0] do_page_fault at ffffffff816b02e5
/linux-3.10.0-693.el7.x86_64/arch/x86/mm/fault.c: 1237
#9 [ffff881052fbbbf0] page_fault at ffffffff816ac508
/linux-3.10.0-693.el7.x86_64/arch/x86/kernel/entry_64.S: 1316
[exception RIP: unknown or invalid address]
RIP: ffffffffc0500790 RSP: ffff881052fbbca0 RFLAGS: 00010286
RAX: ffffc90007396450 RBX: ffff88100ce8dc08 RCX: 0000000000000000
RDX: 0000000000000480 RSI: ffff880c9fbd7980 RDI: ffffc900073968d0
RBP: ffff881052fbbd38 R8: 0000000000000000 R9: ffffc90007396450
R10: 0000000000000480 R11: 0000000000018d8c R12: 0000000000000480
R13: 0000000000000033 R14: 0000000000000002 R15: ffff880c9fbd7000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff881052fbbd40] xlog_cil_push at ffffffffc04021d8 [xfs]
/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 717
#11 [ffff881052fbbe10] xlog_cil_push_work at ffffffffc0402375 [xfs]
/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 799
#12 [ffff881052fbbe20] process_one_work at ffffffff810a881a
/linux-3.10.0-693.el7.x86_64/kernel/workqueue.c: 2252
#13 [ffff881052fbbe68] worker_thread at ffffffff810a9638
/linux-3.10.0-693.el7.x86_64/include/linux/list.h: 188
#14 [ffff881052fbbec8] kthread at ffffffff810b098f
/linux-3.10.0-693.el7.x86_64/kernel/kthread.c: 202
#15 [ffff881052fbbf50] ret_from_fork at ffffffff816b4f18
/linux-3.10.0-693.el7.x86_64/arch/x86/kernel/entry_64.S: 369
crash64>
물론 xfs 모듈을 로드하지 않으면 소스파일을 안알랴준다.
일단 해당 스택을 stack tracer 에 넣어보았다.
오라클에서는 stack checker 라고 해서 스택을 넣어주면 비슷하다고 생각되는 버그를 찾아준다.
나오는 몇가지 케이스를 살펴보았으나 일치하는것은 없는것같다.
좀 더 digging 해 보자... 코드를 좀 살펴보자.
레드햇에서는 매우 유용하게도 웹페이지로 된 Code browser 를 제공한다.
진짜진짜 좋고 유용하지만 난 레드햇 서브스크립션이 없기때문에
직접 ctag 등으로 구축한 코드를 직접 살펴보겠다.
linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 717
716 error = xlog_write(log, &lvhdr, tic, &ctx->start_lsn, NULL, 0);
717 if (error)
718 goto out_abort_free_ticket;
Disassembly 코드도 확인해볼까?
crash64> dis -rl ffffffffc04021d8
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 555
0xffffffffc0401f30 <xlog_cil_push>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffc0401f35 <xlog_cil_push+5>: push %rbp
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 566
0xffffffffc0401f36 <xlog_cil_push+6>: mov $0x7,%ecx
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 555
0xffffffffc0401f3b <xlog_cil_push+11>: mov %rsp,%rbp
0xffffffffc0401f3e <xlog_cil_push+14>: push %r15
0xffffffffc0401f40 <xlog_cil_push+16>: push %r14
0xffffffffc0401f42 <xlog_cil_push+18>: push %r13
0xffffffffc0401f44 <xlog_cil_push+20>: push %r12
0xffffffffc0401f46 <xlog_cil_push+22>: push %rbx
0xffffffffc0401f47 <xlog_cil_push+23>: mov %rdi,%rbx
0xffffffffc0401f4a <xlog_cil_push+26>: sub $0x98,%rsp
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 556
0xffffffffc0401f51 <xlog_cil_push+33>: mov 0x10(%rdi),%r14
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 555
0xffffffffc0401f55 <xlog_cil_push+37>: mov %rdi,-0xc0(%rbp)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 566
0xffffffffc0401f5c <xlog_cil_push+44>: lea -0x68(%rbp),%rdi
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 555
0xffffffffc0401f60 <xlog_cil_push+48>: mov %gs:0x28,%rax
0xffffffffc0401f69 <xlog_cil_push+57>: mov %rax,-0x30(%rbp)
0xffffffffc0401f6d <xlog_cil_push+61>: xor %eax,%eax
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 566
0xffffffffc0401f6f <xlog_cil_push+63>: xor %eax,%eax
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 570
0xffffffffc0401f71 <xlog_cil_push+65>: test %r14,%r14
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 566
0xffffffffc0401f74 <xlog_cil_push+68>: rep stos %rax,%es:(%rdi)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 570
0xffffffffc0401f77 <xlog_cil_push+71>: je 0xffffffffc0402039 <xlog_cil_push+265>
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/kmem.h: 77
0xffffffffc0401f7d <xlog_cil_push+77>: mov $0x15,%esi
0xffffffffc0401f82 <xlog_cil_push+82>: mov $0x70,%edi
0xffffffffc0401f87 <xlog_cil_push+87>: callq 0xffffffffc03fd010 <kmem_alloc>
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 52
0xffffffffc0401f8c <xlog_cil_push+92>: xor %esi,%esi
0xffffffffc0401f8e <xlog_cil_push+94>: xor %r8d,%r8d
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 703
0xffffffffc040217d <xlog_cil_push+589>: movl $0x5452414e,-0x78(%rbp)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 704
0xffffffffc0402184 <xlog_cil_push+596>: movl $0x28,-0x74(%rbp)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 705
0xffffffffc040218b <xlog_cil_push+603>: mov 0x18(%r12),%eax
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 710
0xffffffffc0402190 <xlog_cil_push+608>: subl $0x1c,0x20(%r12)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 716
0xffffffffc0402196 <xlog_cil_push+614>: mov %r12,%rdx
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 706
0xffffffffc0402199 <xlog_cil_push+617>: mov %r13d,-0x6c(%rbp)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 708
0xffffffffc040219d <xlog_cil_push+621>: movl $0x10,-0x80(%rbp)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 709
0xffffffffc04021a4 <xlog_cil_push+628>: movl $0x13,-0x7c(%rbp)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 712
0xffffffffc04021ab <xlog_cil_push+635>: movl $0x1,-0x60(%rbp)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 705
0xffffffffc04021b2 <xlog_cil_push+642>: mov %eax,-0x70(%rbp)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 707
0xffffffffc04021b5 <xlog_cil_push+645>: lea -0x78(%rbp),%rax
0xffffffffc04021b9 <xlog_cil_push+649>: mov %rax,-0x88(%rbp)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 713
0xffffffffc04021c0 <xlog_cil_push+656>: lea -0x88(%rbp),%rax
0xffffffffc04021c7 <xlog_cil_push+663>: mov %rax,-0x58(%rbp)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 714
0xffffffffc04021cb <xlog_cil_push+667>: mov 0x40(%rbx),%rax
0xffffffffc04021cf <xlog_cil_push+671>: mov %rax,-0x68(%rbp)
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 716
0xffffffffc04021d3 <xlog_cil_push+675>: callq 0xffffffffc0400320 <xlog_write>
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log_cil.c: 717
0xffffffffc04021d8 <xlog_cil_push+680>: test %eax,%eax
끄트머리를 살펴보자...
crash64> eval 0xffff881052fbbd38-0x68
hexadecimal: ffff881052fbbcd0
decimal: 18446612202425924816 (-131871283626800)
octal: 1777774201012276736320
binary: 1111111111111111100010000001000001010010111110111011110011010000
crash64>
코드 714 라인을 보면 rbx 에서 0x40 만큼 더한 주소를 rax 에 저장시키고,
그 rax 를 rbp 의 -0x68 만큼 이동한 주소에 저장시킨다.
코드는 rax 가 0xffff881052fbbcd0 이여야 한다고 하지만 실질적인 값을 보면 전혀 다른값으로 되어 있다.
RAX: ffffc90007396450 RBX: ffff88100ce8dc08 RCX: 0000000000000000
일단 최종적으로 호출된 xlog_writer 로 가보자.
crash64> dis -rl 0xffffffffc0400320
/usr/src/debug/kernel-3.10.0-693.el7/linux-3.10.0-693.el7.x86_64/fs/xfs/xfs_log.c: 2299
0xffffffffc0400320 <xlog_write>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
xfs_log.c line 2299 :
2292 int
xlog_write(
struct xlog *log,
struct xfs_log_vec *log_vector,
struct xlog_ticket *ticket,
xfs_lsn_t *start_lsn,
struct xlog_in_core **commit_iclog,
uint flags)
2299 {
여기엔 별 다른게 없다.
Assemble 내용을 보면 nopl 오퍼랜드가 수행되었다는것을 보았을 때 실질적으로 xlog_write 는 호출하였으나
아무것도 기록하지 않았음을 알 수 있다.
메모리 번지가 임의로 변경된 정황을 확인하였으므로 이젠 xfs 에 관련된 버그를 찾아봐야 한다.
다행히 커널 버젼 3.10.0-853 에서 아래와 같은 버그픽스들이 있었고 의심스러운 부분들이 보인다 :
[3.10.0-853]
- [sound] alsa: seq: Make ioctls race-free (CVE-2018-1000004) (Jaroslav Kysela) [1537203] {CVE-2018-1000004}
- [gpu] drm/i915/gvt: move write protect handler out of mmio emulation function (Paul Lai) [1525419]
- [gpu] drm/i915/gvt: Factor intel_vgpu_page_track (Paul Lai) [1525419]
- [fs] xfs: eliminate duplicate icreate tx reservation functions (Brian Foster) [1397653]
- [fs] xfs: refactor inode chunk alloc/free tx reservation (Brian Foster) [1397653]
- [fs] xfs: include an allocfree res for inobt modifications (Brian Foster) [1397653]
- [fs] xfs: truncate transaction does not modify the inobt (Brian Foster) [1397653]
- [fs] xfs: fix up agi unlinked list reservations (Brian Foster) [1397653]
- [fs] xfs: include inobt buffers in ifree tx log reservation (Brian Foster) [1397653]
- [fs] xfs: print transaction log reservation on overrun (Brian Foster) [1397653]
- [fs] xfs: dump transaction usage details on log reservation overrun (Brian Foster) [1397653]
- [fs] xfs: refactor xlog_cil_insert_items() to facilitate transaction dump (Brian Foster) [1397653]
- [fs] xfs: separate shutdown from ticket reservation print helper (Brian Foster) [1397653]
- [s390] gs: add compat regset for the guarded storage broadcast control block (Hendrik Brueckner) [1537067]
- [x86] intel_rdt/cqm: avoid negative static key counts (Joe Lawrence) [1524901]
- [x86] efi: Fix boot crash by always mapping boot service regions into new EFI page tables (Lenny Szubowicz) [1535243]
안타깝게 레드햇 가이가 아니라 버그 내용을 볼수 없었으나,
대략적으로 xfs 의 로그 기능 및 transaction 에 대한 메모리 처리과정에서
잘못된 메모리 참조 및 변환이 발생 할 수 있다는 내용으로 판단된다.
아쉽지만, 여기까지만 하고 다음에는 uek 와 관련된 내용으로 포스팅을 해야할것 같다.
내용을 볼수 없어서 비교가 불가하기때문이다 흑... 갑자기 용두사미가 된거같은데 뭐 내맴.