요즘 다시 혼자서 근무하다보니, 업무량이 훅 늘어나고
분석에 대해서는 반복적인 부분이 많이 발생하는데 그에 대한 시간은 여전히 동일하게 들어서
간편화 하는 방법이 없을까 해서 간단히 python 으로 crash utility 에 대한 extension 을 직접 만들고 있었는데,
하늘아래 새로운 것은 없듯이 역시나 이미 만들어져 있었다는 것을 발견하였다.
그것은 바로 mPyKdump 라는 crash extension 이다.
https://sourceforge.net/p/pykdump/wiki/Home/
일단 커널 코어덤프 분석을 위한 도구인 crash 툴은 C/Python 형태의 외부 스크립트를
내부에서 불러와 사용할 수 있다.
대부분 c 로 컴파일되어 모듈형태로 crash 툴이 실행된 후 로드하는 형태로 사용되는데
mPyKdump 는 파이선 기준으로 작성되어 내용을 수정하더라도 특별히 컴파일이 필요없이
바로바로 적용이 가능하다는 장점이 있었다.
해당 모듈을 이용하여 얼마나 간편하게 기존 삽질을 줄일 수 있는지 확인해보자.
(Host정보 삭제함)
설치는 매우 간단하며 해당 git 에 설명되어 있으므로 생략하겠다.
크게 crashinfo 와 xportshow 기능을 제공하는데, crashinfo 에 대해서 조금 살펴보겠다.
crash 를 수행하고 help 를 통해 추가된 명령어를 확인해 보자
crash64> extend /usr/local/lib64/mpykdump64.so
Setting scroll off while initializing PyKdump
/usr/local/lib64/mpykdump64.so: shared object loaded
crash64> help
* extend mach scsi tslog
alias files mod scsishow union
ascii foreach mount search vm
bpf fregs net set vtop
bt fuser nfsshow sig waitq
btop gdb p struct whatis
crashinfo hanginfo ps swap wr
dev help pte sym xportshow
dis ipcs ptob sys q
dmshow irq ptov task
epython kmem rd taskinfo
eval list repeat timer
exit log runq tree
crashinfo/dmshow/fregs/scsishow/tslog/nfsshow/hanginfo/taskinfo/dmshow/epython 등
몇가지 명령어들이 추가된 것을 확인 할 수 있다.
crashinfo 를 살펴보자
crash64> crashinfo -h
Usage: crashinfo [options]
Options:
-h, --help show this help message and exit
-v verbose output
-q quiet mode - print warnings only
--fast Fast mode - do not run potentially slow tests
--sysctl Print sysctl info.
--ext3 Print EXT3 info.
--blkreq Print Block I/O requests
--blkdevs Print Block Devices Info
--filelock Print filelock info.
--stacksummary Print stacks (bt) categorized summary.
--findstacks=FINDSTACKS
Print stacks (bt) containing functions that match the provided pattern
--checkstacks Check stacks of all threads for corruption
--decodesyscalls=DECODESYSCALLS
Decode Syscalls on the Stack
--keventd_wq Decode keventd_wq
--kblockd_wq Decode kblockd_workqueue
--lws Print Locks Waitqueues and Semaphores
--devmapper Print DeviceMapper Tables
--runq Print Runqueus
--semaphore=SEMA Print 'struct semaphore' info
--rwsemaphore=RWSEMA Print 'struct rw_semaphore' info
--mutex=MUTEX Print Mutex info
--umem Print User-space Memory Usage
--ls=LS Emulate 'ls'. You can specify either dentry address or full pathname
--workqueues Print Workqueues - just for some kernels
--radix_tree_element=root offset
Find and print a radix tree element
--pci Print PCI Info
--version Print program version and exit
** Execution took 1.97s (real) 1.89s (CPU)
요기 도움말에 나오는 옵션들만 봐도 눈이 확 커질것이다.
기존에는 직접 메모리주소를 찾아 찾아가며 확인해야 했던 것들인데
이걸 보면 한방에 지원해 준다는 것을 알 수 있기 때문이다.
일단 그냥 기본으로 수행해 보자.
crash64> crashinfo
************************ crashinfo *************************
/ovs1disk/nfs_local/vmcore/OS_var_crash_File/******/127.0.0.1-2017-12-17-18:34:31/vmcore (4.1.12-94.3.8.el6uek.x86_64)
+==========================+
| *** Crashinfo v1.3.4 *** |
+==========================+
+++WARNING+++ PARTIAL DUMP with size(vmcore) < 25% size(RAM)
KERNEL: vmlinux
DUMPFILE: 127.0.0.1-2017-12-17-18:34:31/vmcore [PARTIAL DUMP]
CPUS: 4
DATE: Sun Dec 17 18:34:01 2017
UPTIME: 97 days, 02:15:47
LOAD AVERAGE: 1.39, 1.62, 1.34
TASKS: 1190
NODENAME: ******
RELEASE: 4.1.12-94.3.8.el6uek.x86_64
VERSION: #2 SMP Fri Jun 30 11:00:28 PDT 2017
MACHINE: x86_64 (3492 Mhz)
MEMORY: 71.7 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000068"
+--------------------------+
>------------------------| Per-cpu Stacks ('bt -a') |------------------------<
+--------------------------+
-- CPU#0 --
PID=0 CPU=0 CMD=swapper/0
#0 crash_nmi_callback+0x38
#1 nmi_handle+0x87
#2 default_do_nmi+0x5e
#3 do_nmi+0xf5
#4 end_repeat_nmi+0x1a
#-1 intel_idle+0xb4, 507 bytes of data
#5 intel_idle+0xb4
#6 cpuidle_enter_state+0x8d
#7 cpuidle_enter+0x17
#8 cpuidle_idle_call+0xe4
#9 cpu_idle_loop+0x1f5
#10 cpu_startup_entry+0x5f
#11 rest_init+0x7c
#12 start_kernel+0x427
#13 x86_64_start_reservations+0x2a
#14 x86_64_start_kernel+0x19c
-- CPU#1 --
PID=0 CPU=1 CMD=swapper/1
#0 crash_nmi_callback+0x38
#1 nmi_handle+0x87
#2 default_do_nmi+0x5e
#3 do_nmi+0xf5
#4 end_repeat_nmi+0x1a
#-1 intel_idle+0xb4, 507 bytes of data
#5 intel_idle+0xb4
#6 cpuidle_enter_state+0x8d
#7 cpuidle_enter+0x17
#8 cpuidle_idle_call+0xe4
#9 cpu_idle_loop+0x1f5
#10 cpu_startup_entry+0x5f
#11 start_secondary+0xbb
-- CPU#2 --
PID=0 CPU=2 CMD=swapper/2
#0 crash_nmi_callback+0x38
#1 nmi_handle+0x87
#2 default_do_nmi+0x5e
#3 do_nmi+0xf5
#4 end_repeat_nmi+0x1a
#-1 intel_idle+0xb4, 507 bytes of data
#5 intel_idle+0xb4
#6 cpuidle_enter_state+0x8d
#7 cpuidle_enter+0x17
#8 cpuidle_idle_call+0xe4
#9 cpu_idle_loop+0x1f5
#10 cpu_startup_entry+0x5f
#11 start_secondary+0xbb
-- CPU#3 --
PID=2064 CPU=3 CMD=kworker/3:3
#0 machine_kexec+0x1e0
#1 crash_kexec+0x68
#2 oops_end+0xe8
#3 no_context+0x161
#4 __bad_area_nosemaphore+0x12d
#5 bad_area_nosemaphore+0x13
#6 __do_page_fault+0x328
#7 do_page_fault+0x37
#8 page_fault+0x28
#-1 kernfs_find_ns+0x19, 477 bytes of data
#9 kernfs_find_and_get_ns+0x3c
#10 sysfs_unmerge_group+0x1d
#11 dpm_sysfs_remove+0x2c
#12 device_del+0x58
#13 attribute_container_class_device_del+0x1e
#14 transport_remove_classdev+0x59
#15 attribute_container_device_trigger+0xa0
#16 transport_remove_device+0x15
#17 scsi_target_reap_ref_release+0x32
#18 scsi_target_reap+0x2c
#19 scsi_remove_target+0xeb
#20 fc_starget_delete+0x26
#21 process_one_work+0x151
#22 worker_thread+0x120
#23 kthread+0xce
#24 ret_from_fork+0x42
+--------------------------------+
>---------------------| How This Dump Has Been Created |---------------------<
+--------------------------------+
+---------------+
>------------------------------| Tasks Summary |------------------------------<
+---------------+
Number of Threads That Ran Recently
-----------------------------------
last second 232
last 5s 357
last 60s 514
----- Total Numbers of Threads per State ------
TASK_INTERRUPTIBLE 1185
TASK_RUNNING 2
+++WARNING+++ There are 1 threads running in their own namespaces
Use 'taskinfo --ns' to get more details
+-----------------------+
>--------------------------| 5 Most Recent Threads |--------------------------<
+-----------------------+
PID CMD Age ARGS
----- -------------- ------ ----------------------------
7 rcu_sched 0 ms (no user stack)
10121 crsd.bin 0 ms (no user stack)
8736 ocssd.bin 0 ms (no user stack)
2064 kworker/3:3 0 ms (no user stack)
9313 ocssd.bin 0 ms (no user stack)
+------------------------+
>-------------------------| Memory Usage (kmem -i) |-------------------------<
+------------------------+
PAGES TOTAL PERCENTAGE
TOTAL MEM 18411269 70.2 GB ----
FREE 893464 3.4 GB 4% of TOTAL MEM
USED 17517805 66.8 GB 95% of TOTAL MEM
SHARED 10688869 40.8 GB 58% of TOTAL MEM
BUFFERS 314037 1.2 GB 1% of TOTAL MEM
CACHED 15464058 59 GB 83% of TOTAL MEM
SLAB 379184 1.4 GB 2% of TOTAL MEM
TOTAL HUGE 0 0 ----
HUGE FREE 0 0 0% of TOTAL HUGE
TOTAL SWAP 4194303 16 GB ----
SWAP USED 16335 63.8 MB 0% of TOTAL SWAP
SWAP FREE 4177968 15.9 GB 99% of TOTAL SWAP
COMMIT LIMIT 13399937 51.1 GB ----
COMMITTED 5643195 21.5 GB 42% of TOTAL LIMIT
+-------------------------------+
>----------------------| Scheduler Runqueues (per CPU) |----------------------<
+-------------------------------+
---+ CPU=0 <struct rq 0xffff88127fc17640> ----
| CURRENT TASK <struct task_struct 0xffffffff81ab54e0>, CMD=swapper/0
---+ CPU=1 <struct rq 0xffff88127fc97640> ----
| CURRENT TASK <struct task_struct 0xffff88122a7aaa00>, CMD=swapper/1
---+ CPU=2 <struct rq 0xffff88127fd17640> ----
| CURRENT TASK <struct task_struct 0xffff88122a7ab800>, CMD=swapper/2
---+ CPU=3 <struct rq 0xffff88127fd97640> ----
| CURRENT TASK <struct task_struct 0xffff881191d4f000>, CMD=kworker/3:3
+------------------------+
>-------------------------| Network Status Summary |-------------------------<
+------------------------+
TCP Connection Info
-------------------
ESTABLISHED 117
TIME_WAIT 92
LISTEN 40
NAGLE disabled (TCP_NODELAY): 90
UDP Connection Info
-------------------
550 UDP sockets, 0 in ESTABLISHED
+++WARNING+++ UDP buffer fill >=75% rcv=1 snd=0
Unix Connection Info
------------------------
ESTABLISHED 1626
CLOSE 32
LISTEN 89
Raw sockets info
--------------------
None
Interfaces Info
---------------
How long ago (in seconds) interfaces trasmitted/received?
Name RX TX
---- ---------- ---------
lo n/a n/a
eth8 n/a n/a
eth9 n/a n/a
eth10 n/a n/a
eth11 n/a n/a
eth0 n/a 1.2
eth1 n/a n/a
eth2 n/a n/a
eth3 n/a n/a
eth6 n/a 0.0
eth7 n/a n/a
eth4 n/a n/a
eth5 n/a n/a
bond0 n/a n/a
bond1 n/a n/a
RSS_TOTAL=71867908 pages, %mem= 93.4
+------------+
>-------------------------------| Mounted FS |-------------------------------<
+------------+
MOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff88122a69c000 ffff88127e810800 rootfs rootfs /
ffff88122a36cc00 ffff88127e815000 proc proc /proc
ffff88122a36cd80 ffff88122a327000 sysfs sysfs /sys
ffff88122a36cf00 ffff88122a320800 devtmpfs devtmpfs /dev
ffff88122a36d080 ffff88122a322800 devpts devpts /dev/pts
ffff88122a36d200 ffff88122a327800 tmpfs tmpfs /dev/shm
ffff88122a36d380 ffff88121f8cc000 ext4 /dev/mapper/vg00-lvroot /
ffff88122a69cc00 ffff881223114800 ext4 /dev/mapper/vg00-lvadmin /Admin
ffff881229af4780 ffff88121ffe6800 ext4 /dev/sda1 /boot
ffff881229af4480 ffff8812256b0000 ext4 /dev/mapper/vg00-lvhome /home
ffff881229af4900 ffff8812256b1800 ext4 /dev/mapper/vg00-lvtmp /tmp
ffff88122a69cf00 ffff881223117800 ext4 /dev/mapper/vg00-lvvar /var
ffff88122a69d080 ffff8812267e8800 ext4 /dev/mapper/vg00-lvcrash /var/crash
ffff881229af4a80 ffff8812256b3000 ext4 /dev/mapper/vg00-lvnetback /Netbackup
ffff881229af4c00 ffff8812256b4800 ext4 /dev/mapper/racoravg-lvracdb /IPMSRAC/oracle/DB
ffff88122a69d200 ffff8812267ea000 ext4 /dev/mapper/racoravg-lvracgrid /IPMSRAC/oracle/GRID
ffff88122a69d380 ffff8812267eb800 ext4 /dev/mapper/racarchvg-lvracarch /IPMSRAC/oracle/ARCH
ffff88122a69d500 ffff8812267ed000 ext4 /dev/mapper/racexpvg-lvracexp /IPMSRAC/oracle/EXP
ffff88122a36dc80 ffff88121fe66000 binfmt_misc none /proc/sys/fs/binfmt_misc
ffff881229af5800 ffff881224fcf000 autofs /etc/auto.misc /misc
ffff881229af5980 ffff881224fc9000 autofs -hosts /net
ffff88122254dc80 ffff881222ca8800 oracleasmfs oracleasmfs /dev/oracleasm
+-------------------------------+
>----------------------| Last 40 lines of dmesg buffer |----------------------<
+-------------------------------+
[8381884.056078] R10: 0000000000000003 R11: 0000000000000001 R12: ffffffff81771ce0
[8381884.056080] R13: 0000000000000000 R14: ffffffff814782e0 R15: ffff880c7c2b0000
[8381884.056083] FS: 0000000000000000(0000) GS:ffff88127fd80000(0000) knlGS:0000000000000000
[8381884.056086] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[8381884.056088] CR2: 0000000000000068 CR3: 0000001226fe3000 CR4: 00000000003406e0
[8381884.056090] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[8381884.056092] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[8381884.056093] Stack:
[8381884.056095] ffff880b736c7bb8 ffffffff816e0c0d ffff880b736c7bd8 0000000000000000
[8381884.056099] ffffffff81771ce0 0000000000000000 ffff880b736c7c08 ffffffff8128934c
[8381884.056103] ffff880b736c7c18 ffffffff81ba8700 ffff880c7c2b2c28 ffffffffa00d5020
[8381884.056107] Call Trace:
[8381884.056111] [<ffffffff816e0c0d>] ? _cond_resched+0x1d/0x30
[8381884.056116] [<ffffffff8128934c>] kernfs_find_and_get_ns+0x3c/0x70
[8381884.056120] [<ffffffff8128d2dd>] sysfs_unmerge_group+0x1d/0x60
[8381884.056124] [<ffffffff8147a28c>] dpm_sysfs_remove+0x2c/0x70
[8381884.056127] [<ffffffff8146f4e8>] device_del+0x58/0x230
[8381884.056131] [<ffffffff8146f329>] ? device_remove_file+0x19/0x20
[8381884.056135] [<ffffffff814782e0>] ? transport_add_device+0x20/0x20
[8381884.056138] [<ffffffff81477bbe>] attribute_container_class_device_del+0x1e/0x30
[8381884.056141] [<ffffffff81478339>] transport_remove_classdev+0x59/0x70
[8381884.056144] [<ffffffff81477de0>] attribute_container_device_trigger+0xa0/0xe0
[8381884.056148] [<ffffffff81478295>] transport_remove_device+0x15/0x20
[8381884.056151] [<ffffffff814c9ba2>] scsi_target_reap_ref_release+0x32/0x50
[8381884.056154] [<ffffffff814c9bec>] scsi_target_reap+0x2c/0x40
[8381884.056157] [<ffffffff814cd1ab>] scsi_remove_target+0xeb/0x120
[8381884.056164] [<ffffffffa00ce116>] fc_starget_delete+0x26/0x30 [scsi_transport_fc]
[8381884.056168] [<ffffffff810a14e1>] process_one_work+0x151/0x4b0
[8381884.056173] [<ffffffff810a1960>] worker_thread+0x120/0x480
[8381884.056176] [<ffffffff816e05cb>] ? __schedule+0x30b/0x890
[8381884.056180] [<ffffffff810a1840>] ? process_one_work+0x4b0/0x4b0
[8381884.056184] [<ffffffff810a1840>] ? process_one_work+0x4b0/0x4b0
[8381884.056188] [<ffffffff810a6b3e>] kthread+0xce/0xf0
[8381884.056191] [<ffffffff810a6a70>] ? kthread_freezable_should_stop+0x70/0x70
[8381884.056196] [<ffffffff816e52a2>] ret_from_fork+0x42/0x70
[8381884.056199] [<ffffffff810a6a70>] ? kthread_freezable_should_stop+0x70/0x70
[8381884.056201] Code: 40 00 eb ef 89 c2 eb de 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 30 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 0f 1f 44 00 00 <0f> b7 47 68 48 8b 5f 48 49 89 f1 c1 e8 05 83 e0 01 48 85 d2 0f
[8381884.056244] RIP [<ffffffff81288f49>] kernfs_find_ns+0x19/0x110
[8381884.056248] RSP <ffff880b736c7ba8>
[8381884.056250] CR2: 0000000000000068
******************************************************************************
************************ A Summary Of Problems Found *************************
******************************************************************************
-------------------- A list of all +++WARNING+++ messages --------------------
PARTIAL DUMP with size(vmcore) < 25% size(RAM)
There are 1 threads running in their own namespaces
Use 'taskinfo --ns' to get more details
UDP buffer fill >=75% rcv=1 snd=0
------------------------------------------------------------------------------
** Execution took 6.40s (real) 4.21s (CPU), Child processes: 2.41s
이것만으로도 이미 crash dump 분석의 대부분이 해결되었다.
대다수의 상태가 이것으로 확인이 가능하기 때문이다.
마지막 요약분을 보면 덤프가 파셜리하다는 것과, 네임스페이스 확인이
필요하니 taskinfo --ns 를 사용해 보라는 것 그리고 UDP buffer 상태에 대해서
나름의 의견이 나왔다.
물론 마지막 요약분은 영 쓸모가 없는 내용이긴 한데, 차근차근 이 crashinfo 를 확인해보면
감탄밖에 안나올 것이다...
메모리 부분, 타스크 요약부분, 최근 5개의 스레드, 런큐에 있는 프로세스를 확인할 수 있다.
따로 아래와 같이 옵션을 줘 blk request 도 확인할 수 있고,
crash64> crashinfo --blkreq
-- Request Queues Analysis: Count=0, in_flight=0
-- Requests on blk_cpu_done: Count=0
-- Requests from SLAB Analysis: Count=0, STARTED=0 WRITE=0
** Execution took 0.40s (real) 0.38s (CPU), Child processes: 0.05s
스택서머리로 스택호출 횟수도 확인할 수 있다.
crash64> crashinfo --stacksummary
------- 391 stacks like that: ----------
#0 __schedule
#1 schedule
#2 futex_wait_queue_me
#3 futex_wait
#4 do_futex
#5 sys_futex
#6 system_call_fastpath
youngest=0s(pid=10121), oldest=8381851s(pid=7410)
........................
ReportAgent 2 times
auditd 1 times
automount 2 times
busagt 3 times
console-kit-dae 1 times
crsd.bin 29 times
cssdagent 13 times
cssdmonitor 13 times
... 생략
아래와 같이 locks waitqueue, semaphores 또한 검색에 시간이 매우 걸리는 편이라
하나하나 검색하기 부담스러운 부분이 있는데 한번에 볼 수 있다.
crash64> crashinfo --lws
-- rw_semaphores with count > 0 --
-- rw_semaphores with count <= 0 --
uts_sem 0
umhelper_sem 0
css_set_rwsem 0
all_cpu_access_lock 0
trace_event_sem 0
oom_sem 0
shrinker_rwsem 0
memcg_cache_ids_sem 0
namespace_sem 0
key_types_sem 0
keyring_serialise_link_sem 0
crypto_alg_sem 0
asymmetric_key_parsers_sem 0
pci_bus_sem 0
bus_type_sem 0
pcistub_sem 0
dmar_global_lock 0
pcmcia_socket_list_rwsem 0
ehci_cf_port_reset_rwsem 0
minor_rwsem 0
companions_rwsem 0
cpufreq_rwsem 0
leds_list_lock 0
triggers_list_lock 0
dquirks_rwsem 0
cb_lock 0
-- Non-empty wait_queue_head --
log_wait
PID: 8532 TASK: ffff88122378e200 CPU: 3 COMMAND: "rsyslogd"
kauditd_wait
PID: 7275 TASK: ffff881221ff7000 CPU: 0 COMMAND: "kauditd"
ksm_thread_wait
PID: 40 TASK: ffff88122a364600 CPU: 3 COMMAND: "ksmd"
destroy_waitq
PID: 52 TASK: ffff88122a354600 CPU: 3 COMMAND: "fsnotify_mark"
vt_event_waitqueue
PID: 13616 TASK: ffff8811efd87000 CPU: 1 COMMAND: "console-kit-dae"
PID: 13615 TASK: ffff8811efd86200 CPU: 0 COMMAND: "console-kit-dae"
PID: 13576 TASK: ffff8811efd15400 CPU: 1 COMMAND: "console-kit-dae"
PID: 13618 TASK: ffff881223238e00 CPU: 0 COMMAND: "console-kit-dae"
PID: 13614 TASK: ffff8811efd85400 CPU: 1 COMMAND: "console-kit-dae"
PID: 13585 TASK: ffff8812066c7000 CPU: 0 COMMAND: "console-kit-dae"
PID: 13595 TASK: ffff8811efe3c600 CPU: 1 COMMAND: "console-kit-dae"
PID: 13579 TASK: ffff881206613800 CPU: 1 COMMAND: "console-kit-dae"
PID: 13577 TASK: ffff8811efd12a00 CPU: 1 COMMAND: "console-kit-dae"
PID: 13591 TASK: ffff8811efe38e00 CPU: 1 COMMAND: "console-kit-dae"
PID: 13596 TASK: ffff8811efe3d400 CPU: 1 COMMAND: "console-kit-dae"
PID: 13590 TASK: ffff8811efe38000 CPU: 3 COMMAND: "console-kit-dae"
PID: 13603 TASK: ffff8811efceb800 CPU: 2 COMMAND: "console-kit-dae"
PID: 13584 TASK: ffff8812066c5400 CPU: 1 COMMAND: "console-kit-dae"
PID: 13574 TASK: ffff8811efd16200 CPU: 3 COMMAND: "console-kit-dae"
PID: 13589 TASK: ffff881224395400 CPU: 2 COMMAND: "console-kit-dae"
PID: 13617 TASK: ffff881223238000 CPU: 1 COMMAND: "console-kit-dae"
PID: 13588 TASK: ffff88121fbe4600 CPU: 1 COMMAND: "console-kit-dae"
PID: 13578 TASK: ffff8811efd11c00 CPU: 2 COMMAND: "console-kit-dae"
PID: 13602 TASK: ffff8811efceaa00 CPU: 1 COMMAND: "console-kit-dae"
PID: 13601 TASK: ffff8811efce9c00 CPU: 3 COMMAND: "console-kit-dae"
PID: 13628 TASK: ffff881223b99c00 CPU: 1 COMMAND: "console-kit-dae"
PID: 13629 TASK: ffff881223b9aa00 CPU: 1 COMMAND: "console-kit-dae"
PID: 13632 TASK: ffff881223b9d400 CPU: 1 COMMAND: "console-kit-dae"
PID: 13631 TASK: ffff881223b9c600 CPU: 1 COMMAND: "console-kit-dae"
PID: 13630 TASK: ffff881223b9b800 CPU: 1 COMMAND: "console-kit-dae"
PID: 13633 TASK: ffff881223b9e200 CPU: 0 COMMAND: "console-kit-dae"
PID: 13627 TASK: ffff881223b98e00 CPU: 0 COMMAND: "console-kit-dae"
PID: 13597 TASK: ffff8811efe3e200 CPU: 0 COMMAND: "console-kit-dae"
PID: 13581 TASK: ffff8812066c4600 CPU: 1 COMMAND: "console-kit-dae"
PID: 13625 TASK: ffff88122323f000 CPU: 1 COMMAND: "console-kit-dae"
PID: 13626 TASK: ffff881223b98000 CPU: 1 COMMAND: "console-kit-dae"
PID: 13600 TASK: ffff8811efce8e00 CPU: 0 COMMAND: "console-kit-dae"
PID: 13624 TASK: ffff88122323e200 CPU: 1 COMMAND: "console-kit-dae"
PID: 13623 TASK: ffff88122323d400 CPU: 1 COMMAND: "console-kit-dae"
PID: 13587 TASK: ffff88121fbe2a00 CPU: 0 COMMAND: "console-kit-dae"
PID: 13583 TASK: ffff8812066c3800 CPU: 1 COMMAND: "console-kit-dae"
PID: 13592 TASK: ffff8811efe39c00 CPU: 0 COMMAND: "console-kit-dae"
PID: 13608 TASK: ffff8811efd80000 CPU: 0 COMMAND: "console-kit-dae"
PID: 13613 TASK: ffff8811efd84600 CPU: 1 COMMAND: "console-kit-dae"
PID: 13612 TASK: ffff8811efd83800 CPU: 1 COMMAND: "console-kit-dae"
PID: 13594 TASK: ffff8811efe3b800 CPU: 0 COMMAND: "console-kit-dae"
PID: 13621 TASK: ffff88122323b800 CPU: 1 COMMAND: "console-kit-dae"
PID: 13609 TASK: ffff8811efd80e00 CPU: 0 COMMAND: "console-kit-dae"
PID: 13619 TASK: ffff881223239c00 CPU: 1 COMMAND: "console-kit-dae"
PID: 13599 TASK: ffff8811efce8000 CPU: 0 COMMAND: "console-kit-dae"
PID: 13620 TASK: ffff88122323aa00 CPU: 1 COMMAND: "console-kit-dae"
PID: 13598 TASK: ffff8811efe3f000 CPU: 0 COMMAND: "console-kit-dae"
PID: 13622 TASK: ffff88122323c600 CPU: 1 COMMAND: "console-kit-dae"
PID: 13580 TASK: ffff8812066c0000 CPU: 0 COMMAND: "console-kit-dae"
PID: 13586 TASK: ffff88121fbe1c00 CPU: 0 COMMAND: "console-kit-dae"
PID: 13607 TASK: ffff8811efcef000 CPU: 1 COMMAND: "console-kit-dae"
PID: 13610 TASK: ffff8811efd81c00 CPU: 0 COMMAND: "console-kit-dae"
PID: 13611 TASK: ffff8811efd82a00 CPU: 1 COMMAND: "console-kit-dae"
PID: 13593 TASK: ffff8811efe3aa00 CPU: 0 COMMAND: "console-kit-dae"
PID: 13606 TASK: ffff8811efcee200 CPU: 1 COMMAND: "console-kit-dae"
PID: 13605 TASK: ffff8811efced400 CPU: 3 COMMAND: "console-kit-dae"
PID: 13634 TASK: ffff881223b9f000 CPU: 1 COMMAND: "console-kit-dae"
PID: 13582 TASK: ffff8812066c1c00 CPU: 1 COMMAND: "console-kit-dae"
PID: 13604 TASK: ffff8811efcec600 CPU: 0 COMMAND: "console-kit-dae"
PID: 13575 TASK: ffff8811efd17000 CPU: 3 COMMAND: "console-kit-dae"
md_event_waiters
PID: 8715 TASK: ffff881222bd4600 CPU: 2 COMMAND: "hald"
-- Non-empty struct work_struct --
** Execution took 179.91s (real) 179.87s (CPU)
Semaphore 를 중요시 쓰는 RDBMS 관련 이슈에서 이용하기 좋을것 같다.
Workqueues 도 볼 수 있고,
crash64> crashinfo --workqueues
-----------------------WorkQueues - Active only-----------------------
--------fc_wq_13--------- <struct workqueue_struct 0xffff881220187c00>
<struct pool_workqueue 0xffff88127fda1300> active=1 delayed=0
<struct worker_pool 0xffff88127fd96e40> nr_workers=2 nr_idle=1
<struct worker 0xffff88040a93b6c0> kworker/3:3 fc_starget_delete
** Execution took 0.59s (real) 0.60s (CPU)
PCI 정보도 하드웨어 문제가 의심될 때 보려면 은근 귀찮은데 최대한 정리해준다.
crash64> crashinfo --pci
ff:08.0 0880: 8086:6f80 (rev 01)
ff:08.2 1101: 8086:6f32 (rev 01)
ff:08.3 0880: 8086:6f83 (rev 01)
83:00.1 0c04: 10df:f100 (rev 03)
..중략..
84:00.0 0c04: 10df:f100 (rev 03)
84:00.1 0c04: 10df:f100 (rev 03)
============================iomem_resource============================
00000000-00000fff : reserved
00001000-0009d3ff : System RAM
0009d400-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000c7fff : Video ROM
000c8000-000cfbff : Adapter ROM
000d0000-000d17ff : Adapter ROM
000d1800-000d2fff : Adapter ROM
000d3000-000d47ff : Adapter ROM
000d4800-000d5fff : Adapter ROM
000e0000-000fffff : reserved
000f0000-000fffff : System ROM
00100000-6c0f3fff : System RAM
01000000-016ea9c4 : Kernel code
016ea9c5-01c23a7f : Kernel data
01e03000-020d1fff : Kernel bss
25000000-354fffff : Crash kernel
6c0f4000-6d679fff : reserved
6c55e018-6c55e018 : APEI ERST
6c55e01c-6c55e021 : APEI ERST
6c55e028-6c55e039 : APEI ERST
6c55e040-6c55e04c : APEI ERST
6c55e050-6c56004f : APEI ERST
6d67a000-6d6dbfff : ACPI Tables
6d6dc000-71810fff : ACPI Non-volatile Storage
71811000-8fffffff : reserved
80000000-8fffffff : PCI MMCONFIG 0000 [bus 00-ff]
90000000-c7ffbfff : PCI Bus 0000:00
90000000-900fffff : PCI Bus 0000:02
90000000-9001ffff : 0000:02:00.0
90020000-9003ffff : 0000:02:00.0
90040000-9005ffff : 0000:02:00.1
90060000-9007ffff : 0000:02:00.1
90100000-904fffff : PCI Bus 0000:03
90100000-901fffff : 0000:03:00.0
90200000-902fffff : 0000:03:00.0
90300000-903fffff : 0000:03:00.1
90400000-904fffff : 0000:03:00.1
90500000-908fffff : PCI Bus 0000:04
90500000-905fffff : 0000:04:00.0
90600000-906fffff : 0000:04:00.0
90700000-907fffff : 0000:04:00.1
90800000-908fffff : 0000:04:00.1
c5000000-c5ffffff : PCI Bus 0000:07
c5000000-c5ffffff : 0000:07:00.0
c6000000-c61fffff : PCI Bus 0000:05
c6000000-c601ffff : 0000:05:00.3
c6000000-c601ffff : be2net
c6020000-c603ffff : 0000:05:00.3
c6020000-c603ffff : be2net
c6040000-c605ffff : 0000:05:00.2
c6040000-c605ffff : be2net
c6060000-c607ffff : 0000:05:00.2
c6060000-c607ffff : be2net
c6080000-c609ffff : 0000:05:00.1
c6080000-c609ffff : be2net
c60a0000-c60bffff : 0000:05:00.1
c60a0000-c60bffff : be2net
c60c0000-c60dffff : 0000:05:00.0
c60c0000-c60dffff : be2net
c60e0000-c60fffff : 0000:05:00.0
c60e0000-c60fffff : be2net
c6100000-c6103fff : 0000:05:00.3
c6100000-c6103fff : be2net
c6104000-c6107fff : 0000:05:00.2
c6104000-c6107fff : be2net
c6108000-c610bfff : 0000:05:00.1
c6108000-c610bfff : be2net
c610c000-c610ffff : 0000:05:00.0
c610c000-c610ffff : be2net
c6200000-c63fffff : PCI Bus 0000:04
c6200000-c627ffff : 0000:04:00.1
c6200000-c627ffff : ixgbe
c6280000-c62fffff : 0000:04:00.0
c6280000-c62fffff : ixgbe
c6300000-c6303fff : 0000:04:00.1
c6300000-c6303fff : ixgbe
c6304000-c6307fff : 0000:04:00.0
c6304000-c6307fff : ixgbe
c6400000-c65fffff : PCI Bus 0000:03
c6400000-c647ffff : 0000:03:00.1
c6400000-c647ffff : ixgbe
c6480000-c64fffff : 0000:03:00.0
c6480000-c64fffff : ixgbe
c6500000-c6503fff : 0000:03:00.1
c6500000-c6503fff : ixgbe
c6504000-c6507fff : 0000:03:00.0
c6504000-c6507fff : ixgbe
c6800000-c70fffff : PCI Bus 0000:07
c6800000-c6ffffff : 0000:07:00.0
c7000000-c701ffff : 0000:07:00.1
c7020000-c7023fff : 0000:07:00.0
c7024000-c70240ff : 0000:07:00.1
c7100000-c73fffff : PCI Bus 0000:02
c7100000-c71fffff : 0000:02:00.1
c7100000-c71fffff : igb
c7200000-c72fffff : 0000:02:00.0
c7200000-c72fffff : igb
c7300000-c737ffff : 0000:02:00.0
c7380000-c7383fff : 0000:02:00.1
c7380000-c7383fff : igb
c7384000-c7387fff : 0000:02:00.0
c7384000-c7387fff : igb
c7400000-c76fffff : PCI Bus 0000:01
c7400000-c74fffff : 0000:01:00.0
c7500000-c75fffff : 0000:01:00.0
c7600000-c760ffff : 0000:01:00.0
c7600000-c760ffff : megasas: LSI
c7700000-c78fffff : PCI Bus 0000:05
c7700000-c777ffff : 0000:05:00.3
c7780000-c77fffff : 0000:05:00.2
c7800000-c787ffff : 0000:05:00.1
c7880000-c78fffff : 0000:05:00.0
c7900000-c790ffff : 0000:00:14.0
c7900000-c790ffff : xhci-hcd
c7911000-c79110ff : 0000:00:1f.3
c7912000-c79127ff : 0000:00:1f.2
c7912000-c79127ff : ahci
c7913000-c79133ff : 0000:00:1d.0
c7913000-c79133ff : ehci_hcd
c7914000-c79143ff : 0000:00:1a.0
c7914000-c79143ff : ehci_hcd
c7916000-c791600f : 0000:00:16.1
c7917000-c791700f : 0000:00:16.0
c7918000-c79187ff : 0000:00:11.4
c7918000-c79187ff : ahci
c7919000-c7919fff : 0000:00:05.4
c7ffc000-c7ffcfff : dmar1
c8000000-fbffbfff : PCI Bus 0000:80
c8000000-c80fffff : PCI Bus 0000:81
c8000000-c801ffff : 0000:81:00.0
c8020000-c803ffff : 0000:81:00.0
c8040000-c805ffff : 0000:81:00.1
c8060000-c807ffff : 0000:81:00.1
fb900000-fbbfffff : PCI Bus 0000:81
fb900000-fb9fffff : 0000:81:00.1
fb900000-fb9fffff : igb
fba00000-fbafffff : 0000:81:00.0
fba00000-fbafffff : igb
fbb00000-fbb7ffff : 0000:81:00.0
fbb80000-fbb83fff : 0000:81:00.1
fbb80000-fbb83fff : igb
fbb84000-fbb87fff : 0000:81:00.0
fbb84000-fbb87fff : igb
fbc00000-fbcfffff : PCI Bus 0000:84
fbc00000-fbc3ffff : 0000:84:00.1
fbc40000-fbc7ffff : 0000:84:00.0
fbc80000-fbc83fff : 0000:84:00.1
fbc80000-fbc83fff : lpfc
fbc84000-fbc87fff : 0000:84:00.0
fbc84000-fbc87fff : lpfc
fbc88000-fbc88fff : 0000:84:00.1
fbc88000-fbc88fff : lpfc
fbc89000-fbc89fff : 0000:84:00.0
fbc89000-fbc89fff : lpfc
fbd00000-fbdfffff : PCI Bus 0000:83
fbd00000-fbd3ffff : 0000:83:00.1
fbd40000-fbd7ffff : 0000:83:00.0
fbd80000-fbd83fff : 0000:83:00.1
fbd80000-fbd83fff : lpfc
fbd84000-fbd87fff : 0000:83:00.0
fbd84000-fbd87fff : lpfc
fbd88000-fbd88fff : 0000:83:00.1
fbd88000-fbd88fff : lpfc
fbd89000-fbd89fff : 0000:83:00.0
fbd89000-fbd89fff : lpfc
fbe00000-fbefffff : PCI Bus 0000:82
fbe00000-fbe3ffff : 0000:82:00.1
fbe40000-fbe7ffff : 0000:82:00.0
fbe80000-fbe83fff : 0000:82:00.1
... 중략 ...
fbe80000-fbe83fff : lpfc
fbe84000-fbe87fff : 0000:82:00.0
fbe84000-fbe87fff : lpfc
fbe88000-fbe88fff : 0000:82:00.1
fbe88000-fbe88fff : lpfc
fbe89000-fbe89fff : 0000:82:00.0
fbe89000-fbe89fff : lpfc
fbf00000-fbf00fff : 0000:80:05.4
fbffc000-fbffcfff : dmar0
fec00000-fecfffff : PNP0003:00
fec00000-fec003ff : IOAPIC 0
fec01000-fec013ff : IOAPIC 1
fec40000-fec403ff : IOAPIC 2
fed00000-fed003ff : HPET 0
fed00000-fed003ff : PNP0103:00
fed12000-fed1200f : pnp 00:01
fed12010-fed1201f : pnp 00:01
fed1b000-fed1bfff : pnp 00:01
fed1c000-fed44fff : reserved
fed1c000-fed3ffff : pnp 00:01
fed1f410-fed1f414 : iTCO_wdt
fed45000-fed8bfff : pnp 00:01
fee00000-feefffff : pnp 00:01
fee00000-fee00fff : Local APIC
... 중략 ...
ff000000-ff3fffff : reserved
ff500000-ffffffff : reserved
100000000-127fffffff : System RAM
===========================ioport_resource============================
0000-0cf7 : PCI Bus 0000:00
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-0060 : keyboard
0061-0061 : PNP0800:00
0064-0064 : keyboard
0070-0071 : rtc0
... 중략 ...
70e0-70e3 : ahci
70f0-70f7 : 0000:00:11.4
70f0-70f7 : ahci
7100-7103 : 0000:00:11.4
7100-7103 : ahci
7110-7117 : 0000:00:11.4
7110-7117 : ahci
8000-ffff : PCI Bus 0000:80
** Execution took 0.51s (real) 0.43s (CPU)
유저영역에서 사용된 메모리도 아래와 같이 간단히 계산해 준다.
crash64> crashinfo --umem
RSS_TOTAL=71867908 pages, %mem= 93.4
** Execution took 0.57s (real) 0.30s (CPU), Child processes: 0.30s
즐겨 쓰는 명령중 하나인 fregs 는 bt 찾은 프로세스 스택의 주요 Register를
스크롤 올려가며 찾아봐야 하는걸 일목요연하게 찾아주어
시간을 상당히 줄여준다.
crash64> fregs
PID: 2064 TASK: ffff881191d4f000 CPU: 3 COMMAND: kworker/3:3
#0 machine_kexec called from 0xffffffff81112c98 <crash_kexec+104>
+R12: 0xffff880b736c7898
+R13: 0xffff880b736c7af8
+R14: 0x9
+R15: 0xffff881191d4f000
+RBP: 0xffff880b736c7958
+RBX: 0xffff880b736c7af8
#1 crash_kexec called from 0xffffffff8101a7f8 <oops_end+232>
+R12: 0x96
+RBP: 0xffff880b736c7988
+RBX: 0x9
#2 oops_end called from 0xffffffff8106d931 <no_context+353>
+R12: 0x96
+R13: 0x68
+RBP: 0xffff880b736c79d8
+RBX: 0xffff880b736c7af8
1 RDI: 0x96
2 RSI: 0xffff880b736c7af8
#3 no_context called from 0xffffffff8106db2d <__bad_area_nosemaphore+301>
+R12: 0x68
+R13: 0xffff880b736c7af8
+R14: 0xffff881191d4f000
+R15: 0x30001
+RBP: 0xffff880b736c7a28
+RBX: 0x0
1 RCX: 0xb
#4 __bad_area_nosemaphore called from 0xffffffff8106dc43 <bad_area_nosemaphore+19>
+R12: 0x0
+R13: 0x0
+R14: 0xffffffff814782e0
+R15: 0xffff881191d4f000
+RBP: 0xffff880b736c7a38
+RBX: 0x68
1 RCX: 0x30001
#5 bad_area_nosemaphore called from 0xffffffff8106e1c8 <__do_page_fault+808>
+RBP: 0xffff880b736c7aa8
2 RDI: 0xffff880b736c7af8
3 RSI: 0x0
taskinfo 를 이용하면 ps 또한 tree 형태로 볼 수 있다!
crash64> taskinfo --pstree
init(1)-+-ReportAgent(3635)---4*[{ReportAgent}]
|-SVRemoteConnect(1095)
|-abrtd(13382)
|-atd(13409)
|-auditd(8495)---{auditd}
|-automount(8980)---4*[{automount}]
|-biosagt(25634)
|-bonobo-activati(13691)---{bonobo-activati}
|-bpcd(27885)
|-busagt(25574)---5*[{busagt}]
|-certmonger(13454)
|-console-kit-dae(13572)---63*[{console-kit-dae}]
|-crond(13394)
|-crsd.bin(9712)---46*[{crsd.bin}]
|-cssdagent(8648)---15*[{cssdagent}]
|-cssdmonitor(8625)---15*[{cssdmonitor}]
|-dbus-daemon(8641)
|-dbus-daemon(13643)
|-dbus-launch(13642)
|-devkit-power-da(13647)
|-eecd(24802)---40*[{eecd}]
|-etheragt(25619)
|-evmd.bin(9430)---16*[{evmd.bin}]
|-gconfd-2(13669)
|-gdm-binary(13513)-+-gdm-simple-slav(13542)-+-Xorg(13545)---4*[{Xorg}]
| | |-gdm-session-wor(13734)
| | |-gnome-session(13644)-+-at-spi-registry(13688)
| | | |-gdm-simple-gree(13714)
| | | |-gnome-power-man(13713)
| | | |-metacity(13711)---{metacity}
| | | |-plymouth-log-vi(13712)
메모리 상태역시 아래와 같이 쉽게 확인 할 수 있다.
crash64> taskinfo --memory
==== First 8 Tasks reverse-sorted by RSS+SHM ====
PID= 27510 CMD=oracle RSS=11.023 Gb shm=15.002 Gb
PID= 10887 CMD=oracle RSS=4.986 Gb shm=15.002 Gb
PID= 11163 CMD=oracle RSS=4.545 Gb shm=15.002 Gb
PID= 10873 CMD=oracle RSS=4.388 Gb shm=15.002 Gb
PID= 10859 CMD=oracle RSS=4.359 Gb shm=15.002 Gb
PID= 10863 CMD=oracle RSS=4.359 Gb shm=15.002 Gb
PID= 10879 CMD=oracle RSS=4.179 Gb shm=15.002 Gb
PID= 5269 CMD=oracle RSS=3.901 Gb shm=15.002 Gb
==== First 8 Tasks Reverse-sorted by RSS only ====
PID= 27510 CMD=oracle RSS=11.023 Gb shm=15.002 Gb
PID= 10887 CMD=oracle RSS=4.986 Gb shm=15.002 Gb
PID= 11163 CMD=oracle RSS=4.545 Gb shm=15.002 Gb
PID= 10873 CMD=oracle RSS=4.388 Gb shm=15.002 Gb
PID= 10859 CMD=oracle RSS=4.359 Gb shm=15.002 Gb
PID= 10863 CMD=oracle RSS=4.359 Gb shm=15.002 Gb
PID= 10879 CMD=oracle RSS=4.179 Gb shm=15.002 Gb
PID= 5269 CMD=oracle RSS=3.901 Gb shm=15.002 Gb
=== Total Memory in RSS 68.539 Gb
=== Total Memory in SHM 15.004 Gb
** Execution took 2.83s (real) 1.24s (CPU), Child processes: 0.69s
정말 멋지지 않은가?
이제 크래쉬 분석을 위한 기술을 연습하고 내용을 공유하기위한 나의 길은 끝난거 같다 ㅋㅋㅋ
hanginfo 에서 볼 수 있는 내용은 아래와 같다.
crash64> hanginfo -h
Usage: hanginfo [options]
Options:
-h, --help show this help message and exit
-v verbose output
--version Print program version and exit
--maxpids=MAXPIDS Maximum number of PIDs to print
--sortbypid Sort by pid (the default is by ran_ago)
--syslogger Print info about hangs on AF_UNIX sockets (such as used by syslogd
--tree Print tree of resources owners (experimental!)
--saphana Print recommendations for SAP HANA specific hangs
안타깝게 현재 사용하는 샘플에서 hanginfo 는 충분한 정보를 보여주지 않았다.
특이한점은 SAP Hana 를 위한 인포메이션도 따로 제공한다는 점이다. ㅋㅋㅋ
또한 tslog 를 통해 기존 dmesg 커널 로그에 타임스탬프를 추가하여
직관적으로 해당 로그가 발생 된 순간과 경과시간을 확인할 수 있다.
2017-12-17 18:27:22 [8381485.416618] st 13:0:8:0: reservation conflict
2017-12-17 18:28:05 [8381528.355935] rport-13:0-2: blocked FC remote port time out: removing target and saving binding
2017-12-17 18:28:05 [8381528.356115] lpfc 0000:83:00.0: 2:(0):0203 Devloss timeout on WWPN 50:05:07:60:44:41:c6:01 NPort x010e00 Data: x40000 x1 x0
2017-12-17 18:28:32 [8381555.333221] rport-13:0-3: blocked FC remote port time out: removing target and saving binding
2017-12-17 18:28:32 [8381555.333376] lpfc 0000:83:00.0: 2:(0):0203 Devloss timeout on WWPN 50:05:07:60:44:41:c6:05 NPort x011200 Data: x100 x5 x0
2017-12-17 18:28:35 [8381558.332433] scsi 13:0:0:0: Sequential-Access IBM ULT3580-TD5 F990 PQ: 0 ANSI: 6
2017-12-17 18:28:39 [8381562.334719] st 13:0:0:0: Attached scsi tape st0
2017-12-17 18:28:39 [8381562.334727] st 13:0:0:0: st0: try direct i/o: yes (alignment 4 B)
2017-12-17 18:28:39 [8381562.334933] st 13:0:0:0: Attached scsi generic sg49 type 1
2017-12-17 18:29:02 [8381585.306996] lpfc 0000:83:00.0: 2:(0):2756 LOGO failure DID:011200 Status:x3/x2
2017-12-17 18:29:19 [8381602.293186] st 13:0:10:0: [st10] Error e0008 (driver bt 0x0, host bt 0xe).
2017-12-17 18:29:25 [8381608.287600] rport-13:0-2: blocked FC remote port time out: removing target and saving binding
2017-12-17 18:29:25 [8381608.287693] st 13:0:0:0: rejecting I/O to offline device
2017-12-17 18:29:25 [8381608.287706] scsi 13:0:0:1: rejecting I/O to offline device
2017-12-17 18:29:25 [8381608.287711] scsi 13:0:0:1: killing request
2017-12-17 18:34:01 [8381884.050200] rport-13:0-12: blocked FC remote port time out: removing target and saving binding
2017-12-17 18:34:01 [8381884.050323] lpfc 0000:83:00.0: 2:(0):0203 Devloss timeout on WWPN 50:05:07:60:44:41:c6:0b NPort x010600 Data: x40000 x1 x0
2017-12-17 18:34:01 [8381884.055245] scsi 13:0:10:0: scsi scan: 70 byte inquiry failed. Consider BLIST_INQUIRY_36 for this device
2017-12-17 18:34:01 [8381884.055364] ------------[ cut here ]------------
2017-12-17 18:34:01 [8381884.055377] WARNING: CPU: 3 PID: 2064 at fs/kernfs/dir.c:1253 kernfs_remove_by_name_ns+0xac/0xc0()
2017-12-17 18:34:01 [8381884.055380] kernfs: can not remove 'node_name', no directory
2017-12-17 18:34:01 [8381884.055381] Modules linked in: bridge stp llc ipmi_devintf seos(POE) tcp_diag inet_diag oracleasm autofs4 cpufreq_powersave bonding ipv6 scsi_dh_alua dm_round_robin dm_multipath uinput iTCO_wdt iTCO_vendor_support pcspkr ch osst st sb_edac edac_core i2c_i801 lpc_ich mfd_core xhci_pci xhci_hcd ixgbe mdio igb dca i2c_algo_bit ptp pps_core sg ipmi_ssif i2c_core ipmi_si ipmi_msghandler ext4 jbd2 mbcache2 sd_mod lpfc scsi_transport_fc ahci libahci be2net vxlan udp_tunnel ip6_udp_tunnel megaraid_sas mxm_wmi wmi dm_mirror dm_region_hash dm_log dm_mod
2017-12-17 18:34:01 [8381884.055448] CPU: 3 PID: 2064 Comm: kworker/3:3 Tainted: P OE 4.1.12-94.3.8.el6uek.x86_64 #2
2017-12-17 18:34:01 [8381884.055450] Hardware name: FUJITSU PRIMERGY RX2540 M2/D3289-B1, BIOS V5.0.0.11 R1.15.0 for D3289-B1x 02/24/2017
2017-12-17 18:34:01 [8381884.055462] Workqueue: fc_wq_13 fc_starget_delete [scsi_transport_fc]
2017-12-17 18:34:01 [8381884.055465] 0000000000000000 ffff880b736c7b78 ffffffff816e018f ffff880b736c7bc8
2017-12-17 18:34:01 [8381884.055470] 00000000000004e5 ffff880b736c7bb8 ffffffff810868c5 ffff880b736c7ba8
2017-12-17 18:34:01 [8381884.055474] 0000000000000000 ffffffffa00d22fd 0000000000000000 ffffffff814782e0
2017-12-17 18:34:01 [8381884.055478] Call Trace:
2017-12-17 18:34:01 [8381884.055488] [<ffffffff816e018f>] dump_stack+0x63/0x84
2017-12-17 18:34:01 [8381884.055497] [<ffffffff810868c5>] warn_slowpath_common+0x95/0xe0
2017-12-17 18:34:01 [8381884.055507] [<ffffffff814782e0>] ? transport_add_device+0x20/0x20
2017-12-17 18:34:01 [8381884.055512] [<ffffffff810869c6>] warn_slowpath_fmt+0x46/0x50
2017-12-17 18:34:01 [8381884.055519] [<ffffffff811ea590>] ? kfree+0x130/0x170
2017-12-17 18:34:01 [8381884.055525] [<ffffffff8128a33c>] kernfs_remove_by_name_ns+0xac/0xc0
2017-12-17 18:34:01 [8381884.055528] [<ffffffff8128c3c5>] sysfs_remove_file_ns+0x15/0x20
2017-12-17 18:34:01 [8381884.055533] [<ffffffff8146f329>] device_remove_file+0x19/0x20
2017-12-17 18:34:01 [8381884.055537] [<ffffffff81477b6b>] attribute_container_remove_attrs+0x5b/0x90
2017-12-17 18:34:01 [8381884.055542] [<ffffffff81477bb6>] attribute_container_class_device_del+0x16/0x30
2017-12-17 18:34:01 [8381884.055545] [<ffffffff81478339>] transport_remove_classdev+0x59/0x70
2017-12-17 18:34:01 [8381884.055549] [<ffffffff81477de0>] attribute_container_device_trigger+0xa0/0xe0
2017-12-17 18:34:01 [8381884.055553] [<ffffffff81478295>] transport_remove_device+0x15/0x20
2017-12-17 18:34:01 [8381884.055559] [<ffffffff814c9ba2>] scsi_target_reap_ref_release+0x32/0x50
2017-12-17 18:34:01 [8381884.055562] [<ffffffff814c9bec>] scsi_target_reap+0x2c/0x40
2017-12-17 18:34:01 [8381884.055567] [<ffffffff814cd1ab>] scsi_remove_target+0xeb/0x120
2017-12-17 18:34:01 [8381884.055587] [<ffffffffa00ce116>] fc_starget_delete+0x26/0x30 [scsi_transport_fc]
2017-12-17 18:34:01 [8381884.055596] [<ffffffff810a14e1>] process_one_work+0x151/0x4b0
2017-12-17 18:34:01 [8381884.055601] [<ffffffff810a1960>] worker_thread+0x120/0x480
2017-12-17 18:34:01 [8381884.055604] [<ffffffff816e05cb>] ? __schedule+0x30b/0x890
2017-12-17 18:34:01 [8381884.055608] [<ffffffff810a1840>] ? process_one_work+0x4b0/0x4b0
2017-12-17 18:34:01 [8381884.055613] [<ffffffff810a1840>] ? process_one_work+0x4b0/0x4b0
2017-12-17 18:34:01 [8381884.055617] [<ffffffff810a6b3e>] kthread+0xce/0xf0
2017-12-17 18:34:01 [8381884.055621] [<ffffffff810a6a70>] ? kthread_freezable_should_stop+0x70/0x70
2017-12-17 18:34:01 [8381884.055628] [<ffffffff816e52a2>] ret_from_fork+0x42/0x70
2017-12-17 18:34:01 [8381884.055632] [<ffffffff810a6a70>] ? kthread_freezable_should_stop+0x70/0x70
2017-12-17 18:34:01 [8381884.055639] ---[ end trace 614896ece49fce3d ]---
pyKdump 의 추가적인 장점은 epython 명령으로써,
내가 혹은 외부에서 가져온 extenstion python scripts 를 수행할 수 있다는 점이다.
crash64> epython -h
Usage:
epython [epythonoptions] [progname [--ehelp] [progoptions] [progargs]]
epythonoptions:
---------------
[-h|--help]
[-v|--version] - report versions
[-d|--debug n] - set debugging level
[-p|--path] - show Python version and syspath
[--ehelp] - show extra options, common for all programs
테스트 파이선을 작성해 로드해 보겠다.
# cat test.py
#!/usr/bin/env python
from optparse import OptionParser
use = "Usage: %prog [options] argument1 argument2"
parser = OptionParser(usage = use)
parser.add_option("-v", "--verbose", dest="verbose", action="store_true", default=False, help="Set mode to verbose.")
parser.add_option("-f", "--filename", dest="write", metavar="FILE", help="write output to FILE"),
encode_url = ""
if (encode_url == ""):
parser.add_option("-n", "--noaction",
action="store_true",
dest="noaction",
default=False,
help="Only colorising the output and not connection to server")
options, args = parser.parse_args()
if options.verbose:
print("Mode is set to verbose!")
if options.noaction:
print("Noaction Added!")
print("filename : ", options.write)
print("ARGs : ", args[0], args[1])
crash64> epython test.py -v -n -f myfile test1 test2
Mode is set to verbose!
Noaction Added!
filename : myfile
ARGs : test1 test2
** Execution took 0.00s (real) 0.00s (CPU)
crash64> epython test.py -v -f myfile test1 test2
Mode is set to verbose!
filename : myfile
ARGs : test1 test2
여기까지 PyKdump 모듈의 기능에 대해서 살펴보았다.
이제 커널정보들도 상당히 공개되고 알려져 먹고살기 참 힘든 세상이 되었다.
추가 팁 : 자동으로 mPyKdump 모듈을 로드시키고 싶을 경우,
.crashrc 파일을 만들어 아래와 같이 넣어주면 실행시 자동 로드한다.
.crashrc 는 크래쉬 툴 명령명과 동일하게 만들어 주면 된다.
(.crash64rc 와 같이.)
# cat ~/.crash64rc
extend /usr/local/lib64/mpykdump64.so
'Skills > mY Technutz' 카테고리의 다른 글
Mac 에서 launchctl 을 이용하여 특정 명령을 지정된 시간에 자동수행 시켜보자 (0) | 2020.03.19 |
---|---|
PyKdump extension - pycrashext (0) | 2019.12.14 |
kernel Dump Analysis #18 (0) | 2019.05.11 |
Kernel Dump Analysis #17 (0) | 2019.04.04 |
libfc: Update rport reference counting bug - 1368175 (0) | 2018.03.29 |