Linux soft lockup分析
本站寻求有缘人接手,详细了解请联系站长QQ1493399855
关键词:watchdog、soft lockup、percpu thread、lockdep等。
近日遇到一个soft lockup问题,打印类似“[ 56.032356] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [cat:153]“。
这是lockup检测机制在起作用,lockup检测机制包括soft lockup detector和hard lockup detector。
借机分析下soft lockup机制以及什么情况下导致soft watchdog异常、对watchdog的配置、如何定位异常点。
这里跳过hard lockup detector的分析。
1. soft lockup机制分析
lockup_detector_init()函数首先获取sample_period以及watchdog_cpumask,然后根据情况创建线程,启动喂狗程序;创建hrtimer启动看门狗。
然后有两个重点一个是创建内核线程的API以及struct smp_hotplug_thread结构体。
void __init lockup_detector_init(void) {set_sample_period();----------------------------------------获取变量sample_period,为watchdog_thresh*2/5,即4秒喂一次狗。 ...cpumask_copy(&watchdog_cpumask, cpu_possible_mask); if (watchdog_enabled)watchdog_enable_all_cpus(); }static int watchdog_enable_all_cpus(void) {int err = 0;if (!watchdog_running) {----------------------------------如果当前watchdog_running没有再运行,那么为每个CPU创建一个watchdog/x线程,这些线程每隔sample_period时间喂一次狗。watchdog_threads时watchdog/x线程的主要输入参数,watchdog_cpumask规定了为哪些CPU创建线程。err = smpboot_register_percpu_thread_cpumask(&watchdog_threads,&watchdog_cpumask);if (err)pr_err("Failed to create watchdog threads, disabled ");elsewatchdog_running = 1;} else { err = update_watchdog_all_cpus();if (err) {watchdog_disable_all_cpus();pr_err("Failed to update lockup detectors, disabled ");}}if (err)watchdog_enabled = 0;return err; }static void watchdog_disable_all_cpus(void) {if (watchdog_running) {watchdog_running = 0;smpboot_unregister_percpu_thread(&watchdog_threads);} }static int update_watchdog_all_cpus(void) {int ret;ret = watchdog_park_threads();if (ret)return ret;watchdog_unpark_threads();return 0; }static int watchdog_park_threads(void) {int cpu, ret = 0;atomic_set(&watchdog_park_in_progress, 1);for_each_watchdog_cpu(cpu) {ret = kthread_park(per_cpu(softlockup_watchdog, cpu));---------------------------设置struct kthread->flags的KTHREAD_SHOULD_PARK位,在watchdog/x线程中会调用unpark成员函数进行处理。if (ret)break;}atomic_set(&watchdog_park_in_progress, 0);return ret; }static void watchdog_unpark_threads(void) {int cpu;for_each_watchdog_cpu(cpu)kthread_unpark(per_cpu(softlockup_watchdog, cpu));-------------------------------清空struct kthread->flags的KTHREAD_SHOULD_PARK位,在watchdog/x线程中会调用park成员函数。 }