The ``edac`` kernel module's goal is to detect and report hardware errors that occur within the computer system running under linux.
? ? ? ? ? ? ? ? 《Documentation/admin-guide/ras.rst》
EDAC可以检测物理内存的错误 和 PCIE的错误,本文主要分析后者。
edac_init();
-> edac_workqueue_setup();
-> alloc_ordered_workqueue();
edac_pci_add_device();
-> INIT_DELAYED_WORK(&pci->work, edac_pci_workq_function);
edac_pci_workq_function();
-> pci->edac_check();
-> edac_pci_generic_check();
-> edac_pci_do_parity_check();
-> edac_pci_dev_parity_test();
-> edac_queue_work(&pci->work, delay); //周期执行edac_pci_workq_function(),周期时间为1秒
此函数会读取每个PCIE设备的配置头空间中的Status寄存器,从中读取具体的错误信息。
/sys/module/edac_core/parameters/check_pci_errors
/sys/devices/system/edac/pci/pci_nonparity_count
/sys/devices/system/edac/pci/pci_parity_count