ÎÊÌâÃèÊö
Óû§ÔÚʹÓ÷þÎñPGµç¾º¹ÙÍø´îÅäLinuxϵͳ£¬±ÈÈçºìñ¡¢CentOS£¬ÓÐʱ»áÓöµ½messageÖÐÓÐEDACÄڴ汨´í£¬µ«ÊÇÔÚÖ÷»úµÄBMCµÄʼþÈÕÖ¾Öв¢Î´·¢ÏÖÄÚ´æ´íÎó¡£
ÀýÈçÏÂÃæÄ³¿Í»§·´À¡µÄÈÕÖ¾£º
Feb 4 21:00:06 localhost kernel: EDAC MC6: 1 CE memory read error on CPU_SrcID#3_MC#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x9a97c19 offset:0xac0 grain:32 syndrome:0x0 ¨C err_code:0101:0091 socket:3 imc:0 rank:1 bg:1 ba:1 row:6cb5 col:2f8)
Éæ¼°·¶Î§
Ó²¼þ£ºM5¡¢M6ƽ̨·þÎñPGµç¾º¹ÙÍø
Èí¼þ£ºLinuxϵͳ£¬ÖîÈçRHEL£¨Redhat£©¡¢CentOS¡¢SLES£¨Suse£©¡¢UbuntuµÈ
´¦Àí·½°¸
½«edacÄ£¿é¼ÓÈëºÚÃûµ¥£º
1¡¢Áгö edac Ä£¿é£º

2¡¢Ð޸ĺÚÃûµ¥£º

½«Ïà¹ØµÄedacÄ£¿éÌí¼Óµ½Îļþµ×²¿
Èç²éѯµÄedacÄ£¿éΪskx_edacºÍedac_coreÄǾÍÖ»Ìí¼ÓÈçÏÂÑ¡Ïî¼´¿É¡£

3¡¢ÖØÆôÖ®ºóʹÓÃ1µÄ·½Ê½ÑéÖ¤£»
¹ÊÕϸùÒò
EDAC¼´error detection and correction£¨´íÎó¼ì²âÓë¾ÀÕý£©£¬ËüÊÇLinuxϵͳÄÚ²¿µÄÒ»ÖÖÕï¶Ï»úÖÆ¡£ÔÚÉÏÃæµÄÈÕÖ¾ÖУ¬¿ÉÒÔÇå³þµØ¿´³öÊÇÄÚ´æ¶Á´íÎó¡£
ÆäÖÐMC¼´memory controller£¨ÄÚ´æ¿ØÖÆPGµç¾º¹ÙÍø£©¡£CEÔò´ú±ícorrectable error£¬ÊÇECCÄÚ´æÖпÉÒÔ¾ÀÕýµÄ´íÎó£¬Ïà¶ÔµØ»¹ÓÐUE£¨uncorrectable error£©¡£
ÓÉÓڼĴæPGµç¾º¹ÙÍøÄ¬ÈÏΪֻ¶ÁÒ»´Îºóɾ³ýÊý¾Ý£¬µ±EDACÄ£¿é±ÈBMCÏȴӼĴæPGµç¾º¹ÙÍøÖжÁÈ¡ÁË´íÎóºó£¬BMC½«ÎÞ·¨¶ÁÈ¡µ½´íÎóÐÅÏ¢£¬ËùÒÔÈç·Ç¿Í»§Ã÷È·ÐèÇó½¨Òé½ûÓÃEDACÄ£¿é¡£
½¨ÒéÓë×ܽá
Ö÷»úµÄBIOS/BMCÓÐÒ»Ì××Ô¼ºµÄ¹ÊÕÏ´¦Àí»úÖÆ£¬Äܹ»¶ÔCPU¡¢ÄÚ´æ¡¢PCIeÉ豸¹ÊÕÏ×öͳһ´¦Àí¡£Í¬Ê±¶ÔÄÚ´æ´íÎóÓÐãÐÖµºÍ©¶·¹ýÂË¿ØÖÆ£¬µ±¼ÆÊýµ½´ïãÐÖµ»á´¥·¢SMI£¬ÕâʱBIOS»áÊÕ¼¯·¢ÉúCEµÄÄÚ´æÐÅÏ¢·¢Ë͵½BMC¼Ç¼µ½ÏµÍ³ÈÕÖ¾£¬BMCÄÜÔÚϵͳÈÕÖ¾¼Ç¼³ö´íµÄÄÚ´æÎ»ÖúÍÄÚ´æ´íÎóÀàÐÍΪ¿É¾ÀÕý´íÎó¡£
OSÄÚMCE¼Ç¼µÄÄÚ´æ´íÎó£¬Ã¿´Î³öÏÖÒ»´Î¿ÉÐÞ¸´µÄECC£¬BIOS¾Í»á´¥·¢Ò»´ÎCMCIÈÃOS¼Ç¼£¬Ã»ÓÐãÐÖµÉèÖã¬Ò²²»»á×öÄÚ´æ´íÎó¸ôÀ룬»á¸ø¿Í»§Ôì³ÉÒ»¶¨µÄÀ§ÈÅ¡£ËùÒÔ½¨ÒéºöÂÔOSϵÄÄÚ´æMCE±¨´í¼Ç¼¡£
