System Log: RAM Test - calab-ntu/gpu-cluster GitHub Wiki


2020/04/07

Node Logs
eureka00
eureka01 [2020/05/14] test each memory individually(6 rounds) * P000143: Fail, 1 error occurred * P000137: Fail, 85 errors occurred * P000144: Pass * P000140: Pass * P000135: Pass * P000136: Pass * P000139: Pass * P000138: Pass [2020/05/19] Reproducible test(6 rounds) * P000143: Pass * P000137: Fail, 73 errors occurred [2020/05/19] Replaced fail RAM by health RAM on eureka13
eureka02 [2020/04/10] failed (66 Hr, centOS7 boot USB) [2020/04/15] test each memory individually(4 rounds) * T000364: Pass * T000368: Fail, 86 errors occurred * P000047: Pass * P000048: Pass * P002249: Pass * P002250: Pass * P000051: Pass * P000052: Pass [2020/04/20] Reproducible test(4 rounds) * T000364: Pass * T000368: Fail, 108 errors occurred * P000047: Pass * P000048: Pass * P002249: Pass * P002250: Pass * P000051: Pass [2020/04/21] Reproducible test(4 rounds) * T000368: Fail, 103 errors occurred [2020/04/21] Replaced fail RAM by health RAM on eureka33
eureka03 [2020/04/10] passed (66 Hr, centOS7 boot USB)
eureka04 [2020/04/10] passed (66 Hr, centOS7 boot USB)
eureka05 [2020/04/06] passed (120 Hr, centOS7 boot USB) [2020/04/10] failed (6 round, Memtest86) [2020/04/15] test each memory individually(4 rounds) * P000033: Pass * P000034: Pass * P000366: Pass * P000367: Pass * P000363: Pass * P000358: Pass * P000154: Pass * P000153: Pass [2020/04/17] test each memory individually(12 rounds) * P000033: Pass * P000034: Pass * P000366: Pass * P000367: Pass * P000363: Pass * P000358: Pass * P000154: Pass * P000153: Pass [2020/07/06] After replace MB, RAMtest86 passed.(8 rounds)
eureka06 [2020/04/10] passed (66 Hr, centOS7 boot USB)
eureka07 [2020/04/10] passed (49 Hr, centOS7 boot USB)
eureka08 [2020/05/14] test each memory individually(6 rounds) * P000496: Fail, 3 errors occured * P000497: Pass * P000190: Pass * P000189: Pass * P000180: Pass * P000179: Pass * P000478: Pass * P000479: Pass [2020/05/19] Reproducible test(6 rounds) * P000496: Fail, 2 errors occurred P000497: Pass [2020/05/19] Replaced fail RAM by health RAM on eureka21
eureka09 [2020/05/15] test each memory individually(6 rounds) * P000012: Pass * P000011: Pass * P000014: Pass * P000013: Pass * P000071: Pass * P000072: Pass * P000017: Pass * P000018: Pass [2020/05/19] Reproducible test(6 rounds) * P000011: Fail, 1 error occurred * P000012: Pass [2020/05/19] Replaced fail RAM by health RAM on eureka13
eureka10 [2020/05/05] test each memory individually(6 rounds) * P000365: Fail, 1 error occurred * P000178: Fail, 62 errors occurred. * P000177: Fail, 1 error occurred * P000188: Pass * P000187: Pass * P000052: Pass * P000051: Pass * P000368: Pass [2020/05/12] Reproducible test(6 rounds) * P000178: Fail, 61 errors occurred * P000365: Fail, 1 error occurred * P000177: Fail, 2 errors occurred [2020/05/13] Replaced fail RAM by health RAM on eureka21
eureka11 [2020/04/10] failed (66 Hr, centOS7 boot USB) [2020/04/16] test each memory individually(4 rounds) * P000155: Pass * P000156: Fail, 1 error occurred. * P000073: Pass * P000074: Pass * P000157: Pass * P000158: Pass * P000008: Pass * P000009: Pass [2020/04/20] Reproducible test(4 rounds) * P000156: Pass [2020/04/21] Reproducible test(4 rounds) * P000155: Pass * P000009: Pass * P000008: Pass * P000156: Pass * P000158: Pass * P000157: Pass * P000073: Pass [2020/04/21] Replaced fail RAM by health RAM on eureka33
eureka12 [2020/03/30] passed (70 Hr, centOS7 boot USB)
eureka13 [2020/05/18] test each memory individually(6 rounds) * P000028: Pass * P000027: Pass * P000070: Pass * P000069: Pass * P000022: Pass * P000021: Pass * P000015: Pass * P000016: Pass [2020/05/19] Reproducible test(6 rounds) * P000028: Pass * P000027: Pass
eureka14 [2020/05/06] test each memory individually(6 rounds) * P000481: Fail, 5 errors occurred * P000480: Pass * P000486: Pass * P000466: Pass * P000107: Pass * P000108: Pass * P000101: Pass * P000102: Pass [2020/05/12] Reproducible test(6 rounds) * P000481: Pass [2020/05/13] Replaced fail RAM by health RAM on eureka21
eureka15 [2020/05/06] test each memory individually(6 rounds) * P000133: Fail, 100 errors occurred * P000111: Fail, 14 errors occurred * P000104: Pass * P000114: Pass * P000103: Pass * P000110: Pass * P000109: Pass * P000112: Pass [2020/05/12] Reproducible test(6 rounds) * P000111: Fail, 1 error occurred * P000133: Fail, 103 error occurred [2020/05/13] Replaced fail RAM by health RAM on eureka21
eureka16 [2020/04/22] test each memory individually(8 rounds) * P000095: Pass * P000098: Pass * P000097: Pass * P000096: Pass * P000105: Pass * P000106: Pass * P000099: Pass * P000100: Pass
eureka17 [2020/05/07] test each memory individually(6 rounds) * P000018: Pass * P000017: Pass * P000031: Pass * P000030: Pass * P000020: Pass * P000019: Pass * P000035: Pass * P000036: Pass
eureka18 [2020/05/08] test each memory individually(6 rounds) * P000007: Fail, 5 errors occurred * P000006: Pass * P000021: Pass * P000022: Pass * P000201: Pass * P000010: Pass * P000004: Pass * P000005: Pass [2020/05/12] Reproducible test(6 rounds) * P000007: Fail, 1 error occurred [2020/05/13] Replaced fail RAM by health RAM on eureka21
eureka19 [2020/05/11] test each memory individually(6 rounds) * P000027: Pass * P000028: Pass * P000030: Pass * 25-P000029: Pass * 25-P000025: Pass * 25-P000026: Pass * 25-P000024: Pass * 25-P000023: Pass
eureka20 [2020/05/11] test each memory individually(6 rounds) * 35-P000025: Pass * 35-P000026: Pass * 35-P000029: Pass * P000034: Pass * P000033: Pass * P000032: Pass * 35-P000023: Pass * 35-P000024: Pass [2020/05/12] Reproducible test(6 rounds) * 35-P000025: Pass
eureka21 [2020/05/12] test each memory individually(6 rounds) * P002264: Pass * P002263: Pass * P002247: Pass * P002248: Pass * P002234: Pass * P002266: Pass * P002265: Pass * P002233: Pass
eureka22 [2020/04/30] test each memory individually(6 rounds) * P002244: Pass * P002243: Pass * P002256: Pass * P002255: Pass * P002240: Pass * P002239: Pass * P002246: Pass * P002245: Pass
eureka23 [2020/04/24] test each memory individually(8 rounds) * P000467: Pass * P000495: Fail, 14304 errors occurred * P000493: Pass * P000494: Pass * P000469: Pass * P000492: Pass * P000468: Pass * P000470: Pass [2020/04/30] Reproducibility test(8 rounds) * P000495: Fail, 14065 errors occurred [2020/05/04] Replace RAMs fail memtest with health ones from eureka33(P000258)
eureka24 [2020/04/10] passed (66 Hr, centOS7 boot USB)
eureka25 [2020/04/24] test each memory individually(6 rounds) * P000146: Fail, 75 errors occurred * P000361: Fail, 24 errors occurred * P000152: Pass * P000145: Pass * P000360: Pass * P000150: Pass * P000149: Pass * P000151: Pass [2020/04/30] Reproducibility test(8 rounds) * P000146: Fail, 27 errors occurred * P000361: Fail, 133 errors occurred [2020/05/04] Replace RAMs fail memtest with health ones from eureka33(P000264, P000268)
eureka26 [2020/04/22] test each memory individually(6 rounds) * P000455: fail, 5 errors occurred * P000166: fail, 10 errors occurred * P000454: Pass * P000485: Pass * P000163: Pass * P000165: Pass * P000484: Pass * P000164: Pass [2020/04/30] Reproducibility test(8 rounds) * P000455: Fail, 12 errors occurred * P000166: Fail, 3 errors occurred [2020/05/04] Replace RAMs fail memtest with health ones from eureka33(P000262, P000263)
eureka27 [2020/04/28] test each memory individually(6 rounds) * P000489: Pass * P000488: Pass * P000257: Pass * P000261: Pass * P000259: Pass * P000260: Pass * P000282: Pass * P000285: Pass [2020/04/30] Reproducibility test(8 rounds) * P000489: Pass
eureka28 [2020/04/28] test each memory individually(6 rounds) * P000040: Pass * P000039: Pass * P000462: Pass * P000463: Pass * P000045: Pass * P000046: Pass * P000464: Pass * P000465: Pass [2020/04/30] Reproducibility test(8 rounds) * P000045: Pass
eureka29 [2020/04/28] test each memory individually(6 rounds) * P000265: Pass * P000266: Pass * P000288: Pass * P000287: Pass 2 round, then eureka25 crushed. * P000276: Pass * P000279: Pass * P000491: Pass * P000490: Pass [2020/04/30] Reproducibility test(8 rounds) * P000287: Pass
eureka30 [2020/04/29] test each memory individually(6 rounds) * P000176: Pass * P000175: Pass * P000003: Pass * P000002: Pass * P000174: Pass * P000173: Pass * P000200: Pass * P000199: Pass
eureka31 [2020/04/10] passed (66 Hr, centOS7 boot USB)
eureka32 [2020/04/10] passed (66 Hr, centOS7 boot USB)
eureka33 [2020/04/10] passed (66 Hr, centOS7 boot USB) [2020/04/20] test each memory individually(4 rounds) * P000258: Pass * P000263: Pass * P000262: Pass * P000264: Pass * P000268: Pass * P000267: Pass * P000452: Pass

Links