Analysis of the ctlog1.txt file by Hale Landis, 23 September 2003 This log file is the result of a 94 hour run of ATACT on a SATA drive. There are 31 errors in this log: 2 data compare errors (wrong sectors read) and 29 other errors (5 of these resulted in an ATADRVR timeout error). First thing to notice is that these errors are grouped around various times - a few errors over a minute or two and then a few hours of no errors followed by another group of errors over a few minute time frame. The errors do not seem to be grouped around ranges of LBA locations on the media (there is no indicate of a media problem on this drive). You must be very careful when looking at errors on a SATA device. The status reported to the host software by the host controller hardware can be (usually is?) out of sync with the actual device status. SATA seems to have the ability to randomize the ATA status information making it very difficult to determine what is really happened. High level summary: * There are 12 errors that appear to be the result of the host or device causing a "reset" (SATA COMINIT?) in the middle of a command. * There are 8 strange status errors where is appears that the device has "disappeared" from the interface. * There are 4 errors when the device appears to be hung with BSY=1 status during a data transfer command. * There are 4 errors when the device appears to be hung with DRQ=1 status during a data transfer command. * There are 2 data compare errors because the wrong sectors were read from the device. In both cases the retry of the same command did not fail. Could this be a case of the command parameters received by the drive being corrupted? See errors #23 and #27. * There is 1 ABRT error that appears to be the result of the device receiving a corrupted command. In this case the Sector Number register appears to have changed to FFH making the command invalid. See error #30. These are all very serious errors and errors that are extremely unlikely on a PATA interface. If these errors happened on PATA this system or device would never qualify to be shipped to customers. Why are SATA devices with these error conditions in your local computer store? Here is a summary of each error... #1 !!!!!!!!!! ERROR !!!!!!!!!! -- Thu Sep 18 15:42:52 2003 Device Primary 0, command 430058, WRITE MULTIPLE EXT, started at LBA 75283093 47CBA95H (CHS 9149 9 47) for 38188 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 2C 95 BA 7C 40 39 -- -- 02 After Cmd: -- 01 01 01 00 00 00 -- 50 50 -- The host and device "reset" themselves - note the "reset signature" values in the registers. #2 !!!!!!!!!! ERROR !!!!!!!!!! -- Thu Sep 18 15:43:53 2003 Device Primary 0, command 430071, WRITE DMA EXT, started at LBA 75368719 47E090FH (CHS 9234 8 56) for 424 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- A8 0F 09 7E 40 35 -- -- 00 After Cmd: -- 00 A8 04 00 00 40 -- D0 D0 -- Device hung BSY=1 during command. Probably some packet was lost on the SATA interface causing the host and device sides to get out of sync. #3 !!!!!!!!!! ERROR !!!!!!!!!! -- Thu Sep 18 17:13:28 2003 Device Primary 0, command 548731, WRITE SECTORS EXT, started at LBA 97129092 5CA1284H (CHS 30822 3 40) for 27173 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 25 84 12 CA 40 34 -- -- 02 After Cmd: -- 01 01 01 00 00 00 -- 50 50 -- The host and device "reset" themselves - note the "reset signature" values in the registers. #4 !!!!!!!!!! ERROR !!!!!!!!!! -- Thu Sep 18 17:14:18 2003 Device Primary 0, command 548736, WRITE DMA, started at LBA 97156265 5CA7CA9H (CHS 30849 2 60) for 226 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: BA -- E2 A9 7C CA 40 CA -- -- 00 After Cmd: -- 00 E2 A9 7C CA 45 -- D0 D0 -- Device hung BSY=1 during command. Probably some packet was lost on the SATA interface causing the host and device sides to get out of sync. #5 !!!!!!!!!! ERROR !!!!!!!!!! -- Thu Sep 18 17:39:15 2003 Device Primary 0, command 581590, WRITE MULTIPLE EXT, started at LBA 103134662 625B5C6H (CHS 36780 2 9) for 61071 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 8F C6 B5 25 40 39 -- -- 02 After Cmd: -- 01 01 01 00 00 00 -- 50 50 -- The host and device "reset" themselves - note the "reset signature" values in the registers. #6 !!!!!!!!!! ERROR !!!!!!!!!! -- Thu Sep 18 17:40:11 2003 Device Primary 0, command 581600, WRITE DMA, started at LBA 103196696 626A818H (CHS 36841 10 51) for 229 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: D2 -- E5 18 A8 26 40 CA -- -- 00 After Cmd: -- 00 E5 18 A8 26 46 -- D0 D0 -- The host and device "reset" themselves - note the "reset signature" values in the registers. #7 !!!!!!!!!! ERROR !!!!!!!!!! -- Thu Sep 18 18:39:35 2003 Device Primary 0, command 663235, WRITE SECTORS, started at LBA 117408384 6FF8280H (CHS 50940 9 10) for 34 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 39 -- 22 80 82 FF 40 30 -- -- 02 After Cmd: -- FF FF FF FF FF FF -- 7F 7F -- Strange status. How does this happen? Here it looks like the device has disappeared. #8 !!!!!!!!!! ERROR !!!!!!!!!! -- Fri Sep 19 07:26:49 2003 Device Primary 0, command 1732668, WRITE MULTIPLE EXT, started at LBA 301207372 11F40F4CH (CHS 36672 13 26) for 31969 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- E1 4C 0F F4 40 39 -- -- 02 After Cmd: -- 01 01 01 00 00 00 -- 50 50 -- The host and device "reset" themselves - note the "reset signature" values in the registers. #9 !!!!!!!!!! ERROR !!!!!!!!!! -- Fri Sep 19 07:27:40 2003 Device Primary 0, command 1732675, WRITE DMA EXT, started at LBA 301239564 11F48D0CH (CHS 36704 12 25) for 35 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 23 0C 8D F4 40 35 -- -- 00 After Cmd: -- 00 23 11 00 00 40 -- D0 D0 -- Device hung BSY=1 during command. Probably some packet was lost on the SATA interface causing the host and device sides to get out of sync. #10 !!!!!!!!!! ERROR !!!!!!!!!! -- Fri Sep 19 08:22:40 2003 Device Primary 0, command 1814442, READ SECTORS EXT, started at LBA 1159677 11B1FDH (CHS 1150 7 37) for 16468 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 54 FD B1 11 40 24 -- -- 02 After Cmd: -- 01 01 01 00 00 00 -- 50 50 -- The host and device "reset" themselves - note the "reset signature" values in the registers. #11 !!!!!!!!!! ERROR !!!!!!!!!! -- Fri Sep 19 08:36:48 2003 Device Primary 0, command 1838912, READ MULTIPLE EXT, started at LBA 3792538 39DE9AH (CHS 3762 7 2) for 61873 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- B1 9A DE 39 40 29 -- -- 02 After Cmd: -- 01 01 01 00 00 00 -- 50 50 -- The host and device "reset" themselves - note the "reset signature" values in the registers. #12 !!!!!!!!!! ERROR !!!!!!!!!! -- Fri Sep 19 08:37:01 2003 Note: This is the retry of the previous error. Device Primary 0, command 1838916, READ MULTIPLE EXT, started at LBA 3792538 39DE9AH (CHS 3762 7 2) for 61873 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- B1 9A DE 39 40 29 -- -- 02 After Cmd: -- 01 01 01 00 00 00 -- 50 50 -- And the retry of the previous error also failed! The host and device "reset" themselves - note the "reset signature" values in the registers. #13 !!!!!!!!!! ERROR !!!!!!!!!! -- Fri Sep 19 11:11:55 2003 Device Primary 0, command 2035524, READ SECTORS, started at LBA 39081183 25454DFH (CHS 38771 0 16) for 80 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 3C -- 50 DF 54 54 40 20 -- -- 02 After Cmd: -- 00 50 DF 54 54 42 -- 58 58 -- Device hung with DRQ=1 during data transfer - What happened here? The status would indicate some problem during the PIO data transfer. Is this another case of a lost packet on the SATA interface? #14 !!!!!!!!!! ERROR !!!!!!!!!! -- Fri Sep 19 11:12:07 2003 Device Primary 0, command 2035538, READ SECTORS EXT, started at LBA 39083486 2545DDEH (CHS 38773 4 51) for 50549 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 75 DE 5D 54 40 24 -- -- 02 After Cmd: -- 01 01 01 00 00 00 -- 50 50 -- The host and device "reset" themselves - note the "reset signature" values in the registers. #15 !!!!!!!!!! ERROR !!!!!!!!!! -- Fri Sep 19 11:12:26 2003 Note: This is the retry of the previous error. Device Primary 0, command 2035542, READ SECTORS EXT, started at LBA 39083486 2545DDEH (CHS 38773 4 51) for 50549 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 75 DE 5D 54 40 24 -- -- 02 After Cmd: -- 01 01 01 00 00 00 -- 50 50 -- The host and device "reset" themselves - note the "reset signature" values in the registers. #16 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 12:23:06 2003 Device Primary 0, command 4136500, READ MULTIPLE EXT, started at LBA 169468225 A19E141H (CHS 37051 3 53) for 26 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 1A 41 E1 19 40 29 -- -- 02 After Cmd: -- FF FF FF FF FF FF -- 80 80 -- Strange status. How does this happen? Here it looks like the device has disappeared. #17 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 12:23:36 2003 Device Primary 0, command 4136949, READ MULTIPLE EXT, started at LBA 2343973 23C425H (CHS 2325 5 59) for 3399 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 47 25 C4 23 40 29 -- -- 02 After Cmd: -- 01 01 01 00 00 00 -- 50 50 -- The host and device "reset" themselves - note the "reset signature" values in the registers. #18 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 12:25:21 2003 Device Primary 0, command 4137912, WRITE MULTIPLE EXT, started at LBA 211399016 C99B168H (CHS 13113 3 60) for 477 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- DD 68 B1 99 40 39 -- -- 02 After Cmd: -- 01 01 01 00 00 00 -- 50 50 -- The host and device "reset" themselves - note the "reset signature" values in the registers. #19 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 12:26:09 2003 Device Primary 0, command 4137967, READ DMA, started at LBA 79494053 4BCFBA5H (CHS 13327 2 24) for 189 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 29 -- BD A5 FB BC 40 C8 -- -- 00 After Cmd: -- 00 BD A5 FB BC 44 -- D0 D0 -- Device hung BSY=1 during command. Probably some packet was lost on the SATA interface causing the host and device sides to get out of sync. #20 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 14:00:54 2003 Device Primary 0, command 4299887, WRITE MULTIPLE, started at LBA 170677617 A2C5571H (CHS 38251 0 34) for 5 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 20 -- 05 71 55 2C 40 C5 -- -- 02 After Cmd: -- FF FF FF FF FF FF -- 7F 7F -- Strange status. How does this happen? Here it looks like the device has disappeared. #21 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 14:05:43 2003 Device Primary 0, command 4304392, WRITE SECTORS, started at LBA 178284018 AA065F2H (CHS 45797 1 4) for 1 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 82 -- 01 F2 65 A0 40 30 -- -- 02 After Cmd: -- FF FF FF FF FF FF -- 7F 7F -- Strange status. How does this happen? Here it looks like the device has disappeared. #22 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 14:10:51 2003 Device Primary 0, command 4311557, WRITE MULTIPLE, started at LBA 176167937 A801C01H (CHS 43697 12 30) for 1 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: A4 -- 01 01 1C 80 40 C5 -- -- 02 After Cmd: -- FF FF FF FF FF FF -- 7F 7F -- Strange status. How does this happen? Here it looks like the device has disappeared. #23 !!!!!!!!!! DATA COMPARE ERROR !!!!!!!!!! -- Sat Sep 20 14:12:52 2003 Device Primary 0, command 4313862, READ DMA EXT, started at LBA 18004275 112B933H (CHS 17861 6 10) for 93 sectors, 0 sectors ok, 93 sectors bad, 0 sectors bypassed. The wrong sectors were read. #24 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 14:15:14 2003 Device Primary 0, command 4316129, READ SECTORS, started at LBA 202776557 C161FEDH (CHS 4559 3 33) for 23 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 61 -- 17 ED 1F 16 40 20 -- -- 02 After Cmd: -- 00 17 ED 1F 16 4C -- 58 58 -- Device hung with DRQ=1 during data transfer - What happened here? The status would indicate some problem during the PIO data transfer. Is this another case of a lost packet on the SATA interface? #25 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 14:15:34 2003 Device Primary 0, command 4316423, READ MULTIPLE EXT, started at LBA 246389894 EAF9C86H (CHS 47826 6 45) for 149 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 95 86 9C AF 40 29 -- -- 02 After Cmd: -- 00 95 0E 00 00 40 -- 58 58 -- Device hung with DRQ=1 during data transfer - What happened here? The status would indicate some problem during the PIO data transfer. Is this another case of a lost packet on the SATA interface? #26 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 14:16:04 2003 Device Primary 0, command 4316958, READ MULTIPLE EXT, started at LBA 77265782 49AFB76H (CHS 11116 8 63) for 452 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- C4 76 FB 9A 40 29 -- -- 02 After Cmd: -- FF FF FF FF FF FF -- 80 80 -- Strange status. How does this happen? Here it looks like the device has disappeared. #27 !!!!!!!!!! DATA COMPARE ERROR !!!!!!!!!! -- Sat Sep 20 14:16:25 2003 Device Primary 0, command 4317162, READ DMA EXT, started at LBA 119261875 71BCAB3H (CHS 52779 5 41) for 230 sectors, 0 sectors ok, 230 sectors bad, 0 sectors bypassed. The wrong sectors were read. #28 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 14:19:21 2003 Device Primary 0, command 4319511, SET MULTIPLE MODE, started at LBA 165107559 9D75767H (CHS 32725 2 58) for 8 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 28 -- 08 67 57 D7 40 C6 -- -- 02 After Cmd: -- FF FF FF FF FF FF -- 7F 7F -- Strange status. How does this happen? Here it looks like the device has disappeared. #29 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 14:19:33 2003 Device Primary 0, command 4319647, READ SECTORS EXT, started at LBA 158577967 973B52FH (CHS 26247 6 38) for 6 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 06 2F B5 73 40 24 -- -- 02 After Cmd: -- FF FF FF FF FF FF -- 80 80 -- Strange status. How does this happen? Here it looks like the device has disappeared. #30 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 14:19:55 2003 Device Primary 0, command 4319926, READ MULTIPLE EXT, started at LBA 76665263 491D1AFH (CHS 10520 12 60) for 6 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 00 -- 06 AF D1 91 40 29 -- -- 02 After Cmd: -- 04 00 FF 00 00 40 -- 51 51 -- A real error? Probably not. The retry of the same command was sucessful. Notice how the SN register seems to have changed to FFH. #31 !!!!!!!!!! ERROR !!!!!!!!!! -- Sat Sep 20 14:20:15 2003 Device Primary 0, command 4319986, WRITE MULTIPLE, started at LBA 260662924 F89668CH (CHS 61986 2 47) for 2 sectors. ATA Intf Regs: FR ER SC SN CL CH DH CM ST AS DC Cmd Params: 08 -- 02 8C 66 89 40 C5 -- -- 02 After Cmd: -- 00 02 8C 66 89 4F -- 58 58 -- Device hung with DRQ=1 during data transfer - What happened here? The status would indicate some problem during the PIO data transfer. Is this another case of a lost packet on the SATA interface? (This is REALLY REALLY BAD! Hale)