Deeper research of broken HDD

Date: November 09, 2020

Introduction

In the previous article, I got an access to data on failing HDD and even temporarily fixed NTFS table. This HDD was gifted for tech experiments, and it was a nice occasion to make a deeper diagnostics of the device. Drive SMART before the research looked like this.

SMART before research

Preparing tools

I was surprised to notice, that there aren’t many tools for HDD testing, especially for Linux. Some of them (Victoria or MHDD) are based on MS-DOS and were written over 15 years ago, but still usable. In order to use MHDD, PC should support running hard drives (even SATA ones) in IDE legacy mode. As a testing platform, MS-DOS is recommended by the software author. However, I managed to run it from FreeDOS, an open-source DOS-compatible environment.

In order to create bootable pen drive, I used the command sudo dd if=PATH_TO_FD12LITE.img of=/dev/sda status=progress && sync. The first partition is a bootable one from the image, and the second partition was created manually to copy MHDD files and to store its logs. Since the program was written many years ago, last release dates back to 2005, I made a small FAT32 partition for data storage.

Bootable pen drive

Starting the research

After the bootable device is created and FreeDOS is launched, your should choose the language and select option to run without installation “No, return to DOS”.

FreeDOS

The interface of MHDD is quite scary for new users and looks like this. However, there is an instruction how to use it.

MHDD Interface

Step numer one is connecting HDD to MHDD. First your should type Shift + F3 to select HDD for scanning.

MHDD choose drive

Another step was simple scan of HDD surface to check for bad blocks, without parameters. The result of SCAN command is visible on HDD screen and is written to LOG/MHDD.LOG file.

03.2012  19:15:47   MHDD>SCAN 
03.2012  19:15:53   Scan started
03.2012  19:15:53   MODE: IDE
03.2012  19:15:53   Device: ST1000LM024 HN-M101MBB
03.2012  19:15:53   -------------------------------
03.2012  19:15:53   Lap : 1
03.2012  19:15:53   LBA scan: 0 to 1953525167
03.2012  19:18:46   þ LBA warning: 37671660
03.2012  19:19:16   þ LBA warning: 44103015
03.2012  19:19:16   þ LBA Warning: 44103270
03.2012  19:19:38   þ LBA Error: 48350664
03.2012  19:19:41   þ LBA Error: 48482768
...
03.2012  22:47:42   Time spent: 03:31:46
03.2012  22:47:42    Blocks <   3ms = 7560949
03.2012  22:47:42    Blocks <  10ms = 749635
03.2012  22:47:42    Blocks <  50ms = 1382
03.2012  22:47:42    Blocks < 150ms = 36
03.2012  22:47:42    Blocks < 500ms = 19
03.2012  22:47:42    Blocks > 500ms = 6
03.2012  22:47:42   Errors: 32, Warnings: 25
03.2012  22:47:42   Done

After the scan I fully erased the HDD via command ERASE. It was necessary since sometimes HDD generates soft bad blocks, which were caused by OS errors, but not hardware fault of those sectors. It wasn’t quite my story (since tested HDD had 0 bad blocks, but continuos read errors), still it was worth trying.

SMART (F8 shortcut) before HDD erase.

03.2012  19:04:12   MHDD>SMART ATT
03.2012  19:04:12   HDD: ST1000LM024 HN-M101MBB
03.2012  19:04:12   --------------------------------------------------------
03.2012  19:04:12   SMART attributes:
03.2012  19:04:12               Name                        Val Worst Raw
03.2012  19:04:12   Att #   1 : Read error rate           : 100  100  4927  
03.2012  19:04:12   Att #   5 : Reallocated sectors count : 252  252  0  
03.2012  19:04:12   Att #   9 : Power-on time             : 100  100  10604  
03.2012  19:04:12   Att # 194 : HDA Temperature           :  60   46  40  
03.2012  19:04:12   Att # 195 : Hardware ECC recovered    : 100  100  0  
03.2012  19:04:12   Att # 196 : Reallocate event count    : 252  252  0  
03.2012  19:04:12   Att # 197 : Current pending sectors   :  99   99  230  
03.2012  19:04:36   Att # 200 : Write error rate          : 100  100  25719  

 6.03.2012  19:06:27   MHDD>ERASE 
 6.03.2012  19:06:28   ST1000LM024 HN-M101MBB  LBA: 1,953,525,168

SMART #200 Write error rate didn’t change after the erase, as well as other parameters, and I started scan procedure with erase option. This option tries to fill in with zeros those blocks, which are unreadable.

03.2012  19:09:37   MHDD>SCAN 
03.2012  19:09:40   Scan started
03.2012  19:09:40   MODE: IDE
03.2012  19:09:40   ERASE DELAYS: ON, TIME=350 msec
03.2012  19:09:40   Device: ST1000LM024 HN-M101MBB
03.2012  19:09:40   -------------------------------
03.2012  19:09:40   Lap : 1
03.2012  19:09:40   LBA scan: 0 to 1000000000
03.2012  19:12:07   Erase 255 sectors starting from 32136885
03.2012  19:12:11   Erase 255 sectors starting from 32136885
03.2012  19:12:14   þ LBA Error: 32137140
...
03.2012  19:16:29   Erase 255 sectors starting from 32152185
03.2012  19:16:29   þ LBA Timeout: 32152185
03.2012  19:16:29   Last scanned LBA: 32152439
03.2012  19:16:29   Blocks erased with EraseWaits: 72
03.2012  19:16:29    Blocks <   3ms = 125868
03.2012  19:16:29    Blocks <  10ms = 161
03.2012  19:16:29    Blocks <  50ms = 5
03.2012  19:16:29    Blocks < 150ms = 2
03.2012  19:16:29    Blocks < 500ms = 5
03.2012  19:16:29    Blocks > 500ms = 0
03.2012  19:16:29   Errors: 11, Warnings: 5
03.2012  19:16:29   Done
03.2012  19:16:32   ST1000LM024 HN-M101MBB  LBA: 1,953,525,168

In a while I continued the scan and the result looked like this.

03.2012  19:16:54   MHDD>SCAN 
03.2012  19:17:06   Scan started
03.2012  19:17:06   MODE: IDE
03.2012  19:17:06   ERASE DELAYS: ON, TIME=350 msec
03.2012  19:17:06   Device: ST1000LM024 HN-M101MBB
03.2012  19:17:06   -------------------------------
03.2012  19:17:06   Lap : 1
03.2012  19:17:06   LBA scan: 32146065 to 1000000000
03.2012  19:17:08   Erase 255 sectors starting from 32146065
03.2012  19:17:20   Erase 255 sectors starting from 32146065
03.2012  19:17:20   þ LBA Timeout: 32146065
03.2012  19:17:20   þ LBA Warning: 32146320
03.2012  19:17:21   þ LBA Error: 32147850
03.2012  19:17:21   þ LBA Warning: 32148105
03.2012  19:17:21   Erase 255 sectors starting from 32148360
03.2012  19:17:25   Erase 255 sectors starting from 32148360
03.2012  19:17:25   þ LBA Warning: 32148615
03.2012  19:17:26   Erase 255 sectors starting from 32152440
03.2012  19:17:34   Erase 255 sectors starting from 32152440
03.2012  19:17:36   þ LBA Warning: 32152695
03.2012  19:17:37   þ LBA Warning: 32154735
03.2012  19:25:59   Erase 255 sectors starting from 140557530
03.2012  19:26:04   Erase 255 sectors starting from 140557530
03.2012  19:26:07   þ LBA Error: 140557785
03.2012  19:26:07   Erase 255 sectors starting from 140558040
03.2012  19:26:11   Erase 255 sectors starting from 140558040
...
03.2012  19:31:32   Erase 255 sectors starting from 182151345
03.2012  19:31:33   þ LBA Warning: 182152875
03.2012  19:31:34   Erase 255 sectors starting from 182153640
03.2012  19:31:37   Erase 255 sectors starting from 182153640
03.2012  20:41:10   Blocks erased with EraseWaits: 44
03.2012  20:41:12   Time spent: 01:24:03
03.2012  20:41:12    Blocks <   3ms = 3566480
03.2012  20:41:12    Blocks <  10ms = 228878
03.2012  20:41:12    Blocks <  50ms = 103
03.2012  20:41:12    Blocks < 150ms = 10
03.2012  20:41:12    Blocks < 500ms = 8
03.2012  20:41:12    Blocks > 500ms = 0
03.2012  20:41:12   Errors: 5, Warnings: 8
03.2012  20:41:12   Done

I have noticed a weird thing, sector addresses of bad blocks were absolutely different from bad blocks at the first scan. For example there haven’t been any problems with reading blocks after 100,000,000. Overall amount of errors was smaller than during first scan. Since I knew the position of badly readable blocks, I decided to try remapping them.

03.2012  20:52:38   MHDD>SCAN 
03.2012  20:53:14   Scan started
03.2012  20:53:14   MODE: IDE
03.2012  20:53:14   REMAP: ON
03.2012  20:53:14   Device: ST1000LM024 HN-M101MBB
03.2012  20:53:14   -------------------------------
03.2012  20:53:14   Lap : 1
03.2012  20:53:14   LBA scan: 0 to 182146760
03.2012  20:55:40   þ LBA Error: 32137140
03.2012  20:59:01   À> Remap try...
03.2012  20:59:01   þ LBA Error: 32137141
03.2012  21:02:22   À> Remap try...
03.2012  21:02:22   þ LBA Error: 32137142
03.2012  21:05:42   À> Remap try...
...
03.2012  22:16:09   À> Remap try...
03.2012  22:16:09   þ LBA Error: 32137164
03.2012  22:19:29   À> Remap try...
03.2012  22:19:30   þ LBA Timeout: 32137165
03.2012  22:19:30   Last scanned LBA: 32137419
03.2012  22:19:30    Blocks <   3ms = 125867
03.2012  22:19:30    Blocks <  10ms = 147
03.2012  22:19:30    Blocks <  50ms = 13
03.2012  22:19:30    Blocks < 150ms = 1
03.2012  22:19:30    Blocks < 500ms = 0
03.2012  22:19:30    Blocks > 500ms = 0
03.2012  22:19:30   Errors: 25, Warnings: 0
03.2012  22:19:30   Done

Seems that HDD had a huge problem with remapping bad blocks, which is a scary sign. However, HDD was doing well with writing data to there blocks, since ERASE command didn’t cause any new SMART errors.

SMART showed following parameters:

03.2012  23:26:35   MHDD>SMART ATT
03.2012  23:26:35   Getting SMART attributes...
03.2012  23:26:35   SMART READ ATTRIBUTES
03.2012  23:26:35   HDD: ST1000LM024 HN-M101MBB
03.2012  23:26:35   --------------------------------------------------------
03.2012  23:26:35   SMART attributes:
03.2012  23:26:35               Name                        Val Worst Raw
03.2012  23:26:35   Att #   1 : Read error rate           : 100  100  5270  
03.2012  23:26:35   Att #   5 : Reallocated sectors count : 252  252  0  
03.2012  23:26:35   Att #   9 : Power-on time             : 100  100  10615  
03.2012  23:26:35   Att # 194 : HDA Temperature           :  56   45  44  
03.2012  23:26:35   Att # 196 : Reallocate event count    : 252  252  0  
03.2012  23:26:35   Att # 197 : Current pending sectors   : 100   99  24  
03.2012  23:27:15   Att # 200 : Write error rate          : 100  100  25719  

The amount of pending sectors (ones, that could not be read) has decreased, but still there weren’t any bad ones.

Hard disk was turned off, and in a while SCAN option was launched again. Result was just unpredictable, 0 read errors.

03.2012  20:08:20   MHDD>SCAN 
03.2012  20:08:41   Scan started
03.2012  20:08:41   MODE: IDE
03.2012  20:08:41   ERASE DELAYS: ON, TIME=350 msec
03.2012  20:08:41   Device: ST1000LM024 HN-M101MBB
03.2012  20:08:41   -------------------------------
03.2012  20:08:41   Lap : 1
03.2012  20:08:41   LBA scan: 0 to 1000000000
03.2012  21:32:18   Blocks erased with EraseWaits: 0
03.2012  21:32:19   Time spent: 01:23:36
03.2012  21:32:19    Blocks <   3ms = 3692574
03.2012  21:32:19    Blocks <  10ms = 228916
03.2012  21:32:19    Blocks <  50ms = 74
03.2012  21:32:19    Blocks < 150ms = 5
03.2012  21:32:19    Blocks < 500ms = 0
03.2012  21:32:19    Blocks > 500ms = 0
03.2012  21:32:19   No warnings, no errors
03.2012  21:32:19   Done

And SMART still showed zero reallocated rectors.

03.2012  21:38:19   MHDD>SMART ATT
03.2012  21:38:19   Getting SMART attributes...
03.2012  21:38:19   SMART READ ATTRIBUTES
03.2012  21:38:19   HDD: ST1000LM024 HN-M101MBB
03.2012  21:38:19   --------------------------------------------------------
03.2012  21:38:19   SMART attributes:
03.2012  21:38:19               Name                        Val Worst Raw
03.2012  21:38:19   Att #   1 : Read error rate           : 100  100  5345  
03.2012  21:38:19   Att #   5 : Reallocated sectors count : 252  252  0  
03.2012  21:38:19   Att #   9 : Power-on time             : 100  100  10625  
03.2012  21:38:19   Att # 194 : HDA Temperature           :  43   43  57  
03.2012  21:38:19   Att # 196 : Reallocate event count    : 252  252  0  
03.2012  21:38:19   Att # 197 : Current pending sectors   : 252   99  0  
03.2012  21:38:28   Att # 200 : Write error rate          : 100  100  25719  

As a matter of experiment, I installed clean Debian-based system on this HDD, and in a while I could see errors during OS boot and an increased number of errors in SMART.

03.2012  19:11:15   MHDD>SMART ATT
03.2012  19:11:15   Getting SMART attributes...
03.2012  19:11:15   SMART READ ATTRIBUTES
03.2012  19:11:15   HDD: ST1000LM024 HN-M101MBB
03.2012  19:11:15   --------------------------------------------------------
03.2012  19:11:15   SMART attributes:
03.2012  19:11:15               Name                        Val Worst Raw
03.2012  19:11:15   Att #   1 : Read error rate           : 100  100  5349  
03.2012  19:11:15   Att #   9 : Power-on time             : 100  100  10635  
03.2012  19:11:19   Att # 200 : Write error rate          : 100  100  25756  

Conclusion of the research

This HDD seem to have a problem with reading blocks at different sectors, which looks like problem with read/write heads.
Hard disks can get data from read-error sectors after device reboot or during next scans, which means that full disk image can be created via dd_rescue via multiple disk reads even without a costly replacement of read-write head from another HDD of that exact model.
NEVER use broken HDD to store data due to potential data loss. This HDD is unstable, causes system files errors on clean OS, and its future usage could cause complete failure and data loss.

Always make backups and check the state of your HDD, and the risk of data loss will be drastically decreased!