When I originally wrote, and then republished on Dan’s Data, my Ground Zero column about hard drives wearing out, I was puzzled by something.
The famous Google study (PDF) of a large population of hard drives found, oddly, that the Self-Monitoring, Analysis, and Reporting Technology (not dreadfully helpfully abbreviated “S.M.A.R.T.”) that’s built into all modern hard drives was pretty much useless for its intended purpose. It just doesn’t often tell you when a drive is on the way out and should be replaced.
Any drive that’s been in service for a couple of years will have a couple of S.M.A.R.T. warning flags thanks to the basic hour counters built into the standard. Those warnings, by themselves, don’t mean much at all. But despite those largely useless warnings that all older drives have, 36% of the drives that failed in the Google study had no warnings at all!
Technically, S.M.A.R.T. should work much better than this. The drive controller board knows when it has to repeatedly retry reads or writes, for instance; that’s the most basic kind of ominous error. S.M.A.R.T. is just a standardised interface to allow drives to tell monitoring software how often stuff like that is happening.
And yet, very often, no such report happens.
S.M.A.R.T. monitoring isn’t completely useless; a drive that actually does report any of the more serious S.M.A.R.T. problems should indeed be replaced. So you should still run some S.M.A.R.T. monitoring utility or other.
I didn’t know why this was.
Now, however, I’ve got a clue, and I’ve added a piece to the Ground Zero column to mention it.
This Slashdot comment led me to this Usenet post from (someone who says he is) a former Seagate engineer. He alleges that the hard drive manufacturers’ marketing departments just overruled the engineers and made them, in essence, secretly turn off S.M.A.R.T.’s early warning features, to make the drives look more reliable.
Until those drives failed without warning, of course.
But until that happened, they looked super-reliable!
I don’t know whether this is actually true, but it sure does fit the evidence.
Great work, marketroids!