On the other hand, for the same symptoms, the drives belonging to the youngest generation of devices shows only 3% and 20% respectively failure rate for the same errors. The “worst” model also happens to be the older drive generation studied in the paper. For example, the “worst” model studied exhibits a 20% failure rate nine months after the first relocation error and up to 36% failure rate nine months after the first occurrence of data errors. The study noticed significant differences in reliability between the different models. The Microsoft/Pennsylvania State University paper does not disclose the exact drive models studied, but according to the authors, most of the drives are coming from the same vendor spanning several generations. So this cannot be directly compared to the 36% failure-without-prior-notice mentioned for hard drives in the Google paper. The study did not mention though if the failed drives have exhibited any other S.M.A.R.T. However, if you reverse that statement, that also means 38% of the studied SSDs failed without showing any of the above symptoms. So this indicator is most significant when correlated with the presence of one or several of the preceding ones.Īccording to the study, 62% of the failed SSD showed at least one of the above symptoms. Selecting a lower signaling rate is not uncommon, especially on older drives. Downgrading the link below the nominal link rate has the obvious impact on the observed drive performances. In other words, correctable errors are invisible to the host operating system, but they nevertheless impact the drive performances since data has to be corrected by the drive firmware, and a possible sector relocation might occur.SATA downshift count:īecause of temporary disturbances, issues with the communication link between the drive and the host, or because of internal drive issues, the SATA interface can switch to a lower signaling rate. This indicator takes into account both corrected errors (thus without any issue reported to the host system) as well as uncorrected errors (thus blocks the drive has reported being unable to read to the host system). These events can be caused either by storage error or issues with the drive’s internal communication link. So, once again, a sudden increase in the number of events might indicate the drive has reached its end of life limit, and we can anticipate many more memory cells to fail soon.CRC and Uncorrectable errors (“Data Error”): However, flash memories have a limited number of clear/write cycles. Because of imperfections in the manufacturing process, few such errors can be anticipated. This is a symptom of a problem with the underlying flash hardware where the drive was unable to clear or store data in a block. Worth mentioning because of wear-leveling algorithms used in SSDs, when several blocks start failing, chances are many more will fail soon.Program/Erase (P/E) fail count: While the underlying technology is radically different, that indicator seems as significant in the SSD world than it was in the hard drive world. attributes are good indicators of imminent failure. In 2016, Microsoft and The Pennsylvania State University conducted a study focussing on SSDs.Īccording to that study, it appears some S.M.A.R.T. reported issues to anticipate disk replacement needs in data centers or server farms. technology particularly interests company using a large number of storage units, and field studies have been conducted to estimate the accuracy of S.M.A.R.T. Given the statistical nature of failure prediction, the S.M.A.R.T. should be used to estimate the likeliness of a failure. It cannot predict with 100% accuracy a failure nor, on the other hand, guarantee a drive will not fail without any early warning. What isn’t S.M.A.R.T.?Īll that seems wonderful. gives an option for the operating system or the system administrator to identify soon-to-fail drives so they can be replaced before any data loss occurs. Since drives usually don’t fail abruptly, S.M.A.R.T. would allow anticipating predictable failures such as those caused by mechanical wearing or degradation of the disk surface, as well as unpredictable failures caused by an unexpected defect. and can also perform on-demand tests on the drive. will monitor several disk parameters during normal drive operations, like the number of reading errors, the drive startup times or even the environmental condition. –for Self-Monitoring, Analysis, and Reporting Technology- is a technology embedded in storage devices like hard disk drives or SSDs and whose goal is to monitor their health status.
0 Comments
Leave a Reply. |