[kwlug-disc] SSD Failure Symptoms and Recovering Data?

Chris Irwin chris at chrisirwin.ca
Thu Nov 8 12:10:57 EST 2018


On Thu, Nov 8, 2018 at 9:35 AM Ron Singh <ronsingh149 at gmail.com> wrote:

> Sorry to hear of your SSD woes Khalid, must be a ton of read retries
> causing the slow copy?
>

Khalid, presumably you're using ext4 (based on the other thread abut
filesystem resizing). Do you have any method of verifying integrity of
files recovered and/or finding files which are missing/truncated due to
read errors?

I'd be especially wary of any data you pull from the drive, particularly if
they were on ext4, which doesn't do data checksums.

We use a ton of Samsung 860 Pro and EVO in my world for
> laptops/workstations and cache devices on NAS units. The 860 PRO with it's
> LMC tech is likely overkill for an end-user situation and the prices are
> brutal.
>
> The 860 EVO is the perfect blend of decent pricing and unparalleled
> reliability compared to it's peers in the TLC realm.
>

I have a Kingston 120GB in my work machine, which has held up well. 3 years
7 months power-on time. It doesn't report lifetime writes via SMART,
however. Also, sample size of 1.

Going forward, we're pretty much just using Samsung EVOs though. Their
specs, reliability, and price are all excellent, which isn't a combination
found often.


> We have used Intel, Sandisk, Crucial/Micron, and Samsung since about 2009
> and the Samsung are the ones holding up best, only about 4-5 failures in 9
> years within a population of at least 3000 units in play. The Intel units
> were the worse.
>

Regarding the Intel SSDs, they intentionally grenade themselves when they
hit their lifetime writes. The thought being that you can look at the
lifetime writes as the actual drive lifetime, and it would avoid any "kinda
failing" mode like Khalid is currently experiencing. Personally, I think
they should have failed to read-only, but Intel's target consumer wasn't
really desktop users anyway.

This article is from 2015, and Intel may have changed behaviour since:
https://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead

>From TFA: "Intel's 335 Series failed much earlier, though to be fair, it
pulled the trigger itself. The drive's media wear indicator ran out shortly
after 700TB, signaling that the NAND's write tolerance had been exceeded.
Intel doesn't have confidence in the drive at that point, so the 335 Series
is designed to shift into read-only mode and then to brick itself when the
power is cycled. Despite suffering just one reallocated sector, our sample
dutifully followed the script. Data was accessible until a reboot prompted
the drive to swallow its virtual cyanide pill."

Personally, I've only had one SSD outright fail (an old first-gen OCZ), but
I have had an additional SSDs that went flakey after a few years. That was
my recently replaced Windows 10 SSD (which didn't report any errors, but
sometimes didn't show up at boot, and sometimes caused bluescreens,
possibly due to the disk disappearing while the system is up). I can't
remember what brand it was, but it was probably a Crucial or Sandisk
(because I'm cheap). I replaced the Windows 10 SSD with an ADATA NVMe drive
(and tested my backups...).

-- 
Chris Irwin
<chris at chrisirwin.ca>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20181108/7cb514e0/attachment.htm>


More information about the kwlug-disc mailing list