1,234 of 1,353 people found the following review helpful
Regular consumer drives in RAID are accident waiting to happen,
Verified Purchase(What's this?)
This review is from: WD Red 3 TB NAS Hard Drive: 3.5 Inch, SATA III, 64 MB Cache - WD30EFRX (Personal Computers)
Here is a quote from a review at pcper.com
I'm going to let the cat out of the bag right here and now. Everyone's home RAID is likely an accident waiting to happen. If you're using regular consumer drives in a large array, there are some very simple (and likely) scenarios that can cause it to completely fail. I'm guilty of operating under this same false hope - I have an 8-drive array of 3TB WD Caviar Greens in a RAID-5. For those uninitiated, RAID-5 is where one drive worth of capacity is volunteered for use as parity data, which is distributed amongst all drives in the array. This trick allows for no data loss in the case where a single drive fails. The RAID controller can simply figure out the missing data by running the extra parity through the same formula that created it. This is called redundancy, but I propose that it's not.
Since I'm also guilty here with my huge array of Caviar Greens, let me also say that every few weeks I have a batch job that reads *all* data from that array. Why on earth would I need to occasionally and repeatedly read 21TB of data from something that should already be super reliable? Here's the failure scenario for what might happen to me if I didn't:
* Array starts off operating as normal, but drive 3 has a bad sector that cropped up a few months back. This has gone unnoticed because the bad sector was part of a rarely accessed file.
* During operation, drive 1 encounters a new bad sector.
* Since drive 1 is a consumer drive it goes into a retry loop, repeatedly attempting to read and correct the bad sector.
* The RAID controller exceeds its timeout threshold waiting on drive 1 and marks it offline.
* Array is now in degraded status with drive 1 marked as failed.
* User replaces drive 1. RAID controller initiates rebuild using parity data from the other drives.
* During rebuild, RAID controller encounters the bad sector on drive 3.
* Since drive 3 is a consumer drive it goes into a retry loop, repeatedly attempting to read and correct the bad sector.
* The RAID controller exceeds its timeout threshold waiting on drive 3 and marks it offline.
* Rebuild fails.
At this point the way forward varies from controller to controller, but the long and short of it is that the data is at extreme risk of loss. There are ways to get it all back (most likely without that one bad sector on drive 3), but none of them are particularly easy. Now you may be asking yourself how enterprises run huge RAIDs and don't see this sort of problem? The answer is Time Limited Error Recovery - where the hard drive assumes it is part of an array, assumes there is redundancy, and is not afraid to quickly tell the host controller that it just can't complete the current I/O request.
Here's how that scenario would have played out if the drives implemented some form of TLER:
* Array starts off operating as normal, but drive 3 has developed a bad sector several weeks ago. This went unnoticed because the bad sector was part of a rarely accessed file.
* During operation, drive 1 encounters a new bad sector.
* Drive 1 makes a few read attempts and then reports a CRC error to the RAID controller.
* The RAID controller maps out the bad sector, locating it elsewhere on the drive. The missing sector is rebuilt using parity data from the other drives in the array.
*Array continues normal operation, with the error added to its event log.
The above scenario is what would play out with an Areca RAID controller (I've verified this personally). Other controllers may behave differently. A controller unable to do a bad sector remap might have just marked drive 1 as bad, but the key is that the rebuild would be much less likely to fail as drive 3 would not drop completely offline once the controller ran into the additional bad sector. The moral of this story is that typical consumer grade drives have data error timeouts that are far longer than the drive offline timeout of typical RAID controllers, and without some form of TLER, two bad sectors (totaling 1024 bytes) is all that's required to put multiple terabytes of data in grave danger.
The solution should be simple - just get some drives with TLER. The problem is that until now those were prohibitively expensive. Enterprise drives have all sorts of added features like accelerometers and pressure sensors to compensate for sliding in and out of a server rack while operating, as well as dealing with rapid pressure changes that take place when the server room door opens and the forced air circulation takes a quick detour. Those features just aren't needed in that home NAS sitting on your bookshelf. What *is* needed is a WD Caviar Green that has TLER, and Western Digital delivers that in their new Red drives.
End quote and back to reviewer.
I've got 5 of these in a Synology DiskStation 5-Bay (Diskless) Network Attached Storage (DS1512+). It is really a sweet setup.
The Synology software has a S.M.A.R.T. test that can do surface scans to detect bad sectors. I have their Quick Test check every disk daily and the Extended Test set to automatically run on each of the 5 disks every weekend. (The Extended Test takes about 5 hours per disk so I separate the tests by 12 hours.)
Tracked by 19 customers
Sort: Oldest first | Newest first
Showing 1-10 of 110 posts in this discussion
Initial post: Aug 21, 2012 9:22:16 PM PDT
awesome review. very informative. thanks.
Posted on Aug 21, 2012 11:53:55 PM PDT
Last edited by the author on Aug 21, 2012 11:54:50 PM PDT
Not sure these are worth the price. A raid that goes down for a short time in a home environment is usually not that critical. (ie: tler)
Side note, I use ZFS which is much much better than RAID, here is a quote I found on hardforum
Q. Do i need to use TLER or RAID edition harddrives?
A. No and if you use TLER you should disable it when using ZFS. TLER is only useful for mission-critical servers who cannot afford to be frozen for 10-60 seconds, and to cope with bad quality RAID controller that panic when a drive is not responding for multiple seconds because its performing recovery on some sector. Do not use TLER with ZFS!
Instead, allow the drive to recover its errors. ZFS will wait, the wait time can be configured. You won't have broken RAID arrays, which is common with Windows-based FakeRAID arrays.
Posted on Aug 28, 2012 7:59:23 AM PDT
Last edited by the author on Aug 28, 2012 8:02:26 AM PDT
Brent P. says:
This is EXACTLY what happened to a 14TB 10 disk RAID 5 array I have. A sleeper bad sector on one drive... an active file on another drive hit a bad sector... I replaced that drive and the rebuild failed due to the sleeper bad sector tanking its drive. It lead to 2 months of data recovery with Synology remote accessing the box... but they WERE able to get it back up and I was able to back up the data, replace all bad drives, and reload the data.
I hadn't yet heard of WD's RED line... so that's awesome, but if you already have greens or some other drives, there are still some steps you can take to ensure a bit more safety.
The new setup I have: I now am using SHR-2 instead of RAID 5. Think of it as RAID 6... 2 disk redundancy to start with, then it adds on some further variable drive size/striping benefits to maximize drive space used when using non-similar disk sizes for all your drives.
On top of that, I have gone into the DSM software interface on the array, and if you go to Storage Manager, you can click S.M.A.R.T. test on the drive... and you can actually SCHEDULE full disk checks as you see fit. I have each drive checked with a full surface scan weekly. This should eliminate any sleeper bad sectors from laying dormant for months until another drive fails. And even if it doesn't I have 2 full disks of redundancy now as well.
Posted on Aug 31, 2012 9:28:02 PM PDT
I know WD has targeted the Red's for the vibration and heat of 5-disk arrays or less, but I REALLY want the Synology DiskStation DS1812+ (8 bay) - you think I could get away with it?
In reply to an earlier post on Sep 1, 2012 8:10:42 AM PDT
Jim M. says:
I have performed a lot of research on the subject and I am comfortable with purchasing an 1812+ and eight 3TB "red" drives (I'm ordering it later this week). I haven't seen anything that shows the "Reds" as performing poorly with more than five drives in an array. The 1812+ has two fans that can kick in as needed. And, at least in my case with home use, I won't have all eight drives being accessed all the time at the same time. The definitive answer, I think, comes from Synology - the WD30EFRX is a recommended drive on the Synology web site for the 1812+ model.
Posted on Sep 13, 2012 10:22:02 AM PDT
Bhavesh K. Patel says:
Outstanding explanation, Gary. Thank you so much. I have the 2411+ (awesome as well) with 6 2TB Green drives in it. While doing a bunch of file copying, I had Drive 6 drop out. Fortunately, I was able to remount and rebuild successfully. Now, with another apparently huge file operation (adding thousands of MP3s to iTunes), I had Drive 2 drop out. I'm 55% into a rebuild now. Unfortunately, I built this with SHR-1. I'm definitely wishing I built with SHR-2.
I plan on buying 6 of the 3TB Red drives (when I can find them...they seem to be out of stock everywhere) and building it with 2 drive redundancy.
The next question is, do I sell the Green drives, or should I see about adding them to the Red set? They probably are not TLER, but the guys at TechGage gave me an awesome set of links on how to reprogram them if it is a possibility. Here is the link to their help: http://forums.techgage.com/showthread.php
If I could reprogram them, it might be worth adding them back in.
In reply to an earlier post on Sep 13, 2012 10:24:58 AM PDT
Bhavesh K. Patel says:
Brent, this is a very reassuring comment. Thanks. I've SHR with a 1 Drive dropout. So far, I've been able to successfully rebuild. But, I'm really glad to know that with Synology's help, you were able to recover the data.
If you have a chance, it would be great to know the essential steps performed that allowed you to recover your data.
I'm going to take your advice and schedule the full surface scans from now on as well.
In reply to an earlier post on Sep 26, 2012 10:34:11 PM PDT
Boyd Waters says:
I've used ZFS for at least 5 years now. I've lost plenty of consumer-grade drives (including half of my Seagate Green 2TB) but I haven't lost data.
WD Red will be my next set of drives.
Posted on Oct 14, 2012 10:46:26 PM PDT
Consumer drives work just fine in software raid or firmware [intel RTS] raid.
Been running a pair of black AALS drives in raid1 for years without an issue and a pair of RE3's in raid 1for years without issue.
TLER is only for and only benefits with a dedicated hardware raid controller.
Greens due to their low power off state are the absolute worst drive to put into any raid system, hardware, software or firmware, this has been documented all over the net. If you like your drives to rack up a million head parks in a few months by all means put greens in raid and then watch as they die.
Posted on Oct 20, 2012 10:36:34 AM PDT
[Deleted by the author on Oct 20, 2012 10:43:31 AM PDT]