Jump to content

Server crash


Recommended Posts

Yesterday the server that all my models & materials lives on crashed, it's raid system with 6 drives and we've tried recovering it but haven't had any luck. We're going to send it out to a company that specializes in recovering data in the hopes that some or all of it can be saved. I don't want to get my hopes up, I was wondering if anyone here has had this procedure done to one of your systems and how it turned out?

Link to comment
Share on other sites

My deepest sympathy!

I think i have lost only two HDDs in my life - one was a user error, because the 2,5" IDE adapter cable was not protected against polarity reversal and the second one was an encryptet external USB drive which was too expensive to restore (beware of external 2,5" WD drives, they come without internal SATA connection and encryption chip!!!), but i think it was no important data on them that was lost - at least nothing that i missed so far.

 

What exactly crashed? One or more drives? The controller? Or something else? I think professional recovering of RAID is the most expensive recovery task.

Do you have a backup?

 

I can say for me that i'm a bit paranoid since i lost that second drive - i use only RAID 1 because i can read the data without RAID controller in the case of a crash. And i have two external backups, one of them at a different location.

 

Good luck!

Edited by numerobis
Link to comment
Share on other sites

It was a raid 5 and we're not exactly sure what went wrong, the server is supposed to let you know when a drive goes out but this one never did. We replaced one of the drives and it should have started rebuilding it but it just asked if we wanted to format everything.

 

The backups were being done every 2 months due to a miscommunication between me and IT, it should have been bi-weekly. On top of that the backups for the last 12 months have been corrupted and others are missing. I'm left with a backup from last October, I don't even want to think about all the data that's missing or how I'm going to replace it.

 

As if all of that weren't bad enough no one seems to understand how bad this is, people are still asking me today if I can update some renderings from a few weeks ago!

Link to comment
Share on other sites

It was a raid 5 and we're not exactly sure what went wrong, the server is supposed to let you know when a drive goes out but this one never did. We replaced one of the drives and it should have started rebuilding it but it just asked if we wanted to format everything.

 

The backups were being done every 2 months due to a miscommunication between me and IT, it should have been bi-weekly. On top of that the backups for the last 12 months have been corrupted and others are missing. I'm left with a backup from last October, I don't even want to think about all the data that's missing or how I'm going to replace it.

 

As if all of that weren't bad enough no one seems to understand how bad this is, people are still asking me today if I can update some renderings from a few weeks ago!

 

I am sorry to hear that. That really sounds very bad. I hope the recovery will be successful!

Concerning the the corrupted backup... this is one reason why i don't trust any backup solution that stores the data in some proprietary, compressed file format and i don't use incremental backups. I only do direct file copies.

But maybe you can contact the producer of the backup software if they have any special solution to restore a backup.

Link to comment
Share on other sites

It was a raid 5 and we're not exactly sure what went wrong, the server is supposed to let you know when a drive goes out but this one never did. We replaced one of the drives and it should have started rebuilding it but it just asked if we wanted to format everything.

 

The backups were being done every 2 months due to a miscommunication between me and IT, it should have been bi-weekly. On top of that the backups for the last 12 months have been corrupted and others are missing. I'm left with a backup from last October, I don't even want to think about all the data that's missing or how I'm going to replace it.

 

As if all of that weren't bad enough no one seems to understand how bad this is, people are still asking me today if I can update some renderings from a few weeks ago!

 

For the record, what server was that?

Link to comment
Share on other sites

So after a week I've finally gotten some answers, it turns out that only one of the 6 drives was bad. The recovery company is still working on recovering the files after re-striping the drive. They've only been able to recover about 2 gigs of data so far but they claim they will eventually get it all, there's about 500 Gb total that I absolutely need. There not giving me a timeframe for this process but I kind of feel like it's going to take weeks. There saying the total cost of recover will be about $16K, I'm not sure if that number is solid or not yet. We're going to let them continue their work because we really have no other options at this point.

 

After doing some research on backups it's clear that tape backups while cheep aren't the most reliable. I plan on backing up to external hard drives in addition to tape, online backups seem to be too slow at least that's what I've read.

Link to comment
Share on other sites

Devin, isn't the whole point of RAID 5 that it can tolerate a single drive failure? What happened there?

Thats what I was thinking. Makes my primitive backup system seem quite good.... I just manually copy the drive every now and then and have an external 2TB WD drive continually backing up my job files and asset folders. It just backs up when the server is idle. I also switch out the both 2TB drives in server at the end of year, so I end up with 2 x HD per year in the cupboard.

Link to comment
Share on other sites

I have a Synology NAS in RAID 5 and my understanding is if a drive goes bad you just swap it out and the array "heals" itself. That's the whole point. On top of that I back up the NAS itself to an external hard drive that I swap out weekly for a second one at the safe deposit box so in a catastrophic event I only ever lose a week's worth of assets/project files which is all I store on there. If RAID 5 can fail with one drive then I may need to rethink my strategy here.

Link to comment
Share on other sites

Devin, isn't the whole point of RAID 5 that it can tolerate a single drive failure? What happened there?

One of the drives went bad and somehow broke the whole RAID, the server was only about 3 years old and every indicator said it was healthy after it failed. I think we all assume that redundancy = safety but it's like anything else the more you complicate something the more weak spots begin to appear. In this case RAID failed completely and it kills me that a $150 drive could have saved us all this pain.

 

We will eventually get all the data back, right now there saying it could take weeks. Once we do I'll be backing up to both tape and external hard drive on a weekly basis.

Link to comment
Share on other sites

One of the drives went bad and somehow broke the whole RAID, the server was only about 3 years old and every indicator said it was healthy after it failed. I think we all assume that redundancy = safety but it's like anything else the more you complicate something the more weak spots begin to appear. In this case RAID failed completely and it kills me that a $150 drive could have saved us all this pain.

 

We will eventually get all the data back, right now there saying it could take weeks. Once we do I'll be backing up to both tape and external hard drive on a weekly basis.

 

Maybe start from scratch? I know it sounds painful, but 16k would buy a lot of pro assets. My asset library is unintelligible to anyone but me, I'd love to start again but I dont have time!

I guess project files is a different story.

Link to comment
Share on other sites

The thought of starting over makes me cringe, to think of the hundreds of hours that went into collecting and configuring all those assets makes the money seem inconsequential. Even if I wanted to do that I've got old projects piling onto new, there's not enough time to rebuild the old & work on the new.

Link to comment
Share on other sites

I have a Synology NAS in RAID 5 and my understanding is if a drive goes bad you just swap it out and the array "heals" itself. That's the whole point. On top of that I back up the NAS itself to an external hard drive that I swap out weekly for a second one at the safe deposit box so in a catastrophic event I only ever lose a week's worth of assets/project files which is all I store on there. If RAID 5 can fail with one drive then I may need to rethink my strategy here.

 

 

RAID 5 is safe, but in case something goes wrong and the system itself or more than one drive go bad too fast for you to react, then you can still lose a lot - if not everything. Technically you still keep your eggs in a single basket if you are not backing up your NAS.

 

 

How big is your current array?

 

 

Think the best strategy with new Synology systems is connecting USB 2.0/3.0 drives to it every now and then, backing up the array and leaving the said HDD in "cold storage", completely offline. Your NAS continues to be online serving you and other clients.

 

 

Issue is that if you have a really big library, with RAID 5/6 spanning over multiple 2TB or bigger drives, getting a single HDD to hold all this data is impossible. Maaaaaybe with those newer 6-8TB drives, arrays with 2TB disks are doable. Otherwise backing up the backup becomes complicated, and over the long run it will stop happening regularly enough, beating the purpose.

Link to comment
Share on other sites

  • 2 weeks later...

So I figured I'd give a final report on the server, it turns out that two of the drives went bad at the same time. After two weeks the company handling the recovery finally gave up after getting only 3 gigs out of 2.5 TB. They wanted $2500 for this data but after looking at it there were only a few dozen files that I could actually use that I didn't already have so we declined their offer. I suppose I am fortunate to at least have a 12 month old backup so it's not a total loss but it's still considerable. I will now be baking up to tape and external hard drive every week, I don't want to ever go through this again.

Link to comment
Share on other sites

Sheesh, the whole thing just sounds bizarre. I mean really, whats the point in a 5 drive RAID array? That should have redundancy built in, you should be able to lose multiple drives. Im assuming its a hardware failure and not a virus, so I just dont get it.

Cheaper and simpler solutions seem better than RAID.

Link to comment
Share on other sites

RAID 5 only has one drive fault tolerance. With RAID 6 you could lose 2 drives without losing any info which seems like the way to go if you're working with six HDDS like he was. I'd probably opt even for RAID 10 with those six disks.

 

Devin, I would consider backing up to the cloud - either Glacier, S3 or even Crashplan depending on if you want easy access to your data or just need it in a catastrophic scenario like this. They even offer seed services nowadays where you can ship them a physical HDD that they copy straight to their servers so you don't have to deal with the initial months of upload time.

Link to comment
Share on other sites

I think your "easy" bet since you got burned and do not want to deal with multilayered RAID arrays should be a 2bay RAID 1 setup with a couple of large disks, and a USB 3.0 external backup HDD for the array that you should be taking every week or two (if you are lazy).

 

 

Something like a Synology DS214+ (has 2x GBit for aggregation, otherwise vanilla DS214 is pretty much the same) with 5-6TB drives should handle all your needs on NAS side, and you can hook a USB 3.0 4-5-6TB drive directly on it and have it back it up with the push of a button.

 

 

I don't know how many users/workload your network has but this should be "safe" enough, with the huge benefit of having very very fast rebuilt times in case of a failure - something that of course doesn't happen with parity solutions without lots of hot-spares.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...