What is end-to-end data protection and why you need it

In the ideal world, when you write something on your storage device (HDD, SSD…), you should be able to read the exact same data later. But in reality it is not true and as our storage devices grow in size, the data corruption problem is getting bigger too. Data gets corrupted all the time, sometimes you will even not notice it.

Bit rot – silent data curruption

Silent data corruption is when a one or more bits on HDD flip their state. Disk manufactures of course know about it and price their HDDs according a rate, this happens – URE (Unrecoverable Read Error) or UBER (Unrecoverable Bit Error Rate). According to the data from manufacturers if you have three 4TB drives, you can be sure at least one bit was flipped. The European Center for Nuclear Research (CERN) published a paper in 2008 and they found approximately 38,000 files were corrupted in the 15,000TB of data they generated.

So bit rot happens more often than you think. Were you listening to your favorite MP3 and then suddenly CHIRP! ? Your MP3 file was corrupted. If you think one bit will not make a difference, look at this example image with one bit flip:

bit-rot
Thanks Jim Salter, arstechnica.com for providing image

Even if all HDDs would be 100% reliable as data storage, data would be silently corrupted by RAID cards, loose cabling, controller bugs, DMA parity errors, power supply problems and so on.

You might think buying the most expensive HW will fix this, but it will just reduce the amount of data being corrupted.

A better solution: End to end data protection

End-to-end data protection requires each data block to be verified against an checksum on the filesystem level. If the checksum does not match, the filesystem then needs to:

  1. read the data from different copy (in case of mirror RAID) or calculate it from stripe drive (RAID 5)
  2. fix the corrupted data on the fly

But almost no file systems are actually doing this. This means data gets corrupted daily and users are just wondering what is wrong with their files.

In order for this to work filesystem needs to communicate directly with your storage device without any ‘middle men’ like RAID card, LVM, partitions, external storage etc.

Web hosting

Is it important for web hosting? Of course yes. Maybe it happened to you that your web site has stopped working and after re-uploading the same file everything was OK. This could be just one rotten bit.

Why take the chances in data corruption lottery when you do not have to? Use hosting provider that offers end to end data protection web hosting.

 

Leave a Reply

Your email address will not be published. Required fields are marked *