A Shock to the System, Part II

John V. Hinshaw;

A Shock to the System, Part II

September 1, 2008

John V. Hinshaw

Article

LCGC North America

LCGC North AmericaLCGC North America-09-01-2008

Volume 26

Issue 9

Pages: 922–928

John Hinshaw concludes his two-part series on electrical issues in the laboratory.

At the end of part I of this two-part series, I thought that all of the failures induced by a large electrical spike were accounted for, and I was ready to submit my insurance claim and effect repairs. But then, something else happened.

John V. Hinshaw

Delayed Failures

About a year ago, I had set up an extra computer to handle file storage and backups, among other tasks. I had enlisted an older computer, vintage 2002, for this purpose. This system had been chugging along for months before the surge event, serving up files and writing daily backups of itself plus our other computers to a 750-gigabyte (GB, 10⁹ bytes) external hard drive. The evening after the surge occurred I ran a thorough check on the external backup hard drive: no problems were detected. I also scheduled a check of the main hard drive to take place when it next rebooted — the check required that the main drive be "offline" for the extensive checkout process. It was too late that night, so I waited until the next evening to restart the system. I gave the command, the PC shut itself down and restarted, but then it asked me to "insert a bootable disk." A sinking feeling began to make itself known. This request was not good because it meant that the main hard drive was not being detected as containing an operating system.

I restarted the PC again with the intention of running the BIOS configuration program and investigating the disk settings, but when the system started up this time, the hard drive emitted a high-pitched scream (literally) and subsided into making sporadic clicking noises. This definitely was not good. To test the computer separately from the drive, I substituted a spare hard drive that I knew was functional, but it was not recognized in the BIOS either. Conclusion: Both the hard drive and the PC's motherboard had failed as a direct result of the power surge. It took a couple of power cycles to discover the full extent of the failure, but the evidence was clear: Do not assume that a potentially compromised device is functioning normally on the basis of only cursory short-term tests.

The delayed failure was another eye-opener for me. I understood right away that, again, I had been lucky. If the power surge had taken out the external hard drive, which also was plugged into a surge protection device (SPD), I would have lost all the backed up files and been faced with the prospect of paying and waiting for a data retrieval service to recover them, if that was even possible. I had made plans to start archiving the files onto removable media but had not yet imposed the discipline on myself to do so. In the worst-case scenario, the original copies on our desktop computers would have been lost as well, leaving us with only partial digital records of the past ten years or so, whatever I could salvage from the collection of data CDs and older backup tapes I had accumulated during previous attempts at conserving data.

What about the other PCs that were running during the surge? Both of them are more recent, built in 2005 and 2007, and they both had power supplies that clamp the output voltages to safe levels in the event of a voltage spike, unlike the failed PC that did not have a clamping feature in its power supply. Thus, I have to conclude that the newer systems survived because they were connected to an SPD that limited the input voltage to around 330 V, and the effects of this spike were limited in turn by the PC's power supplies, which kept the voltages output to the computer components within safe limits.

Rebuild and Restore

Now, I faced the prospect of rebuilding the PC and restoring the data from the backed up files on the external drive. Due to the electrical stress so recently imposed upon them, I elected to replace all of the computer hardware components, especially the power supply, rather than take the chance that other parts might fail at any time. I decided to use my desktop PC temporarily in the role of the destroyed system until the replacement PC was ready.

I did not have a full binary disk image of the failed system, but in such cases, having a full disk restore image does not help. In fact, it is not a good idea to have only the image on hand. For one thing, with only the binary image I would not have been able to copy the files directly onto my desktop system for temporary use. I would have had to restore them to an empty drive and then copy them. As it is almost impossible to obtain an exact duplicate of a failed computer, and certainly not an older system, the utility of restoring from a disk image is limited — the system configuration and drivers for the replacement system would by definition be different than those on the original. The only time that an image can be applied easily is in the case of drive failure alone, and this situation is better addressed instead with a redundant array of independent disks (RAID) disk configuration — although that was not the configuration on the failed PC — which makes it a simple matter to remove and replace a failed drive.

The replacement system arrived in due course. I chose to use a pair of hard drives in a RAID configuration, and of course, a modern power supply with over-voltage protection. Once the system was operational with a fresh installation of the operating system, I moved the files from my desktop PC over to the new one. However, now I had to consider how best to backup all of the systems, and how to better ameliorate the potential impact of a catastrophic failure that would take out all of the computers.

Backup Strategies

Keeping current data backups in a secure location is a difficult and potentially expensive task. If you are a corporation with the company brain trust at stake, then expending thousands of dollars on a high-speed automated tape archive system and secure offsite vault services is appropriate. And you can afford to setup redundant servers with automatic failover on critical resources. At the other end of the scale, if you have a decent internet connection and only a few gigabytes of data to mind then there are a number of good online backup solutions that can be very inexpensive, on the order of five dollars per home computer per month. These backups run in the background and utilize spare internet bandwidth to send files for backup. Restoration is a simple matter of copying the files back to your drive. Online backup costs are higher for a commercial account but still quite reasonable. These backups are completely automated and transparent to the user, especially if you are willing to leave the computer on at night and if it does not bother you to leave your data in the trust of a company that might or might not be operating a few months or years in the future.

My data backup needs lie in the middle of this range. Online backups are not practical because I have too much data: today my backup system reports that it has 1.1 terabytes (TB, 10¹² bytes of data) on record. At typical maximum cable modem upload speeds of 2 megabits per second (Mbps), and given that you rarely get the maximum upload speed, a full initial backup operation might extend to something more like a month! Updating the online data after that would take considerably less time because only a fraction of the files change on a daily basis, but for me, a month's delay in deploying my data to an outside archive obviously is unacceptable.

Instead, I have elected to use a two-tiered approach that keeps backed up files with two weeks of version history available on my network, and that also allows me to carry archived copies of the latest file versions to a different location for safe storage. I continue to schedule weekly full backups of all files of interest plus daily backups of changed files and save them onto the external hard drive. I do not usually back up system files or temporary files, but be aware that not backing up the system files implies that you have the original installation CD or DVD media on hand so that you can restore the operating system and basic computer configuration from scratch.

The backup system is efficient in that it compresses the files as well as keeping only a single copy of duplicate files. For example, if a file is identical on two systems, only one copy is stored. If a new version were to appear on one system, the backup program would then keep both versions separately. As a result of these efficiencies, only 57% of the 750-GB external drive is required to contain the 1.1 TB worth of data that is distributed across the computers. Daily backups take only about 20–40 min to run, consume almost no resources on the PCs being backed up, and the backup processes only use about 20% of the resources on the system that is running the backup itself.

An archived copy of the data on the backup drive is necessary to avert the scenario in which an event takes out both the main drives and the backup drive. I will never know how close I came to this situation, but now I keep a separate archive copy of my files in another location and update it as often as possible, usually daily.

For this purpose, the backup system includes an archive function that copies all of the backed up files from each PC onto removable media, in this case onto USB 2.0 (universal serial bus) hard drives. Typically, this type of archiving has been the ken of digital tape drives or hard drives with removable media. Recently, however, the cost of pluggable external hard drives has plummeted. Notebook-sized USB drives with capacities of 200–500 GB now can be bought for under $100. Several of these together cost less than the least expensive tape or removable cartridge hard drives and media, and have considerably faster transfer rates. For archiving my backed-up files, I chose a pair of 320-GB drives that measure only 5 in. × 3 in. × 0.6 in. (126 mm × 80 mm × 15 mm) and are self-powered through the USB interface. I fill one disk in the evening with the archival data of that day and exchange it for the other drive the next day. I keep the most recent copy offsite in a secure location. This permits me only two-days' worth of file versions, and only one day once the archiving process starts anew. Additional drives could be added for longer retention of file versions, but this much retention is all I really need in the event of another drive-destroying event.

The final piece of any backup strategy is the restoration process. Assume that you will need to perform some kind of restoration every so often, and then practice restoring some or all of the backed up files and also the archived files, preferably monthly, but at least a few times a year. Empty space on any hard drive in your system will do for practice restores, just be sure to tell the restore process to restore the selected files to a different location than where they originated, or tell it not to overwrite newer files in case some file was changed after the backup copy was made.

Back to the Laboratory

I am confident that my personal computer systems are in better shape now to resist data loss due to an electrical event or other equipment-destroying occurrence. The other electrical equipment is functioning again, but it is not much better protected than before. Of course, there is always room for improvement, and I will continue to think about better ways to safeguard the equipment and the data I have. For example, I am considering asking the electric company to install a whole-house surge protector at the power drop outside. But what about a laboratory situation? Analytical data is one of the primary products that come out of a laboratory, and conservation of the data for required retention periods is one of the most important postanalysis tasks. Another class of mission-critical data is the information in the analytical methods and procedures used to generate results data. The instruments and computers, while more easily replaced than results data, also constitute a group that can benefit from attention paid to safeguarding from electrical or other damaging fault conditions.

Here is a list of practical suggestions to help keep laboratory equipment safe from the hazards that I have discussed in this column.

Install new surge SPDs

• after a serious electrical fault event

• when installing new equipment

• with appropriate power ratings

• as replacements for older SPDs that do not meet the most recent standards

• in consultation with IT and electrical personnel in your company.

Follow manufacturers' electrical guidelines and local electrical codes.

Train and enforce good electrostatic discharge (ESD) safety measures for those who have access to internal instrument or PC components.

Enlist newer PCs to handle critical data.

Replace older PCs that cannot reasonably handle newer data and processing requirements. The cost of a new PC is much less than the potential costs from the compromise of critical data.

Where there are multiple data systems in the laboratory, store critical data on a redundant network storage server that is backed up rigorously.

Assume that each of the computers in the building could fail at any moment and be prepared for the eventual failures.

Follow an appropriate backup and offsite archive scheme.

• In a large company, ask IT personnel about how backups are performed and how you can best use that service to meet laboratory data requirements.

•In a small company, document your current backup scheme and consider improving it.

Practice restoring data regularly so that you are ready for the real thing when it occurs.

Train employees on data retention, safeguarding, and electrical protection procedures.

If there is not already one in place, consider implementing a data safeguard policy that encompasses and expands on the previously listed items.

John V. Hinshaw"GC Connections" editor John V. Hinshaw is senior research scientist at Serveron Corp., Hillsboro, Oregon, and a member of LCGC's editorial advisory board. Direct correspondence about this column to "GC Connections," LCGC, Woodbridge Corporate Plaza, 485 Route 1 South, Building F, First Floor, Iselin, NJ 08830, e-mail lcgcedit@lcgcmag.com. For an ongoing discussion of GC issues with John Hinshaw and other chromatographers, visit the Chromatography Forum discussion group at http://www.chromforum.com.

Articles in this issue