Numb3rs 111: Sacrifice

In this episode the data from a murdered researcher's computer is recovered after being "wiped". After a review of the basics of magnetism and binary encoding, we will explore how the two are combined to store data on hard disk drives and how this data might be recovered after being overwritten with other data.

Magnetism

bar magnet plus filings

You have probably played with bar magnets, like the one on the right, before, discovering that they have two distinct poles and that like poles of two bar magnets repel while opposite attract. What you might have not realized is that magnetism originates at the subatomic level. Electrons (and other subatomic particles) have an intrinsic property misleadingly called spin; it does not refer to the particles actually spinning in the everyday sense of the word--it's more accurate to think of spin as a property of a particle akin to mass or charge. Nevertheless, as spinning a larger charged object creates a magnetic dipole (i.e. makes the spinning object produce a magnetic field), so does the spin of an electron make it behave like a tiny bar magnet. Additionally, when the electron also orbits an atom, this gives it another distinct magnetic dipole created by this orbiting, usually called the orbital dipole moment. The magnitude and axis of both of these magnetic moments can be expressed as vectors; adding those up we get the total magnetic dipole contributed by one electron. If we sum up these dipoles for each electron in each atom of a small object, and they do not cancel each other out, then that object, like the bar magnet above, produces a magnetic field and is known as a magnet.

Activity 1:
  1. Use wikipedia or other internet resources to find out about the differences between diamagnetic, paramagnetic, and ferromagnetic elements.
  2. What would happen if you take a bar magnet and break it in half near the midway point dividing the north and south poles? Can you use what you've read about ferromagnetism and the discussion above to explain why this happens?
magnetic field induced by current in straight
	wire

In addition to electrons producing magnetic fields by their spin and orbit around an atomic nucleus, moving electrons also create a magnetic field "around" the direction of motion, a phenomenon called electromagnetic induction. For instance, if a current flows through a straight wire, as on the right, a magnetic field is induced according to the right hand rule. Such a magnetic field can be used to realign the polarity of ferromagnetic materials, allowing the possibility of storing information in magnetic alignment, which is the basis of digital storage media like hard drives and floppy disks.

Binary Encoding

Throughout history humans used symbols, whether pictures, pictograms, or alphabets, to communicate, replicate, and store information. What differentiates older ways of storing information in books or manuscripts from what happens when you select "Save" in your word processor is the form of the encoding. By encoding I mean the representation of information in a way different from its original form. These sentences, for instance, encode information that originally exists as ideas in my mind. Encryption is another example of encoding, as is compression.

Activity 2:
  1. Julius Caesar used a simple shift encoding to communicate with his generals. In this encoding a letter of the Roman alphabet was replaced by one appearing n after it, for some positive integer n, wrapping at the end of the alphabet to the beginning. Although Caesar originally used n=3, the most famous version of this scheme is rot13, in which n=13, so that A becomes N, B becomes O, etc. The next part of this activity is encoded in rot13. Decode and do it.
  2. N fgnaqneq jnl gb rapbqr gur Ratyvfu nycunorg hfrq ol pbzchgref vf NFPVV, Nzrevpna Fgnaqneq Pbqr sbe Vasbezngvba Vagrepunatr. Vg rapbqrf rnpu yrggre nf n guerr-qvtvg ahzore. Ybbx hc na NFPVV gnoyr bayvar, naq qrpbqr gur jbeq va cneg 3 bs guvf npgvivgl.
  3. 067 111 110 103 114 097 116 117 108 097 116 105 111 110 115 !

Once we've encoded our alphabet as ASCII numbers, the next step is to encode these numbers using a series of blocks each with only two states, on or off, as in case of a ferromagnetic material which can be aligned along only one axis, with the north pole pointing either up or down. This is accomplished by converting ASCII codes into binary.

Note that 2008=8*100 + 0*101 + 0*102 + 2*103. By writing "2008" in everyday use we mean for the digits to simply enumerate how many 1s, 10s, 100s, and 1000s are in the number. But if we look at it this way, a natural question arises: why use the powers of 10, instead of say, powers of 3 or 7? Indeed, there's absolutely no objective reason to chose 10--resulting in what's called the decimal or base 10 numerical system--over other possible bases. The Sumerians, for example, used base 60, hence the 60 seconds per minute, 60 minutes per hour. The following activity should give you a feel for how different bases work and how to convert from one to another.

Activity 3:
Let mn denote the number m in base n. For instance, 510 = 1012 = 1*20 + 0*21 + 1*22.
  1. Here's an algorithm for converting from base 10 to base 4: take your number, say 10310, divide by 4, recording and discarding the remainder, in this case 3 (103=4*25+3). Divide the result by 4 again, recording and discarding remainder, rinse and repeat until you get 0. (25=4*6+1, 6=4*1+2, 1=4*0+1). Concatenate the remainders in reverse order to get the number base 4, 12134=10310. Why does this algorithm work?
  2. Design an algorithm to convert from base 4 to base 16.
Activity 4:
Besides binary (base 2), another common base used by computer scientists and programmers is hexadecimal, or base 16. Why do you think this is? (Hint: see part 2 of the previous activity).

Hard Drives and Data Recovery

Your computer hard drive consists of layered, circular platters of ferromagnetic material that quickly spin while read/write heads hover above. Those read/write heads are able to read and change the magnetic polarity of tiny areas on a platter's surface. The more such areas are squeezed onto the platters, the larger the storage capacity of the hard drive. The reading and writing operations are accomplished using the principle of electromagnetic induction discussed previously: a current passing through the tip of the head is used to change the polarity of the area below, and the reverse effect--the appearance of a current in response to a changing magnetic field--is used to measure the existing magnetic alignment.

Normally, when you delete something in Windows or on a Mac (and in most modern Linux systems), the files are not deleted but their records are moved from the folder they originally resided into a designated "Trash/Recycle Bin" area. During this operation the actual data on the disk is left untouched, and the files are easily recoverable from trash. If you empty the trash bin, the records of the files get destroyed: the filesystem designates the physical space in which the file resides as empty. However, the actual file, the bits encoded in the magnetic alignment, do not get realigned in any way. Thus the information is actually kept completely intact until it is overwritten, there's just no way to readily access or restore it. However, restoring such non-overwritten files is not particularly difficult; there exist both commercial and free software tools to do exactly that. Things get much more interesting, and contentious, when it comes to restoring information that has been overwritten one or more times.

The most important paper on this subject is Peter Gutmann's Secure Deletion of Data from Magnetic and Solid-State Memory. It describes the following method of recovering overwritten data. Suppose we zoom onto a hard drive's surface and discover that the initial alignment within each area are as illustrated below.
zoom on disk magnetic
	alignment

Suppose we now overwrite the middle area: change the alignment to point down.
zoom on disk magnetic
	alignment

Due to microscopic imperfections in the read/write head and tiny imprecisions in its movements, there will be a small band of the previous up aligned particles left, separated from the new down aligned particles by a thin buffer band of randomly aligned particles. Overwriting the same area again will create additional microscopic bands of molecules aligned according to the previous magnetic alignment. These bands will trace out the history of previous magnetic alignment of this area of the hard disk, and when examined with a powerful electron microscope can be read. Naturally, the more times an area is overwritten, the "fainter" the outer bands become and the harder it becomes to reconstruct long erased alignments.

This is the theory presented in the paper above, much of which has been confirmed. However, the most important and contentious claim made in the paper, that all this can be done quickly and efficiently enough to actually recover any nontrivial amount of overwritten data, and that such techniques are being used by intelligence agencies, remains unproven. Nonetheless, many software packages have been written which wipe files by overwriting the areas in which they are located repeatedly, thus making it virtually impossible to detect the original alignment.