The Coffee Place's Joke Stack
The Truth About Data Compression
by Pablo Biannuci
Most of the people using personal computers have been amazed by the prodigy of data compression. It's very difficult to imagine how that long, long file filled with nonsense numbers could get squeezed into a small, small file filled with nonsense numbers. In this article I'm going to explain all the secrets about the compression of computer data, even those that the software companies do not want to see published. There are several data compression algorithms. An explanation of each follows, along with a brief summary of its applications.
Non-Repeat Packing
---------- -------
This is the simplest (and worst) compression method. It is based on the fact that if you write one 'A' instead of twenty, you'll save a lot of space. Actually, its savings aren't too impressive, but it is known to have saved a life when a message saying "I'll kick yer butt" was compressed to "I'l kick yer, but" (There is a committee investigating the sudden appearance of the comma, but they don't agree that a comma is there yet.)
Huffman Tree
------- ----
This is one of the first algorithms that yielded a relatively acceptable compression ratio. (relatively means E=mc^2, where E is the compressed file size, m is the original file size and c is the speed of light).
It is quite simple; all you have to do is to count the appearances of each character (A, B, C, D, and so on) and then build a binary tree with them so that the characters will be its leaves. Once the tree is built, just get a sharp axe and cut it down. Chop it in small fragments, and pile them neatly. It will occupy just a fraction of the space it occupied before, so it will be compressed. Currently the Huffman algorithm is not so frequently used because there must be some kind of identification in each piece of tree, and that takes so much space that it's almost bigger than the tree. MNP modems use a modified version of this method, but instead of chopping the tree with an axe they chop it with a MicroCom Chop-O-Matic machine, which is much faster.
LZW
---
LZW is a more modern algorithm. It is widely used, sometimes together with Huffman. Its name (kinda cryptic, isn't it?) derives from its authors initials: Lempel, Ziv and Welch.
The idea behind this method is revolutionary: To reference part of the contents to things before it. (It is revolutionary in data compression, it has been widely used in most other life aspects.) To be more clear to those of you not accustomed to technical jargon, what this algorithm does is to insert footnotes instead of the actual data. Let's see an example. The original text is : "As I was looking at my reflection in the mirror, I was playing reflections with my look, and I broke the mirror." (Just a selected text sample. It's not a reflection of the author's mental state. In fact, mine is a bit worse.) The compressed text would be: "As *1* *3* my *2* in the *4*, *1* playing *2* with my *3* and *1* broke the *4*."
Footnotes: "1:I (was) / 2:reflection(s) / 3:look(ing) / 4:mirror. As everybody can see, it is tightly compressed, and now it fits into a pocket book's page.
Shannon-Fano Trees
------------ -----
This is another vegetal algorithm. The difference is that this method uses the so-called "Sliding Dictionaries," which make it better. To compress data this way, you have to take it, dig some holes in fertile ground and scatter parts of it (the data) in the holes. After some nice trees have grown up (the Shannon-Fano trees, and they grow quite fast), you tie some ropes to their branches and place a dictionary so that it is able to slide up and down the rope. (Be sure the rope is strong enough, or don't use an unabridged Webster's dictionary.) I cannot figure out why this compresses the data yet, but it works, so I'll leave it alone for now.
There are more compression methods out in the computer world, but I didn't have access to the confidential information about them. (Lunch time in the software company that uses it is an example.) Also, some methods aren't single algorithms, but a mixture of two or more of them. For example, the world-wide known Imploding is a LZW algorithm with a Shannon-Fano Tree performed after, which leads to a data collapse and posterior implosion, but I won't be going in much farther detail.
So here ends our lesson about data compression. I *1* this *6* *7* *2* useful *8* you *2* it *6* *7* *8* me, *2* a way *5* spend my time at *3* *4* hospital. Hey! Who *6* turned on *3* LZW?
1:hope / 2:as / 3:the / 4:psychiatric / 5:to / 6:has / 7:been / 8:for
--------------
Pablo Biannuci is the sysop of Atomic World BBS in Avellaneda, Buenos Aires, Argentina. FidoNet> 4:901/225
This page is maintained by: mark@thecoffeeplace.com
Changes were last made on 05-05-2002
Return to The Coffee Place's Joke Stack