Wolfram|Alpha: Systematic knowledge, immediately computable.

Thursday, May 6, 2010

Steganography, GKNOVA6 Style.

The recent viral marketing campaign of GKNOVA6, believed by many to be related to the upcoming Call of Duty game has used cleverly disguised messages to hint at something. What, no one is sure of at this point. One of the messages, itself some kind of hint, has a message hidden within it, revealed by temporal frequency analysis. Details of this can be seen at the entry at BashandSlash. Hiding information in plain sight (plain 'hearing' in this case) like this using techniques such as changing bits of sound or image files or otherwise embedding the information is known as Steganography

Frankly, it's pretty obvious listening at GKNOVA6 that there's something under the covers. I decided to try my hand at this in a less obvious way. Here's a clip of Amazon Rain Forest Sounds. Click to listen, or right click and 'save as' to download if you want to analyze it. There is a repeating message hidden in it: the name of the site that first made me aware of GKNOVA6. The message is hidden as sounds that when appropriately analyzed reveal the actual text of my message. I think the low-level synthesized audio I used for these 'messages' would pretty easily pass the natural sound test and be undetected for most listeners. The louder one stands out a bit on purpose, so you can hear what the 'message' sounds like embedded with the real rain forest sounds.

My message signal was produced in short order using Matlab and Mathematica and was then injected into the real jungle sounds. Basically, imagine scanning a 2-D array that represents the characters you want to inject, where one dimension maps into signal time (the left to right of the message) and the other maps to frequencies (the points occupied by the message in the vertical for a given point in time). As you scan the array in time, you generate some combination of one or more frequencies for that point in time that you inject into your 'cover' signal. Playing a bit loose with the terminology, you are in essence doing an inverse short-time Fourier transform.

The spectrogram you can use to recover the message is doing the reverse: mapping the frequencies and amplitudes for a narrow (in the time dimension) window scanning the signal in time overall. The resultant is nothing more than short-time Fourier transform of the signal: snapshots of the Fourier transform of the windowed signal stacked in time. The video below is a screen recording of the WavePad sound editor with the clip loaded and analyzed. All three message repeats of BASHANDSLASH stand out plain as day in the spectrogram. 

Pretty cool, I think! If you have something that can do temporal frequency analysis (many sound editors have this functionality), or have a package like Mathematica, Matlab, or Maple and know your way around the Fourier transform, have a look at the clip yourself. For a readily accessible, well written book on the basics, see Mathematics of the discrete Fourier transform (DFT) with audio applications by Stanford researcher Julius O. Smith III.

[Update: While making the effort to learn the low-level mathematics and techniques to do this sort of thing is great exercise for the brain, you may not have the time or interest to do so. If you just want to play around with the concepts, I've located a program that you can use to experiment with synthesis of signals from shapes. While the intent of this program is research into analysis and synthesis via partials, it provides a simple means of experimenting with the concepts involved. See S P E A R - Sinusoidal Partial Editing Analysis and Resynthesis for details. ]

Simple shape drawn in S P E A R then converted to audio, with temporal frequency analysis of that audio done in WavePad. (Click to enlarge)

If you'd rather view some results instead of making your own, click on the video below to see the amplitude envelope, temporal frequency analysis, and frequency distribution of the Rain Forest clip.

Rain forest sounds recording with secret message embedded analyzed, showing amplitude over time, STFT (Spectrogram) and frequency distribution. Click on the video while playing to see a larger version in a new window on YouTube.

[Update: Reading the first comment, it occurred to me that an analogy might help some visualize what's happening here.]

If you've ever seen a player piano (I mean a real player piano, with the perforated paper rolls, not a modern digitally based one), you've basically seen a sort of 'mechanical' transform at work (I'm really going to be playing loose with terms and analogies here to simplify this). The roll has positions across the paper that correspond to each note of the piano. As the roll is drawn over the sensor, any place there's a hole, the corresponding piano key is actuated.

So if you look at a point along the length of the roll, going at right angles to the length you would see what note(s) were actuated at that time. The length of the roll is the time dimension, the width is the frequency dimension. You could almost map the holes in the roll as it is wound on to the take up spool as the piano plays into the points of a spectrogram recording the piano (not literally, but surprisingly close).

Let's imagine you had a special 'punch player piano' that would take a blank roll of paper and punch the holes corresponding to the keys you played over time. That roll would be like the transforms over time of the notes you played. Just like the spectrogram, but having far less detail and not accounting for anything other than key press and duration. Now, if we took the roll the 'punch player piano' made and put it on a regular player piano, it would play your 'song' perfectly, mimicking the keys you pressed as time flows. Keys to holes, holes to keys.

Now imagine we take a blank roll, and we manually punch holes in it to make letters and shapes. We could still put that roll into our player piano, and it would play the notes corresponding to your manually made 'song'. It might sound like trash, but who cares. That 'song' would correspond to the 'sound' of my letters and shapes. We do this and record it. This is like my 'hidden' message sound. We could mix in some other real notes or melodies on the paper if we wanted, we could still see our shapes and letters.

Next, say we had someone that could hear any combination of notes and press all the right keys on a piano to mimic them. We play them this 'song' we recorded from our player piano that played the roll we manually punched to form shapes and letters.  As they manually 'replay' our song, their piano punches the holes corresponding to the keys they press into a fresh paper roll. They are effectively acting like a 'human spectrogram recorder', producing a spectrogram (the punched holes on their roll). The end result? Their roll of paper will have the same shapes and letters as the one we manually made. We went from holes (time) to keys (frequency) then from keys (frequency) back to holes (time). Shapes and letters to sounds, sounds to shapes and letters.

So in the main part of this blog entry, we knew what 'shape' we wanted the spectrogram of our hidden message to be (what holes in the roll we needed to make our shapes), and from that made the 'sounds' of it. We mixed that in with real sounds to mask it, and passed it along. Given the right tools, we can take the sounds and see the shapes. Our hidden message sounds produce the shapes we wanted in the spectrogram (the holes in the roll make our shapes). The real sounds that mask our hidden message matter not, but our shapes (the message) are revealed.


  1. I am a bit cornfused by this, but it is interesting. If I knew more about GKNOVA6 it might make more sense to me.

  2. @ Anonymous May 6, 2010 7:42 AM :
    I added a reference link that may help you visualize this. See the spectrogram link. Once you understand how the spectrogram shows the frequencies and amplitudes of signals as they vary in time, then just imagine doing the reverse: Knowing what 'shapes' you want the spectrogram to show, and building the needed sound(s)to get them.

    I also added an explanation from a different angle after the image, that might help to clarify what is happening.

  3. yeah, the piano roll thing makes sense. It's just like an NC tape for a machine tool. If you cut words into the tape, it might crash the machine, but the tape is all that matters in this instance, ie making the tape have words. It would be like showing an ugly piece of machined metal and someone reverse engineering the metal into the original tape with a pattern that looks like the words.

    I still don't quite get what GKNOVA6 means though.. heh

  4. That is the simplest explanation of what a fourier transform is that I've ever seen. I get it now! Can you post the code you used? I don't have the programs you used, but I'd like to see if I can do it in VB.

  5. @ May 9, 2010 6:53 AM :
    I am *really* sorry I didn't reply in a timely fashion to your question: I must have just glanced at the comment when approving it, and I just now really read the details.

    I don't think I archived the 'code', but I'll look and post a link if I did. There was no real 'code', per se, since I just cobbled up the desired stuff interactively. Since both applications have very sophisticated graphics & mathematical functionality (e.g., all the Fourier stuff I needed is just built-in functions) it ends up being a few dozen lines of interaction.

    VB certainly does not have such functionality built-in, though there may be some libraries for it (I'm not a VB developer, so I may be off base here.)

    That said, if you just want to generate the sounds for a text message, you could do it in VB as follows:

    1) Create an array / list of matrices that represent the shape of the letters (like the dot-matrix shape).

    2) Assign some list of numeric frequencies ( to be used to create pure sine wave audio) to the vertical dimension of the matrices, where the bottom of each letter is some frequency N, and the frequency increases as you go 'up' the vertical dimension of the letter.

    3) For each letter in your message, get the associated matrix, and for each point in the horizontal dimension, add the 'tones' corresponding to the occupied dots for the letter in the vertical dimension.

    4) Repeat 3 as needed for all of your message.

    Adjust the timing for each horizontal time slice, spaces, etc. to get the output you need. You cane use something like Wavepad to analyze it, and when it looks good, use the same to mix it in with some masking sound(s).