Codes and Cyphers

By David A. Cooper

Some codes like Morse Code and Semaphore are used to enable communications at a distance, while others are used to hide the content of messages from outsiders. I will take a little look at the former type of code, and then move on to the main event.

There are a number of different systems used by deaf people to spell out words using their hands, but most hearing people can't be bothered learning any of them because they are too hard to learn and too easy to forget. However, I've noticed a pattern in the alphabet which could change all that. Take a look for yourself: a (b c d) e (f g h) i (j k l m n) o (p q r s t) u (v w x y z). The five vowels are spaced out through the alphabet, and each is followed by a group of three or five consonants: perfect for an animal with five fingers on each hand. You just use one hand to indicate which vowel you're starting at, and the other hand to say how far to go on through the alphabet from there to get a consonant, so if you extend three fingers on one hand (representing "i") and then hold your other hand lower down with five fingers extended, that would represent "n". This system is very easy to learn, and very hard to forget.
indentSemaphore is also hard to learn and easy to forget, but it's also hard to read because everything's suddenly reversed left to right when you watch someone else signaling to you. Again the solution is to use that pattern in the alphabet instead: one arm can be used to select a vowel, while the other can optionally be used to convert it into a consonant. The letter "a" is formed by putting one arm into the "teapot handle" position with your hand on your hip; "e" is formed by holding your arm out to the side, but sloping down at an angle of 45 degrees; "i" is formed by holding your arm out to the side horizontally; "o" is formed by holding your arm out to the side, but sloping up at 45 degrees, and "u" is formed by holding your arm straight up in the air. The other arm is held by your side if a vowel is being signed, but it can go through the same five positions as the vowels to indicate consonants, so if you imitate an aeroplane (both arms held out to opposite sides horizontally), then that's the letter L. It doesn't matter which arm you use for vowels and which you use for consonants because you will always put the vowel arm into position slightly before the consonant arm.
indentMorse Code is again hard to learn and remember, so in most emergency situations it is practically useless because you can be sure that one or other party won't know it. That pattern in the alphabet can come to the rescue yet again: just flash a series of dots to indicate a vowel, but turn the last of these into a dash if you are going to convert it into a consonant, and then send more dots to count through the alphabet from that vowel to the consonant you require."Dot dot dash dot dot (..-..)" represents the letter K (aei, jk). "Dot dash dot dot dot (.-... ae, fgh); dot dot (.. ae); dot dot dash dot dot dot (..-... aei, jkl); dot dot dot dash dot (...-. aeio, p)" represents the word "help". You can store the number of flashes on your fingers as you read the signals and then take your time to decode them before signalling with a single flash that you are ready to receive the next one.

Try out the three communication systems above with a friend to make sure you know how to use them correctly. The key skill you need to acquire is the ability to count your way through the alphabet starting from the places where the vowels are, but once you've got the hang of that you will find the three systems are extremely easy to use, and you'll be hard pushed to forget them.

Cyphers

A cypher is a kind of code where all the letters in a message are changed in an attempt to keep the content secret from people who aren't meant to read them. A simple cypher might replace "a" with "b", "b" with "c", "c" with "d", and so on. The message "meet me at midnight" would therefore become "nffu nf bu njeojhiu". Of course, if that message fell into enemy hands, it would be fairly easy to crack this cypher because it's such an obvious one to use. It would be much better to write the alphabet out a less predictable order and then use that as the basis for a more secure cypher:-
a b c d e f g h i j k l m n o p q r s t u v w x y z r e w q p o i u t y v c f d x z s a j k m n l b g h
Now "meet me at midnight" now becomes "fppk fp rk ftqdtiuk". Unfortunately, this is still not a safe way to keep your messages secure against outsiders. There are patterns which give clues as to what some of the letters might be, such as the "pp" in "fppk" which could help an enemy crack the cypher, and there is also the problem that some letters are more common than others, so all an enemy needs to do is count how many times each letter appears in a message and he can guess that the most common ones are likely to represent "e" and "t". Unless the message is very short and has an unusual lack of the most common letters, it can usually be cracked fairly easily: if you use a computer to help, you can often crack a message in a matter of minutes (or under a second if the program has access to a dictionary and can do the entire job for you).
indentTo show you how easy it is to crack a cypher, I have written a couple of programs for you to play with. The first one enables you to encode messages using the cypher of your choice. Once you have written and encoded your message, you can highlight it and press ctrl+C to copy it. You can then use ctrl+V to paste it into the other program designed for decoding and cracking messages. Here are the two programs:-

Encode a message using a cypher.

Decode or crack an encyphered message.

And here's a message for you to try to crack:-

ejrpxxlgdw mcmaimuupo mulpeqmlcp lnmapumpuk xgpurwkuje mqmlpvumrw mcmkuplmcr pkepijverj zxvlnkehjx hmakekrfwm erwmgpcmrw kuuwjcrkwp hmamxkqmcp rmxgdvrrwm fjcarwmwmc mfwmcmkrfj vxajrwmcfk umejrwphmj llvccmasvu rrjwmxdgjv ujxhmkrqvr pxjeomcdkm lmjzrmyrfj vxawphmrwm fjcarwmipe grkimujhmc.

The most common letter (assuming the message is written in English) is usually "e", though "t" is normally close behind. If you look at all the places where "t_e" appears in the message, you may be able to guess which letter represents "h", and you can then look out for words containing those three letters (then, there, here, where, three, other, etc). The next most common letters tend to be "a", "h", "i", "n", "o", "r" and "s". When trying to crack the message above, you may reach a point where you find "...where_two..." but it is unlikely that the words "where" and "two" are both parts of the message given the lack of room between them for anything other than an unlikely one-letter word, so you need to think about where a split might actually occur: perhaps the missing letter is "i" and it joins up with the "t" to make the word "it"? You can then think about what sort of word might start with "wo": it is likely to be a five letter word ending in "d", because the "d" can't join up with the next word and there aren't many three-letter words starting with "wo" or two-letter words ending in "d". If you want to get a better idea of how common all the different letters are in a typical piece of text written in English, just copy this paragraph and paste it into the input box of the second program, then when you click on "Post" you'll see how often each letter occurs in it. The letter "l" comes tenth, so whenever you get stuck decoding a message it may be worth trying the ninth, tenth or eleventh most common letter as a "l" and seeing if it produces any words.

So, simple cyphers can be cracked and are therefore insecure, but you can make your messages more difficult for outsiders to crack if you throw a little chaos into them. One trick is to use more than 26 symbols so that you can have more than one letter for "e" to make it behave as if it is a less common letter. By doing this for the other common letters too, you can make most of them appear equally frequently. You can also throw in lots of "nulls": letters which are not a real part of the message and which can be ignored by the person you're sending it to once it has been decoded. As you can szee fromq thqis izt is stillq possizble xto maxke out what isx beizng saiqd evenq with lotzs of nuqlls addedx to it, though the way you add them could actually make it easier to decypher if you do it badly (an artificially regular spacing between the nulls may flag them up as just that: nulls).
indentAnother way to make your messages more difficult to crack is to scramble them. You could, for example, divide the message up into blocks of five letters and change the order within each block from 12345 to 25314, and that would be enough to keep most people out, but there are advanced computer programs which may still be able to crack your codes by spotting points where the same word has been scrambled in exactly the same way at different places in your message. You could make this less likely by using a different cypher for each letter in the message, with the pattern repeating after a large number of letters, while your scrambling pattern would repeat after a different number of letters. You would need to write your own computer programs to handle the processing for all that, and even then there would be no guarantee that it couldn't be cracked if you were to use the same system too often. The safest bet is to write your own program to generate random numbers, done in such a way that two machines running the same program would create exactly the same sequence of numbers. Contained within each message would be a number of letters used to initialise the program that generates the random numbers so that it comes up with a unique set of numbers for each message, and no one could possibly crack the code unless they had their own copy of that program. The random numbers would be used to change the cypher used for each letter of the message, and it wouldn't repeat at any stage. Of course, if you were to connect the computer running your program to the internet, someone could hack into it and copy the entire program, so you need to be extremely careful if you are trying to protect important industrial secrets. Don't rely on encryption systems involving private and public keys for anything more important than buying things on line: the people who sell you the keys can decode all your messages, as can the governments they pass them on to, and governments will think nothing of stealing your ideas.

Phonetic Cyphers

My favourite kind of cypher is the phonetic cypher: although they are not secure, phonetic cyphers can be spoken and they can easily be taken for foreign languages. If someone recorded your conversations, they could crack your cypher fairly easily, but that's unlikely to happen (and there is a way of stopping them which we'll come to later). When you design a phonetic cypher you have to replace vowels with other vowels and consonants with other consonants so that the result can still be spoken, but you will need to learn a little about phonetics if you are to do the job well. English uses ten main vowels, as you can see (or hear) from the following ten words: moon; blow; saw; last; man; ten; main; tree; hit; bird. When I write phonetically, I spell these ten words as: mwn; blo, sq; last; mxn; ten; mjn; tri; hyt; burd. Vowels can also be short or long, so some words are only different from each other in their vowel lengths: "come" has a short vowel, while "calm" has a long vowel, but the vowels are otherwise identical (unless you live in northern parts of England where "come" often sounds more like "coom" with a very short, malformed "oo" sound). I spell these words phonetically as "cam" and "caam". The words "stock" and "stalk" ("stqc" and "stqqc") are another such pair, differing only in their vowel lengths. "Pull" and "pool" also have different vowel lengths, though because the difference feels less extreme I just spell them both as "pwl" (though again people in northern parts of England can use a rather different vowel for "pull"). The vowel "ee" also comes in long and short forms, but the difference is only rarely important: "I need a salesman in the Netherlands" has a very different meaning from "I knee-ed a salesman in the netherlands". All of these vowel length differences can be completely ignored when you make a phonetic cypher, so you should treat "come" and "calm" as if they sound the same. There is another complication with vowels: the vowel in "fly" is actually a slide from one vowel to another: you start on "ah" and slide up to "ay". There are other kinds of slide between vowels in words such as "cow" ("ah" to "oh") and "boy" ("aw" to "ay"). I spell these three words: flaj; cao; bqj. Some people pronounce the word "down" differently depending on whether it means feathers or the opposite of "up": the former is "daon" while the latter is "dxon" (remembering that "x" is the letter I use use for the vowel in "cat"). Two other words which can be pronounced with "xo" in them are "now" and "round". When making a phonetic cypher, "xo" can be treated as if it is "ao". There is one standard English vowel which I haven't mentioned yet: the second vowel in "woolly". It never occurs in stressed syllables, so it goes almost unnoticed. Its sound lies somewhere between "ay", "ee" and "i". In phonetic cyphers it can be treated as if it is the sound "ay".
indentThe consonants are easier to handle. At the front of the mouth there are p, b, m, f and v. Further back are t, d, n, s and z, plus a variety of s/z variants: th ("th" as in "thin"), dh ("th" as in "then"), sh, zh (vision), hr (used in Welsh), r, hl (used in Welsh) and l. At the back of the mouth are c, g, ng, ch (loch) and gh (used in Gaelic). "H" is a special consonant which is really just an increase in pressure: there is actually a "h" built into many of the other consonants, where t = hd, s = hz, f = hv, c = hg, etc. "Y" is really the vowel "ee" behaving like a consonant, while "w" is the vowel "oo" behaving like a consonant. With an "h" sound placed in front of them, they become the initial sounds in "huge" and "when" (though many English people just say "wen" for the latter). I spell the words "yes", "wood", "huge" and "when" as "'es", "wwd", "h'wdzh" and "hwen". The other consonants are all clusters, so x = c + s, qu = c + w, ch = t + sh and j = d + zh. I usually spell the last two of these as "tt" and "dd" instead of "tsh" and "dzh" because it's quicker. My phonetic spelling system also uses the letter "k" to represent the sound normally spelt as "ng", so "long" becomes "lqk", and "longer" becomes "lqkgur". You can read a full explanation of my phonetic spelling system here.
indentWe can now use our knowledge of phonetics to help us create a phonetic cypher, and the trick is to pair off each sound with another so that, for example, the sound "ee" could be used to replace "oo" while at the same time "oo" is used to replace "ee." Similarly, "ay" and "o" could be paired up, as could "aw" and "e". These particular pairings are not necessarily the best ones to use because they are all opposites, and you may not want to be that obvious, but it will allow you to speak less precisely without the meaning being lost, and it also makes the translation easier for your brain to perform. Even if you happen to use the same vowel pairs as someone else, the difference between the consonant pairs will still make your cyphers so different that they will be mutually unintelligible. Anyway, it's up to you. Using my phonetic spellings, you need to make pairs out of: w, o, q, a/aa, x, e, j, i, y, u, aj, qj and ao/xo. You may wish to treat y and u as the same vowel as they are pretty well interchangeable. I will leave it to you to create your own pairings and to fine-tune things to create an overall sound for your phonetic cypher which you like (if you get it wrong it can sound very ugly indeed, so do take your time to get it right). Next you need to do the same job with all the consonants: h', ', hw, w, h, p, b, m, f, v, t, d, n, th, dh, s, z, l, sh, zh, r, c, g, k (ng), tt (=tsh), dd (=dzh), plus any other sounds you may wish to add (such as the "ch" in "loch" which actually works well if you just count it as a form of h and lump them both together). There is, however, a complication because the first five of these (h', ', hw, w and h) can only come at the start of syllables, and most people only find the k (ng, not c) comfortable to say at the end of syllables. The same thing happens with consonant clusters: you can't put "nt" at the start of a word, and you can't put "spl" at the end, though "st" is happy in either position. So, the trick is to sort them all into three separate groups and then to try to swap single consonants with other single consonants within their group, and consonant clusters with other consonant clusters within their group, with just the odd single consonant like k (ng) having to pair up with a consonant cluster. The three groups are: h', ', hw, w, h, pl, pr, bl, br, fl, fr, tw, tr, dw, dr, ni, spl, spr, sm, sf, str, sn, sm, sl, sw, scl, scr, li, shr, cr, cw, gr; p, b, m, f/th, v/dh, t, d, n, s, z, l, sh, zh, r, c, g, sp, st, sc, tt/tsh/ti, dd/dzh/di; k, pt, ps, bd, bz, mp, mpz, mt/mpt, md, ms, mz, ft, fz, vz, ts, tth, dz, nt, nts, nth, nthz, nd, ndz, ns, nz, nsh, lp, lps, lb, lbz, lm, lf, lfs, lv, lvd, lvz, lt, lts, ld, ldz, lsh, ldzh, lc, shd, rp, rpz, rb, rbz, rm, rmz, rf/rth, rfz/rthz, rt, rtz, rd, rdz, rn, rnz, rs, rz, rl, rld, rldz, rlz, rsh, rc, rcz, rtsh, ct, ctz, cs, cst, csd, cth, gd, gz, kc. There are ways of simplifying these lists to make things easier, particularly with the consonant clusters ending in s/z/t/d, but I'll leave that for you to work out for yourself. It's also possible to eliminate many of the consonant clusters if they happen to convert well as individual sounds in the first place, though that will depend on the pairings you choose for the single consonants: eg. if you pair "s" with "g" and "t" with "r", then "st" will turn neatly into "gr" and "gr" into "st", so you eliminate the need for "st" and "gr" to be in your list of pairings. I would strongly recommend that you leave all the consonant clusters out of the list to begin with when you create your cypher and just concentrate on pairing up the single consonants: you can always break up the clusters into individual consonants and put a "u" vowel between them when speaking your cypher (eg. change "sprynt" into "supurynut"), though you can treat the consonant clusters tsh and dzh as if they are single consonants because they work so well that they feel as if they are single sounds. As you fine-tune your pairings you will find that some of the extra "u" sounds can be left out as the consonants will happen to combine well together and be easy to pronounce as clusters. You can then pair up any remaining awkward consonant clusters at a later stage to speed up your cypher and make it flow better. Most clusters are actually too rare to be worth bothering to cover at all (eg. jt-th (8th) = jtuth), but I'll leave that for you to decide for yourself.

Word-Pair Codes

Phonetic cyphers are fine in speech, but they are easy to crack if you write them down, and someone who is really keen to understand your conversations could record them and analyse them later, but you can get round this problem by combining your phonetic cypher with a word-pair code. The idea of a word-pair code is simply to pair up words so that you use each word in place of its partner. It is best to pair nouns with nouns, verbs with verbs, adjectives with adjectives, etc. so that everything still flows well gramatically. You could pair "horse" and "bicycle", for example, though these concepts are perhaps rather too similar and an outsider could guess the real meaning. The main difficulty you will have if you try to make up a code like this is the sheer number of words involved, though you can start out with just the most common words and build your system slowly over time. Word-pair codes are practically impossible for outsiders to crack, but they do sound rather silly, so it's best to use a phonetic cypher at the same time, and this has the advantage of hiding the fact that you are using a code as it will just sound as if you're speaking a foreign language. The phonetic cypher is also invaluable for disguising names of people and places which you are unlikely ever to pair up for swapping. If you want to make up a word-pair code for use with a phonetic cypher, there is no need to pair up all the common little words such as I, we, them, no, of, at, etc. because they aren't the biggest giveaway to what you're saying: when you learn a foreign language you learn all these little words early on, and yet they aren't enough in themselves to make any useful sense of any texts written in that language. Instead you should put your effort directly into pairing up all the words like: house, tree, pen, idea, time, blue, smooth, eat, sing, etc.

If you are really ambitious, you might prefer to make up an entire new language, but I should warn you that it is an absolute monster of a task and you will almost certainly give up: it took me over ten years just to get mine to the point where it was usable even at a basic level, though I probably did make things unnecessarily difficult for myself by insisting on gramatical and semantic perfection. I will publish the grammar of my language some day, but it's filled with so many industrial secrets relating to artificial intelligence that I'll probably have to keep it tightly under wraps for many years to come. If you do try to create your own language, don't try to derive all the vocabulary entirely from a finite set of fundamental components of meaning, because if you do, you'll get bogged down in it for decades. If I was starting out again, I would begin with a simple phonetic cypher, then add to that a word-pair code, and then I would replace some of the words with completely new ones of my own, gradually evolving it bit by bit into a completely new language. There are some areas where I would design large chunks of the language in one go, and dealing with all the different tenses would be one of those areas. You can find an analysis of all the verb tenses in the linguistics section of this Web site. Only then would I try to create a regular system for groups of related words such that "hot", "warm", "cool", "cold", "heat" and "temperature" could all be derived from a single word. Anyway, however far you take it, have fun with it: it's all really good for the brain. Good luck!