From msuinfo!agate!ihnp4.ucsd.edu!usc!rand.org!jim Mon Apr 18 23:03:20 1994 Path: msuinfo!agate!ihnp4.ucsd.edu!usc!rand.org!jim From: jim@rand.org (Jim Gillogly) Newsgroups: sci.crypt Subject: Re: PUZZLE or exercise with phi-test Date: 16 Apr 1994 15:12:33 GMT Organization: Banzai Institute Lines: 104 Message-ID: <2oov91$ovf@rand.org> References: NNTP-Posting-Host: mycroft.rand.org In article , Stewart C. Strait wrote: > Let the plaintext, ciphertext, and key be respectively > p[0],p[1], ...; c[0],c[1], ...; and k[0],k[1], ... . > > Then c[i]=p[i]+k[0]+ (sum over all p such that (2**p) & i is nonzero) > of k[p+1]. > > Plaintext: THIS ISQU ITEA WEAK SYST EM > Keys used: YYYY YYYY YYYY YYYY YYYY YY > P P P P P P P P P P P > MM MM MM MM MM > IIII IIII II > WWWW WWWW > XXXX XX > --------------------------- > Ciphertext: RUSR ONIB CCKV YVON NIZP HE and offered a challenge cipher starting > straits.3.002; VIGROTOR; 1024 Ciphertext letters. > ACGUR UMWKH HIARZ RWNTW SRNFG LLMVB LDNQG FCPYB ZICUR YUDKG MDDQJ XKSMN Probably a number of different approaches would work with this. As a one-off, I tried quick-and-dirty attacks that took more time overall than an elegant attack would have used. Step 1 was observing that subtracting the first 512 bytes from the last 512 gives a distribution of differences that depends only on the 11th key byte, k[10]: c[512] - c[0] = p[512] - p[0] + k[10] c[513] - c[1] = p[513] - p[1] + k[10] c[514] - c[2] = p[514] - p[2] + k[10] c[515] - c[3] = p[515] - p[3] + k[10] c[516] - c[4] = p[516] - p[4] + k[10] If a word from the first half happens to line up with a word in the second half, a string of equal values will show up in the ciphertext differences, and that value will be k[10]. The longer the string, the more confidence we have that it wasn't due to chance. Given the underlying roughness of English, an E in one half will wind up adjacent to an E in the second half more often than average (as with the other high frequency letters); so k[10] should be one of the most frequent values for the ciphertext difference to take. The same concept works for other key-letters with decreasing confidence as the segments get shorter: k[10]=D (strong confidence) k[9] =P or A, maybe E; k[8] =F (matching string of 3) or VGSX k[7] =H (strong) k[6] =K (very strong) k[5] =M or X, maybe B or H k[4] =maybe HQSW k[3] =perhaps GW k[2] =BIL?? k[1] =who knows k[0] =no information, since it's applied to all plaintext For the first few bytes, exhaustive search is good enough. With five bytes of key we get 16 bytes of output; with six bytes of key we get 32. To recognize when we're done, adding logarithms of English digraphs works well enough, as would other standard techniques like looking for English words using a hash table. With an exhaustive search of 6 bytes, the correct result gets the highest score. With 5 bytes, it's close enough to the top to spot, and a much shorter run: wgskb easaliosnesnteam: 99 wyxta eindchamonorlenh: 91 hovgz theeeteaengtorsw: 94 wcssb eesedegonisrlasi: 91 pzzur loshimepecceasat: 93 ubigm gherrteneattoefw: 91 pfajk liratrotldiesere: 93 tbjgn hiersueneassoeev: 91 nmtcx ndacctecaletotua: 93 oazvr mothileofcdearas: 91 larss presorschenofwnf: 93 mdobu ongreelseynlthet: 91 gewxx useoonnthaifandr: 93 hovro theetitppyreorsw: 91 abcfs abermofosonndtar: 93 hovgb theeeteaclermpqu: 91 xebgc dbisoniolehevith: 92 herqs trisutyelerolyth: 91 worhj esttodsofolyorwa: 92 hergc trisediobuhelyth: 91 sgwob iesalikorisntewi: 92 eylfr warhinsepobeatoi: 91 pjpcr lechaughesmesacl: 92 eqpjg winlerkeahithirt: 91 pfnwn liengeotiasocoob: 92 zszkd blieitewingponoo: 90 jtpan raidisofosweecon: 92 wpgrn eredestobjseactw: 90 iafvg suthoreowtoprild: 92 wovgk espppeplengtorsw: 90 hovsb theeshsocleradei: 92 ---> vqxww fromanystatenopr: 90 ekonc woosathhernehosa: 92 uemtj geakednthasperrf: 90 With the first 5 key bytes in hand, I tried each of the other likely keys from the first phase in turn, and ended up with VQXWWXKHFAD. Each of the higher key bytes was on the list of possibles, though not always first. Simpler would have been to eliminate the first stage altogether, and given the first few key bytes, get each subsequent byte in turn: with each correct new choice, we double the length of exposed plaintext, so it's a linear search for the correct one -- as in the movie War Games, where WOPR found digits of the secret code one at a time. Stewart said: > If I end up posting a lot of these I'll probably use rec.puzzles > instead of sci.crypt and just give a brief announcement in sci.crypt. > (Unless there is a consensus that they belong here.) IMHO sci.crypt is appropriate for the initial posting; probably rec.puzzles is best for subsequent ones. -- Jim Gillogly Trewesday, 25 Astron S.R. 1994, 15:12 From msuinfo!agate!ames!elroy.jpl.nasa.gov!sdd.hp.com!vixen.cso.uiuc.edu!howland.reston.ans.net!EU.net!uknet!comlab.ox.ac.uk!pcl Mon Apr 18 23:03:55 1994 Newsgroups: sci.crypt Path: msuinfo!agate!ames!elroy.jpl.nasa.gov!sdd.hp.com!vixen.cso.uiuc.edu!howland.reston.ans.net!EU.net!uknet!comlab.ox.ac.uk!pcl From: pcl@foo.oucs.ox.ac.uk (Paul C Leyland) Subject: Re: PUZZLE or exercise with phi-test Message-ID: In-reply-to: straits@netcom.com's message of Thu, 14 Apr 1994 07:17:51 GMT References: Date: 18 Apr 1994 15:43:57 GMT Lines: 92 In article straits@netcom.com (Stewart C. Strait) writes: The cryptosystem used below can be broken with unknown plaintext using the phi test together with techniques familiar from the Vigenere cipher. Description deleted: The puzzle itself: Plaintext is public domain, English, and may begin or end in mid-word. straits.3.002; VIGROTOR; 1024 Ciphertext letters. ACGUR UMWKH HIARZ RWNTW SRNFG LLMVB LDNQG FCPYB ZICUR YUDKG MDDQJ XKSMN Remainder of ciphertext deleted. Solved it. I used exhaustive search on the first 32 bytes of the message, i.e. 6 key characters. A psychological (rather than cryptanalytical) optimization was to run from Z down to A as I guessed that Strait would expect people to use exhaustive search in the naive manner 8-) The key popped out quite quickly. For each proposed decryption, I compared the trigram statistics against those for a large body of English, with the optimization that any decryption with a trigram that did not appear in my tables was instantly rejected. I had to build my own trigram tables: I used the Oxford Text Archive to help me. My tables are now built from 10Mb of the writings of Trollope, possibly not the best of sources. After an initial run to see what sort of values were reasonable, I printed out all those which exceeded a threshold and scanned the output. The key begins VQXWWX and the plaintext FROM ANY STATE NO PREFERENCE SHALL BEG Digging out the rest of the text was then very easy: for each new keyletter, write out the next block of text and all 26 decryptions; pick the obvious one. Nice challenge: took me a few hours, largely because I had to write all my code from scratch. At least I have a nice trigram analyser in my toolbox now. Here's the trigram statistics code: int tri_stats (len, s) char *s; int len; { int j, c1, c2, c3; int sum = 0; int v; c1 = s[0]; c2 = s[1]; j = 2; while (j < len) { c3 = s[j++]; v = tris[c1][c2][c3]; if (v == 0) return 0; sum += v; c1 = c2; c2 = c3; } return sum; } read_tris () { int c1, c2, c3; for (c1=0; c1 < 26; c1++) for (c2=0; c2 < 26; c2++) for (c3=0; c3 < 26; c3++) { scanf ("%d", &(tris[c1][c2][c3])); } } Please continue to post challenges of this level of difficulty. Paul -- Paul Leyland | Hanging on in quiet desperation is Oxford University Computing Services | the English way. 13 Banbury Road, Oxford, OX2 6NN, UK | The time is gone, the song is over. Tel: +44-865-273200 Fax: +44-865-273275 | Thought I'd something more to say. Finger pcl@black.ox.ac.uk for PGP key | From msuinfo!uwm.edu!spool.mu.edu!darwin.sura.net!convex!cs.utexas.edu!howland.reston.ans.net!math.ohio-state.edu!jussieu.fr!univ-lyon1.fr!news-rocq.inria.fr!couchey.inria.fr!doligez Mon Apr 18 23:05:02 1994 Path: msuinfo!uwm.edu!spool.mu.edu!darwin.sura.net!convex!cs.utexas.edu!howland.reston.ans.net!math.ohio-state.edu!jussieu.fr!univ-lyon1.fr!news-rocq.inria.fr!couchey.inria.fr!doligez From: doligez@couchey.inria.fr (Damien Doligez) Newsgroups: sci.crypt Subject: Re: PUZZLE or exercise with phi-test Date: 18 Apr 1994 18:47:34 GMT Organization: I.N.R.I.A Rocquencourt Lines: 62 Distribution: world Message-ID: <2oukk6$922@news-rocq.inria.fr> References: <2oov91$ovf@rand.org> NNTP-Posting-Host: couchey.inria.fr Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit In article <2oov91$ovf@rand.org>, jim@rand.org (Jim Gillogly) writes: |> For the first few bytes, exhaustive search is good enough. With five |> bytes of key we get 16 bytes of output; with six bytes of key we get 32. In article , pcl@foo.oucs.ox.ac.uk (Paul C Leyland) writes: |> Solved it. I used exhaustive search on the first 32 bytes of the |> message, i.e. 6 key characters. Hey, you guys seem fond of brute force. I only had to brute-force 3 characters of the key, and the recognition could be easily automated. This is how I did it: > Plaintext: THIS ISQU ITEA WEAK SYST EM > Keys used: YYYY YYYY YYYY YYYY YYYY YY > P P P P P P P P P P P > MM MM MM MM MM > IIII IIII II > WWWW WWWW > XXXX XX > Ciphertext: RUSR ONIB CCKV YVON NIZP HE Notice that guessing the first 4 letters of the key gives you 8 characters of plaintext. But you can apply the same trick to any block of 8 characters starting at a multiple of 8. Guessing 4 letters of the key on the "CCKV YVON" cyphertext will give you k[1], k[2] and k[3], and k[0]+k[4] (instead of the k[0] you get for the first 8 characters). So I guessed 4 letters of key for each 8-letter block, retained the blocks with no "unlikely" trigrams, and sorted by the 3 last letters of the partial key. The only (k[1], k[2], k[3]) that appeared more than twice was represented 15 times, and it was QXW. I recovered k[0] by listing the 26 possible decryptions of the first 8 characters, and the other characters of the key by the same method, doubling the amount of plaintext for each key. I could also have solved the 15 equations (with 8 unknowns) given by the 8-character blocks. [also from Paul:] |> I had to build my own trigram tables: I used the |> Oxford Text Archive to help me. My tables are now built from 10Mb of |> the writings of Trollope, possibly not the best of sources. My tables come from 60k of text, mostly the PGP documentation. That's all I had on my hard disk. Next time, I'll use the US Constitution. The many "shall" and the absence of "key" and "you" are no good for the statistics... This took me the best part of the week-end, but then, it is my first successful cryptanalysis. I tried the first approach described by Jim, but I gave up when it seemed to get nowhere. I figured that an experience cryptanalyst should be able to recover both pieces of plaintext from the character-wise difference of the two 512-character pieces. Is it possible ? Is it easy ? |> Please continue to post challenges of this level of difficulty. Seconded. This is more relevant to this newsgroup than all the political discussions about the EES. If you post these to rec.puzzles, I will have to subscribe that group.