From msuinfo!agate!howland.reston.ans.net!vixen.cso.uiuc.edu!sdd.hp.com!hp-cv!hp-pcd!hpcvsnz!ericb Sat Mar 19 00:12:19 1994 Newsgroups: sci.crypt Path: msuinfo!agate!howland.reston.ans.net!vixen.cso.uiuc.edu!sdd.hp.com!hp-cv!hp-pcd!hpcvsnz!ericb From: ericb@lsid.hp.com (Eric Backus) Subject: Encryption from Hash Functions Sender: news@hpcvsnz.cv.hp.com (News ) Message-ID: Date: Fri, 18 Mar 1994 21:26:10 GMT Organization: Hewlett-Packard X-Newsreader: TIN [version 1.2 021193BETA PL3] Lines: 171 Encryption from Hash Functions ============================== This topic has been visited once or twice before in sci.crypt. In particular, I remember Phil Karn proposing a three-round MD5-based encryption scheme. But I haven't heard anything recently, and the encryption that I propose below is different from the one Phil proposed. Perhaps it's obviously flawed, in which case I need help seeing the flaws. Introduction ------------ Due to rather stupid export laws in the US, it is illegal to export encryption software. However, it is NOT illegal to export hash functions. This got me to thinking: suppose we design an encryption system that is extremely simple, and relies on a cryptographic hash function for its strength? There are a number of advantages to this: 1. We can import and export all the complicated hash functions we want to, without worry. The encryption program other than the hash functions is trivial, so that someone outside the US can easily implement it from the description provided here (oops, am I now breaking the law?), and then everyone will have access to it. 2. There are several different hash functions available, all of which are supposedly "cryptographically secure". If the encryption program could easily switch between hash functions, we could easily move to more secure hash functions as time goes on. If one hash function is broken, we just switch to a different hash function. 3. Since hash functions allow arbitrary length input, it is trivial to allow arbitrary length passwords for the encryption program. Encryption Method ----------------- 1. Get a password from the user. This should be an arbitrary length block of data. The password is sent through the cryptographic hash function, and the output of the hash function is called the "hashed password". 2. Create a unique initial output block. No two messages should ever be encrypted using the same initial output block. The initial output block could come from a hardware random number generator if one is available. Alternatively, it could be a combination of the current time of day, the hostname of the computer, and the process ID of the encryption program, all appended together and run through the cryptographic hash function: out[1] = hash(time + hostname + pid + whatever_else) We could even include the password as part of the input to the hash. 3. In a loop: Combine the most recently generated output block with the hashed password from step 1 (either by XORing the two, or by appending the two), and use this as input the to cryptographic hash function. Take the output of the hash function, and XOR it with the next block of data to be encrypted. This becomes the next output block: out[i+1] = in[i] XOR hash(out[i] XOR hash(password)) Decryption is nearly the same as encryption: out[i] = in[i+1] XOR hash(in[i] XOR hash(password)) That's it. Quite simple, except for the hash function itself. Strength -------- If the cryptographic hash function is really secure, then I believe only brute force can be used to break the above method. Therefore, 128-bit MD5 or 160-bit SHA should be much more secure than DES. But I'm not an expert - does anyone else see flaws in the method? Hash Functions -------------- By "hash function", I mean a function which takes as input an arbitrary amount of data, and produces as output a fixed size "hash" value. By "cryptographic hash function", I mean a hash function which has the following properties: 1. Given a "hash", there is no way to gain any information about the input which produced the hash. 2. There is no way to find two different inputs that produce the same hash value. Of course, when I say "no way", I really mean "no way other than brute force". By "brute force", I mean trying all possible inputs to the hash function. Given an n-bit hash, it should on average take 2^(n-1) attempts to find an input that produces a given hash. It should on average take 2^(n/2) attempts to find two different inputs that produce the same hash. Because of this, we need to further qualify "cryptographic hash functions": 3. The fixed size "hash" must contain enough bits so that a brute force attack is sufficiently impractical that it need not be considered a risk for the cryptosystem. As far as I know, the MD2, MD4, MD5, SHA, Snefru-128, and Snefru-256 hash functions all meet the above criteria for "cryptographically secure". In addition, I believe one could make a cryptographically secure hash function by running IDEA in CBC mode, keeping the last output block as the hash. I'm sure other hash functions are also available. Using DES in CBC mode is also a possible hash, but the hash size would then only be 64 bits. This means that on average only 2^32 inputs would have to be tried in a brute force attempt to find two inputs with the same hash, which may not be secure enough. Note that for additional security, the "hash function" could really be a chain of hash functions. The input string is fed to the first hash function, the output of the first hash function is fed to the second hash function, and so on for however many hash functions are used. The output of the last hash function would be the output that is used. Comments -------- Assuming we can always generate a unique initial output block, I don't see how anyone could do known-plaintext or chose-plaintext attacks, since even identical inputs will not encrypt to the same output. I haven't implemented this and timed it, but I suspect that an MD5-based hash encryption would be much faster than DES, and potentially more secure. Phil Karn's three-round MD5-based encryption scheme required three rounds of MD5 to process half a block of input data. The above method requires only one round of hash function to process a full block of input data. Thus, this method should be six times faster than Phil's. The requirement for a unique initial output block is really only a requirement that the initial output block be unique for a given password. This requirement comes from the following. Suppose two messages use the same initial output block. If these two messages also use the same password, then the second output block (which comes from the first input block) from each of these messages has been XORed with the same value. If these second blocks are XORed together, the result is the XOR of the two first input blocks (the plaintext). If one of the plaintexts were known, then the other would be trivially found. This encryption method adds one output block (the initial output block) to the length of the message. In addition, some fraction of an output block is added to the end of the message, just like with all block-based encryption methods. Since a block (the hash output size) is typically between 128 and 256 bits, this does not seem excessive to me. Even for me, who is not trained in cryptography, this encryption method seems fairly trivial and obvious. Therefore, the method itself should not be patentable. The hash functions, of course, are a separate matter. -- Eric Backus ericb@lsid.hp.com (206) 335-2495