The Pure Crypto Project Version 2.0
Remarks on Security

Attacks on the Cipher RSA

With the usual notation, PCP is based on a cryptographic function (ModExp) that transforms a message M in the following way:

{{ M } Ka } Ka-1 = M

where Ka is Alice's RSA public key and Ka-1 is her corresponding RSA private key. This scheme can be used to create a signature by applying the keys in the reverse order.

How RSA Works

The RSA cryptosystem basically uses three different and very large numbers (e,d,n). Although one of those numbers (e) can actually be quite small to speed up encryption, the security of the cipher will not be reduced. Two of them, the public key e and the modulus n are used to encrypt a message. The message to be encrypted is converted into a very large number, which will be raised to the power of e, the recipient's public key, and the modulus n will then be subtracted from the result until it gets smaller than the modulus n. This remainder (mod n) is the cryptogram (or the encrypted message) ready to be sent to the recipient.

Long messages cannot be encrypted in one go, as the number representing the plain text must be smaller than the modulus for RSA to work correctly. Thus long messages must be split into a bunch of smaller messages and each of them must be encrypted separately. This will lead to some problems that are addressed later.

The recipient applies the same procedure to the cryptogram to decrypt it. Again the encrypted message is converted into a very large number which is raised to the power of d, the private key, which miraculously yields the original message after the remainder has been computed using the modulus n.

Just one single number d has to be kept secret at all costs. On the other hand, the method of computation and both publicly known numbers e and n as well as the cryptogram need not to be kept secret and can safely be analyzed by everybody.

But the RSA cryptosystem only works correctly, if the three numbers fit together, which can be assured by a careful process of generating a user's RSA key pair.

Basically, to generate a pair of RSA keys, you have to find two very large prime numbers, p and q, which form the modulus as a product (n = p*q). You can derive both interesting numbers, e and d, which form the actual pair of keys, very quickly from these two prime numbers. So it is important that the primes are thrown away after the RSA key pair has been generated and only the modulus n remains.

An attacker who knows the modulus n, "only" needs to find out which primes the modulus is composed of, in other words he just needs to factor the modulus. After that computing the private key is a matter of seconds. Fortunately, the problem of factoring is a job causing an awful lot of work if prime numbers are used, which consist of several hundred decimal digits. So factoring sufficiently large moduli is regarded as a practically insoluble problem .

How it really works:

RSA

Take an integer value M that represents a message text and a large prime p > M and start calculating powers of M, say M1, M2, M3, M4 and so on. Taking the remainder of these powers of M modulo p will result in a sequence of numbers between 0 and (p-1), which will appear in a strange and unpredictable order.

But within this sequence the number 1 will show up when the exponent is (p-1), regardless which integer M you have chosen as a base. With a different prime q the number 1 will appear exactly when you finish to calculate M(q-1) mod q.

Now we start calculating powers of M modulo the composite number p*q. Which exponent would you expect to produce the number 1 again ? If (p-1)*(q-1) seems to be the only natural answer to you, you have reinvented a theorem due to EULER, stating that for every M < n = p*q

M phi(n) mod n = 1

where (p-1)*(q-1) is usually called phi(n) (Euler's totient function).

We now are only one step away from RSA.

If you continue to multiply the power by M once again you get

M phi(n) + 1 mod n = M And if you choose your public/private key pair to be e * d = phi(n) + 1 then you have:

M e * d mod n = M

Thus applying the simple operation of modular exponentiation to a message twice, first using the public key for encryption and secondly using the private key will recover the original message !

It may be worth noting, that the security of RSA also depends on the absence of any "short-cut", so that there is no way to decrypt or sign a message without using the private key. Because of the fact that the cryptogram was created by taking the message to the e-th power (modulo n) one might think that computing the e-th root of the cryptogram (modulo n) will produce the message.

On one hand nobody has found a way to perform this calculation yet, given that really large numbers are involved, but on the other hand it has not been proved that calculations of e-th roots modulo n is of the same complexity and requires as much effort as factoring the modulus. So there might be a short-cut not yet known to the cryptographic community that can render the use of RSA useless in future. It would be wise not only to have a close eye on the advances in factoring, but also on the possibility of short-cuts like calculations of roots or even some completely different approach.

Minimum Key Length

One of the obvious problems follows from the fact that keys must have a sufficient length, which means effectively that it is computationally infeasible for an opponent to deduce the private key from the public key, i.e to factor the modulus (or use a shortcut unknown to the crypto crowd). The history of factoring shows that it is possible to factor moduli of 850 bit and further improvements can be expected. There has to be a lower limit to the modulus length which every key must meet to be used in PCP, which is currently set to 1300 bit. PCP will reject every key which is shorter. Of course, there are different keys for encryption and signing, so the long-term signing key will be considerably longer (minimum 2150 bit) than the current encryption key.

Looking at the Source Code:

Modular Exponentiation
def ModExp (Base, Exp, Mod): Hash = 1 X = Exp Factor = Base while X > 0 : Remainder = X % 2 X = X / 2 if Remainder == 1: Hash = Hash * Factor % Mod Factor = Factor * Factor % Mod return Hash
This function of 11 lines is the implementation of the basic function that performs modular exponentiation of very large integer numbers, the heart of the RSA cryptosystem.

The basic idea behind this function is to avoid to cancel out the multiples of n in the result of a very, very large number at the end of the process. Instead it ensures to get rid of multiples of n early during the step-wise process of computing the power, so that the resulting numbers never grow too large. This is done by extracting the binary bits of the exponent beginning with the least significant one. If the bit is 1, a factor representing a certain number of bases, which corresponds to the value of the bit, is multiplied to the rest that has accumulated so far. When the most significant bit is reached, exactly as much bases have been multiplied as defined by the exponent and the computation is complete. In every step of the computation, only remainders (mod n) are used subsequently, so that none of the numerous factors of n remain in the calculation unnecessarily.

Multiplicative Homomorphism

A bunch of problems arise from the so called "multiplicative homomorphism" which is a property of the core-encryption function ModExp above.

Unfortunately, it is generally insecure to apply this function to blocks of plain text directly or to sign a message directly, because cryptograms and more important signatures are multiplicative. That means that you can generate a "new" valid signature simply by multiplying two old ones and the same holds for encrypted blocks. To defeat this multiplicative characteristics of RSA, a one-way hash function is used to convert a message into a large number, which is being signed instead of the message itself. Multiplication is no longer a threat, because multiplying signatures would destroy the format of the hash function and its proper padding too.

Preparing a plain text for safe encryption is more difficult and requires a mechanism called hash chain encryption padding which ensures that some random data is attached to the message and that a wilful or accidental manipulation of encrypted blocks can be detected to thwart oracle attacks.

The multiplicative homomorphism of RSA leads to another problem which comes into play when you encrypt messages. A method called Optimal Asymmetric Encryption Padding (OAEP) is used to protect encrypted plain text from being forged and this usually applies to short pieces of data only, like key packets. Extending OAEP to all encrypted plain text would roughly be equivalent to signing the whole message, because a hash of the whole message body is needed in OAEP.

That means we need a padding scheme that uses randomness to protect the plain text blocks before encryption. Because without a carefully crafted manipulation of the plain text the RSA encrypted blocks are exposed to a variety of attacks, which I will describe before I explain PCP's padding scheme (hash chain padding).

Key Related Attacks

Key Spoofing Attacks

Some carefully crafted key spoofing attacks have been described in detail by Ross Anderson and Roger Needham (see [AN-95a], [AN-95b]). Especially when users try to sign something that contains already encrypted data, an opponent who is able to factor the modulus used for encryption (i.e. at least Bob) can work out a different Message M' and publish a modified public key for Bob so that the new Message M' looks like a valid signature made by Alice on something that was encrypted for Bob. (see [AN-95a] p.2 and [Bon] p.4 for details)

To prevent this kind of attack, PCP checks every message the user is about to sign whether it contains lines of cryptographic data - which is everything made out of 0..9 on a line by itself - and issues an unequivocal warning that the user should not proceed with the signing process unless he knows what he is doing. The warning makes clear that it is prudent to sign before encrypting anything.

Another form of key spoofing attacks require compatible weak keys (see [And-2000] p.4 for details) that will work if an attacker is able to modify the modulus in a public key published in an unprotected place. This will show up in PCP, because before any key is used the security hash is printed, so that the vigilant user will notice that the security hash value has changed. If the manipulation has occured before the user signs the key and the correct, unchanged security hash is on the key and has been checked with the user, then PCP will notice discrepancies between the signed key material and the security hash that was signed. This will lead to another unequivocal warning not to use the key unless it has been checked reliably with first-hand knowledge.

Low Public Encryption Exponent Attack

There is a serious problem arising from the fact that most public keys that are commonly used have a very small public encryption exponent e as part of the public key (e,N). These small values of e facilitate practical attacks based on Coppersmith's theorem. Basically his theorem provides the means to find small roots of polynomials modulus a composite N. An attack requires, that the same message M is encrypted with a number of different public keys. If for instance the public encryption exponent is 17, which is a number used in very many public keys today, an attacker can obtain the plain text after he has collected more than 17 different ciphertexts of the same message and has used the method to compute 17th roots based on Coppersmith's theorem. This attack can be defeated by carefully padding the message before it is encrypted with the RSA primitive (ModExp) but only, if randomness is introduced in the process of encryption. Anyway, the encryption exponent used by PCP is set to 65537.

Random Fault Attacks

If signature verification fails, PCP will report this fact to the user without revealing details of the mismatch between the freshly made hash of the message and the decrypted hash value stored in the signature. On the other hand, if a signature is being created and the private key is being used, any randomly occuring dysfunction of the hardware may affect the created signature so that at least one single bit may be wrong. To prevent any possible exploitation of those random faults (see [Bon] p.13) PCP will always check, whether a signature verifies correctly before it is written to the file system.

Oracle Attacks

Recently oracle attacks have become very popular (see [JKS] and [Bon] p.3) and they pose a real threat to the RSA encryption scheme, when being used without proper padding of the plain text. Generally speaking, if any process of decryption involving the private key leads to incorrect plain text, i.e. if the decryption fails, the decrypted data must be discarded reliably. There should never be a way to access this faulty cryptogram, not even by the user himself. Because if the opponent can learn whether a decryption was correct or not, he can feed the user with crafted cryptograms and abuse him as an oracle. This may put him into a position to recover an encrypted message once he has got the faulty decrypted data. Therefore, PCP will never return any output of an unsuccessful use of any private key and the time for an unsuccessful decryption will exactly be the same as for successful decryption to prevent timing attacks.

Oracle attacks will work in a way similar to blinding attacks on signatures (see [Bon] p.4 and 14) An attacker uses a random number r and computes C'=C*r mod N and sends C' for decryption to the victim. If the attacker knows that the plain text will have a certain known structure, he can deduce information about the plain text from the fact whether or not the victim accepts the message as a valid plain text or not.

Hash-Chain Encryption Padding

In order to defeat all these possible attacks, it is obvious that we cannot use RSA's ModExp function on the plain text blocks directly. PCP must provide a padding method that introduces randomness in addition to every plain text block before encryption is done. It is very much desirable that the padding method can also provide a means to check, whether a block can be decrypted successfully or not.

The method used for padding starts with the secure transmission of an initial number of 256 bit which will subsequently be used to generate a hash chain that provides a piece of information to blind the current message block. Additionally the hash chain produces a 256 bit challenge, that links consecutive message blocks so that chosen ciphertext attacks can be detected.

How it really works:

Hash-Chain Encryption Padding
  1. Use standard OAEP to transfer a single Nonce of 256 bit in the first encrypted data block to the recipient.
    The recipient decodes the first block with the correct private key and learns the initial Nonce.

  2. Prepare a hash chain for the next block
    PAD1 = initial Nonce
    PAD2 = hash(PAD1)
    CHALLENGE = PAD1 XOR PAD2

  3. Create a large number to be encrypted with the RSA primitive (ModExp) as follows:
    Concatenate two numbers, the 256 bit CALLENGE and the padded message with PAD1 extended to the whole length of the current message block. The current message block is 1024 bit long, so every input to ModExp is 1280 bit. That's why the minimal mudulus length is 1300 bit for all public keys.
    i-th Block : Message XOR (repeated PAD1) || CHALLENGE
    apply ModExp() to the i-th block

  4. Prepare the hash chain for Block i+1:
    PAD1 = PAD2
    PAD2 = hash(PAD2)
    CHALLENGE = PAD1 XOR PAD2

  5. Continue at 3) until all plain text is processed.

The recipient can form the same hash chain during decryption, because the recipient is in possession of the private key and can decrypt the first block to learn the initial Nonce. The encrypted message in the following block can then be recovered with PAD1. Additionally the recipient can check, whether the current challenge is the XOR of both pads at stage i or not. If the challenge that links adjacent blocks is not valid, the integrity of the decryption process is violated. In this case, PCP continues to decrypt (to prevent timing attacks) and discards the decrypted data (to prevent oracle attacks). The user will never see the decrypted data, because he will only find a warning in the plain text file saying that decryption has failed.

If an attacker wants to manipulate a cryptogram, he must replace one encrypted block by another one which has the same lower 256 bits inside the encrypted block. Otherwise the hash chain is broken and the decrypted data will disappear forever.

Fixing the lower 256 bits (the challenge) in a block which is encrypted under the recipient's public key is impossible, unless the attacker knows the beginning of the hash chain, the initial random Nonce.

Hashing within PCP

The hash function used by PCP (SDLH) was proposed years ago by Adi Shamir, which is not only based on an intuitive simple idea, it also comes with a prove of collision-resistance given by Ronald L. Rivest in a posting to the cryptography discussion forum.

Shamir's Discrete Logarithm Hash Function (SDLH)

The SDLH is base on a simple idea that once the message is converted into a long integer a hash of the message can be computed as follows:

hash(x) = g x (mod p*q)
given, that both p and q are large primes which are being kept secret so that factoring n = p*q is computationally infeasible.

This hash function is provably collision-resistant, I quote the prove Ronald L. Rivest presented in his posting:

Adi Shamir once proposed the following hash function: Let n = p*q be the product of two large primes, such that factoring n is believed to be infeasible. Let g be an element of maximum order in Z_n^* (i.e. an element of order lambda(n) = lcm(p-1,q-1)). Assume that n and g are fixed and public; p and q are secret. Let x be an input to be hashed, interpreted as a non-negative integer. (Of arbitrary length; this may be considerably larger than n.) Define hash(x) = g^x (mod n). Then this hash function is provably collision-resistant, since the ability to find a collision means that you have an x and an x' such that hash(x) = hash(x') which implies that x - x' = k * lambda(n) for some k. That is a collision implies that you can find a multiple of lambda(n). Being able to find a multiple of lambda(n) means that you can factor n. I would suggest this meets the specs of your query above. Cheers, Ron Rivest Ronald L. Rivest Room 324, 200 Technology Square, Cambridge MA 02139 Tel 617-253-5880, Fax 617-258-9738, Email <rivest@mit.edu>

There are a number of issues to be addressed, especially when the SDLH is being used together with the RSA signature scheme and a full analysis of the SDLH's security can be found in the paper "A Discrete Logarithm Hash Function for RSA Signatures". The analysis shows, that SDLH can safely be used together with RSA once certain conditions are met with regard to the selection of the user's key material. For details I'd like to refer you to this paper about SDLH.

Looking at the Source Code:

SDLH
def SDLH(Message) : # Calculates hash(Message) = Generator^Message mod HashModulus # precomputing table for step (1) table = [] table.append(1) index = 1 Number = 1 while index < 256 : Number = (Number * Generator) % HashModulus table.append(Number) index = index + 1 # process the message byte by byte Hash = 1 for Byte in Message: A = table[Byte] # step (1) B = Hash # step (2) index = 8 while index > 0 : B = (B * B) % HashModulus index = index - 1 Hash = (A * B) % HashModulus # step (3) return Hash def SDLH256(Message) : Hash256 = 0 X = SDLH(Message) Size = pow(2,256) while X > 0 : Section = X % Size X = X / Size Hash256 = Hash256 ^ Section return Hash256
If the calculation of a hash value is done with a single call to the function ModExp() the whole message must be converted into a very large integer first. This would slow down the hashing even more because every character must be processed twice, first during conversion and second during exponentiation. As the hash function is already the slowest part of PCP, the above code performs conversion and exponentiation simultaneously while the message bytes are being fed into the loop.

A message M = c1 c2 c3 ... cn will be converted into a number by consecutive multiplications

m = (...((( 0 + c1 ) * 256 + c2 ) * 256 + c3 ) * 256 + ... + cn) a modular exponentiation using this number as the exponent can be performed in a similar way g m = (...((( 1 * g c1 ) 256 * g c2 ) 256 * g c3 ) 256 * ... * g cn) cancelling out any multiples of the hash modulus as soon as they appear in the intermediate results.

The process can be advanced further by pre-calculating the 256 powers of g in a look-up table that is being used inside the loop. Anyway, one ModExp() operation per message byte makes hashing expensive, but that is the price to be paid for a 1440 bit hash function which is collision-resistant and is based on an intuitive, clear and understandable concept.

Finally a 256 bit SDLH version is created by xoring all 256 bit sections of the hash value, beginning with the least significant 256 bits.

The Protection of Your Private Keys

The intention to keep the source code of PCP as concise as possible is not the only reason why there is no symmetric cipher in PCP. But at least when it comes to protecting your private keys with a passphrase some symmetric cipher is needed.

Let me describe in detail how the private decryption value d is protected on your local computer system if you use the program pcp2-protect-privatekey. The user's passphrase string as entered into the keyboard is hashed into a 256 bit number (using the private key's modulus). The result of 78 decimal digits is then split into 13 pieces consisting of 6 decimal digits each, providing 13 pointers into a file which consists of at least 1100000 bytes of truly random data, the entropy file.

Each pointer provides a different sequence of random data the length of the key's modulus. All 13 random sequences are then XORed and the resulting sequence is finally XORecd with the unprotected decryption value d and stored in the file that holds the protected private key. When the private key has to be used for signing or decryption, the user must enter the passphrase again and the 13 piece one-time-pad is reconstructed to unlock the stored (protected) decryption exponent.

The sequence of random data can safely be used as a one-time-pad because it will never be reused to encrypt anything else but the private key. Depending on your passphrase, every possible pointer into the entropy file is equally likely, because the passphrase is hashed with SDLH into a 256 bit hash value, an attacker does not know.

If an attacker wants to recover the one-time-pad that protects your private decryption exponent, he has to recover all 13 pointers used, i.e. he has to guess the passphrase, because SDLH is a one-way function. A brute force attack on the private key encryption based on some unknown 256 bit hash value is clearly not practical.

It seems much more rewarding to make assumptions on the passphrase and to try dictionary attacks instead of mounting a fully fledged brute force attack, hoping that the user has a weak passphrase. But if the passphrase has enough entropy, the strength of this encryption scheme is close to 256 bit, if the entropy file is made of random bytes.

The Security Hash Value

Even though the operation system controls access to the entropy file and to the (protected) private keys carefully, it is always possible that those files change due to file system bit rot or some program's undesired side-effects or worse. In order to detect such changes reliably, every key file stores a securityhash value that represents the five public pieces of information of every key file. (hashmodulus, generator, modulus, encryption exponent and finally the user's key identification string).

This hash value is always checked before any key is used. And it helps to convince other people that they have received the correct, unchanged public key of a user.

References

[ AN-95a ]
Ross Anderson and Roger Needham: Robustness principles for public key protocols.
in: Advances in Cryptology - CRYPTO '95, Springer LNCS v 1070 pp10-18

[ AN-95b ]
Ross Anderson and Roger Needham: Programming Satan's Computer
in: Computer Science Today - Recent Trends and Developments , J. van Leeuven (ed.), Springer LNCS vol 1000, pp 426-440

[ And-2000 ]
Ross Anderson : Two Remarks on Public Key Cryptography

[ And-93 ]
Ross Anderson : The Classification of Hash Functions
in: Codes and Ciphers (proceedings of fourth IMA Conference on Cryptography and Coding, December 1093), published by IMA (1995) pp 83-93

[ Bon ]
Boneh, Dan: Twenty Years of Attacks on the RSA Cryptosystem

[ JKS ]
Kahil Jallad, Jonathan Katz, Bruce Schneier : Implementation of Chosen-Ciphertext-Attacks against PGP and GnuPG

[ S-03 ]
Ralf Senderek : A Discrete Logarithm Hash Function for RSA Signatures, 2003