The Protection of Your Secret Key
and the Pure Crypto Project

Editorial Note

Version 1.0 - September 2003

Ralf Senderek

The Complexity Problem

Fortunately the use of cryptographic programs has become more widespread and more people use crypto software today protecting their privacy on the Internet than five years ago. Unfortunately with crypto products developing they have become more and more complex and in-transparent, leaving the user in fatal dependence on crypto code almost nobody fully understands nor analyses for security risks. Most people have inevitably accepted this situation and do not really know what exactly they are doing when they use crypto software, even though the basic principles are widely known and the source code is of course open to inspection and peer review. But as the code grows more and more complex, it is simply beyond the state of the art to analyse such complex code with respect to all its security implications and it is no wonder, that it is being used with nothing more than a faint hope that some expert had checked every single line and can be held responsible for the security of the software system. Although not every single bit of a software system is security relevant one can imagine the complexity and interdependency of current versions of GnuPG (54100 lines of C code) and PGP-6.5 (294100 lines of C code) compared to classic PGP (23600 lines of C code).

There really is a need for a secure crypto program that restricts itself to the basic functions, composed of only a small amount of highly readable code that will be clear and understandable not only for security experts helping to regain the transparency which is needed for users to get themselves into the driving seat again.

The main objective of the Pure Crypto Project is to provide such a program, consisting of just a few hundred lines of security relevant code based on a single, well understood cryptographic method (RSA) which can be fully analysed with respect to its security implications by a number of crypto-literate people. Because of the fact that the basic questions regarding the security of a crypto product remain the same I will explain the mechanisms behind PCP in a similar way as I have done for PGP years ago.

How secure is the Pure Crypto Program ?
Can a digital signature be forged
or
can a confidential message be read by unauthorized individuals ?

These are serious questions which cannot be answered in a simple way. First of all I would like to briefly summarize the essential results of my assessment of PCP's security before I give reasons for it in detail.

Summary
Explanation
The Security of the Cryptographic Methods Used by PCP
The Public Key Cryptosystem RSA
The Hash Function SDLH
How the Secret Key Is Stored
Practical Attacks on PCP
Security-relevant Actions
How to Deal with Your Passphrase
Attacks on Your Passphrase
What Is a Good Passphrase ?
Long Live the Signing Key !
Creating RSA Keys and Hash Keys
Keep An Eye on the Threat
References to Other Sources of Information
A Final Request

Summary

When using PCP you generally pursue the following objectives:
  • you intend to establish confidentiality of communication with other people, in a way that prevents other people reading the message in plain text except of the intended addressee,
  • you intend to guarantee the reliability of the source of information (authenticity), in a way that prevents someone masquerading as the author of a message actually having been created by somebody else (protection of intellectual property),
  • you intend to guarantee the integrity of a message, in a way that a composed message cannot be changed accidentally or deliberately without this alteration being noticed.
And last but not least you may want to have all this done in a reliable and transparent way you can understand with a clear awareness of what is really going on behind the scenes, knowing that manipulations of the cryptosystem you are using will certainly show up and that there will be no unwanted side-effects due to the complexity of the system you are using.

There are two possible strategies to attack the security of the Pure Crypto Program:

  • theoretical attacks
    which try to make use of the possible weakness of cryptographic methods used by PCP compromising PCP's reliability and security fundamentally, and
  • practical attacks
    which try to exploit weaknesses of the practical use of PCP. These attacks intend to compromise the passphrase to gain access to the user's secret key or try to deceive other people by distributing faked public keys.
A third strategy might be to launch an attack on the implementation of PCP. This could be done by changing the source code of PCP by altering the functionality of the cryptographic methods being used or by inserting new malicious code which makes security-related information accessible to somebody else. Even though the source code is signed digitally a cautious user will have to verify that the system he is actually using is really an unmanipulated secure version of PCP. As this is an attempt to undermine the practical use of PCP, I will deal with this problem in section II below.

Theoretical attacks on PCP will be unsuccessful, as I will prove subsequently, because the use of RSA-keys of sufficient length makes it practically impossible to gain the user's secret key. The security of cryptographic methods is not merely based on near religious faith but can be justified by the history of failed attempts to break them. As theoretical security is based on the results of recent research in the field of computer science it has to be constantly reviewed in the light of new knowledge.

With the pointlessness of attacks on the cryptographic methods the attention of potential datafiends is shifting towards the weaknesses due to the daily use of the secret key. You have to consider a huge number of practical problems if you do not want to put the security of PCP at risk unintentionally. Most important is maintaining the secrecy of your passphrase which protects your secret key because maintaining this secrecy is a problem most people do underestimate dramatically.

Explanation

The complexity of the Pure Crypto Program is considerably reduced because of the fact that there are only two different basic cryptographic methods used, the RSA cryptosystem and a discrete logarithm hash function. I would like to give a short description of their purpose, before I will value their reliability.

The asymmetric cipher RSA is used to enable persons to establish themselves as the only legitimate author or to ensure that only the intended person can read a message. Other crypto programs usually use a symmetric cipher and a randomly selected session key to ensure the confidentiality of a message because RSA is slow when long keys are being used to encrypt and decrypt relevant information. But a solution to the complexity-problem will make it necessary to reduce the number of different crypto-algorithms to a bare minimum, giving preference to the most concise and clear concepts available in the field. If one does not worry about performance, there is really no need for a symmetric cipher, which will burden the cryptosystem with a number of avoidable security weaknesses or risks only. Given a proper padding of the plain text and sufficient key lengths, the public key cipher RSA can do all encryption with a simple, intuitive operation, modular exponentiation. Basically, RSA is used to link to a single person the ability to individually decrypt a message or to sign a document.

The second basic method which is put to use by PCP is the hash function SDLH, which relies on a similar clear and intuitive concept, so that the security of both methods are based on the same foundations. The main purpose of the hash function is to convert a very long text into a single number of a few hundred decimal digits which is used as a fingerprint to replace the long text in a digital signature.

The Security of the Cryptographic Methods Used by PCP

One of the fundamental questions to be answered is of course:

Can you really be sure that only the legitimate recipient of your message is able to read it in plain text?

A satisfying answer will imply that one can be sure that someone who has really enormous computational powers will not be able to invert the encryption process in a reasonable time scale, i.e. that the clear text will remain inaccessible and secondly that only the person who is the intended addressee of the message has the necessary information to turn the encrypted message back into readable plain text.

Systems which use symmetric ciphers usually guarantee the first condition showing that a brute force attack on the session key is computationally infeasible, because of the large number of possible session keys that can be used. To estimate the amount of work we have to perform to find a session key by brute force it is helpful to realize the fact, that computers have limits and that there simply is not an endless source of improvement in the computational abilities of supercomputers.

EXCURSUS

Why should it remain infeasible for a computer to test all possible keys of a 128 bit symmetric cipher in the near future rather quickly ?

It remains infeasible because as a matter of principle a computer will not have the necessary efficiency.

According to today's state of physical research which in future could change the spreading of information in every imaginable computer system is limited by the speed of light.
Not even the results of current "experiments measuring velocities higher than the speed of light" do contradict this fact.

If you assume the size of a high-performance computer system performing key tests is 0.3 mm for electronic or optical transfer of information, it can only perform 1 000 000 000 000 operations (1012) per second, otherwise it has to be smaller. Over a period of 317 years or 10 003 759 200 seconds that will sum up to 10 003 759 200 000 000 000 000 operations. This ability to perform no more than 1022 operations within 317 years is truly not sufficient for testing 1022 different symmetric keys because every key test requires more than a single operation cycle. But even if it was possible to do a key test that fast, the large number of 2128 keys would have been searched for no more than 0.000 000 000 000 000 029 percent. Thus a specific key will not be found even if "lots" of those high-performance computers will work parallel to test the keys.

This argument is based on empirical assumptions and therefore it can be refuted by experience. But without any doubt it is definitely wrong to suppose "efficiency of any size" for future computer systems.

It is interesting to compare the ability of our proposed supercomputer to a real machine, "Deep Crack", which was invented in 1998 by EFF to show that a brute force attack on DES, the standard symmetric cipher used by banks and other commercial institutions for many years, was indeed possible. Deep Crack is a parallel computer using special hardware of 1500 Deep Crack chips and managed to test 90 billion DES-keys per second, so that any 56-bit DES-key can be found in about 5 days. This single machine today provides only 9 percent of the ability we assigned to our proposed 0.3 mm supercomputer.

Of course the above question must be answered differently if encryption is done with the public key cipher RSA because there are longer keys being used and different methods of attack against the security of the secret key are much more promising than brute force, but knowing that about 1022 operations is today's limit for a successful attack will help to assess the necessary key lengths for RSA.

The Public Key Cryptosystem RSA

To ensure a reliable link between a single person and the ability to decrypt or sign a message the public key cryptosystem (RSA) is of utmost importance for PCP.

The secure performance of the public key cryptosystem guarantees:

  • the confidentiality of the message, because the inversion of the encryption process should be possible for the intended recipient of the message only, and
  • the authenticity of the message, it enables a person to sign a document reliably. So a message can be scrutinized whether it has been created by a particular person or not.
To gain the original message in clear text the recipient has to apply his secret key, the counterpart to his public key, which must be in the possession of the recipient only to decrypt the message sent to him. The authenticity of the message therefore is only guaranteed if the pair of keys (public key, secret key) is assigned to a certain person, even if you do not know the recipient personally.

On the other hand the validity of a digital signature depends on the fact, that only the person itself was able to create the document by applying his or her secret key which also made him or her responsible for the contents of the document. A person's intellectual property can only be protected by proving the copyright in a document if nobody else is able to create an information that matches with this person's public key.

The RSA cryptosystem basically uses three different and very long numbers (e,d,n), even if one of those numbers actually is chosen very small due to the technical implementation of the cipher, the security of the cipher will not be reduced. Two of them, the public key e and the modulus n are used to encrypt a message. The message to be encrypted in principle is understood as a very long number itself which will be raised to the power of e, the recipient's public key, and the modulus n will then be subtracted from the result until it gets smaller than n. This remainder (mod n) is the cryptogram or the encrypted message ready to be sent to the recipient. The recipient himself applies the same procedure to the cryptogram to decrypt it. Again the encrypted message is understood as a very long number being raised to the power of d, the secret key, which miraculously yields the original message after the remainder has been computed using the modulus n. Just one single number d strictly has to be kept secret whereas the method of computation and both publicly known numbers e and n as well as the cryptogram need not to be kept secret and can safely be analysed by everybody.

But the RSA cryptosystem only works well, if the three numbers fit together "in an appropriate way". Matching this condition is ensured by the process of generating a pair of keys which also gives a fine recipe for cracking the whole cryptosystem.
To generate a pair of keys you have to find two very large primes, p and q, which make up the modulus as a product (n = p*q). You can derive both interesting numbers, e and d, which form the actual pair of keys, very quickly from these two primes which are thrown away finally.
An attacker who knows the modulus n, say p*q, only needs to find out which primes the modulus is composed of, in other words he or she just has to factor the modulus, after that computing the secret key is a matter of seconds. Fortunately the problem of factoring is a job causing an awful lot of work if numbers are concerned which consist of several hundred decimal digits, so this is a practically insoluble problem .

The security of the RSA cryptosystem therefore depends on the fact, that it is practically impossible to factor the large known modulus n and nobody can infer the two primes p and q from his or her knowledge of the publicly known modulus to gain the secret key.

How it really works:

RSA
Take an integer value M that represents a message text and a large prime p > M and start calculating powers of M, say M1, M2, M3, M4 and so on. Taking the remainder of these powers of M modulo p will result in a sequence of numbers between 0 and (p-1), which will appear in a strange and unpredictable order.

But within this sequence the number 1 will show up when the exponent is (p-1) regardless of the integer M you have chosen. With a different prime q the number 1 will appear exactly when you finish to calculate M(q-1) mod q.

Now we start calculating powers of M modulo the composite number p*q. Which exponent would you expect to produce the number 1 again ? If (p-1)*(q-1) seems to be the only natural answer to you, you have reinvented a theorem due to EULER, stating that for every M < n = p*q

M phi(n) mod n = 1

where (p-1)*(q-1) is usually called phi(n) (Euler's totient function).

We now are only one step away from RSA.

If you contiue to multiply the power by M once again you get

M phi(n) + 1 mod n = M And if you choose your public/secret key pair to be e * d = phi(n) + 1 then you have:

M e * d mod n = M

Thus applying the simple operation of modular exponentiation to a message twice, first using the public key for encryption and secondly using the secret key will recover the original message !

It may be worth noting, that the security of RSA also depends on the absence of any "short-cut", so that there is no way to decrypt or sign a message without using the secret key. Because of the fact that the cryptogram was created by taking the message to the e-th power (modulo n) one might think that computing the e-th root of the cryptogram (modulo n) will produce the message. On one hand nobody has found a way to perform this calculation yet, given that really large numbers are involved but on the other hand it has not been proved that calculations of e-th roots modulo n is of the same complexity and requires as much effort as factoring the modulus. So there might be a short-cut not yet known to the cryptographic community that can render the use of RSA useless in future and it would be wise not only to have a close eye on the advances in factoring but also on the possibility of short-cuts like calculations of roots or even some completely different approach.

The Previous History of Factoring

  • 70-digit numbers have been factored in 1998 on a workstation within 10 hours.

  • 100-digit numbers have been factored on a single workstation within 1 year.

  • 129-digit numbers :
    In August 1977 Martin Gardner asked the readers of Scientific American to factor
    114 381 625 757 888 867 669 235 779 967 146 612 010 218 296 721 242 362 562 561 842 935 706 935 245 733 897 830 597 123 563 958 705 058 989 075 147 599 290 026 879 543 541 .
    Some 16 years later, in April 1994, the factors were presented by Paul Leyland (University of Oxford), Michael Graff (University of Iowa) and Derek Atkins (MIT). They had been supported by over 600 volunteers running a computer program written by K. Lenstra (Bell Labs, Morristown, New Jersey) on their workstations at night sharing the work of factoring over the internet. Later the amount of work was estimated to approximately 5000 Mips years. One Mips year is equivalent to 3.15 * 1013 operations.

    This experience gave rise to the "RSA Factoring Challenge" a contest which aimed at encouraging the research into number theory and the exploration of the difficulty of factoring in practice.

  • A 155-digit number had been factored on August 22, 1999
    by a group of researchers using the general number field sieve method.
    The work started on March 17, 1999 and kept a total of 292 computers busy for little more than 5 months. The process of sieving required some preparations in which CRAY supercomputers were involved and it took nearly 4 months time and was worth approximately 8000 Mips years of effort. This complies excellently with the prediction made in 1996.
  • 160-digit numbers:
    In 1996 experts expected factoring to be possible within about 5 years using a new method of factoring known as number field sieve.

    Factoring of RSA-155 in 1999 was important, since it made RSA keys insecure which had a length of 512 bit, a key length which was considered as "Low commercial grade, fast but less secure" since PGP was first introduced in 1991.

  • 200-digit numbers:
    The time for factoring was estimated at 52 000 000 years in 1998.

Conclusion Using a minimal key length of 1024 bit corresponding to a RSA-modulus of at least 309 decimal digits, it is guaranteed that RSA can be considered safe for the near future as long as there will be no fundamental advance in the factoring of large numbers. Today lots of RSA keys are certified with key lengths of even 2048 bit corresponding to 617 decimal digits.

Without any doubt mathematics has made considerable progress within recent years in factoring very large numbers which has been discussed in detail by J. Buchmann in the June-issue of Scientific American 1996. Using advanced computer technology massively, completely different methods of factoring have been developed which essentially differ from conventional methods.

In 1985 W. Lenstra developed a method based on elliptic curves, which made it possible to find factors of 30 decimal digits size within three weeks using current workstation technology. This method is successful even if the number which has to be factored consists of several thousand decimal digits. If the RSA-modulus is composed of a small and a large factor, the small one could be found within an acceptable period of time. So it is essential that two sufficiently large primes of almost the same size are used for RSA-key generation to prevent fast factoring of the modulus.

A different method known as quadratic sieve requires the solution of a system of linear equations consisting of a huge number of linear equations and unknown quantities. Using this method a 120-digit RSA-number was successfully factored in 1993, although 252222 equations with a total of 245810 unknown quantities had to be solved. This work which would have taken almost 50 years using a single workstation could be run in parallel, so that it will be done within a much shorter period of time. The method of number field sieves which is based on a similar principle but works much faster than the quadratic sieve was successfully used in 1996 for the factoring of a 130-digit RSA-number.

Considering the progress of factoring over the last ten years one could expect 170-digit numbers to be factored in the near future even if today no known algorithm is able to perform this job within a reasonable amount of time. But it is still possible that a completely new algorithm will be found which would make today's secure RSA-keys absolutely worthless. So it will be essential to observe new developments in the field of factoring and to consider them carefully when evaluating the length of RSA-keys.

RSA has long been subject to a patent (No. 4,405,829) under the law of the United States of America which turned out to be an impediment to the future development of PGP and it seems to be the reason why newer versions proceeded to introduce new Diffie-Hellman/ElGamal keys. But this patent has expired in September 2000 and RSA now is in the public domain even in the US.

Looking at the Source Code:

Modular Exponentiation
def ModExp (Base, Exp, Mod): Hash = 1 X = Exp Factor = Base while X > 0 : Remainder = X % 2 X = X / 2 if Remainder == 1: Hash = Hash * Factor % Mod Factor = Factor * Factor % Mod return Hash
This function of 11 lines is the implementation of the basic function that performs modular exponentiation of very large integer numbers, the heart of the RSA cryptosystem.

The basic idea behind this function is not to cancel out the multiples of n in the result of a very, very large number in the end but to get rid of multiples of n early during the stepwise process of computing the power so that the resulting numbers do not exceed a certain value. This is done by extracting the binary bits of the exponent beginning with the least significant one. If the bit is 1 a factor representing a certain number of bases multiplied which corresponds to the value of the bit is multiplied to the rest which has accumulated so far. When the most significant bit is reached, exactly as much bases have been multiplied as defined by the exponent and the computation is complete.

Attacks on the RSA Cryptosystem

Unfortunately it is generally insecure to apply this function to blocks of plain text directly or to sign a message directly because cryptograms and more important signatures are multiplicative. That implies that you can generate a "new" valid signature simply by multiplying two old ones and the same holds for crypto blocks. To defeat this multiplicative characteristics of RSA a one-way hash function is used to convert a message into a large number which is being signed instead of the message, so multiplication is no longer a threat because no-one knows which plain text corresponds to the result of the multiplication as the function is one-way.

Preparing a plain text for safe encryption is more difficult and requires a mechanism called hash chain encryption padding which ensures that some random data is attached to the message and that a wilful or accidental manipulation of encrypted blocks can be detected to thwart oracle attacks which I will explain now.

Oracle Attacks

Recently oracle attacks have become very popular and they pose a real threat to the RSA encryption scheme, when being used without proper padding of the plain text. Generally speaking, if any process of decryption involving the secret key leads to incorrect plain text, i.e. if the decryption fails, the decrypted material must be discarded reliably. There should never be a way to access this faulty cryptographic material, not even by the user himself, because if the opponent can learn whether a decryption was correct or not he can feed the user with crafted cipher text and abuse him as an oracle. This may put him into a position to recover an encrypted message once he has got the faulty decrypted material. Therefore PCP will never return any output of an unsuccessful use of the secret key and the time for unsuccessful decryption will be exactly the same as for successful decryption to prevent timing attacks.

Oracle attacks will work in a way similar to blinding attacks on signatures which were described by Boneh in detail. An attacker uses a random number r and computes C'=C*r mod N and sends C' for decryption to the victim. If the attacker knows that the plain text will have a certain known structure, he can deduce information about the plain text from the fact whether or not the victim accepts the message as a valid plain text.

Hash-Chain Encryption Padding

Due to a number of possible attacks we cannot use RSA on the plain text blocks directly and have to provide a padding method that introduces randomness in addition to every plain text block before encryption is done. It is very much desirable that the padding method will provide a means to check, whether a block can be decrypted successfully or not. The method used for padding relies on the secure transmission of an initial number of 256 bit, a Nonce, which will subsequently be used to generate a hash-chain that provides a piece of information to blind the current message block. Additionally the hash-chain gives a challenge, that links consecutive message blocks so that chosen cipher text attacks can be detected.

How it really works:

Hash-Chain Encryption Padding
  1. Use standard OAEP (Optimal Asymmetric Encryption Padding) to transfer a single Nonce of 256 bit (a number used once) in the first crypto packet to the recipient.
    OAEP is a method normally used to blind a small piece of information like a session key prior to encryption with RSA. It consists of two steps:
    • Calculate the hash of a seed-value xored with the data block to be transmitted:
      C1 = hash(seed) XOR message
    • Calculate the hash of the first step xored with the seed:
      C2 = hash(C1) XOR seed

    The recipient can learn the data block by recovering the seed from C2 XOR hash(C1) and finally gets the message from C1 XOR hash(seed).

  2. Prepare a hash-chain for the second block (first encrypted message block)
    PAD1 = initial Nonce
    PAD2 = hash(PAD1)
    CHALLENGE = PAD1 XOR PAD2

  3. Create a long number to be encrypted with the RSA primitive ModExp as follows:
    Concatenate two numbers, the 256 bit CALLENGE and the padded message with PAD1 extended to the whole length of the message.
    i-th Block : Message XOR (repeated PAD1) || CHALLENGE

  4. Prepare the hash-chain for following block i+1:
    PAD1 = PAD2
    PAD2 = hash(PAD2)
    CHALLENGE = PAD1 XOR PAD2

  5. Continue at step 3 until all plain text is processed.
The recipient can form the same hash-chain during decryption and can recover the message at any stage with the current PAD1. And she can check, if the current CHALLENGE is the XOR of both pads at stage i. If not, an oracle attack is imminent and anyway integrity is violated. In this case PCP now continues to decrypt (to prevent timing attacks) and throws away the decrypted material leaving a warning in the output file, that an oracle attack may have been launched and the user will never ever see the decrypted material.

An attacker will only be able to replace an encrypted block by another one, if she can manage to keep the lower 256 bits inside the encrypted packet the same. Otherwise the hash-chain is broken and the decrypted material will disappear forever. But that would imply that she could fix the lower 256 bits (the CHALLENGE) of the new forged plain text encrypted under the recipient's public key without knowing its value.

Chosen Cipertext-Attacks and Signatures

Another similar threat emanates from attacks directed at the signing process which presuppose the attacker being able
to submit a chosen message to the victim for signing. To protect yourself against these attacks you should avoid signing a document which could contain arbitrary hidden data, for instance an unknown word-processor document containing information (on formatting) which is not to be seen in the plain text. As the chosen-ciphertext-attack is based on the fact, that an attacker is able to smuggle some skilful chosen data into the message, so that an encryption with the secret key turns out to be actually a decryption of the message or at least part of it. But even if the attacker was successful he would not get the secret key itself. The existence of the chosen-ciphertext-attack is one reason why you do not sign the whole document, but a fingerprint of it created by the one-way hash function SDLH to avoid such attacks. The signature made on this representation of the document is smaller now but the security of digital signatures now does not depend on RSA only but on the hash function as well.

The Hash Function SDLH

The integrity of a message or the security that the message has not been changed during transportation over networks depends on the hash function SDLH used to generate a digital fingerprint of the document.

What Is a Fingerprint ?

A fingerprint is a large number (its length depends on the hash modulus) which fits to a message unambiguously. PCP uses hash values of two different lengths, a long one which is as long as the hash modulus (usually longer than 1000 bits) and a compressed one of 256 bits length. Both values are being produced by the same hash function SDLH based on exponentiation. The fingerprint is computed by a one-way-function from the characters of the message, ensuring that changing a single bit within the message would produce a completely different result. In this way even a minimal change of the message could be detected reliably. Because of the fact that the hash function is irreversible there is no way of reconstructing the message from the fingerprint. It is practically impossible as well to design a new message (as a forgery) in a way, that it computes the same fingerprint as the original message, because there are 1078 different hash values (or fingerprints) even in the compressed version of SDLH and the one which is generated with the new message is unpredictable.

Of course it is possible that some other text will accidentally produce the same hash value because even if the number of different hash values is very large, it is also limited. But such a collision could not be constructed intentionally because you have to try about 1077 variations to find one. The one-way function does not give any clue for an appropriate choice of the forgery and the possibility to get the same hash value from any two different texts is almost close to zero (1/2256 or even less than 1/21024 for the hash values used in signatures).

The Birthday-Attack

How many people are required to make the possibility for two of them having their birthday on the same day reach 50 percent ?

The wrong answer is: 365/2 = 182 people.

Actually you only need 23 people which is far less than expected, and related to forging a digital signature, the forger can benefit from this situation considerably. If she does not intend to forge a certain existing document but rather confines herself to submit a document to the potential victim for signing and she has already found a forged document which hashes to the same fingerprint, forging someone's digital signature is possible, using such a "suitable pair of documents" (benign and malicious document).

The number of pairs of documents she has to check until she will probably find one working pair is no longer 2256 (115 792 089 237 316 195 423 570 985 008 687 907 853 269 984 665 640 564 039 457 584 007 913 129 639 936), "but merely" 2128 (170 141 183 460 469 231 731 687 303 715 884 105 728), that is 1039 pairs of documents. Finding a working pair of documents still is "bloody lotta work", and way beyond today's computational abilities (see my argumentation concerning the brute-force attack) but anyway, it far less work than expected.

Proof

With N different documents (benign and malicious all in all) there are N(N-1)/2 pairs. Given that there are K = 2160 different hash values the probability of two different documents sharing the same hash value is 1/K. To be on the safe side one must try at least half of N2 pairs of documents, for if the probability for a collision should be 50 percent, 280 pairs of documents will be sufficient.
(N2 * 1/K) / 2 = 0,5
N = sqrt(K) = 280.

Among those 1 208 925 819 614 629 174 706 176 different pairs one will probably be suitable for forging a digital signature. It might be necessary to check slightly more than 280 documents, because of the fact that there are half malicious and half benign documents a matching pair can be within one of the groups which does not help to make a forgery. But since there are twice as many "cross-community" pairs than "single-community" pairs, 281 documents will provide a fair chance for a forgery.

Note that this is a calculation which would apply for a hash function like SHA-1 with 160 bit hash values. In PCP the hash values used in signatures are all longer than 1000 bits as shorter hash moduli are not accepted by PCP.

How it really works:

Shamir Discrete Logarithm Hash Function
The SDLH is based on a simple idea that once the message is converted into a long integer a hash of the message can be computed as follows:
hash(x) = g x (mod p*q)
given, that both p and q are large primes which are being kept secret so that factoring n = p*q is computationally infeasible.

This hash function is provably collision-resistant, as Ronald L. Rivest showed in his posting to the cryptography discussion forum:

Adi Shamir once proposed the following hash function: Let n = p*q be the product of two large primes, such that factoring n is believed to be infeasible. Let g be an element of maximum order in Z_n^* (i.e. an element of order lambda(n) = lcm(p-1,q-1)). Assume that n and g are fixed and public; p and q are secret. Let x be an input to be hashed, interpreted as a non-negative integer. (Of arbitrary length; this may be considerably larger than n.) Define hash(x) = g^x (mod n). Then this hash function is provably collision-resistant, since the ability to find a collision means that you have an x and an x' such that hash(x) = hash(x') which implies that x - x' = k * lambda(n) for some k. That is a collision implies that you can find a multiple of lambda(n). Being able to find a multiple of lambda(n) means that you can factor n. I would suggest this meets the specs of your query above. Cheers, Ron Rivest Ronald L. Rivest Room 324, 200 Technology Square, Cambridge MA 02139 Tel 617-253-5880, Fax 617-258-9738, Email <rivest@mit.edu>
There are a number of issues to be addressed, especially when the SDLH is used together with the RSA signature scheme and a full analysis of the SDLH's security can be found in the paper "A Discrete Logarithm Hash Function for RSA Signatures". The analysis shows, that SDLH can safely be used together with RSA once certain conditions are met with regard to the selection of the user's key material. For details I like to refer to the paper about SDLH.
Looking at the Source Code:

SDLH
################################################################ def SDLH(Message) : # # Calculates hash(x) = Generator^x mod HashModulus # # precomputing table for step (1) table = [] table.append(1L) index = 1 Number = 1L while index < 256 : Number = (Number * Generator) % HashModulus table.append(Number) index = index + 1 # process the message character by character Hash = 1 for Character in Message: A = table[ord(Character)] # step (1) B = Hash # step (2) index = 8 while index > 0 : B = (B * B) % HashModulus index = index - 1 Hash = (A * B) % HashModulus # step (3) return Hash ################################################################ def SDLH256(Message) : Hash256 = 0L X = SDLH(Message) Size = pow(2,256L) while X > 0 : Section = X % Size X = X / Size Hash256 = Hash256 ^ Section return Hash256
Although calculating a hash value would require a single call to the function ModExp() only the message has to be converted into a long integer first. This would slow down the hashing even more because every character will be processed twice, first during conversion and second during exponentiation. As the hash is already the slowest part of PCP the above code performs conversion and exponentiation simultaneously while the characters are being fed into the loop.

A message M = c1 c2 c3 ... cn will be converted into a number by consecutive multiplications

m = (...((( 0 + c1 ) * 256 + c2 ) * 256 + c3 ) * 256 + ... + cn) a modular exponentiation using this number as the exponent can be performed in a similar way g m = (...((( 1 * g c1 ) 256 * g c2 ) 256 * g c3 ) 256 * ... * g cn) cancelling out any multiples of the hash modulus as soon as they appear in the intermediate results.

The process can be speeded up further by pre-calculating the 256 powers of g stored in a look-up table used inside the loop. Anyway one ModExp() operation per character makes hashing expensive but that is the price to be paid for a 1024 bit hash function which is collision-resistant and is based on an intuitive, clear and understandable concept. (If you don't believe me, look at the 400 lines of C code for SHA-1)

Finally a 256 bit version is created by xoring all 256 bit pieces of the hash value beginning with the least significant bit.

Attacks on the Hash Function SDLH

For the creation of signatures the most important property of a hash function is its "collision-resistance", it must be practically impossible to create a different document deliberately that hashes to the same value, otherwise the hash function is compromised and can no longer be used for digital signatures. Please keep in mind that always there will be collisions for every hash function because the number of possible documents which can be an input of a hash function are digested into a fixed number of hash values, but those collisions which are inevitable cannot be created at will and therefore it is extremely unlikely but nevertheless possible that collisions can be found accidentally. Producing collisions is not a problem as long as it is an extremely unlikely event and it is unpredictable. But any chance to create a collision using a certain method would certainly be the death of the hash function. As we saw SDLH is provably collision-resistant but an adversary who is able to factor the hash modulus can compute phi(n) and can create collisions. These collisions will certainly not make much sense but to protect signatures made by PCP it is essential that factoring of the hash modulus must be infeasible for the lifetime of the signature.

How Your Secret Keys Are Stored

Up to now we have not seen any necessity for a symmetric cipher in PCP as all encryption and decryption is safely done with RSA, but when you are trying to store your secret RSA signingkey on your local system, which is probably some six hundred decimal digits long, you desperately need some reliable means to encrypt your secret key with a symmetric cipher using a passphrase to unlock it. If we do need a passphrase based protection scheme for the secret key why shouldn't we use this method instead of the painfully slow encryption mechanism with RSA keys? The answer is simple. If you can generate a truly random sequence of data depending on your passphrase alone you can use this sequence as a one-time-pad to protect your secret key simply by xoring the secret decryption value with this pad of random data. And because of the fact that this one-time-pad never has to leave your local system this method will be secure as long as it is computationally infeasible to recover the sequence of random data used to protect your secret key.

But the same method suitable to protect your secret key cannot be used to encrypt a message that is meant to be sent to another person for practical reasons which are generally known and accepted. So the intuitive and secure one-time-pad method can be used locally only without the need to transfer the pad to another party.

Let me first describe in detail how the secret decryption value d is protected on your local computer system before I will discuss the reasons against the plan to engage this method as a general symmetric substitution for the RSA-encryption of plain text.

The following two functions are used to prepare the secret key stored encrypted in the local file system for use by PCP:

Looking at the Source Code:

Secret Key Decryption
def load_secretkey(): import os, sys global Decryption # makes secretkey available in Decryption if protected == "otp" : print EOL + "Your secret key is protected." +EOL print "Please enter your passphrase to unlock it." Flags = Mode = 0 try: CONSOLE = os.open(Console, Flags, Mode) PASS = "" if OS == "unix" : os.system("stty -F " + Console + " -echo") CHAR = os.read(CONSOLE, 1) while CHAR != EOL : PASS = PASS + CHAR CHAR = os.read(CONSOLE, 1) if OS == "unix" : os.system("stty -F " + Console + " echo") except: print "Cannot open console for reading." print "Your secretkey is unavailable!" sys.exit(3) Hash = hash256(PASS) print EOL + "Unlocking your secret key." Decryption = unlock_secretkey(Hash) # Burn sensitive information PASS = str(Modulus) Hash = Modulus else: print EOL + "Your secret key is not protected !" + EOL # check if secretkey works well with a challenge Number = 1893947706487L EncryptedNumber = 0L EncryptedNumber = ModExp(Number, Encryption, Modulus) Challenge = 0L Challenge = ModExp(EncryptedNumber, Decryption, Modulus) if Challenge == Number : print "Secret key is good." else : print "Error : Your secret key is unavailable." sys.exit(4)
def unlock_secretkey(H): import os, sys try : File = open(Home + "entropy", "r") Entropy = File.read()[:-1] File.close() # check if entropyfile is long enough if len(Entropy) < 1099999 : print "FATAL ERROR: There is not enough entropy", print " available!" print len(Entropy) sys.exit(4) except: print EOL + "Your entropy file is unavailable !" + EOL sys.exit(4) X = H IndexList = [] while X > 0 : Remainder = X % 1000000 X = X / 1000000 IndexList.append(Remainder) OTP = Modulus # Initial value of PAD for index in IndexList : PadString = Entropy[int(index) : int(index) + int(Block+1)] NewPad = StringToLong(PadString) print "*", OTP = OTP ^ NewPad print return Decryption ^ OTP # otp-method
If the secret key is protected as it usually should be, the user's passphrase string is read in from the console and then hashed into a 256 bit number using the secret key's modulus for hashing. To unlock the secret key the resulting number of 78 decimal digits is then split into 13 pieces consisting of 6 decimal digits each, providing 13 pointers into a file which consists of at least 1100000 bytes of truly random data. Each pointer provides a different sequence of random data with the length of the secret key's modulus. All 13 random sequences are then xored producing the one-time-pad that had originally been used to encrypt the secret decryption value which is finally xored with the protected decryption value to recover the secret key.

But before the secret key is actually used it is checked if some challenge can be decrypted correctly after having been encrypted with the corresponding public key,

The sequence of random data is clearly used as a one-time-pad because it will never be reused to encrypt anything else but the secret key and if the hash function does not cluster, every possible pointer is equally likely, depending only on the passphrase used, which must contain as much entropy as the pointers.

Entropy measures the amount of uncertainty being included in a passphrase. The amount of entropy can be reduced drastically if you can be certain that the passphrase is taken from a dictionary comprising no more than say 130 000 different words. Even if the passphrase is hashed the entropy is not 2256 but merely 217 due to the fact that only a few of all possible combinations of characters are being used as a passphrase. But once the user has selected a passphrase with enough entropy to recover the pad one has to recover all 13 pointers used, i.e. one has to guess the passphrase, if the hash is a one-way function.

A brute force attack on the secret key encryption would roughly require to check all possible combinations of 13 sequences out of a million, which is roughly proportional to enumerating all 256 bit hash values. Usually it seems much more rewarding to make assumptions on the passphrase and to try dictionary attacks instead of mounting a fully fledged brute force attack, hoping that the user has a weak passphrase. But if the passphrase has enough entropy the strength of this encryption scheme is almost 256 bit, provided that the entropy file is made out of truly random data, which is not an easy job to do.

I would like to stress that the quality of the entropy file is of utmost importance for the security of the secret key so that it is prudent to spend as much care as possible to select the 1100000 Bytes of the random pool using a truly random source of entropy and to protect this file along with the secret key file using the operation system's protection mechanisms carefully. It can even work out to be a wise decision to store these sensible data on a removable storage medium to be locked away while not in use.

EXCURSUS

Why would a randomized stream cipher not be suitable to encrypt messages in transit ?

Ferguson, Schneier and Wagner published their cryptoanalytic research on a randomized stream cipher implemented by TriStrata which was designed as a symmetric encryption algorithm also using a 1 MByte random data pool but only 5 instead of 13 pointers to generate a key stream for encryption. They analysed this smaller variation of an encryption scheme first introduced by Maurer which promised to be both provably secure and at the same time practical using a very huge amount of publicly available random data with 50 pointers. In their analysis Ferguson et. al. showed that several attacks against the smaller Maurer-like variant of the cipher are possible with a complexity of 293 for an exhaustive search attack on the pointer space and even 257 when a meet-in-the-middle attack is being mounted. They also showed that improved attacks with a complexity of only 239 are possible if one could induce faults into the random pool or if one could find related keys whose contributions to the key stream cancel out. They concluded that if the number of pointers used is increased to 14 the complexity of an attack would rise to 2128 but with only 5 pointers the TriStrata system will not always be secure.

All attacks against the TriStrata variant are based on two assumptions which are realistic for a symmetric cipher. Firstly, that the random pool is publicly known, a fact we can suppose for PCP's entropy file as well, even if it is protected by the operation system's protection and ACL mechanisms and secondly, that the stream of encrypted data can be observed freely by the attacker who knows some bytes of the plain text because of some known structures of certain plain texts like WORD-documents and the like. Ferguson et. al. suppose that the adversary knows at least 16 bytes of plain text revealing 16 bytes of the key stream. But this assumption of known plain text is unrealistic in the context of PCP's protection of the secret key as the secret key does not only lack a known structure but also even if a fraction of the secret key is known there are different ways to reconstruct the rest of the secret decryption value and the protection is already broken. Thus the special purpose the Maurer-like encryption scheme is used for in PCP to protect the secret key will render the known-plaintext attacks described by Ferguson et. al. useless.

But at the same time their cryptanalysis shows that such an encryption scheme cannot be safely used as a general substitution for RSA encryption inside PCP to speed up its performance. As long as the random pool and the number of pointers are not increased considerably randomized stream ciphers a la Maurer cannot be regarded as a secure cipher suitable for the exchange of strongly encrypted messages.

But anyway there is one aspect of Ferguson's analysis which may affect even PCP's protection of the secret key. If the attacker is able to tamper with the stored pool and the adversary can observe enough cipher text encrypted with the manipulated random pool she might locate the position of the pointers (see section 5.2). But in the context of PCP's protection mechanism the attacker will not be able to observe the faulty ciphertext, because PCP checks if the secret decryption value works correctly with the public key, and if not, it will display a warning that the secret key is unavailable and will simply stop working. On the other hand as long as the attacker constantly keeps changing the random pool and at the same time observes that Alice is happily signing her files she knows that the manipulated bytes of the pool are not inside the 13 sections which are used to create the one-time-pad. But if she hits a significant byte Alice will notice that her secret key does no longer work and if she is careful, she will create a different entropy file to protect her secret key taken from the safe again bringing the attacker back to square one.

RESULT Using sufficient long keys an attack on cryptographic methods used by PCP is practically pointless. This "theoretical security" has to be evaluated with respect to current results of cryptographic research.

Practical Attacks on PCP

Because of the fact that attacks on the cryptographic methods used by PCP are practically impossible the attention of potential attackers is shifted towards the practical use of PCP.

Security-relevant Actions

Practical use of PCP sometimes requires the use of the secret key to sign a document or to decrypt a confidential message. There are dangers lying in wait when using the secret key (security-relevant action) that could make the cryptographic methods worthless all at once, so you have to be extremely cautious in using your secret key. It is essential to develop an awareness of the possible vulnerabilities of PCP's security, which is usually extremely high.

Fortunately most of the everyday tasks can be carried out without any difficulties, because only public keys are used. Checking the signatures on your emails you have received will require the use of the sender's public key only and even if you send confidential information to other people, you have to encrypt them using the recipient's public key.

But adding a new public key of an unknown person to your pool of trusted keys, is definitely a security-relevant action, because by making the new public key a valid key for use by PCP you definitely recognize that the available public key you often have received from non-reliable sources (i.e. key-servers) is actually identical with the public key as it was initially generated by that person itself. This could be a problem if you do not know that person or if you can only communicate with him or her using an unreliable channel like the internet which frequently happens.

How can you assess the reliability of an unknown public key? There are two different ways to solve this problem.

The first one relies on some first-hand knowledge about the new public key. You may be able to obtain the public key's digital fingerprint through a direct and reliable contact with that person comparing this first-hand information with the public key's fingerprint you have received from unreliable sources. Having confirmed the public key's authenticity you can certify this key yourself signing the new key with your secret signing key and adding it to your pool of trusted keys.

As PCP can use public keys being stored in different key formats, including RSA keys in PGP-version-2, PGP-version-3, SSH and X509 formats at the moment, checking the new public key's fingerprint will also depend on the reliability of the corresponding software as well. But once you have established trust in the authenticity of a new key you can use PCP's extraction tools to extract the public key's modulus and encryption values in decimal digits together with the user's identification string from the binary or armored key file. These three lines build the foundation of a new PCP key file which has to be supplemented with a hash key (hash modulus and generator) the key owner is currently using. Of course you will need first-hand knowledge about the hash key too, to be able to complete the key file which will eventually look like this:

Looking behind the scenes:

Public Key Structure
Modulus = 60543512488910112606602583998 ... 671880494444421949488339311 Encryption = 65537 HashModulus = 9839391050497052426120625 ... 899354547842152832236191997 Generator = 115277462192246252358083703 ... 011718422799012403067626981 Ralf Senderek, Wassenberg PCP signingkey 2003 <ralf@senderek.com> Securityhash = 494660087634075087624428 ... 086648493733366295231438448
After having stored this file in the trusted-keys directory you have to sign it in order to enable its use.

Looking behind the scenes:

Public Key Certificates
-----BEGIN PURE-CRYPTO SIGNED MESSAGE----- Modulus = 60543512488910112606602583998 ... 671880494444421949488339311 Encryption = 65537 HashModulus = 9839391050497052426120625 ... 899354547842152832236191997 Generator = 115277462192246252358083703 ... 011718422799012403067626981 Ralf Senderek, Wassenberg PCP signingkey 2003 <ralf@senderek.com> Securityhash = 494660087634075087624428 ... 086648493733366295231438448 -----BEGIN PURE-CRYPTO SIGNATURE----- Hash: SDLH *** based on modular exponentiation and RSA alone *** Ralf Senderek, Wassenberg PCP signingkey 2003 <ralf@senderek.com> 196776708728465316800282963884340869037 ... 143076856763231124797789521 -----END PURE-CRYPTO SIGNATURE-----
Note, that there is a single end-of-line character in line 8 following the message body in order to allow the signing of files that do not end with an EOL character.

Securityhash

You may have noticed line 6 which is the place designed to store the public key's security hash being a hash of the decimal values stored in line 1 to 4 with the user's identity string concatenated. The security hash will detect any manipulation of the key material reliably and of course - learning from the ADK problem - nothing else but these 6 lines will be an acceptable public key body, becoming valid only if it bears a signature made with your signing key. To ensure that your signature can be validated reliably PCP will always check the security hash of your signing key before anything else is done. Although this step will take time while using PCP it provides the necessary information confirming, that your signing key has not been tampered with during storage in the local file system.

Despite the clear advantages of the discrete logarithm hash function, which is used to compute the security hashes, there has not been much practical experience collected by now, a fact which raised concerns by some people that nothing new should be used but the standard hash functions like SHA-1 which are supposed to have been subject to intense scrutiny and review. Consequently PCP is designed to run in two different modes. The "conservative mode", being the default mode, which addresses the concerns of those who will only trust standard algorithms uses SHA-1 as the hash algorithm under all circumstances. Clearly this is a compromise solution because at this point the program relies on a pre-compiled module implementing SHA-1 breaking with the idea of pure crypto which should entirely be understandable by the user.

But the "pure mode", which can be selected on the command line or set permanently in a configuration file, will use the discrete logarithm hash function SDLH, which being used with moduli longer than 1024 bit, will produce hash values used in signatures of at least that length. With more scrutiny and experience I am convinced that sooner or later SDLH can be the default mode of PCP, making the "SHA-1 concession code" obsolete in the near future.

Meanwhile the public key files of users who prefer the conservative mode can contain the number 1 for the hash modulus and the generator indicating that the user has no hash key. Using such a key in PURE mode to verify a signature will cause PCP to switch to conservative hashing while everything else will be done with the discrete logarithm hash function.

The second way to establish trust in a public key is to receive a signed key body, usually called a certificate, which can be verified using an already trusted key. It will depend on your assessment of the person's ability to check the authenticity of the signed key material with care whether you will decide to replace your trusted introducer's signature with your own to enable the key's use within PCP or not. Accordingly you can simply remove your signature on a key to revoke the key, rendering it useless for PCP, if you have reasons to mistrust an already signed key.

Key Spoofing Attacks

Some carefully crafted key spoofing attacks have been described in detail by Ross Anderson and Roger Needham. Especially when users try to sign something that contains encrypted material already an opponent who is able to factor the modulus used for encryption (i.e. at least the recipient Bob) can work out a different message M' and publish a modified public key for Bob so that the new message M' looks like a valid signature made by Alice on something that was encrypted for Bob.

To prevent this kind of attack PCP checks every message the user is about to sign whether it contains lines of cryptographic material - which is everything made out of 0..9 on a line by itself - and issues an unequivocal warning that the user should not proceed with the signing process unless he knows what he is doing. The warning makes clear that it always is prudent first to sign before encrypting something.

Another form of key spoofing attacks require compatible weak keys that will work if an attacker is able to modify the modulus in a public key published in an unprotected place. This will show up in PCP because before any key is used the security hash is printed, so that the vigilant user will notice that the hash has changed. If the manipulation has occurred before the user signs the key and the correct unmanipulated security hash is on the key and has been checked with the user, then PCP will notice that the signed key material gives a different security hash than the one which is signed. This will lead to another unequivocal warning not to use the key before it has been checked with first-hand knowledge. But if the security hash is replaced as well - a daring attack, given the fact that he will have to trick you into signing this thing - everything looks fine, except that the length of the modulus is considerably longer than the original and this will be calculated and displayed to the user as well before the key is used. So if you know that your partner uses a 2003 bit signing key and the length of the public key shows more when it is used to verify, you know that something is rotten with this key.

It is of utmost importance to protect your secret keys and to be aware of its central role in the whole system but during your everyday use of different people's public keys your own secret key will never be used. So security-relevant actions do reduce to signing own documents (as the "crowning finish" of your creative work) and to decrypting messages meant only for your eyes and of course accepting new public keys.

Please pay attention to the fact, that whenever such a security-relevant action is carried out, you always have to enter your passphrase!

If somebody got to know your passphrase he or she just would need to get your encrypted signing key and your entropy file to be able to sign in your name or read your private data. Practically you could try to prevent someone copying your secret data files by using file-permissions provided by the operating system. Although with a suitable networking environment, one could restrict the access to sensible data, it cannot be guaranteed that the encrypted secret key will not fall into the hands of other people, i.e as a result of burglary. Of course during normal operation there is a fundamental difference between a responsibly administered mult-iuser system and an utterly unprotected desktop PC. But because of the fact, that in the worst case the theft of your secret signing data is to be expected, the quality of your passphrase is essential which protects your secret key from being (mis)used by others.

How to Deal with Your Passphrase

Even if you have found a sophisticated passphrase (better two of them, so you can change it when necessary) you should avoid making the following mistakes in order to get your secret key secure for eternity.
  • Don't ever write your passphrase down (in plain text)! Don't even think about doing so.

  • Don't use your passphrase for something else (i.e as login-password). Even similarities would make drawing conclusions much easier, if you just weren't cautious some time.

  • Don't communicate your passphrase to other people, come hell or high water! You would destroy the link between your pair of keys and a single individual, that is you. Nobody, except you, has the right to use your secret key, so you have to resist any attempt to "squeeze the passphrase out of you" (maybe except some tough rubber-hose-cryptanalysis). You should not consider attempts of somebody to obtain the passphrase of other people as a peccadillo but as a perfidious attack on privacy in computer networks. Such attempts of "social engineering" destroy trust in the reliability of cryptographic methods and have to be seen as social-harmful behaviour, especially as the right of an individual to use a private key is threatened by the exploitation of relationships of dependency (i.e in a company). Privacy is a right, not a generous concession.

  • Take care that you are not observed when entering your passphrase. Everyone who is aware of the value of cryptography will avoid "peeping at your fingers". You can easily recognize people without any awareness of the problem or with suspicious intention by their behaviour in those situations. You do not have to defend your right "to enter your passphrase unobserved", claim it as a matter of course.

  • Don't ever forget your passphrase. If you don't remember your passphrase you will be in exactly the same situation as someone who has his eyes on your passphrase but in contrast to him you cannot catch yourself entering your passphrase. This situation is desperate and without any help. But it is rather unlikely that you will forget your passphrase if you use PCP regularly.

  • Please make a backup of your secret keyring and lock it away in a safe place.

Attacks on Your Passphrase

Vulnerabilities of the Operating System

If you received a confidential message it will usually be stored on a disk in plain text after decryption with your secret key. After deleting this file containing sensitive data it has surely disappeared from the file system but it is still possible to recover the physically stored data, because deleting a file only cancels the link between the destroyed file name and the original blocks of data on the medium, which are still existent. To actually destroy the confidential data the file has to be overwritten with some other data in its full length before it is deleted, preferably a dozen times. PCP has an option "-wipe" for this purpose which overwrites the contents of an existing plain text file with random data at least once. Special "file wiping utilities" do that job by overwriting the file several times.

Some text editors create temporary files on their own, which hold a copy of the text during the process of editing. After finishing those "tempfiles" will usually only be deleted not "wiped", leaving sensitive data undestroyed on the medium.

Moreover your passphrase is at risk to appear on a storage medium, if your operating system is extending its memory by "swapping" . While UNIX does separate this swap-space from the rest of the file system, so that someone has to gain root access to analyse it, MS-WINDOWS directly stores parts of the memory to disk. Therefore with some luck everyone who has access to your hard disk could find your passphrase or parts of confidential information in the swapfile in plain text. Especially in a networking environment a swapfile is a serious danger to your passphrase. It definitely would be the least to avoid swapping while using WINDOWS or at least to wipe the swapfile before turning off your computer, because deleting does not destroy the contents of your swapfile.

UNIX restricts the ability to examine any blocks of data to the person of the system administrator. In principle he or she also has access to the memory (/dev/kmem) and can search for your passphrase in the memory during the execution of your PCP-process. Do not hesitate to speak to your system administrator about this problem. A responsible system administrator will put himself to a lot of trouble to prevent others from using the superuser's password and he or she will be able to assure you that your passphrase will not be at risk by root access. At this point the security of your secret key depends on the ethos and proficiency of another person who is responsible for the UNIX system. At least since Bruce Schneier's epiphany we begin to understand that security is a process and not a product. Therefore do not bother to ask your system administrator about monitoring, intrusion detection and risk management to make sure that the integrity of your working environment is carefully observed.

Peter Gutmann has examined the reliability of safe destruction of data stored on magnetic disks and solid-state memory and found surprisingly that data seem to be more resistant to destruction on these media than expected. In case your computer falls into the hands of an attacker procedures including "Magnetic Force Microscopy" can be applied to retain stored data which had even been overwritten several times. There are different reasons for the persistence of formerly stored data on magnetic media ranging from the inability of writing data to exactly the same track so that old data remains at the edges to a lack of overwrite performance which leaves progressively diminishing images of older written data in present patterns on the disk. Based on an analysis of the way raw data is written to disks Gutmann proposes a method of truly erasing data which uses a sequence of 35 consecutive write circles to get rid of old data reliably.

He also found out that data present in RAM is recoverable after power is switched off depending on the time the data is held in memory unchanged. Interestingly not only static RAM could "remember" last stored states over days but also dynamic RAM can hold such information because the charge of cells is retained while the electric field stresses the thin oxide that makes up storage capacitor dielectric and thus changes its electric properties remaining in the cells. This effect is not exploitable unless the data remains constant for several minutes but after that the information remains for days depending on temperature once it has reached a critical threshold. Unfortunately overwriting RAM does not have the same effect as with magnetic media, rather the data used for erasure should be applied unchanged for as long as possible and should not be changed as often as possible to be effective. As a precaution Gutmann proposes that security relevant data like passphrases and secret keys should be held in specially designed RAM which has to flip constantly to prevent "burning in" of secret information at all.

Snooping Attacks

Spending large technical effort, different possibilities could be realized which all can lead to a revelation of your passphrase but which also have to be considered as "rather unlikely" with respect to the normal situation of use. It is possible to capture all your keyboard input using hidden video surveillance or a specially prepared keyboard, which emits a different radio signal with every key press, to compromise the whole cryptosystem all at once. Recently methods have been disclosed to reconstruct information from the electromagnetic radiation of computing equipment (van Eck Radiation). Whether or not this is a realistic scenario is hard to say but the necessary technical effort is not too big to rule out such a danger basically. In December 2000 it was reported that FBI agents installed a bugging device on a suspect's business computer to find out the password the suspect was using.

But without any big effort "key-logging utilities" or "key-press snooping tools" can be installed in operating systems without access control (DOS/WINDOWS) which will detect every key pressed, storing them somewhere or sending them over the network. Those key-loggers replace some parts of the operating system, which control the keyboard input by tampered code thus leading to an alteration of the operating system. With computers running DOS or WINDOWS everybody will be able to install key-loggers without any difficulties, if he or she has access to your hard disk. But such an alteration could be done by a boot virus as well or based on a network attack with Back Orifice giving an adversary control over your machine using the network connection.

To get an efficient protection against these alterations of the operating system it is necessary to check the integrity of the corresponding routines of the operating system regularly before you use PCP. This could be done by generating fingerprints of relevant parts of the operating system comparing them with the actual software-fingerprints (may be automatically) to ensure a change of the operating system not to remain undetected. Tripwire can be of help in those situations.

With UNIX "key-logging" requires access to the corresponding device files (terminals) which normally is not permitted to other people. An alteration of the input routines of the operating system is possible only with root access. But the X Window System (X11R6) which is the graphical user interface for UNIX features a conceptual design error, which could be exploited for "key-press-snooping" without root access. You will find comprehensive information about this in Rune Braathen's Crash Course in X Windows Security. You should always avoid entering your passphrase during a X Window session and you should switch to the character based console if possible. But if you cannot do so, you can restrict access to your X server to your localhost using the UNIX command "xhost -", so you can only be threatened by users who have already logged into your local computer.

Convenience and "Brand-new Features"

On the one hand working in a network environment is quite profitable but on the other hand it also bears some severe risks concerning the use of PCP. In TCP/IP-networks "packet sniffers" are a real threat. Dumping any packet of data transmitted over the network can be made ineffective with the use of cryptographic methods. By using the Secure Shell (SSH) as a replacement for rlogin one can perform authentication with a remote host using RSA and subsequently all data will be transferred over the network conventionally encrypted (i.e. with 3DES or IDEA) making packet sniffers completely useless. This encrypted connection to a remote host requires very small additional effort only; for every host participating and for every user a pair of RSA-keys has to be generated and exchanged between the systems reliably. So an arbitrary connection to any remote host (as with telnet or rlogin) would not be possible without exchanging trusted RSA-keys. This emphasizes the fact, that a gain of security can be reached at the expense of practical convenience only. But using the secure shell allows really long and secure passwords to protect the secret key.

How Secure is Your Source for PCP ?

As the source code of PCP is publicly known, it is possible that a faked version of PCP will be distributed as the legitimate original, which might store passphrases or send it over the internet via email. I have released the source code of PCP into the public domain for use but I insist on the content not being changed, so I signed the code with my PGP signing key which is certified within the PKI of the German Research Network. You can check the source code's integrity any time verifying my PGP signature. There is also a separate PCP signature available for all relevant parts of PCP coming with the distribution.

Following the support of JAVA, JAVA-Script and ACTIVE-X by the new generation of web browsers a severe lack of security has become known, because unauthorized access to private local data over the network has been permitted due to the implementation of these new features. The concerted action of the software industry to deliver "brand-new features" in a premature and imperfect condition will certainly damage the security of the browsers in their further development - particularly, if you take into account the rivalry between Microsoft and Netscape.
The demand for comfort and user-friendliness lots of users call for, causes suspicion of an increase of security-relevant attacks using applets and "hostile webpages" in future. Even today some websites intend to cause destructive effects based on JAVA-applets. So in future those pages could also be dedicated to the purpose of hidden information gathering. You can face these dangers best, if you do not use your system for PCP and as a platform for testing web browsers simultaneously.

Usually a firewall is useful to protect against intruders, if configured correctly, which has become much of an art today. Some products that offer personal firewalls have been found to leak information out into the internet, a problem called "internal extrusion". To check the ability of personal firewalls against such leakages Steve Gibson developed a free utility called LeakTest which tries to pass out through the firewall. Unfortunately most of the current products today fail the leak test.
But even if firewalls work reliably they do not block faked DNS-requests which can be used to send confidential information to the name server of a hostile domain, as was reported by Richard Cox in a prominent security mailinglist.

What Is a Good Passphrase ?

In the following section I will discuss some criteria, you might consider while choosing your passphrase if you do not want to put the security of your secret key at risk.

How Long Should a Passphrase Be ?

The passphrase protects your secret key from being (mis)used by other individuals. If you like having as much security as PCP's protection scheme offers, a 256-bit-string of 0's and 1's would be an appropriate solution. Replacing any 4 bits by a hexadecimal digit, this solution will boil down to remembering a passphrase like "A9 52 4A AF 55 B3 D4 2E 86 B1 2F 73 7B 5F D6 C4 BF 6E 35 17 CA A8 94 37 9C FB AA 86 55 07 0F AD ", which probably is expecting too much of the digital Joe Blow.

Because of the fact that you should never forget your passphrase, otherwise your secret key is useless and the protected data is lost, solutions have to be found that will work safely. At this point two different strategies can be offered to guarantee the memorability ot the passphrase, both leading to a loss of security.

  • You can reduce the number of characters until you can remember the passphrase
  • or instead of using random characters you can try to introduce "meaning" into your passphrase.

Both strategies do raise the question what length will be required to make sure that a passphrase is still secure. My argument concerning the efficiency of a future computer system has shown, that it will be sufficient to ensure, that the work for "systematic guessing" the passphrase will reach about 1022 operations. If you choose your passphrase in a way that in principle there are 1022 possible passphrases, you certainly are on the safe side. To cause this amount of work 72 bits or 9 Bytes of truly random data will be sufficient, because this will leave 272 or 4,7 * 1021 variations.

Strategy 1 : Random Character Strings

A shorter, but still secure passphrase would be "4C 72 1A FD 94 5B BA 39 E6". If you think, you can remember such a thing this solution is strongly recommended.

The number of characters could be reduced further by using more than those 16 hexadecimal characters (0-9A-F). Using not only all lower and upper case letters but also every special character you can enter through the keyboard resulting in some 70 different symbols, 12 of those randomly chosen characters will be sufficient to cause the required work (7012 = 1,4 * 1022).
Then may be your passphrase would be e8ZWr!ySFt?m.

By using the <SHIFT>, <CTRL> and <ALT>-keys and the combination <CTRL><ALT> as well you can easily increase the number of different characters to 160, which results in an unbeatable secure 10-character passphrase like
G<ALT>)a<CTRL>#Hs<CTRL><ALT>e.3<ALT>wY .

So the gain you have achieved using the meta keys is rather modest and you simply cannot construct a secure passphrase with less than 10 characters!

Strategy 2 : "Meaningful" Character Strings

If you try to fill your passphrase with meaning, you will rule out a lot of "meaningless" combinations from the amount of passphrases to test and therefore you either accept a considerable extension of your passphrase or a drastic deterioration.

A dictionary contains maybe 60000 different items, so an arbitrary choice of five different items would again produce plenty of combinations (600005 = 7,7 * 1023). But your secure passphrase salmagundi_latrine_warning_gowk_marathon now has a length of 36 characters, a price you have to pay for the fact, that you only need to remember the order of the five different words, which all have meaning now.

One single word chosen from a dictionary is clearly useless as a passphrase. R. T. Williams describes the attempt of testing PGP-passphrases using low-tech computer technology (PC-486 and PGP-2.6.2). Performing at least two key tests per second one easily covers 172800 words a day, so every dictionary will be searched exhaustively within a short period of time.

If you still want to use a single word as a passphrase, it is advisable to choose a random-looking word, which has no meaning for anyone except you. Twcae,pwcab. would be no poor candidate for my passphrase, if nobody could get the idea that it reflects the quotation of I. Kant "Thoughts without content are empty, perceptions without concepts are blind". But those who know that I am busy with philosophy professionally might get this idea, so this passphrase is completely useless for me. But if you can invent such an "abbreviation", nobody who knows you would possibly assume, this would be a good passphrase given an appropriate length and spending plenty of <CTRL><ALT>.

Useless Passphrases

Certainly useless passphrases are those, whose systematic testing would require far less than 1022 key tests. It is known that not only the NSA (National Security Agency) is able to crack keys of 40-bit (5 byte) length without any problems, corresponding to a random character string of 7 characters (out of 70), because this only results in 1012 different possibilities. In 1997 Bruce Schneier developed a screen saver that cracked 40 bit S/MIME-keys by brute force to show the insecurity of short keys.

RESULT The PROTECTION of your PASSPHRASE is the crucial factor which determines the security of PCP.

The trouble you are taking to choose a good passphrase and to make your working environment secure but also your awareness of the various possibilities of practical attacks on the use of PCP builds the basis of your security and the basis of the trust of others in the value of cryptographic applications like PCP as well.

Long Live the Signing Key !

RSA keys can be used to sign documents and to encrypt confidential messages. Even though it has become common practice to use one single key for both signing and encryption there is a pile of good reasons not to use one key for both purposes. For an attacker - or with a bit of bad luck your government - it makes a remarkable difference whether someone intends to forge your signature or tries to access the plain text of your encrypted communication.

A key used to create signatures usually has a very long lifetime, because trusted key distribution is an awful expensive procedure and in case of a compromise of your secret key the secure revocation of your signing key takes equal pain. Moreover you would probably use your signing key more frequently so that spending an extra effort to protect your secret key would be only prudent. On the other hand keys which protect confidential communication should have a short lifetime to minimize the damage caused by a key compromise. The change of even revocation of a short-life encryption key would be a pretty normal thing whereas you would probably go through much trouble to avoid the revocation of your long-life signing key. Governments have often toyed with the idea of demanding the disclosure of private encryption keys used to protect confidential communication but have confined themselves to include private signing keys in this demand up to now. Using two different keys for signing and encryption is terribly advisable in the light of those intentions of governments to establish the seizure of private keys as a natural means of law enforcement which are currently developing in different countries all around the globe and especially in the European Union.

You can easily specify the purpose of your long-life key by adding something like "high security signing key, not for encryption" to your User-ID and use this key to sign any short lifetime "encryption key [expires: date]" which can be published on a website and can be replaced even monthly without revocation so that everyone who has got your signing key reliably can obtain your current encryption key and check its validity before starting a confidential conversation. There is no need for a wide distribution of your encryption keys using key servers because only your signing key has to be certified properly to ensure the authenticity of all your other keys which are to be replaced from time to time. To complete this picture there is also a need for a safe data haven which is immune against demands for cryptographic keys.

Creating RSA Keys and Hash Keys

How do you create your RSA keys best ?

Well, to be honest, I do not really know. I have not (yet) included a key generator in PCP because there are several different software packages which generate RSA keys and it is a matter of trust which one you will employ to create your keys. All I can tell is, that as far as I know, it is important to get as much randomness as possible into the process of key generation and with respect to low public encryption exponent attacks it may be advisable not to select the lowest value for the encryption exponent, which is usually 17 to gain speed. As the openssl software uses both a large file containing random data to seed the random number generator and a longer encryption value I have created my signing key with openssl but, as I said, your mileage may vary. Anyway, generating keys is still a little bit like counting on the egg in the hen's arse.

With respect to the hash key the choice is much easier, because you simply have to select two strong prime numbers to build the hash modulus and to select another prime of highest order, i.e. a few bits shorter than the hash modulus, to be used as the generator. Here again you will rely on the software of your choice to produce primes but you can use a tool that finds strong primes which comes with the PCP distribution.

Keep An Eye on the Threat

The British Government has pushed the Regulations of Investigative Powers Act (RIPA) through legislation which provides for orders to disclose private encryption keys and threatens everyone served with those orders with two or even five years of jail who fails to comply with the demands. While some are still figuring out how far the powers provided by this act will reach - the conditions under which orders may be served or surveillance devices can be installed seem pretty much stretchable - others begin to protest while other EU governments seem to show the political will to adopt this famous legislation. The unequivocal obligation to reveal your private communication by law is an effective method to tempt the uninformed public into not using cryptography to protect their privacy and to criminalize those who do. Therefore it is important that some far-sighted people have begun to develop crypto-products that can help to face these threats and which may possibly lead away from software like PCP as well. I hope that giving PCP into the public domain will help to turn strong cryptography into an indispensable part of our digital lives.

i

References to Other Sources of Information

A Final Request

I have tried to turn all information I have come across into a well-founded valuation of the security of PCP. Maybe I have made some mistakes or I have drawn wrong conclusions. Please do not hesitate to send me your opinion, I am looking forward to your comments.

Copyright © 2003 Ralf Senderek
This document is signed digitally with PGP.