clAES: Using Cryptlib - Encryption Based On Solid Foundations

After a decent number of weeks of intense coding and testing, I have finally finished my latest program clAES (sig). This work is the result of my attempt to use Cryptlib as the foundation for file encryption (coded in Python3) which does one thing and only that, safely encrypt or decrypt a file with password-based AES.
And nothing more.

File encryption is long solved. Why clAES?

Talking about my project while still in progress, I had been confronted with the obvious question "why not simply use gpg" and the advice "use openssl directly, if you are brave". These objections to even start my now finished project are legitimate and demand a convincing reply.
Both GnuPG and OpenSSL basically rely on the use of two different shared libraries that provide the basic functions for encryption and decryption of data. As "libgcrypt" for gpg and "libcrypto" for OpenSSL both comprise of a vast amount of functions with great complexity, the correct use and safe handling of these functions fall into the hands of the C-programmer, who often does not have the cryptographic expertise nor the experience needed to safely apply the library functions to their own code without the risk of coding something that ends in disaster. It is very likely that shooting oneself in the foot unintentionally will be the result of trying to add security to a project.
More often than not, the prospect to ruin the objective of safeguarding his own code by using cryptography may put him off using cryptography at all. As a result, the code is unsafe to use and could be much more reliable and privacy-protecting if there was a way to use encryption without the risk of doing something nasty unintentionally.

Cryptlib - The Well-designed Security Architecture

Obviously the last thing that is needed is another collection of crypto-functions. And Cryptlib, which is developed and maintained by Peter Gutmann, is so much more than just another library of functions. Cryptlib is designed to be integrated into any program that needs security features in a way that it makes unsafe use of cryptography almost impossible.
More than 20 years ago, Peter Gutmann described a security architecture in his PhD thesis at the University of Auckland, which had become the foundation for Cryptlib as it is available in different programming languages (including Python) today.
It is quite difficult to describe Cryptlib in a few sentences. But what sets it apart from a collection of functions is its design. In layman's words, the core of the software is a security kernel which is completely isolated from the code that initiates the desired cryptographic actions. The security kernel communicates with the code that uses the library in a way that sensitive information does never leave the kernel until the task is completed and only safe access to processes in the security kernel will be allowed from the outside. The objects inside the security kernel interact with one another based on principles that can be verified to ensure safe operation. Secure defaults are used at all times without the need to produce certain parameters by the programmer who uses Cryptlib. All internal methods are safeguarded by strict parameter checking so that no malformed input can jeopardize the functionality and secure operation inside the architecture.
Such a well-designed system also provides methods of access to its functionality that can be used by inexperienced programmers. This ensures that every project in need of reliable cryptography can be extended by the use of Cryptlib.
With this extraordinary foundation at hand, I started to code a message encryption program that works reliably and is able to interact with GnuPG and OpenSSL. If you use only one of them, the resulting encrypted cipher texts are stored in a format that cannot be interchanged with the other. GnuPG aims to implement the OpenPGP standard with various degrees of success in a format that consists of packets described in RFC-4880, while OpenSSL (accessed via the command line tools) can produce cipher text in CMS-format or in its own format. As it is seen so often, both worlds (although using AES) cannot talk to each other easily.
But clAES is made to talk to both, as it can produce OpenPGP messages (as the default) and additionally switch to CMS or OpenSSL message formats if need be. In any case the input data is encrypted with a passphrase that the user provides to the program and AES will be used as the default encryption method.
Facing Unexpected Obstacles

Although it is possible to use Cryptlib safely with very few lines of Python code, actually I encountered a number of hurdles standing in the way while finishing my program. Most of them I didn't expect.
The lessons I learned while coding clAES seem to me worth describing, so I hope my experience with Cryptlib-coding in Python may pave the way for those of you who have developed an appetite for using Cryptlib in your project by now.
Starting with a test program of only a few dozen lines I ended up with a finished clAES of some 1083 lines of Python code. Some of it of course were necessary to handle options selecting between the default OpenPGP message format and the other modes of operation. And a great part of the code base was due to handle safe input and output of text being read from the file system or standard input.
Then the input, which is always expected as ascii-armoured (or base64 encoded) data has a different structure in GnuPG and in OpenSSL. These structures had to be parsed and separated into blocks containing the naked plain texts or cipher texts.
Before I could start to encrypt or decrypt the data blocks, I had to code a checksum function for GnuPG messages (crc24) and a function to derive an AES sessionkey from the user's passphrase for OpenSSL (pbkdf2).
Finally, the rest of the program, some 400 lines, would deal with using Cryptlib (from line 624 to the end) for conventional encryption.
The definitive source of documentation
Before I draw you into the details of my program, I have to mention the excellent documentation of Cryptlib. In the 358 pages of this manual you will find explanations of everything that can be achieved with Cryptlib which goes way beyond mere conventional encryption. You don't have to digest everything in the manual before you can start using Cryptlib, but you'll find it a valuable source of information (including code examples) to come back to from time to time.
By the way, there is a unix-style man page for clAES as well.
Starting to use Cryptlib Envelopes (High-Level)
Let's start with some peculiarities that are specific to the Python language.
Everything that relates to OpenPGP messages can be done with envelopes. Envelopes are the Cryptlib objects that do not only initiate encryption but also produce output in the OpenPGP message format. The first step is to create an envelope of type CRYPT_FORMAT_PGP.
Envelope_object = cryptCreateEnvelope( cryptUser, CRYPT_FORMAT_PGP )
Envelope = int( Envelope_object )

One would expect that the first line would suffice. But in every function that subsequently uses this envelope object an integer value is expected as the first parameter of many functions (a handle to the envelope). So casting the object into an integer for further use (including destruction of the envelope object) is necessary in the Python language.
Try-Except needs to be used instead of status codes
The following lines will add a user-provided passphrase to the PGP-envelope by setting a string attribute of the envelope named CRYPT_ENVINFO_PASSWORD. Coding in the C language you would store the return code of the cryptlib function "cryptSetAttributeString" and check if the operation has succeeded by checking for (status == 0) or (status == CRYPT_OK). The program would proceed once the status is 0.
In Python most functions don't return a status code, so it is essential that you check the success of each cryptlib function with a try-except block in which you can extract the status (as an integer) as well as a message (as a real string) for debugging purposes. Let me stress this important point: before exiting the program as a consequence of an unrecoverable error, you need to clean up the memory. Don't just exit the code without taking care that sensible information in memory is being randomized reliably.
try:
      # add the encryption password to the envelope
      cryptSetAttributeString( Envelope, CRYPT_ENVINFO_PASSWORD, password )
except CryptException as e :
      status, message = e.args
      if (status == CRYPT_ERROR_WRONGKEY) :
            print("Error: " + message)
            clean_envelope()
            exit( ERR_WRONGKEY )

Strings are always bytearrays

Some attributes of envelopes are numbers for instance the maximum buffer size CRYPT_ATTRIBUTE_BUFFERSIZE. Other attributes like passwords are strings. Only that strings are not actually strings, because a string in Python has an encoding method, but in Cryptlib all strings must be interpreted as literal bytes. So wherever a string is needed, you have to use a bytearray() instead. Bytearrays can be extended with actual strings or they can be appended by adding a byte at a time. The len() function also works similar to strings as does the indexing.
Text input to the envelope has to be a bytearray() too, of course.
When the passphrase is added to the envelope and CRYPT_ENVINFO_DATASIZE is set to the length of the input data (which is only necessary with PGP-envelopes), the actual encryption is done by:
bytesCopied = cryptPushData( Envelope, Data )
cryptFlushData( Envelope )
bytesCopied = cryptPopData( Envelope, envelopedData, DataBufferSize )
Buffer = envelopedData[:bytesCopied]
# Buffer holds the encrypted data

I have deliberately omitted the try-except blocks and every safeguard I talked about above to highlight the essential function calls. The bytearray "Buffer" can be written to the file system as a byte stream after being base64 encoded and being decorated with the usual "-----Begin ..." lines. (And uhh, adding the CRC24 checksum, I forgot to mention)

s2k-Overkill
When OpenPGP messages are to be decrypted inside a PGP-envelope the encryption algorithm that had been used must be detected by Cryptlib. In addition to that, it has to be determined how often the envelope has to hash the passphrase internally (using the s2k function) to derive an AES session key from the user provided passphrase.
In fact, messages that are produced by gpg2 demand a hilariously large number of hash iterations, so that Cryptlib has to put an end to this nonsense by allowing only some eight million iterations. This can cause the decryption of some gpg messages to fail, because enough is enough and from a security point of view it is not prudent to force the maximum allowed number of iterations on every software that attempts to decrypt gpg messages. Maybe in future Cryptlib will accept hilariously large s2k counts but at the moment there is a generous limit, that should not be exceeded by gpg2.

Using Crypt-Contexts (Low-Level)
I found that switching to OpenSSL messages, envelopes are not enough. I had to do something which, although it is possible, under normal circumstances has to be avoided as much as possible. I'm talking about resorting to low-level use of Cryptlib. To manipulate the decryption process at a low level, Cryptlib provides crypt-contexts that can be used as an alternative to envelopes.
Obviously, crypt-contexts are more error-prone than envelopes, because the programmer has to use them in the cryptographically correct way, which is not easy as my code will show you.

crypt_object = cryptCreateContext( cryptUser , CRYPT_ALGO_AES )
AESContext = int ( crypt_object )

Insted of an envelope you will now create a context that is prepared for applying the AES cipher to its input. Similar to envelopes, the AEScontext has a number of attributes that have to be set correctly in order to ensure proper encryption.
First of all, the session key (128 bits or 256 bits) as well as the initialisation vector (128 bits) need to be derived from the passphrase in advance. I had to code the function pbkdf2 for this task. But this function expects a salt value of 8 bytes that is really random and does not repeat. This in itself is a very complex issue, so I decided to use the method of getting real random numbers that Peter Gutmann describes in his manual on page 287.
Once everything is set and done, all necessary attributes can be set inside the AEScontext.
cryptSetAttribute( AESContext, CRYPT_CTXINFO_MODE, CRYPT_MODE_CBC )
cryptSetAttribute( AESContext, CRYPT_CTXINFO_KEYSIZE, AESblocksize )
cryptSetAttributeString( AESContext, CRYPT_CTXINFO_KEY, sessionkey )
cryptSetAttributeString( AESContext, CRYPT_CTXINFO_IV, iv )
# encrypt Data in the context
try:
      status = cryptEncrypt( AESContext, Data )
except CryptException as e :
      status, message = e.args
      print_debug( "Encryption error while encrypting data ...")
      ...

If the cryptEncrypt function is successful, the cipher text can be taken from the bytearray() "Data" that originally provided the clear text, because crypt-contexts always change data "in-place". No pop-function is necessary in crypt-contexts to extract the cipher text (or the plain text) as it would be used with envelopes.

The need for padding the clear text
But if you think that is it, you're wrong. Because the AES cipher is designed to work on 128 bit blocks of data (16 bytes at once) the input data, a bytearray, must have a specific length. The input needs to be a multiple of the block size, otherwise AES cannot work.
To achieve this, all input has to be extended at the end to the exact number of bytes that are needed. Even input that already has a length of a multiple of the blocksize needs to be padded. There is a method that openssl uses to pad all messages (PKCS#7 padding) that appends a certain number of bytes to every input so that it reaches the desired length. For instance, if the input lacks seven bytes to be a multiple of 16 bytes, then the padding adds seven bytes of the value seven to the end. If it is nine, then nine bytes of value nine are added and if the length is already ok, sixteen bytes of value sixteen are added.
With this clever method every decrypted message can be chopped to its original length by reading the last byte of the decrypted message and chopping as many bytes off the plain text as the value in the last byte demands. This guarantees that the padding is removed reliably from the decrypted data after decryption.

Randomize all sensitive data before exiting
Errors, even unrecoverable errors can happen. Just think of the fact, that the provided passphrase is empty or does not produce the AES session key which had been used to encrypt the message. My code base for clAES shows 35 points where the program cannot continue and has to be aborted. In all these cases it would be irresponsible to just throw an exit() and be done. Sensible data like the password bytearray or clear text buffers are still present in memory. So the least you can do is to clear the buffers by filling them with random data.
This is the task of the function "clean_envelope()" and "clean_context()" in addition to destroying the envelope or crypt-context.

Get Your Hands on Cryptlib
I hope you could gain some benefit out of my explanations. Most difficulties, I ran into, are owed to the fact that I tried to connect to two very different worlds of practical file encryption, GnuPG and OpenSSL. But for every project that uses cryptography it is essential to determine, who will get the final results of this project. And that involves hard thinking about the formats in which encrypted data is presented to ordinary users. No amount of scrutiny and invention is wasted on this issue.
So let me finally tell you how you can get and use Cryptlib.
The direct way is visiting Peter Gutmann's Cryptlib download page. Here you'll find the original zipped archive of all files Peter releases from time to time. Once you have unzipped the archive, you are only a "make shared" away from building the shared library on your own OS yourself.
Another path is to use my Cryptlib Hub to download a recent version of the library together with bindings for Python, Perl and Java in a single RPM package or DEB package, that can be installed on many OS, that handle these software bundles. (Fedora, Centos, Ubuntu, etc.)
These packages will eventually contain all cryptlib-tools that I develop, including clAES of course.
If you happen to use the Fedora OS, everything mentioned above is available from the main repository.
As a last remark, if you have experience with using Cryptlib now or in future, I'd really like to hear from you.

The Crypto Bone - privacy and secure communication under your control