Some hopefully constructive criticism of the Crypt::Lite module.
0. The module is pretty poorly commented for a CPAN module meant to be used by many others. Apart from boilerplate at the start, section headers, and two commented-out lines there are only 8 comments in 213 lines. Most of these 8 comments are very short and uninformative. For example:
my $char = substr(pack('H8', substr($hex_string, $i, 2)), 0, 1); # 1 char
OK, to anyone who understands Perl at all, it is obvious that this will be 1 char. (At least in the C sense of a char, meaning a byte; but on a multi-byte character encoding, it might be only part of a character.) This comment is pointless. On the other hand, the definition of $scramble_left is preceded by the comment:
# Make sure to encrypt similar or equal text to different strings
Umm, OK, that makes it look like $scramble_left is salt, to randomise the encryption. Except that it isn't. The only bytes that $scramble_left changes are the first 4. The rest of the encryption is the same every time. So some of the most helpful looking comments are, in fact, wrong.
This wouldn't be so bad but some of the variable names are also not exactly literate coding. For example, $text_scrambled turns out not to be "scrambled" at all; it's just the plaintext with left and right padding.
The documentation (POD and README) claims that further discussion can be found at L<www.infocopter.com/perl/modules/crypt...; but by early 2012 at the latest this resource 404'ed, and still does. A second link at L<infocopter.blogspot.com/2006/09/crypt...; is found in the README but not in the POD, and hence does not appear on CPAN. This was the author's blog (not updated sionce 2010), and still includes a very brief debate on the security of Crypt::Lite.
1a. The strangest thing is that it requires MD5. If MD5 is available, there are straightforward and fairly well known ways to turn it into a cipher that is much stronger than this one (e.g. MD5-CTR; Luby-Rackoff) It won't be quite as fast as XORing, but if the available MD5 is an XS version, it will be extremely fast, and for short messages the difference will be negligible. However, because of the way XORing has been implemented here, it might actually be faster!
1b. As a matter of style, I somewhat dislike the fact that the module internally implements two methods of output encoding, one of which has a 'require MIME::Base64'. Sure, it means the end-user doesn't need to remember how to call pack(), unpack() and Base64::decode(), but there are disadvantages. It also means that if the end-user wanted some other encoding, he first has to undo the one he has been given; and if MIME::Base64 is not installed, the module will fail even if you don't want to use this encoding. (So much for doing away with dependencies...) I might be happy with this approach for your own module, to be deployed in a specific application or framework; but for a CPAN module, it should be more general.
2. There doesn't seem to be any good reason for making $priv a package global.
3. For some reason, the &Error and &Debug functions wrap the respective messages in HTML tags and then send to STDOUT. While this may be what the end-user wants, it is non-standard and a bit inflexible. Most likely the end user would like to configure format and destination; however if a default must be chosen, it would usually be syslog format, sent to STDERR (to be appended to the server logs.) In this particular case, sending to STDOUT is also a little dangerous because of the risk the system might be tricked into dumping the secret key to screen. It may also be noted that although &Error is defined, it is never used.
5a. The code in the creator function is reasonably correct, but kind of, umm, how do I put this, erm, screwy. I don't want to be too rude, but this is not good code. For example, the global $debug is *twice* set equal to the 'debug' parameter. This value is also assigned to an object property called 'debug'. However the only times the object property is used are two points where we check if &Debug should be called. This usage is redundant, because &Debug does nothing unless the global $debug is set, and they will always be equal! In effect the code has two different ways to control debugging, an object property and a global flag; but they are always the same!
Apart from being both wasteful and confusing, this has the potential to introduce bizarre bugs if someone ever updates this code and doesn't realise that these two signals are both required.
5b. A minor bug, but a comment in the creator suggests the parameter 'encoding' can only have the values 'hex8' and 'base64'. This is never checked, and in fact absolutely any value other than 'hex8' will cause Base 64 encoding to be selected.
Now for the more serious errors ...
6a. The $scramble_left parameter is apparently intended to produce 4 random decimal digits to use as a sort of a salt. It does this in part by calling 1048576 * rand(). Note that 1048576 == 2^20. On most systems, rand() returns a 16 bit value, so by multiplying by 2^20, the lowest 4 bits of 1048576 * rand() will always be b0000. That isn't fatal here because he only looks at the left hand end of the stringified version. Looking at only the left hand end causes a different problem ...
6b. The expression "1048576 * rand()" returns numbers that are pretty nearly uniformly distributed from 0 to 1,048,560. (Not 1,048,575 because the last 4 bits must always be zeroes, as mentioned above.) However, the b<leading digits> of such numbers are b<not even approximately uniform>! Out of the 10,000 possible sets of these digits, there are nearly a thousand that do not occur at all, and a wide range of distributions for the rest: from 63 that only occur once in 65,535 trials (7 times too rare), to 3 (1008, 1024 and 1040) that occur 72 times (10 times too often.) The bias is even worse with the initial digit. 2 - 9 each occur about 10,500 times in 65,535 trials, but 1 occurs 15,141 times and 0 only 84 times!! For the second digit, this is reversed: 0 comes in 14,103 times while everything else hovers around 9,500. The biases in the 3rd and 4th digits are smaller and more complicated, but still on the order of 12 - 17%.
We could explain in detail why this expression doesn't work as expected , but it is easier to replace it with one that does:
my $scramble_left = sprintf '%04d', rand 10_000;
6c. All this, however, is nearly moot. Whether the author intended to use the randomness of $scramble_left, he actually doesn't. Its only contribution to the cipher is to offset the text by 4 bytes so that a double encryption doesn't decrypt -- what he calls "the shifting concept." The actual values of these bytes have no effect here, and the shift would have worked just as well with a single byte. The only effect of randomising these bytes is that they are not known plaintext that would expose the first 4 bytes of the key. However, by limiting them to digits, they reveal too much about the first 4 bytes of the key, even if those digits were well distributed. What he really should be doing here is adding a single random byte that is equally distributed among all values from 0 to 255. Like this:
my $scramble_left = chr(int rand 256);
Note 1. The easiest to understand is why an initial zero is so rare. substr(1048576 * rand(), 0, 4) first stringifies the number in its first argument, and a normal stringification of an integer will start with a 0 only if it is exactly equal to zero. sprintf '%04d' will then pad this with zeroes on the left if is less than 1000. However, that will not occur 1,000 times; because the least significant 4 bits are always zero, it will only occur about 1,000 / 16 = 62.5 times.
7. The padding of $text with the "salt" and the key checksum, and their later removal, depend on splitting on a tab character. Thus if the text itself contains a tab, decryption will fail. $text will end up containing only the characters up to the first embedded tab, while $priv_wrapped will contain all the other characters, and thus will fail to match $priv_md5.
However, there is a much worse problem: it directly gives us two bytes of known plaintext. The fifth byte, and the 33rd-from-last byte, will always be tabs. By xoring 0x09 back into the ciphertext we get at least one and probably two key bytes completely for free.
This issue is unnecessary because both the added fields have a fixed width (4 bytes for the "salt", and 32 bytes for hex-encoded hash) and can thus be removed without requiring a delimiter; like so:
my $rand1 = substr($encryp_pack, 0, 4);
my $priv_wrapped = substr($encryp_pack, -32, 32);
my $result = substr($encryp_pack, 4, length($encryp_pack) - 36);
There would also then be no need to hex-encode the hash: since it is only dealt with by the internal code, and is treated as a fixed-width string, you could use the binary hash. Using the binary hash has two advantages:
a) it reduces the expansion of the string by 16 bytes; and
b) more importantly, the binary hash leaks no information about the key.
This last point is worth thinking about. Just as restricting the "salt" to decimal digits limits possible values for the first 4 bytes of the key, similarly restricting the appended hash to hex digits limits possible values for at least 32 bytes of the key. Each key byte will have at most 16 possible values allowed by this. However if we assume that the key is a phrase -- i.e. printable text -- that knocks out 25 ~ 75% and brings us down to 4 to 12 values for each position before we even seriously start attacking the cipher. If the phrase is less than 32 bytes long then some characters will be repeated and get more than one set of restrictions; in many cases this will be sufficient to fully identify that character, and then make educated guesses about its neighbours. At that point we have half broken the cipher with just pencil and paper, and we haven't even looked at the message text!!
4a. The &atob($str) function does exactly what unpack('B*', $str) does, but considerably slower. On my system it takes about 3 ~ 4 times longer for very short strings, 20 times longer for typical cookie lengths, and hundreds of times slower for strings of a couple of kilobytes. The same thing occurs with &iso2hex (same as unpack('H*', $str), but much slower) and &hex2iso (same as pack('H*', $str), but much slower.)
4b. It's also not clear why these functions are called iso2hex and hex2iso when they are encoding-agnostic. That is, there is no need for the input to iso2hex to be in an ISO encoding; it could be in EBCDIC, or even a JPEG image; and provided nothing goes awry elsewhere, hex2iso will reproduce it exactly on the way back out. It's also a bit strange why the hex encoding is called hex8. A hex encoding, of course, is 4 bits per byte, not 8.
4c. The whole &atob concept is to turn a string into a ASCII binary representation of itself for ease of binary manipulation. That is, the internal representation is 8 times longer, with each 1 bit represented by an ASCII byte with value "1", and similarly for zeroes. Let's call this pseudo-binary. While possibly simpler to code, this approach is generally unnecessary and it is *seriously* slow & memory hungry compared to real binary manipulation.
In particular, &bin_add( atob($str1), atob($str2) ) is almost the same as $str1 ^ $str2. Differences include:
i) ^ gives its output in real binary, &bin_add uses pseudo-binary and outputs in pseudo-binary. However in actual usage this is then immediately packed back into real binary.
ii) &bin_add does take care of the case when $str1 is longer than $str2, as will often be the case. It does this by cyclically repeating $str2. ^ will normally pad the shorter string with zeroes and requires a little extra logic around it to cyclically repeat. Something like:
my $repeats = int(length($str2)/length($str1)) + 1;
$str1 ^ substr( $str2 x $repeats, 0, length($str2));
iii) However, even with that bit of extra logic, ^ is *much* faster and uses about 1/8 to 1/4 as much memory.
9. Some of the claims in the documentation are incorrect or misleading. Some of these errors are minor but a couple are quite serious. Examples:
a) "It's very easy to install and use, anwhere where Perl runs."
Provided MD5 and MIME::Base64 are also installed. They usually are, but it is strange to say that this module can be used in places where you cannot meet cryptographic module dependencies, when ... it also has a cryptographic module dependency. (Furthermore, if MD5 is installed, you can use it to make a cipher that is stronger and faster than this one.)
b) "Especially block ciphers often return a partial plain text even if, let's say about 90 % of the passphrase was correct."
I really don't know what to make of this. Interpreted exactly as written, this statement is completely false. Absolutely any reasonably modern cipher -- including *every* function I have ever heard described as as a block cipher -- will return complete garbage if even one bit of the key is wrong. (This is known as the "key avalanche property" and is considered essential for a modern cipher to pass even initial examination.)
I can only assume that the author has been stung by passing a long passphrase directly to a block cipher with a fixed key length, and having the cipher truncate the extra bytes of key. The response to this is "that's not what you're supposed to do." Raw keys are binary strings of a (usually) prescribed length. If you want to use an arbitrary passphrase for the key, you first hash it to the correct length for a raw key. This gets done for you automagically if you use something like Crypt::CBC or Crypt::Simple, but not if you directly call the underlying raw modules.
c) "The goal of Crypt::Lite was to have a pretty simple way to encrypt and decrypt data without the need to install and compile huge packages with lots of dependencies."
Unfortunately it meets those goals significantly less well than alternatives that already existed. For example, Crypt::RC4 has b<no> dependencies at all (unlike this module, which requires two others), and can be run either as a blisteringly fast XS version of ~20 kB; or, for those without compilers, as a pure-Perl version of 5 kB. (Just over half the size of Crypt::Lite.) While the pure Perl version is much slower than the XS version, it is still much faster than Crypt::Lite. And also much, much more secure!!
d) "It generates every time a different encrypted hash when you re-encrypt the same data with the same secret string."
Hmm. There is a hash, and it is encrypted, but it is I<never> different for the same 'secret string' -- a fact which is important, and explicitly pointed out. I think the author here actually means ciphertext, rather than encrypted hash. There seem to be 2 reasonable interpretations of what the author means by this. Firstly, he may mean that if you do:
$ct1 = $crypt->encrypt($pt, $key);
$ct2 = $crypt->encrypt($pt, $key);
then $ct1 and $ct2 will (usually) be different. This is sort-of true, but misleading: only the first 4 bytes will be different (expanded to 8 or 6 depending on whether it is encoded in hex or base 64.) The fact that the two strings are very nearly the same, apart from the prefix, will be pretty obvious.
Secondly, he may mean that if you do:
$ct1 = $crypt->encrypt($pt, $key);
$ct2 = $crypt->encrypt($ct1, $key);
Then $ct2 is different to both $ct1 and to $pt. This is true, and is not usually the case for cyclical XOR. With the usual cyclical XOR, $ct2 would have ended up decryted, right back at $pt.
However as we will discuss in the next point, this double encryption is not nearly as strong as it may appear at first glance.
e) "In normal cases of XOR encryption, what Crypt::Lite is based on, double or tripple encryption does NOT increase the security."
Just to be clear here: with the usual cyclical XOR encryption, double encryption I<with the same key> is the same as decrypting and ending up back at plaintext; triple encryption is then the same as one encryption. However, double encryption with a different key, I<of different length>, actually is quite a bit stronger than single encryption. Not enormously so, not enough to enter the realm of strong modern ciphers, but it will defeat the standard attacks.
f) "Because of the nature of Crypt::Lite I state (because of the shifting concept) double encryption *does* increase the challenge to decrypt it."
By "the shifting concept" he means that the text is padded on the left by 4 random digits and a tab each time it is encrypted, and hence the position where the XOR keys acts on each byte is shifted by 5 each time.
It does eliminate the best known trivial attack, but it doesn't make the cipher stronger; under some conditions it cripples it. First, it should be noted that if you directly execute a second encryption on the output of the first, analysis is made considerably more complicated by the intermediate encoding. Indeed in this simple mode, double encrpytion is (usually) appreciably stronger than single encryption, but the reason is not the "shifting concept" introduced by the author, it is the non-linear, non-uniform staggering caused by the encoding. However, usage in this mode introduces two new problems: message expansion, and plaintext restriction. The message expansion is considerable; even a twitter message (<140 bytes) can be expanded to over 3/4 of a kilobyte. And as mentioned above under point 7, the plaintext restriction can massively accelerate a keysearch, rapidly narrowing each letter of the phrase down to only a couple of possibilities.
So, someone doing double encryption might strip off the intermediate encoding and do the second encryption on the intermeidate binary. This case is much more interesting: far from increasing the difficulty of the attack, for some combinations of parameters it is absolutely fatal. To understand this, consider the following diagram. Let d0, d1 and so on be padding digits on the left; p0, p1 and so on be plaintext; h0, h1 and so on be hash hex-digits; and k0, k1 and so on be key bytes. For the sake of concreteness, assume the key is 11 bytes long and the text is 8 bytes. Then after a single encryption we have:
d0 d1 d2 d3 \t p0 p1 p2 p3 p4 p5 p6 p7 \t h0 h1 ... h31
k0 k1 k2 k3 k4 k5 k6 k7 k8 k9 k10 k0 k1 k2 k3 k4 ... k1
where the first row is xor-ed with the second. Now doubly encrypting with the same key:
d4 d5 d6 d7 \t d0 d1 d2 d3 \t p0 p1 p2 p3 p4 p5 p6 p7 \t h0 h1 ... h31 \t h0 h1 ... h31
k0 k1 k2 k3 k4 k5 k0 k1 k2 k3 k4 k5 k0 k1 k2 k3 ... k3
k0 k1 k2 k3 k4 k5 k0 k1 k2 k3 k4 k5 k0 k1 k2 k3 k4 k5 k0 k1 k2 ... k2 k3 k4 k5 ... k5
At this point, something horrible has happened. Now the attacker knows that the complete structure of the last 66 bytes: it's the tab and hash hex digits repeated, with various different key bytes added in. He can easily manipulate them by xoring the last 33 cipher bytes into the second last 33 cipher bytes, so the two hashes cancel out, like this:
\t h0 h1 ... h31
k2 k3 k4 ... k1
k7 k8 k9 ... k6
\t h0 h1 ... h31
k7 k8 k9 ... k6
k2 k3 k4 ... k1
Bwa-bwarrrr ... you lose. The entire key has been completely exposed in one trivial operation.
Now, I will admit that this example was slightly contrived. This particular totally catastrophic failure occurs only if the key length is equal to 1, 2, 3, 5, 11, 19, 33 or 38 bytes. A different but equally fatal trivial attack occurs if the key length is a mutiple of 5: 5, 10, 15, etc. For other key lengths, the cipher is still severely weakened and laid open to other attacks, always fairly easy although not always quite this facile.
g) "What I really suggest is to use good passphrases not shorter than 6 characters, or better 16 characters length to encrypt."
6 characters is hardly a "passphrase", and even if totally randomised it is far too weak for any but the most trivial applications. Even a 6 character key composed randomly of upper case, lower case and digits can be broken in a few hours on a modern PC B<even used with a strong modern cipher>. With a weak cipher like this it is totally inadequate.
In fact you should use a randomised I<phrase> of at least 5 or 6 I<words>.
h) "A randomly generated passphrase that is used only once of the same length as the plain text will be the most secure encryption with Crypt::Lite."
Such a key makes it (almost) a one-time pad, but with all the hard work (generating, storing and managing the massive key) left for the end user instead of the module author! However, even then the advice is actually wrong. The pad has to be as long as the I<padded> message, not as long as the plaintext.
i) "In general, decryption works also on hashes that have been encrypted on a foreign host (try this with an unpatched IDEA installation ;-)."
A bad example. It is not surprising that IDEA hasn't been maintained much. In the late 1990s it was realised that it was patent encumbered, and it has generally been removed from Free Software distributions.
j) "Since last time has grown a harshly thread about XOR encryption I suggest to take a look from time to time on this URL to get the latest news and documentation on (URL)"
Unfortunately the URL now 404s so the documentation gives little idea of just how extremely weak cipher this is.
This module really does nothing else but XOR-ing the plaintext with the key. The key is repeated if the message is longer than the key, which means the encryption can be broken if the message is long enough. The module even helpfully adds a (partially known) string to the message. Creating a good encryption method is a task for specialists; judging from his reply to the previous review, the author probably isn't one, and judging from his module, he certainly isn't. DON'T USE.
As emphasised on CPANs documentation, Crypt::Lite is *not* designed to act as a competitor for the strong algorithms like Rijndael or Blowfish where Rijndael has been elected as the Advanced Encryption Standard (AES) by NIST (See csrc.nist.gov/CryptoToolkit/aes/aesfa....
That user opinion cited Bruce Schneier "...won't stop a cryptanalyst for more than a few minutes.". Well that's true (and well-known) for trivial XOR encryption. Although I think the "few minutes" is a very imprecise conlusion since some certain requirements had to be met; I recommend Simon Singh's "Geheime Botschaften", ISBN: 3-423-33071-6 as a good reading on that matter (also available in English).
[ The documentation suggests using "double or tripple-encryption
with any data to increase the security." However, multiply
encrypting with XOR cannot possibly increase security -- it's the
same as XORing once with the XOR of the two keys used. ]
Wrong for Crypt::Lite.
I'd assume it is pretty challenging to decrypt, even for a crypto analyst and it would take weeks to make the first guesses. In the case the crypto analyst knows it's a German or English sentence, and not "any string".
Again, Crypt::Lite has many other useful purposes than to be a competitor for AES algorithms but in my humble opinion, it should be safe enough, even for sending encrypted passwords over the net.
[ Amazingly, the secret key is included as part of every encrypted
message. That can't be a good idea. ]
The usage of the secret string has a specific intentation. This part of the procedure is beeing improved as o releae 0.82.08.
[ Due to an apparent implementation bug, Crypt::Lite throws away
7/8ths of the secret key. ]
I don't understand the issue.
I never noticed such a problem.
DO NOT USE. This is a poor implementation of Simple XOR encryption. To quote Bruce Schneier from Applied Cryptography, "An XOR might keep your kid sister from reading your files, but it won't stop a cryptanalyst for more than a few minutes."
The documentation suggests using "double or tripple-encryption with any data to increase the security." However, multiply encrypting with XOR cannot possibly increase security -- it's the same as XORing once with the XOR of the two keys used.
Due to an apparent implementation bug, Crypt::Lite throws away 7/8ths of the secret key.
Amazingly, the secret key is included as part of every encrypted message. That can't be a good idea.
The module also can't tolerate tabs in the plaintext or secret key strings.