Data-Rand reviews

RSS | Module Info

Data-Rand (v0.0.2) *

Definitely *not* cryptographically secure. Uses one call to rand() (typically 15 bits of "randomness" but even that's questionable), the process id and the time of day to seed an array, and then shuffles that around to generate chars.

Not efficient, either.

If you want real random data read it from /dev/random, use an openssl binding, use the entropy gathering daemon, or something like that.

Cryptographically secure algorithms should be designed by cryptographers, and much more importantly rigorously reviewed by cryptographers before making such claims.

Updated: The actual amount of entropy does not rise more than what the seed can provide, and so this is not cryptographically secure. For pseudorandom numbers much more efficient and reviewed algorithms exist, for instance the Yarrow random number generator. The shuffling you do could only plausibly reduce the actual randomness of the data, so it is doing more harm than good.

You implement a PRNG that uses an array as it's internal structure, which is seeded with various inputs i stated earlier. srand has nothing to do with that, it's seeding a different PRNG with different values (perl's own one). This is the the upper bound on the actual random data being inputted. You could arguably package the small amount of entropy gathering into something standalone useful for seeding e.g. OpenSSL's entropy pool based RNG.

Implementing a pseudo random number generator for security purposes is isomorphic with designing a cipher. The PRNG you implemented is dubious at best. Furthermore not enough truly random data is being seeded, so regardless of the fact that it is potentially volunerable to various analysis it can't actually generate more than roughly 2^16 random sequences. Since rand is a prng with no cryptographical merit, calling it repeatedly in the shuffle function doesn't actually add any additional entropy. Testing that this is random with 10k unit tests is meaningless. If you tested that the collision rate is in effect lower than 2 / 2^(128-1) or so then that is pretty much the lower bound on what is considered acceptible today. Note that actually verifying this in practice is not practical even with a government budget, so that's why people who actually know cryptography design, analyze and review ciphers/prngs.

See the recent debian ssh key fiasco for why people who don't understand what they are doing shouldn't touch crypto stuff, let alone make claims about it. The implementation of the debian openssl code was correct, but there was a bug in just the estimation of the amount of actual random data. It led to there only being about 30,000 actual keys produced on debian systems in practice.

To elaborate on the efficiency point: Perl arrays involve many layers of pointer indirection. Shuffling them around is an expensive process. Suppose for a second your algorithm was indeed a secure one, it would be much more efficiently implemented using bit vectors, and not array access. On my machine `openssl rand` is about 100x faster and actually known to be at least as secure as Data::Rand claims to be.

In summary, my review's purpose was to warn others, not to get you to improve the code, because I believe it's fundamentally flawed.

Data-Rand (v0.0.2)

Let me start by saying thanks for your feedback. While I do appreciate it it'd be more useful to offer ideas for improvement instead of mostly just rants. I hope the ideas presented below will help improve it and get you to rate it higher in version 0.0.3

For future reference what I try to do when I comment on a module that I feel could be improved is address each item in a respectful way and offer a patch if that makes sense or at least a specific suggestion. For example:

"The code is crap, its not strict or warnings safe and is hard to read."

while that may be 100% true this is a more prodcutive way:

"The code format is difficult to follow since there is no whitespace an dthe variable names are all ambiguous and is not strict or warnings safe. I'd recommend adding use strict; and use warnings; and apply some perl best practice principles to it. Attached is a version I ran through perltidy with these settings: .... I also added strict and warnings but that breaks test 'xyz' and 'def' so that issue will still need addressed.

Can you see how much more productive that way is?

> Definitely *not* cryptographically secure.

Care to share what specifically that statement is based on and how it might be corrected?

My statement is based on info in perldoc, specifically this (and the part quoted further down):

perdoc -f srand:

Most programs won't even call srand() at all, except those that

need a cryptographically-strong starting point rather than the

generally acceptable default, which is based on time of day,

process ID, and memory allocation, or the /dev/urandom device,

if available.

I'll qualify that by changing 'cryptographically strong' to 'cryptographically-strong starting point' to be more clear. and otherwise make that point clearer where possible. Thanks.

I did just now notice that I forgot to add the tests for the 3 functions to the MANIFEST so they are not included. They include 37 tests to how it functions with different args as per the pod and 10,000 tests for actual randomness reliability. Perhaps that will help address your concerns as well.

> Uses one call to rand() (typically 15 bits of "randomness" but even that's questionable), the process id and the time of day to seed an array,

no, its to seed rand() ( IE srand() ) and has nothing to do with any arrays...

Again from perdoc -f srand :

Note that you need something much more random than the default

seed for cryptographic purposes. Checksumming the compressed

output of one or more rapidly changing operating system status

programs is the usual method. For example:

srand (time ^ $$ ^ unpack "%L*", `ps axww | gzip`);

If you're particularly concerned with this, see the "Math::Tru-

lyRandom" module in CPAN.

Which is why the POD says:

A few examples in line with srand perldoc:

sub { return unpack '%L*', `ps axww | gzip` }


Additionally, this is what 'SEEDING RANDOM GENERATOR' describes and the the 'srand_seeder_coderef' and 'use_time_hires' keys are for.

Again, more indication of the need to RTFM and do it as per your needs will be included so as to avoid misconcetpion and ambiguity.

> and then shuffles that around to generate chars.

It does not shuffle the srand args it shuffles the list of items to use tobuild the random data out of. This adds a number of rand() calls (see shuffle() source) and adds to its effectiveness because not only do they have to figure out that the randomdate was indexes 9 2 and 5 the data at said indexes will be different as well.

> Not efficient, either.

What benchmark results are you basing that comment on? Please elaborate so I can benchmark improvements. If its a guess or opinion leave them out of comments or at least qualify the statements.

> If you want real random data read it from /dev/random, use an openssl binding, use the entropy gathering daemon, or something like that.

Exactly the point of the statement 'If a random int between 0 and 999,999,999,999,999 is not what you want for that part of the calulation, feel free to change it via the hashref argument described above.' If you want to privced coderef example of a 'real random data' producing srand_seeder_coderef I'd be happy to review and add to the docs as examples.

> Cryptographically secure algorithms

This is not an algorithm, neither does it claim to be such.

> should be designed by cryptographers, and much more importantly rigorously reviewed by cryptographers before making such claims.

Feedback is welcome, I'm happy to improve it with clearer info / better code but so far your feedback is very unhelpful.