from Hacker News

Ask HN: How to generate random test data (valid and noise too)?

by azatom on 2/17/23, 10:35 AM with 5 comments

I need random data in block of 700 bytes, what is full with valid common binary/plaintext data structures with common data. Sometimes included with common errors making them invalid. In a range from simple bit patterns (eg.: 0x55aa) to just uncompressible noise.

(So.. just everything, I hope it is not too much to ask:)

What won't work:

- /dev/urandom: it is just not random.. always noisy :)

- A corpus like dumped wikipedia: even if with added some random biterror, it is just too specific, besides impractically big.

- markovchain of /dev/sda of freshly installed system: still not enough real world data/errors.

- markovchain of my /dev/sda: although it contains enough real data and errors :) there is sensitive data

- I collect painstakingly common patterns: the whole point is finding something what I don't expect

As always, there has to be existing solution, I just can't find it.

  • by qsort on 2/17/23, 11:04 AM

    It might be that I don't understand your problem and if so I apologize, but from what I gather this is a classic XY problem.

    Use property-based tests instead. Like quickcheck/jqwik/hypothesis.