An implementation of Marsaglia’s Multiply-With-Carry generators, with an extra scrambling operation of folding the upper and lower halves of the multiplication result.
If you just want the generator, here it is in C, porting to other languages should be pretty trivial.
typedef struct {
uint64_t state[3];
uint64_t carry;
} Fmc256;
#define MUL 0xfffff6827807261d
Fmc256 Fmc256_new(uint64_t seed[4]) {
Fmc256 rng;
memcpy(&rng, seed, sizeof(rng));
rng.carry = rng.carry % (MUL - 2) + 1;
return rng;
}
uint64_t Fmc256_next(Fmc256 *rng) {
uint64_t result = rng->state[2] ^ rng->carry;
__uint128_t m = (__uint128_t)rng->state[0] * MUL + rng->carry;
rng->state[0] = rng->state[1];
rng->state[1] = rng->state[2];
rng->state[2] = m;
rng->carry = m >> 64;
return result;
}
Other than the minimal C version above, there is also the Zig version provided here. It has a richer seeding API:
var rng1 = Fmc256.fromBytes(&.{ 42 });
var rng2 = Fmc256.fromBytes("an arbitrarily long string");
// Use the operating system's entropy
var rng3 = Fmc256.fromSeed(seed: {
var seed: [4]u64 = undefined;
try std.posix.getrandom(@ptrCast(&seed));
break :seed seed;
});
And supports efficient jump-ahead:
rng.jump(.steps(n)); // runtime jump (O(log n))
rng.jump(comptime .steps(n)); // compile-time jump (O(1) at runtime)
// Creating parallel streams, a default jump is provided for this purpose
const rng1 = rng;
rng.jump(.default);
const rng2 = rng;
rng.jump(.default);
try std.Thread.spawn(.{}, work1, .{ args1, rng1 });
try std.Thread.spawn(.{}, work2, .{ args2, rng2 });
The rest of this repository contains code for configuring and testing the PRNG. They are not important unless you want to reproduce the result or check for correctness.
It’s hard to evaluate the speed of PRNGs as there are many factors that affect
performance. But here’s a rough estimation of how Fmc256 performs against
some other popular PRNG.
Test environment:
| Generator | Period | Monte Carlo PI estimation | Hamming weight distribution |
|---|---|---|---|
| Fmc256 | ≈2^255 | 4656ms | 3120ms |
| Lehmer64 | 2^126 | 4745ms | 3062ms |
| Xoshiro256++ | 2^256 | 6857ms | 3208ms |
| PcgDXSM | 2^128 | 8741ms | 6513ms |
| Pcg32x2 | 2^63 | 6257ms | 6177ms |
| Splitmix64 | 2^64 | 2267ms | 4155ms |
| Wyrand | 2^64 | 3262ms | 3679ms |
See the source code for more detail about the tests. We can see that Fmc256 is in the same vicinity as the fastest generators in the Hamming weight distribution test. For the Monte Carlo PI estimation test, Fmc256 is only slower than small period generators (period less than 2^128).
This project is licensed under the MIT License.