ARPA2 Common Libraries
2.6.2
|
Data Structures | |
union | rules_trunk |
Trunk identities are stored as 4-byte values in network order. More... | |
struct | rules_db |
The RuleDB structure keeps track of LMDB control. More... | |
typedef size_t | rules_dbkey_lmdbkey |
Integer value for the LMDB lookup key (beginning of rules_dbkey) More... | |
typedef uint8_t rules_dbkey[A2MD_OUTPUT_SIZE] | __attribute__((__aligned__(8))) |
#define | RULES_LMDBKEY_SIZE (sizeof (rules_dbkey_lmdbkey)) |
The number of bytes in the LMDB key, which is only part of a hash; the remainder is RULES_RESTKEY_SIZE and the start is given by digest2lmdbkey() | |
#define | RULES_RESTKEY_SIZE (A2MD_OUTPUT_SIZE - RULES_LMDBKEY_SIZE) |
The number of bytes stored in the LMDB value, matching the remainder of the has that comes after the RULES_LMDBKEY_SIZE and the start is given by digest2restkey() | |
#define | digest2lmdbkey(digest) ((rules_dbkey_lmdbkey *) ((uint8_t *) digest)) |
Extract a pointer to the LMDB key from a digest pointer; the size is RULES_LMDBKEY_SIZE. | |
#define | digest2restkey(digest) (((uint8_t *) digest) + RULES_LMDBKEY_SIZE) |
Extract a pointer to the RESTkey from a digest pointer; the size is RULES_RESTKEY_SIZE. | |
#define | RULES_ENVIRONMENT_PATH "/var/lib/arpa2/rules/" |
The directory that LMDB considers its (storage) environment which may still be overridden by envvar $ARPA2_RULES_DIR. More... | |
#define | RULES_DATABASE_NAME "RuleDB" |
The name of the database. | |
#define | RULES_DATABASE_SIZE 1048576000L |
The size of the database in bytes. | |
#define | RULES_DATABASE_COUNT_MAX 10 |
Maximum number of databases to store in the environment. | |
#define | RULES_TRUNK_ANY 0 |
Wildcard value for trunk identities. | |
#define | RULES_TRUNK_MIN 1 |
Trunk identities count up from 1. | |
#define | RULES_TRUNK_MAX 4294967295 |
Trunk identities are stored in 32 bits. | |
#define | RULES_TRUNK_SIZE 4 |
Trunk identities are stored in 32 bits. | |
Database entries are keyed by a secure hash, which helps to distribute keys evenly over any prefix of bits, but it is not efficient to store complete hashes as keys in the B-tree. Partial keys allow more entries per page, meaning less pages to load. But more importantly, it leads to less levels being loaded.
Say we use SHA-256, so keys are 32 bytes long. Compare that to a 64-bit key which is native to LMDB. Assume pages of 4kB. As alternatives, we may consider the MDB_INTEGERKEY offering of size_t and unsigned int types, 8 and 4 bytes on amd64.
Keys of 32 bytes need 32 bytes of storage (not counting length) and pointers are 8 bytes on a 64-bit machine. In 4 kB we can hold trunc ((4096-8) / (32+8)) = 102 keys and 103 downlinks. The number of levels is the 103-base log of the key count.
Keys of fixed 8 bytes need precisely 8 bytes of storage and pointers need the same amount. In 4 kB we can hold 255 keys and 256 downlinks. The number of levels is the 256-base log of the key count.
The second case requires 256/103 = 2.4852 times less pages in depth, and each page stores more keys, making the cache of index pages is 255/102 = 2.500 times more effictive.
Keys of fixed 4 bytes (common for unsigned int) need 4 bytes of storage, and 4 kB can hold 340 keys and 341 downlinks, so the number of levels is a 341-base log. That is a factor 1.332 more efficient than 8-byte keys and the cache is a factor 1.33 more efficient. We add an option to allow this as an explicit choice, namely RULES_TIGHT_LMDBKEY. This configuration flag impacts the storage format.
The hash value does add protection against pre-image attacks, so it must not be squandered. We shall add the remaining bytes as part of the value, namely in the beginning of each record. We therefore devise a derivation from hash value to (1) an LMDB key and (2) a rest key to be stored in the value field. And we define macros to extract those parts from a digest key.
The risk of running into one clash with 4-byte keys is 50% at sqrt(2^32) == 65536 entries, so it is likely, but still just once and generally rare on individual matches; but we need to handle clashes. We had to anyway.
The database should be opened as DUP_SORT, and accessed with a cursor until we find a match with the rest key. In DUP_SORT databases, considers key/value pairs are considered the unique entries, not merely the keys so we write and delete key/value pairs, which involves RESTkey information as it incorporated into the value.
Future versions of the database may add encryption; in that variant, it should be helpful if not all entropy of the lookup key is revealed as the database lookup key.
#define RULES_ENVIRONMENT_PATH "/var/lib/arpa2/rules/" |
The directory that LMDB considers its (storage) environment which may still be overridden by envvar $ARPA2_RULES_DIR.
Database configuration: environment path, name, size, maximum count.
Integer value for the LMDB lookup key (beginning of rules_dbkey)
Data structures, kept minimally sized under RULES_TIGHT_LMDBKEY.