ARPA2 Common Libraries  2.6.2
Data Structures
Collaboration diagram for Data Structures and Constants for Rules and RuleDB:

Data Structures

union  rules_trunk
 Trunk identities are stored as 4-byte values in network order. More...
 
struct  rules_db
 The RuleDB structure keeps track of LMDB control. More...
 
typedef size_t rules_dbkey_lmdbkey
 Integer value for the LMDB lookup key (beginning of rules_dbkey) More...
 
typedef uint8_t rules_dbkey[A2MD_OUTPUT_SIZE] __attribute__((__aligned__(8)))
 
#define RULES_LMDBKEY_SIZE   (sizeof (rules_dbkey_lmdbkey))
 The number of bytes in the LMDB key, which is only part of a hash; the remainder is RULES_RESTKEY_SIZE and the start is given by digest2lmdbkey()
 
#define RULES_RESTKEY_SIZE   (A2MD_OUTPUT_SIZE - RULES_LMDBKEY_SIZE)
 The number of bytes stored in the LMDB value, matching the remainder of the has that comes after the RULES_LMDBKEY_SIZE and the start is given by digest2restkey()
 
#define digest2lmdbkey(digest)   ((rules_dbkey_lmdbkey *) ((uint8_t *) digest))
 Extract a pointer to the LMDB key from a digest pointer; the size is RULES_LMDBKEY_SIZE.
 
#define digest2restkey(digest)   (((uint8_t *) digest) + RULES_LMDBKEY_SIZE)
 Extract a pointer to the RESTkey from a digest pointer; the size is RULES_RESTKEY_SIZE.
 
#define RULES_ENVIRONMENT_PATH   "/var/lib/arpa2/rules/"
 The directory that LMDB considers its (storage) environment which may still be overridden by envvar $ARPA2_RULES_DIR. More...
 
#define RULES_DATABASE_NAME   "RuleDB"
 The name of the database.
 
#define RULES_DATABASE_SIZE   1048576000L
 The size of the database in bytes.
 
#define RULES_DATABASE_COUNT_MAX   10
 Maximum number of databases to store in the environment.
 
#define RULES_TRUNK_ANY   0
 Wildcard value for trunk identities.
 
#define RULES_TRUNK_MIN   1
 Trunk identities count up from 1.
 
#define RULES_TRUNK_MAX   4294967295
 Trunk identities are stored in 32 bits.
 
#define RULES_TRUNK_SIZE   4
 Trunk identities are stored in 32 bits.
 

Detailed Description

Database entries are keyed by a secure hash, which helps to distribute keys evenly over any prefix of bits, but it is not efficient to store complete hashes as keys in the B-tree. Partial keys allow more entries per page, meaning less pages to load. But more importantly, it leads to less levels being loaded.

Say we use SHA-256, so keys are 32 bytes long. Compare that to a 64-bit key which is native to LMDB. Assume pages of 4kB. As alternatives, we may consider the MDB_INTEGERKEY offering of size_t and unsigned int types, 8 and 4 bytes on amd64.

Keys of 32 bytes need 32 bytes of storage (not counting length) and pointers are 8 bytes on a 64-bit machine. In 4 kB we can hold trunc ((4096-8) / (32+8)) = 102 keys and 103 downlinks. The number of levels is the 103-base log of the key count.

Keys of fixed 8 bytes need precisely 8 bytes of storage and pointers need the same amount. In 4 kB we can hold 255 keys and 256 downlinks. The number of levels is the 256-base log of the key count.

The second case requires 256/103 = 2.4852 times less pages in depth, and each page stores more keys, making the cache of index pages is 255/102 = 2.500 times more effictive.

Keys of fixed 4 bytes (common for unsigned int) need 4 bytes of storage, and 4 kB can hold 340 keys and 341 downlinks, so the number of levels is a 341-base log. That is a factor 1.332 more efficient than 8-byte keys and the cache is a factor 1.33 more efficient. We add an option to allow this as an explicit choice, namely RULES_TIGHT_LMDBKEY. This configuration flag impacts the storage format.

The hash value does add protection against pre-image attacks, so it must not be squandered. We shall add the remaining bytes as part of the value, namely in the beginning of each record. We therefore devise a derivation from hash value to (1) an LMDB key and (2) a rest key to be stored in the value field. And we define macros to extract those parts from a digest key.

The risk of running into one clash with 4-byte keys is 50% at sqrt(2^32) == 65536 entries, so it is likely, but still just once and generally rare on individual matches; but we need to handle clashes. We had to anyway.

The database should be opened as DUP_SORT, and accessed with a cursor until we find a match with the rest key. In DUP_SORT databases, considers key/value pairs are considered the unique entries, not merely the keys so we write and delete key/value pairs, which involves RESTkey information as it incorporated into the value.

Future versions of the database may add encryption; in that variant, it should be helpful if not all entropy of the lookup key is revealed as the database lookup key.

Macro Definition Documentation

◆ RULES_ENVIRONMENT_PATH

#define RULES_ENVIRONMENT_PATH   "/var/lib/arpa2/rules/"

The directory that LMDB considers its (storage) environment which may still be overridden by envvar $ARPA2_RULES_DIR.

Database configuration: environment path, name, size, maximum count.

Typedef Documentation

◆ rules_dbkey_lmdbkey

Integer value for the LMDB lookup key (beginning of rules_dbkey)

Data structures, kept minimally sized under RULES_TIGHT_LMDBKEY.