The API for Access Control is split into layers that guide its processing.

Access Control can be considered from different angles:

Bottom layer: Loading and parsing database rules.

Generic handling of Access Domain, Access Type and Access Name.
Generic handling of #comment to ignore a (marker) word.
Generic handling of flags stored in bits and callback to upper layer.
Generic handling of =xval stored in variable x and callback to upper layer.
Generic handling of ^trigger via callback to upper layer.
Generic handling of ~selector in rulesets via callback to upper layer.
Generic handling of rule end via callback to upper layer.
Generic handling of ruleset end via callback to upper layer.

Upper layer: Semantics for a specific Access Type. This part is specific to an application, of which Communication Access is a possible example. These are separately documented because of the semantic differentiation.

Processes callbacks from the bottom layer.
Implements semantics for flags, ^trigger and =xval offerings.
Reaches semantics-informed conclusions at the end of rules and rulesets.
Is aware of Access Name grammar and is prone to enforce it.
Is aware of the meaning of Access Rights and may translate from/to it.

Management view: Passes over databases and treats them as a bulk data store. (This work may not be completely done.)

May select for Trunk Identity numbers.
May select for #label markers.
May write into the database.

Rulesets have Rules

Policies are defided with a Ruleset; for the application of Access Control these are commonly known as Access Control List or ACL. Each Ruleset consists of one or more Rules, each being a UTF-8 string with a terminating NUL character. (Note that NUL is not a separator but a terminator.)

Rules consist of words in the low-level grammar:

^trigger to trigger callbacks from the bottom layer to the upper layer.
=xval for 26 kept variables like x that are provided with any callback.
FLAGS to spell up to 26 flags like F, L, ... as uppercase letters.
~sel to capture ARPA2 Selectors to match (not in the database form).
#label to label a Rule so management tools may recognise them as theirs.

Database versus Localised Rulesets

Localised Rulesets are perfect for manual management; database Rulesets are better suited for automation and scalable deployment.

Localised Rulesets are kept in an application context, and drive applications like Access Control. This is parsed by the lower layer and callbacks are fed into the upper layer as always. The most concrete match is the one that wins, subject to application policies, but order of Rules in a Ruleset is of no importance; do not depend on Rule order to avoid surprising changes to your policies. Such surprises may be caused by software upgrades at any time, and the implied non-determinism must be taken into account while creating Rulesets.

Database Rulesets are stored in a key-value database. If not overridden at build time, that database is located in /var/lib/arpa2/rules but the environment variable $ARPA2_RULES_DIR can override that. This is a directory which the database considers its working environment. Note that all utilities that employ ARPA2 Rules adhere to this setting, including the Group Logic and Access Control. It is not uncommon for an environment variable to influence program behaviour, but it should be noted as a security precaution that externally provided overrides may be a cause of unintended access.

The database is indexed with keys composed of an optional Database Secret, Access Domain (forming a value known as the Domain Key, granting domain-specific administration), plus the Access Type (at this point it may yield a value known as the Service Key, which can be spread in hexadecimal form to service applications) and an Access Name and an ARPA2 Selector (forming the final Request Key used to query the database).

The Remote Selector goes through an iteration process, where the most concrete match is final. Note that empty Rulesets are removed from the database, so be sure to add something to top iteration with a Ruleset.

Remote Selectors are part of the database lookup key, so they are not explicitly stored in an Access Control database. They are however part of a Localised Ruleset.

Trunks and Labels for Bulk Management

To allow management of the database, two mechanisms are usable:

Trunk numbers are 32-bit unsigned integers in the database value of each Ruleset. To the application, trunks are not meaningful and the first one that matches will be used. To bulk queries however, a trunk identifies a remote peer, or another technical reason for grouping Rulesets. One might use it to remove or update all Rulesets from a remote peer, for instance.
Comment labels are part of individual Rules and can be used to select them. This may be done to select #manual overrides or #pulley bot-inserted Rules.

Privacy of the Access Database

Iteration over content should be private; without a key, one should be left mostly in the blind about contents. This protects from harvesting of (identity) information.

To this end, keys are hashed but not stored in the plain. This also helps to make the database more efficient.

We plan to encrypt database contents too, based on parts of the key that are not present in the database lookup key. We currently use that to select a Ruleset, and insert it in the beginning of each (of multiple) values for a (partial) lookup key. We may end up using it as encryption "entropy", such as an IV or salt. The quotes emphasise that entropy may be rather thin, especially after already matching the database lookup key.

Meaning of Callbacks

Variables with =xval grammar are stored but do not lead to callback. Storage takes the form of a char * to an UTF-8 entry and an unsigned length with the number of bytes (not code points). This usually points somewhere in the current Access Rules, so NUL termination does not apply.

Triggers with ^trigger grammar cause a callback with that string and the current set of 26 =xval variables. When the return is false, it indicates that the upcoming Access Rights are not considered an option.

Flags with FLAGS grammar are parsed into a 32-bit flag field, where A is in bit 0, up to Z in bit 25. They cause a callback that the upper layer may consider the final answer. Any ^trigger callbacks are specific for the next FLAGS and a failure-returning callback causes the suppression of the following FLAGS. Triggers are forgotten after this point and processing continues.

Remote Selectors with ~rsel grammar cause a callback, but would be an error when passing over a database Ruleset.

Labels with #label grammar do not cause callbacks.

Endrule causes a callback, allowing the upper layer to summarise and reset. The 26 =xval variables and the FLAGS are available to the callback. After the callback returns, this data is cleared for a fresh start with the next Rule in the Ruleset, if any.

Endruleset causes a callback, allowing the upper layer to summarise and possibly trigger a final action. The 26 =xval variables are cleared at this point, and so are the FLAGS. It is up to the endrule summaries to provide application-specific information to this callback. During this callback, the space allocated for the ruleset is still locked in memory, so references into that are still valid; after this call returns this memory will be unlocked, so any such references are no longer guaranteed to be valid.

Key Derivation

Various applications of Rules need unique keys to scope access patterns. These keys are derived top-to-bottom, using irreversible digest algorithms, so that a key to one scope cannot be used to derive the key for a peering scope, let alone a predecessor in the derivation chain. The keys that are derived can therefore be installed in applications with minimal leakage of credentials for others that may be managed differently or elsewhere.

API calls that make database lookups start with a scattering key, involving the following elements:

Database Secret, if desired;
Domain, in UTF-8 notation;
Access Type, mapped from UUID to 16 binary bytes.

The mapping is made in 2 stages, to allow maximum control and the ability to gradually delegate control.

The Database Secret is mixed with the Domain, and passed through a message digest (secure hash) to produce a byte sequence. This is the first stage. It yields the Domain Key:

#include <stdint.h>
#include <arpa2/rules.h>
#include <arpa2/rules_db.h>
 
bool rules_dbkey_domain (rules_dbkey domkey,
                        const uint8_t *opt_dbkey, int dbkeylen,
                        const char *xsdomain);

The inputs to this call are the optional Database Secret in opt_dbkey, which is skipped when it is NULL; the length must be set in dbkeylen. In addition, the Access Domain is provided in xsdomain, in UTF-8 notation and terminated with a NUL character. Note that Punycode is considered a local notation mechanism for DNS and not used anywhere else in the ARPA2 infrastructure; it is simply too specific and too confusing in comparison to UTF-8. The xsdomain must be in all-lowercase notation.

When the function call is successful, it returns true and sets the key in domkey, with its size available through C macro sizeof. Upon error, the call returns false and errno is set to a com_err value.

The second stage mixes this output with the binary Access Type, again via a message digest. The output is called a Service Key and often represented in hexadecimal for configuration convenience, but it is to be mapped back to its binary form for continued use with the Rules system.

#include <stdint.h>
#include <arpa2/rules.h>
#include <arpa2/rules_db.h>
 
bool rules_dbkey_service (rules_dbkey svckey,
                        const uint8_t *domkey, unsigned domkeylen,
                        const uint8_t xstype [16]);

This routine takes in the domkey and domkeylen as produced by rules_dbkey_domain() and combines them with the UUID in binary form in xstype. None of these arguments is optional. The output is produced in svckey, which has a static length at compile time, to be derived with sizeof if needed.

The function returns true on success and false with a com_err code in errno on failure.

There actually is a third level, but it is not normally used by programs. This is rules_dbkey_selector(), and it derives the binary key for database indexing when trying to locate a given ARPA2 Selector. The general strategy for locating a Rule that links to a given ARPA2 Identity in the database is to iterate from concrete to abstract forms for the Identity and derive a database index for each in turn; the first that matches will be used and the search stops.

Reading from the Database

The first action is to open the database, whose location is hard-wired.

#include <arpa2/rules_db.h>
 
bool rules_dbopen_rdonly (struct rules_db *ruledb);
bool rules_dbclose (struct rules_db *ruledb);

The default action for operations is to open the database for reading alone. Many of these are permitted to run in parallel, unlike editing operations which may be more constrained, but also much less frequent. All these operations return true on success, or false on failure with errno set to a com_err code.

To iterate over values in the database, construct a loop with two operations, like

... rules_dbget (..., &dbdata);
char *rule;
if (rules_dbloop (&dbdata, &rule)) do {
        ...process (rule)...
} while (rules_dbnext (&dbdata, &rule));

The operations are typed as follows:

#include <arpa2/rules_db.h>
 
bool rules_dbget (struct rules_db *ruledb, rules_dbkey digkey, MDB_val *out_dbdata);
bool rules_dbloop (const MDB_val *dbdata, char **rule);
bool rules_dbnext (const MDB_val *dbdata, char **rule);

The out_dbdata from rules_dbget() represent the Ruleset, while the rule output from rules_dbloop() and rules_dbnext() represent only one Rule at a time. Since they end in a NUL character, their size is not part of the return. As soon as no further Rule is found, the latter two routines return false.

The rules_dbget() function points a database cursor at the Ruleset it found last. This pointer is moved when the function is called again, but also when the transactions change or the database closes. For safe code, do not assume that the database cursor is valid after a false return from rules_dbloop() or rules_dbnext().

Changes in the Database

Applications normally use rules_dbopen_rdonly() to read from the database and be ignorant about trunking. The general form however, allows read/write mode by setting rdonly to false and it also specifies a trunk to select for,

#include <arpa2/rules_db.h>
 
bool rules_dbopen_rdonly (struct rules_db *ruledb);
bool rules_dbrollback (struct rules_db *ruledb);
bool rules_dbcommit (struct rules_db *ruledb);
bool rules_dbsuspend (struct rules_db *ruledb);
bool rules_dbresume (struct rules_db *ruledb);
bool rules_dbclose (struct rules_db *ruledb);

These functions return true on success, or otherwise false with a com_err code in errno. This applies to the remainder of the functions too.

To add or delete rules in a database, use

#include <arpa2/rules_db.h>
 
bool rules_dbadd (rules_dbkey prekey, unsigned prekeylen, char *xskey, char *rules, unsigned ruleslen, a2sel_t *opt_selector);
bool rules_dbdel (rules_dbkey prekey, unsigned prekeylen, char *xskey, char *rules, unsigned ruleslen, a2sel_t *opt_selector);

In this, the prekey of size prekeylen is constructed from the Access Domain and Access Type and optionally a Database Secret. The xskey is the Access Name to use. The rules of size ruleslen form a Ruleset, concatenating a number of Rules, each of which ends in a NUL character.

When the opt_selector is provided, it will be used to index in the database; otherwise, ~selector bits in the rules/ruleslen are used to determine the database entries to update; in this case, knowledge from the rules/ruleslen is taken apart and may get distributed over multiple database records, so that the fast index mechanism based on Selector iteration can be used. Since the exact same procedure for doing this is used for adding and deletion, this mostly remains transparant to the caller. This mechanism simplifies editing the database content from input that takes the form of general Rules that involve Selectors, such as might be configured in LDAP and automatically pulled in by a daemon; this is how we envision configuration data to be exchanged between sites without the propagation delays caused be the trade-offs in simpler caching mechanisms. When security concerns play a role, it pays to be able to make fast updates.

A final call mirrors the get operation with a set operation,

#include <arpa2/rules_db.h>
 
bool rules_dbget (struct rules_db *ruledb, rules_dbkey digkey, MDB_val *out_dbdata);
bool rules_dbset (struct rules_db *ruledb, MDB_val *in0_dbdata, MDB_val *in1_dbdata);

Note how it is possible to set two database values in one stroke. This is for convenience while offering efficient updates to longer stretches of Rules. The rules_dbset() function assumes that rules_dbget() was run before, and returned successfully. This is why no digest is required in rules_dbset(); it operates on the current database cursor position, which is locked.

To add rules with this function, set in0_dbdata to the reult of rules_dbget() and set in1_dbdata to the newly added rules. Each of these segments ends in a NUL character on each of the contained Rules, including the last one. If you prefer to insert at the beginning, just reverse the use of in0_dbdata and in1_dbdata.

To delete rules with this function, set in0_dbdata to the part of rules_dbget() before the rule that will be removed, in1_dbdata to the part after the rule to be removed. Again, any Rule must end in a NUL character; it is however possible that either data field is empty when it contains no Rules at all.

System Administration for Access Control

Databases are indexed by keys that are specific to a Domain, Access Type, Access Name and a Selector. They may be further scattered if a Database Secret was initially incorporated. The scattering is random and uses long enough keys to allow for merging data from different sources, as long as they differ on at least one of these parameters.

Combining sources can be efficient. Remember that a database of size N usually needs only log(N) pages to find a target, and that this is done with a memory-mapped database. The basis for the logarithm is around 250, so one page delivers up to 250 keys, two page loads deliver 31250 up to 62500, three page loads reach 7.8 to 15.6 million keys, and so on.
These number are proximate, but accurate is the exponential growth curve. Merging in an extra database barely impacts search efficiency. This is what you get when you design for scale!

The different uploads are not distinguished in any way; it is assumed that a full match on the keys (or, effectively, a secure hash computed over it) implies a fully warranted offer for those keys.

For administrative purposes, it is useful to be able to separate the various keys (and their associated Rule sets) by source. One might for example use this to reset the information held from one particular uplink. These uplinks are therefore identified with a Trunk Identifier, a number (in 32 bits by default) that is part of every Rule set, but ignored by the Access Control logic. It is however useful for source-dependent bulk operations, in spite of having merged data sources.

When subscribing to an LDAP uplink, you would specify its Trunk Identifier. The value 1 is reserved for manual entries/overrides, but 2 and over are available for assignment to automated Trunk subscriptions.

TODO: Bulk operations have not been implemented yet; they would iterate over entries and include support for removal. Individual elements can be edited.