ARPA2 Common Libraries  2.2.25
ARPA2 Identity and Selector

We explain how to work with Identity under the InternetWide Architecture. Given the ARPA2 Common libraries, it is easy enough to integrate with the ambituous goals or freeing our online presence as users!

This document is about libarpa2identity.so, the library that implements Identity, Selector and Iterator concepts.

Below you will find:

  • Quick description of Concepts
  • API for ARPA2 Identity
  • Signatures for Constrained use of Identity
  • Library Initialisation and Finalisation
  • Example Code
  • Related Work

Examples of Identity and Selectors plus Iteration

In this image, the top nodes give four Identity examples, without aliases. We can only guess they are not groups because the user names sound human. Below that are Selectors, with arrows pointing from specific to generic. Had there been a node for example.com then it would have shown Iteration. Note that the top nodes are not just Identities; they are also Selectors and the first value produced during Iteration.

Quick description of Concepts

Identities are generally structured like email addresses, like john@example.com with a user name and a domain name. The four top nodes in the image are Identities.

Aliases are added to the local part of an identity, using a plus to separate the parts. For example, john might add an alias cooks to classify communication about food and cooking. Others would see this as john+cooks@example.com but services can use the added structure to better sort communications. The image does not show aliases, but john+cooks@example.com would have been drawn with an arrow pointing at john@example.com.

Groups represent multiple users. Think of mailing lists or shared document spaces as possible areas. To outsiders, a group looks just like any other address, and it is only the hosting domain that is aware that something like cooks@example.com actually is treated with group logic. We need our human knowledge of the World to guess that the addresses shown in the image are probably not groups.

Members are to groups what aliases are to users. When john is a member of the group, he might be known there as cooks+johann@example.com and others can address just him via this member address; it is like an extra alias. No examples shown in the image.

Selectors are patterns that can match multiple identities. They might drop parts from the username or domain name, and look like @example.com and go all the way up to @. which matches everything. The arrows point from specific to more generic Selectors; note that every Identity is also a (most specific) Identity.

Iteration is the process starting with an Identity like john@example.com and stepping up via @example.com, @.com to the most general form @. that is sure to match any identity. As you can see in the image, there is no node for example.com so the arrows do not show the complete iteration steps. Had no steps been skipped then Iteration would have meant following the arrows.

Access Control uses Selectors to decide who may do what. A typical example is who may do what with a group. We generally start with a remote identity and use iteration while looking for Access Rights that would apply. The most concrete find will be used, so @. can be used for defaults and nothing could override the concrete john@example.com form.

If you want to read more about this, please check the Domain Owner's Manual and the Identity discussion on our InternetWide Architectural blog.

API for ARPA2 Identity

This is an introduction. Read the include file for extensive documentation of the functions, data types and more.

Identities and Selectors are not specific to a protocol. Your software is, of course. You will receive identities as strings, in some form. Rather than superficially testing the syntax (must have an @ sign in it) and forgetting things (oops, there are two, but we did not check) or defferring them (internationalisation is difficult, and we have no use for it) we have created a parser that completely parses the full grammar, and sorts the parts of the string into the bits and pieces that we work with. While at it, the strings are put in canonical form; meaning, in lowercase except for any signature text which we set clearly aside by printing that in uppercase. (Local parts are considered case-insensitive in ARPA2 Identity; this is designated as a local choice but except for certificates, case never seems to matter.)

Parsing. The functions to use are

#include <arpa2/identity.h>
bool a2id_parse (a2id_t *out, const char *in, unsigned inlen);
bool a2sel_parse (a2sel_t *out, const char *in, unsigned inlen);

In goes a string and its length; out comes a fixed-sized identity buffer. No allocation is needed, you can allocate a2id_t or a2sel_t anywhere you like.

The functions represent completely different parsers. This is not true for all the API calls; some functions have an alias to allow you to not know, or to express your ideas more elegantly.

The a2id_t and a2sel_t are actually the same structures internally, but the latter takes much more freedom. It is however a consequence of the fact that every Identity is automatically a (most concrete) Selector. Internally, these structures hold a string buffer ->txt with the literal user@domain.name and an array ->ofs with monotonically rising offsets into this string that indicates where the parser found cut-off points.

Internationalisation. The domains in an ARPA2 Identity are defined to be in UTF-8 form, which includes the old ASCII representations except those that have labels starting with xn-- because those are reserved for embedding Punycode. Punycode is the DNS representation for non-ASCII characters, but it is not useful in any other protocol; an ARPA2 Identity wants to be equally useful to all protocols and therefore does not favour the DNS form of internationalisation. Remember to write your international domains in UTF-8 form if you want them to parse correctly.

Comparing. To see if two parsed structures are the same, you can safely use memcmp(), or you might use strcmp() between the ->txt. But you don't want to know that level of things; so the API offers the general idea as

#include <arpa2/identity.h>
bool a2id_equal (const a2id_t *left, const a2id_t *right);
bool a2sel_equal (const a2sel_t *left, const a2sel_t *right);

It's more interesting to see if an Identity matches a Selector. This is like testing whether an element is part of a set, but that is not as meaningful to a programmer; but everyone would notice the intuitive logic behind it.

#include <arpa2/identity.h>
bool a2id_match (const a2id_t *identity, const a2sel_t *selector);
bool a2id_member (const a2id_t *identity, const a2sel_t *selector);

Even more interesting is the question whether one Selector is more specific than the other (or the other more general than the former). This may be false in both directions, because the mathematical concept behind it is a partially ordered set. Ah well, just remember to be careful with logic negations.

#include <arpa2/identity.h>
bool a2sel_special (const a2sel_t *special, const a2sel_t *general);

All these function benefit from the pre-parsed structures, so the small investment in an explicit parser call tends to work out positive. The parser, by the way, is generated with Ragel; this leads to simple parses that look at one character and then jump to another place in the program. This tends to be highly efficient. You can compile the code to different models if you like, Ragel is quite supportive of that sort of thing.

Iteration. The general routines for Iteration maintain a Selector data structure, so the ->txt and ->ofs components, so all the usual calls work on the intermediate results. If that is not needed and you only want a username and domain part, then you can choose Quick Iteration isntead.

#include <arpa2/identity.h>
bool a2sel_iterate_init (const a2sel_t *start, a2sel_t *cursor);
bool a2sel_iterate_next (const a2sel_t *start, a2sel_t *cursor);

or

#include <arpa2/identity.h>
bool a2sel_quickiter_init (a2sel_quickiter *iter);
bool a2sel_quickiter_next (a2sel_quickiter *iter);

The general procedure is to call _init, and if it returns true to process the first item; when done, call _next and loop back if its outcome is again true. Code examples are maintained in the header files.

Distance. One possible use of iteration is to count the number of steps to come from one Selector to another. This may fail of course, but when not the value can be use to select the closest option from a list of matches. This can be constructed from iterators, but doing it cleverly and testing it properly actually takes effort, so there is a simple abstraction for this work.

#include <arpa2/identity.h>
bool a2sel_abstractions (a2sel_t *specific, a2sel_t *generic, uint16_t *steps);

You can use the steps output from this function to learn which of two abstractions is nearer. You can even split the step into the steps that were needed for the domain and for the username. This is possible because domain abstraction steps greatly outrank username abstraction steps, and uses:

#include <arpa2/identity.h>
uint16_t a2sel_abstractions_domain (uint16_t steps);
uint16_t a2sel_abstractions_username (uint16_t steps);

Signatures for Constrained use of Identity

A novel feature that we built into this library is lightweight signing of some context from your application into extension characters for the local part. These are easily recognised by the trailing plus, but our parser also maps the letters to uppercase. An example of a signed address would be

john+7HA5JA2NTXJ26N5K6N5YMTOJZNWNN4OEVRLSCCQZG4RFFIL5SCR45QAPHE+@example.com

This looks awkard, but causes no dismay when simply replying or clicking on it. The code contains a few things that will be requested from the application during both signing and verifying. In this case, it assumes communication with a Remote Identity mary@example.net, some protocol-specific session identifier (like the Message-Id in an email) set to SessionID, a human-entered and possibly canonicalised subject string CanonicalSubject and a [Topic] derived from the communication.

Not shown, but cryptographically important, is the inclusion of the domain name example.com and local identity john so the signature cannot be passed, a number of days until expiration of this Identity and the length of the signature part in this form.

Change anything and the code will flip. For instance, madelin@example.org would have had to use

john+7HABHW3IUCWTJFYJGHTOMF2IQTTNWPK3CSRWQ74V3RHEKP7LXNNO3MHVY6+@example.com

The local part will never exceed 64 characters, which is a required for it to work as an email address. Other protocols tend to be less restrictive, because they were designed in times with less concern for buffer overflows.

To summarise, the signature binds information from the context of the protocol into an Identity, to deliberately constrain its usability and, more importantly, its abusability by others who get hold of the address. The intention is to make it easier for you to share an address over which (some) others can reach you.

Signatures in an address are optional, of course. And protocol-specific parts can be added or left out to facilitate a user's desires. Shopping online they might want an address that expires in 14 days but for their job this would not be required; then however, a restriction of the sender's domain might be desired.

To sign before sending, or verify before accepting, the following functions can be used:

#include <arpa2/identity.h>
bool a2id_sign (a2id_t *a2id_to_be_signed, a2id_sigdata_cb cb, void *cbdata);
bool a2id_verify (a2id_t *a2id_to_be_verified, a2id_sigdata_cb cb, void *cbdata);

The idea is to always use a2id_sign() for outgoing traffic from internal senders, and to always use a2id_verify() for incoming traffic to internal recipients. To make this possible, the functions return success when no work had to be done. Failures are only returned when work must be done but then fails. Note that even a2id_verify() does not require that a signature is present; it leaves that to later Access Control stages, but simply bars signatures that are invalid.

Before calling a2id_sign(), you can set the ->sigflags field to your desired minimum and maxmimum support for A2ID_SIGFLAG_ flags. After a2id_verify(), only the verified-correct flags will be set. Similarly, the ->expiration indicates the last valid time(NULL) value for the signature. This value is meaningless without the flag A2ID_SIGFLAG_EXPIRE.

These calls are simple enough. The callback allows your service to add some bytes to the mix. This must be the same values during singing and verification. A few base cases are covered by a standard function that may be supplied as the callback, or to which requests may be forwarded.

#include <arpa2/identity.h>
bool a2id_sigdata_base (a2id_sigdata_t sd, void *cbdata_rid, const a2id_t *lid, uint8_t *buf, uint16_t *buflen);

You can safely call a2id_sign() and a2id_verify() without loading keys for them to work on. The functions quietly ignore all traffic and work a bit on the flags to avoid false pretenses. To actually make these functions work with signatures, you should add a key, preferrably before you drop privileges and start serving traffic.

#include <arpa2/identity.h>
bool a2id_addkey (int keyfd);

The secret must be presented over a file descriptor and adhere to a simple format. Simple ways of doing this with reasonable security include using a pipe or fifo, but it may also be read from a file. The files can be generated with the a2id-keygen utility, and we suggest /var/lib/arpa2/identity/default.key as storage location.

Details. Have a look at the include file. It has lengthy documentation on each of the functions. This is the place where the details are maintained, this document was merely written to show you around.

Library Initialisation and Finalisation

To initialise and cleanup the ARPA2 Identity / Selector / Iteration concepts, operate on the library,

#include <arpa2/identity.h>
void a2id_init (void);
void a2id_fini (void);

The following is an exmple of how ARPA2 Identity reports an error:

#include <com_err/arpa2identity.h>
errno = A2ID_ERR_SELECTOR_SYNTAX;
return false;

The error can be printed with <arpa2/com_err.h>.

Example Code

Let's say your program has parsed strings for a remote user who wants to connect to a local user. An incoming email would be an example, where these values may be parsed from MAIL FROM and RCPT TO headers. The following code parses these strings into a2rid and a2lid structures, and verifies any signatures that may have been found on the local identity:

//
// Initialise the system with key for signatures
int keyfd = open ("/var/lib/arpa2/identity/default.key", O_RDONLY);
if (keyfd >= 0) {
a2id_addkey (keyfd);
close (keyfd);
}
...
//
// Assume a NUL-terminated remote and local user@domain.name
bool ok = true;
const char *remote = ...;
const char *local = ...;
//
// Parse the identities
a2id_t rid, lid;
ok = ok && a2id_parse (&rid, remote, 0);
ok = ok && a2id_parse (&lid, local, 0);
//
// Check basic constraints imposed by the local address
ok = ok && a2id_verify (&lid, a2id_sigdata_base, &rid);
//
// Report whether communication can continue
printf ("Communicating from %s to %s\n", rid.txt, lid.txt);
printf ("%s this address combination\n", ok ? "Accepted" : "Rejected");
//
// Report on signature checks (if any)
if (lid.sigflags & A2ID_SIGFLAG_EXPIRATION) {
printf ("Local address expiration day %d\n", lid.expireday);
printf ("Local domain is restricted by the signature\n");
printf ("Local userid is restricted by the signature\n");
}
if (lid.sigflags & A2ID_SIGFLAG_REMOTE_DOMAIN) {
printf ("Remote domain is restricted by the signature\n");
}
if (lid.sigflags & A2ID_SIGFLAG_REMOTE_USERID) {
printf ("Remote userid is restricted by the signature\n");
}
//
// A wrapper around a2id_sigdata_base could add application restrictions
if (lid.sigflags & A2ID_SIGFLAG_SESSIONID) {
printf ("Application session is restricted by the signature\n");
}
if (lid.sigflags & A2ID_SIGFLAG_SUBJECT) {
printf ("Application subject is restricted by the signature\n");
}
if (lid.sigflags & A2ID_SIGFLAG_TOPIC) {
printf ("Application topic is restricted by the signature\n");
}
//
// Now check if the rid is granted the requested access
...

You will find a compiling version of this code in test/id/example.c and you can call it as test/id/example <remote> <local> to play.

Related Work

There are a few more functions for advanced things, but the above covers the basic flow. Identities are often authenticated as part of the protocol and then, when they are known to be valid, they progress into Access Control to decide what powers to bestow on this user.

This framework deliberately supports the ideas of Realm Crossover so your users can be offered to Bring Your Own IDentity. In other places of our software stack, we are working on tools and libraries that help you with that.

There also is an uplink to Access Control, for which libraries are included of this same ARPA2 Common package. These take great care to also support Realm Crossover, but information is also protected from scanning and browsing, as might be tried after a break-in. To be able to read the information, one must know the information provided by the user through a protocol, and one must hold certain secrets to be able to search and decode information. The idea here is to support a split in the hosting market, and allow Identity Providers to flourish in a fruitful collaboration with Service Providers. The latter will be handed down abilities too query Access Rights, but they will not be permitted to easily iterate the data and put it to other uses.