ARPA2 Common Libraries
2.6.2
|
We explain how to work with Identity under the InternetWide Architecture. Given the ARPA2 Common libraries, it is easy enough to integrate with the ambituous goals or freeing our online presence as users!
This document is about libarpa2identity.so
, the library that implements Identity, Selector and Iterator concepts.
Below you will find:
In this image, the top nodes give four Identity examples, without aliases. We can only guess they are not groups because the user names sound human. Below that are Selectors, with arrows pointing from specific to generic. Had there been a node for example.com
then it would have shown Iteration. Note that the top nodes are not just Identities; they are also Selectors and the first value produced during Iteration.
Identities are generally structured like email addresses, like john@example.com
with a user name and a domain name. The four top nodes in the image are Identities.
Aliases are added to the local part of an identity, using a plus to separate the parts. For example, john
might add an alias cooks
to classify communication about food and cooking. Others would see this as john+cooks@example.com
but services can use the added structure to better sort communications. The image does not show aliases, but john+cooks@example.com
would have been drawn with an arrow pointing at john@example.com
.
Groups represent multiple users. Think of mailing lists or shared document spaces as possible areas. To outsiders, a group looks just like any other address, and it is only the hosting domain that is aware that something like cooks@example.com
actually is treated with group logic. We need our human knowledge of the World to guess that the addresses shown in the image are probably not groups.
Members are to groups what aliases are to users. When john
is a member of the group, he might be known there as cooks+johann@example.com
and others can address just him via this member address; it is like an extra alias. No examples shown in the image.
Selectors are patterns that can match multiple identities. They might drop parts from the username or domain name, and look like @example.com
and go all the way up to @.
which matches everything. The arrows point from specific to more generic Selectors; note that every Identity is also a (most specific) Identity.
Iteration is the process starting with an Identity like john@example.com
and stepping up via @example.com
, @.com
to the most general form @.
that is sure to match any identity. As you can see in the image, there is no node for example.com
so the arrows do not show the complete iteration steps. Had no steps been skipped then Iteration would have meant following the arrows.
Access Control uses Selectors to decide who may do what. A typical example is who may do what with a group. We generally start with a remote identity and use iteration while looking for Access Rights that would apply. The most concrete find will be used, so @.
can be used for defaults and nothing could override the concrete john@example.com
form.
If you want to read more about this, please check the Domain Owner's Manual and the Identity discussion on our InternetWide Architectural blog.
This is an introduction. Read the include file for extensive documentation of the functions, data types and more.
Identities and Selectors are not specific to a protocol. Your software is, of course. You will receive identities as strings, in some form. Rather than superficially testing the syntax (must have an @
sign in it) and forgetting things (oops, there are two, but we did not check) or defferring them (internationalisation is difficult, and we have no use for it) we have created a parser that completely parses the full grammar, and sorts the parts of the string into the bits and pieces that we work with. While at it, the strings are put in canonical form; meaning, in lowercase except for any signature text which we set clearly aside by printing that in uppercase. (Local parts are considered case-insensitive in ARPA2 Identity; this is designated as a local choice but except for certificates, case never seems to matter.)
Parsing. The functions to use are
In goes a string and its length; out comes a fixed-sized identity buffer. No allocation is needed, you can allocate a2id_t
or a2sel_t
anywhere you like.
The functions represent completely different parsers. This is not true for all the API calls; some functions have an alias to allow you to not know, or to express your ideas more elegantly.
The a2id_t
and a2sel_t
are actually the same structures internally, but the latter takes much more freedom. It is however a consequence of the fact that every Identity is automatically a (most concrete) Selector. Internally, these structures hold a string buffer ->txt
with the literal user@domain.name
and an array ->ofs
with monotonically rising offsets into this string that indicates where the parser found cut-off points.
Internationalisation. The domains in an ARPA2 Identity are defined to be in UTF-8 form, which includes the old ASCII representations except those that have labels starting with xn--
because those are reserved for embedding Punycode. Punycode is the DNS representation for non-ASCII characters, but it is not useful in any other protocol; an ARPA2 Identity wants to be equally useful to all protocols and therefore does not favour the DNS form of internationalisation. Remember to write your international domains in UTF-8 form if you want them to parse correctly.
Comparing. To see if two parsed structures are the same, you can safely use memcmp()
, or you might use strcmp()
between the ->txt
. But you don't want to know that level of things; so the API offers the general idea as
It's more interesting to see if an Identity matches a Selector. This is like testing whether an element is part of a set, but that is not as meaningful to a programmer; but everyone would notice the intuitive logic behind it.
Even more interesting is the question whether one Selector is more specific than the other (or the other more general than the former). This may be false
in both directions, because the mathematical concept behind it is a partially ordered set. Ah well, just remember to be careful with logic negations.
All these function benefit from the pre-parsed structures, so the small investment in an explicit parser call tends to work out positive. The parser, by the way, is generated with Ragel; this leads to simple parses that look at one character and then jump to another place in the program. This tends to be highly efficient. You can compile the code to different models if you like, Ragel is quite supportive of that sort of thing.
Iteration. The general routines for Iteration maintain a Selector data structure, so the ->txt
and ->ofs
components, so all the usual calls work on the intermediate results. If that is not needed and you only want a username and domain part, then you can choose Quick Iteration isntead.
or
The general procedure is to call _init
, and if it returns true
to process the first item; when done, call _next
and loop back if its outcome is again true
. Code examples are maintained in the header files.
Distance. One possible use of iteration is to count the number of steps to come from one Selector to another. This may fail of course, but when not the value can be use to select the closest option from a list of matches. This can be constructed from iterators, but doing it cleverly and testing it properly actually takes effort, so there is a simple abstraction for this work.
You can use the steps
output from this function to learn which of two abstractions is nearer. You can even split the step
into the steps that were needed for the domain and for the username. This is possible because domain abstraction steps greatly outrank username abstraction steps, and uses:
A novel feature that we built into this library is lightweight signing of some context from your application into extension characters for the local part. These are easily recognised by the trailing plus, but our parser also maps the letters to uppercase. An example of a signed address would be
This looks awkard, but causes no dismay when simply replying or clicking on it. The code contains a few things that will be requested from the application during both signing and verifying. In this case, it assumes communication with a Remote Identity mary@example.net
, some protocol-specific session identifier (like the Message-Id
in an email) set to SessionID
, a human-entered and possibly canonicalised subject string CanonicalSubject
and a [Topic]
derived from the communication.
Not shown, but cryptographically important, is the inclusion of the domain name example.com
and local identity john
so the signature cannot be passed, a number of days until expiration of this Identity and the length of the signature part in this form.
Change anything and the code will flip. For instance, madelin@example.org
would have had to use
The local part will never exceed 64 characters, which is a required for it to work as an email address. Other protocols tend to be less restrictive, because they were designed in times with less concern for buffer overflows.
To summarise, the signature binds information from the context of the protocol into an Identity, to deliberately constrain its usability and, more importantly, its abusability by others who get hold of the address. The intention is to make it easier for you to share an address over which (some) others can reach you.
Signatures in an address are optional, of course. And protocol-specific parts can be added or left out to facilitate a user's desires. Shopping online they might want an address that expires in 14 days but for their job this would not be required; then however, a restriction of the sender's domain might be desired.
To sign before sending, or verify before accepting, the following functions can be used:
The idea is to always use a2id_sign()
for outgoing traffic from internal senders, and to always use a2id_verify()
for incoming traffic to internal recipients. To make this possible, the functions return success when no work had to be done. Failures are only returned when work must be done but then fails. Note that even a2id_verify()
does not require that a signature is present; it leaves that to later Access Control stages, but simply bars signatures that are invalid.
Before calling a2id_sign()
, you can set the ->sigflags
field to your desired minimum and maxmimum support for A2ID_SIGFLAG_
flags. After a2id_verify()
, only the verified-correct flags will be set. Similarly, the ->expiration
indicates the last valid time(NULL)
value for the signature. This value is meaningless without the flag A2ID_SIGFLAG_EXPIRE
.
These calls are simple enough. The callback allows your service to add some bytes to the mix. This must be the same values during singing and verification. A few base cases are covered by a standard function that may be supplied as the callback, or to which requests may be forwarded.
You can safely call a2id_sign()
and a2id_verify()
without loading keys for them to work on. The functions quietly ignore all traffic and work a bit on the flags to avoid false pretenses. To actually make these functions work with signatures, you should add a key, preferrably before you drop privileges and start serving traffic.
The secret must be presented over a file descriptor and adhere to a simple format. Simple ways of doing this with reasonable security include using a pipe or fifo, but it may also be read from a file. The files can be generated with the a2id-keygen
utility, and we suggest /var/lib/arpa2/identity/default.key
as storage location.
Details. Have a look at the include file. It has lengthy documentation on each of the functions. This is the place where the details are maintained, this document was merely written to show you around.
To initialise and cleanup the ARPA2 Identity / Selector / Iteration concepts, operate on the library,
The following is an exmple of how ARPA2 Identity reports an error:
The error can be printed with <arpa2/com_err.h>.
Let's say your program has parsed strings for a remote user who wants to connect to a local user. An incoming email would be an example, where these values may be parsed from MAIL FROM
and RCPT TO
headers. The following code parses these strings into a2rid
and a2lid
structures, and verifies any signatures that may have been found on the local identity:
You will find a compiling version of this code in test/id/example.c
and you can call it as test/id/example <remote> <local>
to play.
There are a few more functions for advanced things, but the above covers the basic flow. Identities are often authenticated as part of the protocol and then, when they are known to be valid, they progress into Access Control to decide what powers to bestow on this user.
This framework deliberately supports the ideas of Realm Crossover so your users can be offered to Bring Your Own IDentity. In other places of our software stack, we are working on tools and libraries that help you with that.
There also is an uplink to Access Control, for which libraries are included of this same ARPA2 Common package. These take great care to also support Realm Crossover, but information is also protected from scanning and browsing, as might be tried after a break-in. To be able to read the information, one must know the information provided by the user through a protocol, and one must hold certain secrets to be able to search and decode information. The idea here is to support a split in the hosting market, and allow Identity Providers to flourish in a fruitful collaboration with Service Providers. The latter will be handed down abilities too query Access Rights, but they will not be permitted to easily iterate the data and put it to other uses.