Arbor models chat conversations as a group of Tree data structures. This particular collection of trees is called the "Arbor Forest", and this document specifies its structure.
conversation_id
.There are three types of Node:
The Arbor Forest uses a number of common field types with specific meanings. These types are defined here and will be used to describe each field below.
parent
field of nodes that are the root of a tree within the Arbor Forest.All nodes in the forest share some fields. These are described generally here, though each node type's description may contain more detailed information about the legal values in each field for that node type.
Common fields:
version
SchemaVersion: the version of the node schema in use.node_type
Node Type: the type of the node within the Forest.parent
Qualified Hash: the hash of the parent tree node. For Identity and Community nodes, this will always be the null hash of all zeroesid_desc
Hash Descriptor: the hash algorithm and digest size that should be used to compute this Node's IDdepth
Tree Depth: the number of levels this node is from the root message in its tree. Root messages will be 0, their immediate child nodes should be 1.created
Timestamp: when this node was created.metadata
Qualified Content: TWIG data. Only valid if ContentType is TWIG.author
Qualified Hash: the ID of the Identity node that signed this nodesignature
Qualified Signature: the actual binary signature of the node. The structure of this field varies by the type of key in the author
field. The ContentType of this field should be a signature type of some kind.All node types are signed and hashed with their data laid out in a specific order on the wire. The procedure for constructing the layout used for hashing and signing is as follows:
Determine the values of these fields:
version
node_type
parent
id_desc
depth
created
metadata
author
Write them into a buffer in the order above, with all integers written in network byte order (big endian).
Sign the contents of the buffer using the key pointed to by author
, and use it to create the value of signature
.
Concatenate the value of signature
to the end of the existing buffer, then hash the entire buffer with the algorithm and digest size specified by id_desc
to determine the node's actual ID.
An Identity node has the following fields:
name
Qualified Content: must be of type UTF8. The name of the user who controls this key. Maximum length (in bytes) is 256.public_key
Qualified Key: the binary representation of the public keyThese fields should be processed in the order given above when signing and hashing the node.
A Community node has the following fields:
name
Qualified Content: must be of type UTF8. The name of the user who controls this key. Maximum length (in bytes) is 256.These fields should be processed in the order given above when signing and hashing the node.
A Reply node has the following fields:
community_id
Qualified Hash: the node ID of the community node at the root of the tree containing this reply.conversation_id
Qualified Hash: the node ID of the first reply node in the ancestry of this reply (depth 1).content
Qualified Content: the message content.These fields should be processed in the order given above when signing and hashing the node.
Often applications will want to represent Arbor node IDs as strings. The recommended string encoding for a node ID is the following components concatenated:
B
)An example of such a string encoding is: SHA512_B32__CZMk9Gv5g4GYNAPcdvwkDNITsfYFFsTu95jM5Fe4Ekk
Values of this form can be decoded by breaking them at the first occurrence of a double underscore, then decoding the component before it to determine the algorithm and length. The data following the double underscore (when base64url-decoded) should be the length specified after the B
prefix (implementations should check this).
TWIG is a simple data format for key-value pairs of data.
Keys and values are separated by NULL bytes (bytes of value 0). Keys and values may not contain a NULL byte. All other characters are allowed.
Keys have an additional constraint. Each key must contain a "name" and a "version" number. These describe the semantics of the data stored for that key, and the precise meaning is left to the user. The key and name are separated (in the binary format) by a delimiter, which is currently '/'.
The key name may not be empty, but values may be empty. Empty values must still be surrounded by NULL bytes.
In practice, TWIG keys look like (the final slash is the delimiter between key and version):
TWIG Key | Key name | Key Version |
---|---|---|
anexample/235 | anexample | 235 |
heres one with spaces/9 | heres one with spaces | 9 |
heres/one/with/slashes/9 | heres/one/with/slashes | 9 |
commit bc5691fbee23d4933ced1ba734db739aad833611 Author: Chris Waldon <christopher.waldon.dev@gmail.com> Date: 2021-10-27T15:59:49-04:00 content: fix self hosting guide Signed-off-by: Chris Waldon <christopher.waldon.dev@gmail.com>