This document describes the m2dir format for storing a collection of emails on disk. For more information about m2dir, see the project page.
This specification is considered a draft. Changes, even breaking ones, are possible if feedback from actual implementations indicate they are necessary. This status will be updated accordingly once it stabilizes.
M2dir provides a standardized way to store a collection of email messages as files. It is similar to Maildir/Maildir++, but aims to be simpler and more thoroughly specified.
Its goal is to support both synchronization with other hierarchical remote mail stores (e.g. an IMAP account or another m2dir on a remote host) and delivery of new messages (e.g. SMTP delivery or system notifications).
M2dir only specifies the storage mechanism. Any indexing of messages (for their mapping to remote messages, full-text search, etc.) is left to applications.
The name of this specification is m2dir. It mainly defines two things:
The m2dir format has the following defining features:
An m2dir-compatible directory structure consists of a root directory (called the m2store root) and any number of folders (simply called m2dirs).
Such a directory structure in its entirety is called an m2store.
When synchronizing an m2store with remote mail storage, the folders must
accurately reflect the remote's hierarchy, nested according to the remote's
hierarchy delimiter. Specifically, this implies that an m2store mirroring an
IMAP account must not contain any emails in its m2store root. Instead, the
root will contain an m2dir INBOX
.
The specification does not preclude an m2store root from also being an m2dir. However, at the current version of the specification, applications are strongly recommended to avoid such a setup.
The only restriction is that a folder name must not start with a period
(.
) and any directory starting with a period must be ignored by
m2dir-compliant applications.
An m2store root must contain an empty marker file .m2store
to enable
discovery by other m2dir-compatible applications.
The .m2store
marker file must be empty. However, applications should
merely check for the file's presence. Future versions of the spec may use the
marker file's content, e.g. to indicate support for a revised version of the
spec.
The only constraints imposed on folder names by the m2dir specification is that
they must be representable as a valid UTF-8 string, must not be empty, and must
not start with a dot (.
).
However, further contstraints may be imposed by the underlying filesystem and/or operating system. In such circumstances, an application creating a folder may chose to perform percent-encoding of certain characters, as described in RFC 3986, section 2. In the name of legibility of directory names on the filesystem, applications should be conservative in their choice of characters to encode.
Due to the above rule, a percent sign (%
) in a folder name must always
be percent-encoded (%25
).
When creating a folder, an application may choose to throw an error instead, if the underlying filesystem does not accept a folder name. However, if an application chooses to do any kind of encoding, it must be percent-encoding. All applications performing synchronization to any kind of remote mail store must support percent-encoded folder names.
As stated in the RFC,
For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings.
An m2store root may contain an entry .delivery
to indicate the user's
desired default folder for incoming mail. If present, the entry must meet
one of the following criteria:
INBOX
, not ./INBOX
or
~/Mail/INBOX
)Applications must support the link variant. The regular file variant is intended as a backup solution for platforms or filesystems that do not support links. Applications are strongly recommended to support both.
The treatment of the default delivery target is covered in the Mail Delivery section. If configured this way, the m2store root is a valid delivery target, even if it is not itself an m2dir. Otherwise, applications must be configured to deliver to a valid m2dir.
The purpose of this to allow the following hypothetical setup: a system
administrator configures a m2dir-compatible mail delivery agent to deliver
mails to ~/mail
for all users. With the described mechanism, each user can
direct incoming mails to the folder of their choice.
The on-disk representation of an m2store that gets synchronized with a typical IMAP account (but also allows for local delivery of new mail) might therefore look like this:
\_ mail/
\_ .m2store
\_ .delivery -> INBOX/
\_ INBOX/
\_ .m2dir
\_ .meta/
\_ Sent/
\_ .m2dir
\_ .meta/
\_ Work/
\_ .m2dir
\_ .meta/
\_ Lists/
\_ srht-dev/
\_ .m2dir
\_ .meta/
\_ srht-discuss/
\_ .m2dir
\_ .meta/
Note: the name mail
is just an example, the name of the m2store root
is user-defined.
For more advanced use cases, an m2dir can exist outside the context of an m2store. An example could be backing up one specific mailbox of an IMAP account into a user-specified directory. An m2dir-compliant application can still work with the emails in that directory, but must not make any assumptions about the folder name. Synchronization of changes from or to that directory would for example require that the user explicitly specify the remote mailbox.
A directory that stores emails in m2dir format according to this specification
must contain a marker file .m2dir
.
The .m2dir
marker file must be empty. However, applications must
merely check for the file's presence. Future versions of the spec may use the
marker file's content, e.g. to indicate support for a revised version of the
spec.
Every file in the m2dir represents an email. Files starting with a period (.
)
must be ignored, unless they are specified in this document.
Email metadata (such as flags) is stored in a subdirectory .meta
(see
Metadata below). This directory may not exist, even in the
presence of emails in the m2dir, if no metadata about these emails has been
recorded yet.
All directories in an m2dir should be ignored, unless the m2dir is embedded
in a m2dir-compliant m2store directory structure with a
known m2store root. Directories whose name starts with a period (.
) must
be ignored, unless they are specified in this document.
New files must be created according to the Mail delivery section below.
A message's filename is structured as follows:
<HUMAN_CENTRIC_PART>,<UNIQUE_ID>
The unique ID part is structured as follows:
<CHECKSUM>[.<COUNTER>]
The checksum must be a RFC 4648 base64url-encoded string
representing 12 bytes of data (see Unique ID below). This implies that
it must not contain any padding characters and must contain only the
non-padding characters from the RFC's "URL and Filename safe" Base64 alphabet
([A-Za-z0-9_-]
).
To handle checksum collisions, an integer greater than zero can be appended to
the checksum, separated by a dot (.
).
The unique ID of a message must be generated according to the rules described in the Unique ID section.
An m2dir-compliant application must parse the ID by searching backwards
from the end of the filename for the first comma (,
). This is because the
human-centric part may contain commas itself.
Note that applications must not attempt to parse the human-centric part or derive any properties from it.
Example filename of an email in an m2dir, using the specification's example naming scheme for the human-centric part of the filename:
2023-09-04_13:47_builds@sr.ht,GTfrlwJfN5vyR28R
Storing the same message twice leads to a hash collision. Therefore, the next copy would have the filename:
2023-09-04_13:47_builds@sr.ht,GTfrlwJfN5vyR28R.1
Metadata about emails is stored in separate files in the .meta
subdirectory
of an m2dir. Each type of metadata is stored in its own file, following the
naming convention:
.meta/<UNIQUE_ID>.<EXTENSION>
Currently, the following types of metadata are defined:
.meta/<UNIQUE_ID>.flags
The unique ID must be generated according to the following specification.
The value S
is defined as the little-endian representation of the 32 bit
integer size of the message in bytes.
The entire message is hashed, using the FNV64a hash function, salted with S
.
The final checksum is the base64url-encoded representation of the four
bytes of S
concatenated with the eight bytes of hash output (which is also
assumed to be little-endian).
As the input for the base64url-encoding is exactly 12 bytes, the resulting string will be 16 characters long and not contain any padding.
If, and only if, a checksum collision is detected (which likely means a
duplicate message), the ID is made unique by appending a dot (.
) followed by
the first integer starting with 1 that will prevent a collision.
Example:
X
gets the ID X
X
gets the ID X.1
X
gets the ID X.2
With this scheme, changes to a message (which should not occur) can be detected by re-computing the checksum and comparing it to the value extracted from the filename.
M2dir allows associating a set of arbitrary flags with a message. These flags are considered metadata and stored in a separate file as defined in the Metadata section. This section defines the format of this file.
The flags file must contain a set of flags, one flag per line, lines separated by a single newline character (ASCII character LF, 0x0A). The empty set of flags may be represented either by an empty file or the absence of a flags file. Each flag must be a valid, non-empty UTF-8 string. Flags must not contain any control characters.
The m2dir specification is only concerned with storage. Therefore, it imposes no further restrictions on the permitted flag names, but it is strongly recommended that applications limit the flag names to a conservative subset (such as alphanumeric ASCII characters only, or the allowed characters for IMAP keywords), for interoperability.
Similarly, with m2dir being concerned with storage only, it treats flags as case-sensitive. It is up to an application to normalize flags or compare them in a case-insensitive manner if the use-case calls for it (IMAP for example considers flags case-insensitive).
While the m2dir specification does allow arbitrary flags, it also specifies a set of standard flags for very common use-cases. It is strongly recommended that an application synchronizing a remote mail store with an m2store map whatever flags the remote storage may be using for these common use-cases to the ones defined here (and vice versa). This will help to preserve semantics, even if mail were to be replicated to yet another remote store that potentially uses different flags.
These flags are used for special purposes and are usually not be presented
verbatim to the user (though they may trigger certain visual cues in the
presentation, such as the highlighting of unread messages). As such, they start
with a dollar sign ($
) to avoid conflicts with user-defined flags. Note that
it is technically possible to have user-defined flags starting with a dollar
sign, but it is strongly recommended that applications do not allow this.
The standard flags are all IANA-defined IMAP keywords, verbatim,
minus the reserved $recent
, plus the IMAP flag \Deleted
, but
with the leading \
replaced with a $
.
At the time of this writing, these are:
$seen
- Message has been read.$answered
- Message has been answered.$Forwarded
- Message has been forwarded.$flagged
- Message is "flagged" (by the user) for urgent/special attention.$Deleted
- Message is marked "deleted", for later removal.$draft
- Message has not completed composition (marked as a draft).$Important
- Message is marked as "important".$MDNSent
- A Message Disposition Notification has been sent.$Junk
- Message definitely contains junk.$NotJunk
- Message does definitely not contain junk.$Phishing
- Message is likely a phishing attempt.New flags may be defined later. Any new keywords added to the IANA registry automatically become a standard m2dir flag.
An application delivering a new message which originates from a remote without
a well-defined folder hierarchy (for example SMTP-delivery) must perform
the following steps to determine the final storage location for the message. It
is assumed that the application has a configured target directory for mail for
a certain user (e.g. ~/Mail
):
.m2dir
marker file)
.m2store
marker
file)
.delivery
that is valid
according to the rules described in the [Default Target][#default-target]
section.
.delivery
entry; doneWhen delivering a new message into an m2dir, it is first written to a temporary
file in the target directory. This temporary file's name must start with a
period (.
) in order to be ignored by compliant applications. In addition, it
is strongly recommended that applications employ established mechanisms for
secure temporary file creation (such as mkstemp(3)). Once the file
is complete, it is renamed ("moved") to its final destination according to the
specification. As the final destination is in the same directory, this
operation can reasonably be assumed to be atomic.
The purpose of the human-centric part is solely to provide some context to a human operator to differentiate emails in a meaningful way. Applications must not attempt to parse the human-centric part or derive any properties from it. It is purely for human consumption.
The actual contents of the human-centric part of the filename are intentionally unspecified. Applications are free to come up with their own naming schemes, or even offer users a choice between different ones.
The only requirement is that the human-centric part of the filename must not change, unless explicitly requested by the user. This is to prevent unexpected breakage of any index the user may have on top of the message store.
The following example shall illustrate the purpose of the human-centric part of the filename. It is purely informational.
An application might choose the following naming scheme for the human-centric part:
<DATE>_<FROM>
Where
<DATE>
is the date from the email's Date
header in the following format:
YYYY-MM-DD_hh:mm
or, in other words, equivalent to the output of date '+%Y-%m-%d_%H:%M'
<FROM>
is the address part of the email's From
headerExample:
2023-09-04_13:47_builds@sr.ht
Using the date in the specified notation as first part will naturally sort the
messages by date if alphabetic sorting is applied (as is common e.g. in the
output of ls
). Pretty much any email client presents messages sorted by date,
so the (easy to establish) alphabetical order of files would nicely match the
chronological order which users are used to.
Due to this common presentation, the date is also something that many people "mentally index" their mail by, consciously or not (think e.g. "I got this mail yesterday", or "Rob sent this last week"). Therefore, making the date easily readable would be another human-centric feature.
The "From:" address is also considered (by the author of this example) to be an important distinguishing feature of an email. The idea is, given a moderate amount of message (say less than 50), to enable a user to find the right one just by looking at the filenames.
This work is marked with CC0 1.0 🅭 🄍
commit 72f8841a7c39f3ea51418476bd5eae4ccb3cbe5b Author: Conrad Hoffmann <ch@bitfehler.net> Date: 2024-04-18T16:20:55+02:00 Specify encoding for folder names if required