A number of heated arguments have taken place on Gale about category-related issues. Endless debate rages about the use of dots or slashes for intracategory separators, and proposals are often floated to make the category structure "out of band", to change the way it relates to key names, and to otherwise change the conventions, common practice, and system support of and for Gale users.
But the dot-versus-slash issue is only the surface of the matter. It is not my purpose here to support one particular position over another, but instead to attempt to enumerate the choices under consideration, propose some common terminology, and present the standard arguments for common positions.
I'd like to introduce the concept of a model. This is an overloaded term, so I'll describe local usage. A model is an abstract conceptual structure for some class of objects (such as categories) with some semantics attached. Models are independent of the concrete syntax used to express objects in ASCII text (or any other medium). Models may be understood and interpreted by automated systems, human beings, or both.
For example, the model for Gale messages is that they have a category and message contents, and that the message contents contains a sequence of message fragments, and that each message fragment has a name, a type (text, integer, time, binary data, or nested list of fragments), and data (depending on the type). This model is understood by Gale servers (which route messages according to their category), clients (which interpret fragments for display to the users), and users (who have some vague notion of the attributes of a message and how to manipulate them with tools). Associated with this model are several concrete representations, including the format used to transmit messages over the network, the format used to represent messages in process memory, the format used to communicate messages from "gsub" to "gsubrc", and many formats used to display messages to the user.
Models may be nested; the model for messages includes a "category", which has its own model. (Most of this document concerns itself with the category model!) As mentioned, models can have concrete syntax associated with them. A model may have multiple syntaxes, and syntaxes can be in terms of other models; the syntax for the model of interaction with a Gale "bot" is built on the model used to describe individual Gale messages.
All this philosophical pedantry may seem gratuitous, but I think it will help clarify some issues later.
Most people view a category as consisting of an optional Internet hostname (for "directed categories") and an ordered sequence of tokens. The category as a whole represents a topic of discussion, the hostname indicates which server is the primary clearinghouse for messages on that topic, and the sequence of tokens is used to describe the topic's position in an ontological hierarchy.
For example, the category which is represented in the syntax used by the the current system (and the "slash" camp) as "@ofb.net/pub/me/almccon" has a hostname of "ofb.net" and tokens "pub", "me", and "almccon", establishing the message as public-interest, personally-relevant, and pertaining to Alan McConchie, respectively. The "dot" camp uses a different syntax, and would represent the same category as "@ofb.net/pub.me.almccon".
Most software in the Gale system operates on a slightly different model. In this model, categories consist of an optional Internet hostname (the domain) and an ordered sequence of characters (the subcategory). Category hierarchy is determined by prefix comparison; "pub/foo" "contains" "pub/foobar" as well as "pub/footsie" and "pub/foo/bar". The software dictates a single concrete syntax for categories, either "@dom.ain/subcategory" or just "subcategory" (for nondirected categories). This is the only category model the Gale server understands.
I'll call the token-based human interpretation the word model, and the character-based software interpretation the character model.
Categories in the word model must be represented in terms of the character model. If the word-model tokens are concatenated with delimiter characters, the semantics of the character model's hierarchy will match the semantics of the word model's hierarchy. However, since the software doesn't dictate which delimiter to use, people are free to use different ones, and indeed this is where the famous dot-slash issue arises.
To make matters more confusing, some Gale client software uses a model of "user categories" which are associated with a specific user in a domain. This model is actually built on the word model. These categories use the domain of the user, and at least two tokens: the literal string "user", and the user's ID. Extra tokens (such as "receipt") are added in some cases. The software chooses to use the "slash" convention to represent these categories in characters.
Categories may be combined. Messages may have several categories, and user subscriptions list all the categories of interest. In both cases, some of the categories can be positive, and some can be negative; the order of categories is significant. This is the model used for subscriptions. This model has a concrete syntax which uses ":", "+", and "-", along with the aforementioned concrete syntax for the individual categories themselves.
The existing system uses concrete syntax that represents categories and subscriptions using human-readable Unicode text. This text is then embedded in the Gale protocol (using UCS-2), stored in memory as a text string (as a counted array of wchar_t), and otherwise manipulated as text.
Some have proposed transmitting categories using more complex structure without text encoding. For example, a binary encoding could be used, or something like XML. This is more heavily reliant on an abstract model and multiple representations of the model (one for protocol transmission, one for memory representation, etc).
It's worth noting that out-of-band signalling can still use text encoding, as long as the delimiter characters are rare, or their occurrence in other components is automatically escaped, and the text is automatically parsed and unparsed. The text representation becomes a transfer format rather than a canonical syntax.
The choice for in-band vs. out-of-band signalling can be made independently for intercategory separation (":"), directed category structure ("@.../..."), and word-model structure ("/").
It works, why fix it?
Text strings provide a natural way to refer to a category or subscription in a wide variety of media. For example, people may wish to talk about categories in the body of a Gale puff; this gives them a single, canonical way to refer to them. It also makes simple user interface easy for client implementors, since they can just read and print category strings directly (and most user interface systems have well-developed text I/O facilities).
With in-band signalling, the "metacharacters" (such as "@", "/", ":", etc) used to describe the structure of a category are forbidden from use in category components (subcategory tokens or characters). They can be "escaped", but this requires additional mechanism. Out-of-band signalling can free up the entire Unicode character set for use in category components.
Furthermore, by not mandating or encouraging a single text representation, user interface designers are free to represent the abstract model of a category in the most natural way possible. Users could even configure their interface to use the representation they prefer for input and output.
Last, and least, since any particular protocol can use an encoding optimized for the situation, parsing and unparsing time is minimized.
The server uses the character model to match categories with subscriptions; most users think in terms of the word model. Some have proposed eliminating the discrepancy and reprogramming the server to match categories with subscriptions using a token hierarchy. (Assuming in-band signalling is retained, this means formalizing an intracategory delimiter in code.)
It works, why fix it?
For users of ideographic languages, "words" and "characters" are equivalent, and it would be redundant to mandate delimiters.
The word model can be built on top of the character model, and this gives groups of users the most freedom to choose the delimiter that pleases them. In some cases, it may be useful to use categories which don't contain words at all; for example, one could represent geographical locations by encoded binary category strings, and allow people to construct variable-size regions of interest with category expressions. The character model is a simple basis for almost any use.
The character model is actually not a good substrate for the word model. To see why, assume people want to talk about bats on "pub/bat". Others want to talk about bathtubs, and use "pub/bathtub". In the word model, this makes perfect sense. However, with a character-based subscription system, people subscribed to "pub/bat" will receive messages sent to "pub/bathtub".
This problem can be eliminated by subscribing and posting to "pub/bat/" and "pub/bathtub/", but this is annoying. The trailing slash could be added automatically, but this is effectively an implementation of the word model (including the requirement to formalize the choice of delimiter).
Furthermore, it may be a bad idea to offer choice of delimiter; it only provokes dot-vs-slash-style issues. Mandating a single delimiter will enforce consistency, and it will will make it easier for people to write tools which reason about the structure of Gale categories (such as statistics collectors, autocategorization systems, log browsers, and other utilities).
The domain portion of a directed category is already word-delimited (and has an Internet-standard delimiter of "."), so it would be more consistent for the subcategory portion to also be word-delimited.
Gale clients already use the word model to construct user categories.
Even users of ideographic languages will want tokens which are comprised of multiple words. Despite the term "word-based", each token can contain more than one word: "pub/movies/fight-club". If each word were represented by a single character, this would be "p/m/fc", and the delimiters would still be useful.
Clients currently use the user category model to associate personal categories with key IDs. This is built on the word model, and turns key IDs like "egnor@ofb.net" into categories like "@ofb.net/user/egnor/".
Because these personal categories are automatically generated, users tend to model Gale messages as either "private" or "public", where public messages have a category, and private messages have one or more recipients. Clients manage the transformation of this model into the model understood by the system, where Gale messages always have a category, and are sometimes encrypted for one or more keys.
This difference often leads to confusion. The interaction between command-line arguments, the "-c" flag, and the "-C" flag to gsend are bewildering to inexperienced Gale users. Subscribing to personal messages from a different domain requires magic incantations, as does setting up and participating in an encrypted message group. Experienced users must eventually learn the system's model as well as the transformation from the users' model.
There are two proposals for making the system more natural and easy to use.
For one, Gale could eliminate the two models and adopt one hybrid model for both system and user use; in this model, categories and keys would have the same syntax (probably the e-mail-like "user@dom.ain", with subcategories tacked on somewhere), and whenever sending a message to a category, the system encrypts the message with the matching key. This allows users to send to "private", "public", and "group" destinations with the same syntax.
The other proposal suggests keeping the system's category model as it is, but extending the support for the user's model so that most users need never be aware of the transformation. This means making it easy for users to subscribe from other domains, operate and use encrypted groups, and perform other common tasks that currently require deep understanding of the Gale category and security system.
It works, why fix it?
The system is confusing to users now, so it should be changed. There's no sense having a system-under-the-system if nobody ever sees it, so the system's implementation should match the user's model as closely as possible to minimize needless complexity.
Users think of categories ("public discussion areas"), encrypted groups ("private discussion areas") and keys (individual people) as interchangeable destinations for messages. We should support this directly. Categories should correspond to keys.
Associating keys with categories offers a natural place for administrative control of discussion areas, e.g. for moderation. Currently, categories are a big sea, and there's no particular datum that represents the "policy" associated with an area. Category-key unification would provide this.
The current directed category syntax ("@dom.ain/sub/cat") is weird; moving to something that looks more like an e-mail address will be more natural for users.
Architectural upheaval is neither necessary nor useful. The current system architecture clearly and cleanly represents what is possible for a system like Gale to do: route messages through an untrusted server mesh based on categories, and encrypt and decrypt messages at endpoints. Anything else will ultimately have to be built on such a foundation. Changing the foundation to a different, higher-level system can only reduce the flexibility of the system, can't introduce any new functionality, and introduces additional software layers; the architectural simplicity claimed by the unification arguments is bogus.
Under the unification proposal, it will be necessary to figure out how to create public categories. Do they have a domain, but no username? Do they have an actual key that's special in some way? It will also be necessary to figure out how to append flexible subcategories on keys ("pub@ofb.net/beer"? "pub/beer@ofb.net"?) No matter how this is solved, the result will still be somewhat alien to people used to simple e-mail addresses, so the syntactic familiarity claimed by the unification arguments is bogus.
Users do have some problems, but that's because their tools are insufficient. They should be given scripts or options or wizards to accomplish common tasks (like cross-domain subscription or encrypted group management). This will solve the problem without further setting back Gale's progress to release. While "GALE_OTHERS" is a bad name, it is a first step on this path, making cross-domain subscription much easier. Symlink keys are another example of such a mechanism.
Slash and Burn, by Seth LaForge.
The Delimiter Manifesto, by John Reese.
Traditionally, Gale users have used a slash-based syntax to represent word-model categories in the basic character-model system. More recently, a group of self-described "revolutionaries" have agitated to use a period as the delimiter character. This is only an issue if in-band signalling and a character-based subscription model are retained. (If a word-based subscription model is adopted, the issue will flare up briefly, as the software implementor will have to choose a delimiter, once and for all.)
There are two primary kinds of arguments for specific delimiters. Legacy arguments compare Gale's category structure to other widely-known hierarchical namespaces, and attempt to find a delimiter which evokes systems that have similar semantics. Typographical arguments examine the physical look and common English usage of the available characters, and attempt to find a delimiter which lends the right visual structure and intuitive meaning to categories.
Gale's category namespace has three major characteristics worth comparing to other systems. It is explicitly hierarchical; if you subscribe to "pub*beer", you receive messages sent to "pub*beer*fosters". Furthermore, the direction of the hierarchy is left-to-right; the most general components appear on the left, the most specific on the right. Last, Gale categories are generally not rooted; there is no leading delimiter -- we use "pub/beer", not "/pub/beer".
Let's compare systems.
| System | Example | Hierarchical? | Direction? | Rooted? | |||||
|---|---|---|---|---|---|---|---|---|---|
| Gale | pub*beer*fosters | Yes | Left-to-Right | No ** | |||||
| Filesystems | /etc/passwd, C:\WINDOWS\DESKTOP | Yes | Left-to-Right | Yes | |||||
| Filenames | foo.txt | No | Right-to-Left | No | |||||
| URLs | http://gale.org/users/index.html | Yes | Left-to-Right | Yes | |||||
| USENET | alt.sex.aluminum.baseball.bat | No* | Left-to-Right (by convention) | No | |||||
| DNS | vo.mit.edu | Yes | Right-to-Left | No** | |||||
| OOP | class::object.pointer->member.method() | Yes | Left-to-Right | No | |||||
| Zephyr | krak.discuss.ksh (delimiter varies) | No* | Left-to-Right (by convention) | No | |||||
| LDAP | cn=Quality Control, ou=manufacturing, o=Ace Industry, c=US | Yes | Right-to-Left | No | |||||
| Arabic numerals | 12.345, -175 | Yes | Left-to-Right | No |
* While USENET and Zephyr are often used in a hierarchical way, their technology enforces no hierarchy. People who subscribe to "alt.sex" don't get messages posted to "alt.sex.aluminum.baseball.bat". Furthermore, because the hierarchy is purely social, it is often violated; the aforementioned sluggerphiles' newsgroup should really have been "alt.sex.baseball-bat.aluminum", or maybe "alt.sex.aluminum-baseball-bat", but it wasn't (this is a real example!).
** It's worth noting that Gale is only unrooted as a matter of convention, and that subcategories of directed categories do effectively have a leading slash. Similarly, DNS is technically rooted ("www.slashdot.org.") but the root is not commonly used outside of zone files.
In any case, no perfect match exists, if only because warring proponents kept introducing characteristics until this was true. The factions assign varying import to these results. Dot advocates claim that slashes evoke filesystems, and that the absence of a visible root in Gale is confusing and leads to messages accidentally posted to the incorrect category. Slash advocates assign more import to other characteristics, call the legacy arguments a wash, and focus on the typographical arguments.
While legacy arguments are mostly used by dot advocates, slash advocates also find reason to draw these comparisons. Some make the legacy argument that Gale categories are colon-separated, and that there is precedent for using colons to separate slash-delimited lists (such as UNIX $PATH). Indeed, some of them draw the filesystem anology further, suggesting that filename-like "relative" and "absolute" (slash-rooted) paths could be used to conveniently navigate long directed categories.
Since in-band delimiters make it difficult or impossible to use the delimiter character in category units, another kind of legacy argument tries to avoid the use of common characters. For example, slash advocates note that it's difficult to include an Internet hostname in a dot-delimited category; dot advocates argue that slashes are actually more common in names (X/Open, PL/SQL, and of course filenames and such) than dots are.
Dot critics also make the point that the directed category structure includes a domain, which should be treated as a single "word". Using dot as a subcategory delimiter might confuse people into thinking that a subscription to "@gale" would receive messages sent to "@gale.org/stuff", which (architecturally) it can never do.
Slash is universally considered "heavier" and more visible than dot. Proponents of the slash consider this a feature, since it makes the delimiter visible, and note that its asymmetric orientation conveys a sense of hierarchy or at least directionality. Proponents of the dot argue that slash is too heavy, and that it outweighs the colon used to separate categories. The dot, they argue, is clearly subordinate to the colon.
Some have proposed other delimiters, mostly argued on a typographical basis. Spaces could be used, either for intracategory delimiters or intercategory separators. If intercategory separators are replaced with spaces or commas, the colon could be repurposed for intracategory use. Most other suggestions for delimiter characters are made in jest.