What are Catalogs?

Catalogs contain:

Mappings of human-readable names to WareIDs.
Mappings of either the human-readable names or the WareIDs to Warehouse Addresses — URLs of places on the network that should be able to provide the content matching those WareIDs.
Metadata attached to the human-readable names — open ended, general purpose.
1. May content simple informational content, like an “author” name.
2. May contain semi-structured metadata that helps other tooling produce “automatic updates”.
3. May contain special metadata that references rebuild instructions (e.g. the Warpforge Plot document which should completely reproduce things) — this is called a “replay”.
  - (Fun fact: Because Plot documents also include the catalog reference names for all the Plot’s inputs… once can then look up those catalog entries, and find their replay plots, and... so on: this is recursively walkable, and can produce total explanations for where some any data or applications came from!)

Catalogs have a schema (like all parts of Warpforge APIs!), but also come with a standard definition for how to project them onto a filesystem, because this is often operationally useful. See Catalogs on the Filesystem for more details on the filesystem projection part; keep reading below for the schema and other general info about Catalogs.

Schema

(Note: the actual load-bearing schema is in https://github.com/warptools/warpforge/blob/master/wfapi/wfapi.ipldsch and may have drifted somewhat from what we have in Notion. Conceptually, they haven’t diverged.)

## Catalog is a large data structure that maps human readable names to WareIDs
## (as well as a variety of metadata).
##
## The catalog tree itself uses a large sharded merkle-tree structure,
## which allows scalable, paralellizable verification,
## as well as efficient verification of sub-trees.
##
## Within that large tree structure, a simple structure of
## "moduleName:releaseName:itemName" can be seen as pathing over the catalog,
## and is sufficient to lookup a Ware ID.
##
## Catalogs are defined foremost by this schema,
## and their IPLD hashed document structure.
## However, they also have a standardized projection to a filesystem,
## which is often operationally handy.
type Catalog {ModuleName:CatalogModule}
  representation advanced ProllyTree

# Remember, `ModuleName` has already been declared earlier.
# It's just a string -- something that looks vaguely like a URL.

type ReleaseName string # with some limits: roughly [a-zA-Z0-9-], just to keep simple.
type ItemLabel string   # with some limits: roughly [a-zA-Z0-9-], just to keep simple.
  # ^ This is the the same range as OutputName, but we define different types for them.

## CatalogRef is a tuple that allows lookup of a WareID in a Catalog.
##
## A typical value might look something like "foobar.org/frob:v1.2.3:linux-amd64-zapp".
## CatalogRef values are often seen in serialized documents with a "catalog:" prefix,
## in the same way that WareIDs are often seen with a "ware:" prefix;
## they're usually used with a wrapper type with that prefix for clarity purposes.
type CatalogRef struct {
	moduleName ModuleName
	releaseName ReleaseName
	itemLabel ItemLabel
} representation stringjoin {
	join ":"
}

## CatalogModuleCapsule is a small wrapper type used for versioning
## the CatalogModule type when serialized.
type CatalogModuleCapsule union {
	| CatalogModule "catalogmodule.v1"
} representation keyed

## CatalogModule is the first level of data in a Catalog,
## after having navigated the large sharded module name map.
##
## A CatalogModule contains a map of named releases,
## and also some free-form metadata.
##
## When projected into a filesystem: this data appears in the `{moduleName}/_module.json` file.
##
## Note that the releases map is order-preserving.
## Newer content is generally placed at the "top" of the map.
## Some tooling may choose to rely on this property.
##
## Note that the values of the releases map are a CID --
## these link to another, separate document.
## This is so subtrees can be trimmed down and only partially transmitted,
## and to ensure the serial size of CatalogModule itself does not grow
## unduly quickly as releases are appended to the structure.
type CatalogModule struct {
	name ModuleName
	releases {ReleaseName:&CatalogRelease}
	metadata {String:String}
}

## CatalogRelease is part of a Catalog's tree structure, 
## and is one of the discreet documents within a Catalog.
## The CIDs in the releases map in the CatalogModule type point to this.
##
## The releaseName value here reiterates the same name
## indicated by the CatalogModule pointing at this.
##
## The "items" map associates item labels
## (which are freetext, but by convention contain phrases like
## "linux-amd64-zapp" or "darwin-amd64-static" or "src", etc)
## with WareIDs (which are usually IDs of filesystem snapshots).
##
## Freeform metadata may appear in each release document.
## Some of this metadata is "well-known".
## For example, the key "replay", if present, may be expected
## to contain a CID to a document that's a Warpforge plot that can reproduce
## the contents of this release.
##
## Note the a lack of "capsule" type for versioning this structure;
## this because we assume that if this part of the protocol evolves,
## then it will do so in tandem with the CatalogModule type,
## and therefore CatalogModuleCapsule provides enough versioning hints for this area too.
##
## When projected into a filesystem: this data appears in the `{moduleName}/_releases/{releaseName}.json` files.
type CatalogRelease struct {
	releaseName ReleaseName
	items {ItemLabel:WareID}
	metadata {String:String}
}

“Metadata”

What's in the metadata maps?

It's a mixed bag — it's open ended, and meant for extensions. The metadata maps are meant to freely contain data that we didn’t anticipate nor standardize when developing Warpforge.

Mostly we expect metadata to be fairly advisory and fairly freetext (e.g. "author").

Sometimes metadata can be used to contain a load-bearing extension.

Metadata can sometimes be a CID that points to another document! (And we’ll use this ourselves, for some features that Warpforge does understand well.)

When we do have well-known extensions, they generally appear just by conventions of well-known names of keys for the metadata map. See the “extensions” section coming up next.

Extensions

We don’t have a ton of well-know extensions at the moment, but here’s at least one (and a pretty important one — it’s how we store the data for rebuilding and for the “recursive explain” feature!) which shows how it can look: