Let's say I'm wanting to do (small scale) distro-like work:

How big is this?

This is a very big, very end-to-end story.

It talks about what we want from L3+ tools.

It implies a lot of ingester tooling (though we're trying to just make an appropriately sized box for that).

It will take us a while to get to a reasonably complete and ready story for this.

It's not super clear anyone has ever done this particularly well. Particularly within the constraints and goals we have for clear checkpoints for snapshotting and an explicit transition to where things are now reproducible.

Walking through it

Phase 1

  1. I have some sort of script that scrapes some internet for projects or repos or packages or whathaveyou.
  2. This emits a list of information about projects. Probably mostly git repo URLs? Maybe other basic info? Not much that's detailed though.
  3. checkpoint. This is either a ware, or more likely, we actually unroll it into files and put them in a VCS. (A human intervention and review cycle opportunity is probably desirable here.)

Phase 2

  1. We probably want to clone each of the slurped projects... Into a container. We've got all this good infra for this; let's use it. Stamping out the upstream repo into a formula input is actually legitimately the easiest way to procede!
  2. There's probably some relatively complex tools we also put into this container. For example: something that looks for a go.mod file (or better yet, a "lock" equivalent — but that's more detail than we need for this user story overall), and then deduces that into warpforge-style WareIDs... Or catalog names. Or something.