If you want to skip all the ruminations and sketches, I'd say at the moment, I'm feeling "Sketch 3" is probably what's looking like the most fruitful direction at the moment.
But it's not fully settled.
There's been a tug-o-war between simplicity and utility going on here.
If inputs can be strings, it... seems like a noticeable simplification to the whole system.
(And if inputs can be just strings, it's also a significant readability increase.)
But some inputs get complex enough that going for a struct with map serialization seems desirable.
Maybe we should consider a moderate third option, which is a keyed union, string for most things (and a stringprefix union inside there), and a struct with map representation otherwise, which is just ComplexInput, and still contains most of it as a string, but then adds an options/detalis map.
Something worth keeping in mind with these is extensibility. Going super rich with unions (e.g. all the way down to TarInput vs GitInput, and a big stringprefix union for those) could be nice in one respect, but also doesn't seem to make extensiblity feel much better. (It's not worse, either. This is a vibes thing.)
Contributors to the simplicity direction in the tug-o-war:
One major contributors to the tug-o-war in the complex direction: