fix(providers): inject shim version at build time

PR #338 on GitHub branch remove-hardcoded-provider-versionmain @surajssd 25 files, +367 / -54 1 commit OPEN

Executive Summary

Each of the five provider shims (dynamo, kaito, kuberay, llmd, vllm) hard-coded its reported version as a Go const in config.go — a string like "dynamo-provider:v0.2.0" written verbatim into InferenceProviderConfig.status.version (surfaced by kubectl, the Web UI, and the Headlamp plugin). Because the const was never bumped at release time, the reported version lied.

This PR replaces that const with a build-time injection: a package-level var shimVersion = "dev" is overwritten via -ldflags "-X <module>.shimVersion=<tag>", and ProviderVersion is derived from it. The release workflows pass SHIM_VERSION=${{ inputs.version }} — the same value that tags the image — so status.version equals the image tag by construction. Non-release builds report a git stamp (dev-<sha>, or -dirty). The injection target is resolved with go list -m so the -X path can never drift from a hand-typed module string, and Dockerfiles grow a test -n "${SHIM_VERSION}" guard so a bare docker build fails loud instead of silently shipping :dev.

What a reviewer should scrutinize most: the correctness of the -X injection mechanism — specifically that the linker target is the unexported, constant-string shimVersion and never the composite ProviderVersion (a non-constant initializer that -X silently no-ops). The CI read-back step in test.yml is the guard that proves the wiring holds; verify it actually exercises both the default and the override path for all five providers.

Change at a glance

SurfaceFilesWhat changed
config.go5 Drop ProviderVersion const; add var shimVersion = "dev" + derived var ProviderVersion.
config_test.go5 Exact-literal version assertion → strings.HasPrefix shape check (llmd/vllm gain a new TestProviderConstants).
Makefile5 Resolve MODULE via go list -m; git-stamp SHIM_VERSION default; pass --build-arg.
Dockerfile5 Add ARG SHIM_VERSION (no default) + test -n guard; resolve MODULE in build RUN.
release workflows5 Pass SHIM_VERSION=${{ inputs.version }}; new release-vllm-provider.yml.
test.yml1 CI step asserts git-stamp default + explicit override both land in the binary.

Glossary & Concept Primer

Vocabulary an unfamiliar reviewer needs before the rest of the dashboard makes sense.

provider shim
A small standalone controller under providers/<name>/ that renders a ModelDeployment into an upstream provider's resource (KAITO, Dynamo, KubeRay, llm-d, vLLM). Each ships as its own container image.
InferenceProviderConfig.status.version
A status field on the cluster-scoped CRD that reports which version of a provider shim is running. Read by kubectl, the Web UI, and the Headlamp plugin. The bug: this value was a hard-coded const that never tracked the actual release.
ProviderVersion
The Go symbol whose value populates status.version. Was a const (e.g. "kaito-provider:v0.1.0"); now a var derived as ProviderConfigName + "-provider:" + shimVersion.
shimVersion
New unexported package var, initialized to "dev". The actual -ldflags -X injection target. Unexported visibility doesn't matter to the linker — -X resolves the symbol regardless.
-ldflags "-X importpath.name=value"
Go linker flag that overwrites a string variable's value at link time. Critical constraint: it silently no-ops on a non-constant initializer (which is why shimVersion, not the composite ProviderVersion, is the target).
SHIM_VERSION
The build-arg / Make variable carrying the version into the build. Release workflows set it to ${{ inputs.version }}; local/CI builds default it to a git stamp.
go list -m
Prints the current module path from go.mod. Used so the -X target ($(MODULE).shimVersion) is computed, never hand-typed — eliminating path-drift bugs.
git stamp / dev-<sha>
The default SHIM_VERSION for non-release builds: dev-$(git rev-parse --short HEAD), with -dirty appended when the working tree has uncommitted/untracked content.
versions.env
Repo-root file holding pinned upstream runtime versions (e.g. DYNAMO_VERSION, VLLM_VERSION). Distinct from the shim version — the PR deliberately does not add the shim version here (the release tag is its source of truth).

Where the version comes from: before vs. after

The same value — what status.version reports — but sourced differently. Toggle to compare.

release workflow
inputs.version
--build-arg
SHIM_VERSION
-ldflags -X
shimVersion var
derives
ProviderVersion
status
status.version

Same value that tags the image → version equals tag by construction.

hand-typed const
"x-provider:v0.2.0"
is
ProviderVersion
status
status.version

Const never bumped at release → reported version lied.

new in this PR removed pre-existing

Injection chain: which build path sets the version

Three entry points feed the same -X target. The Go literal is the last-resort fallback only.

Release (CI)
inputs.version
Local / CI build
dev-<sha> git stamp
go run / go test
"dev" literal
-ldflags "-X $(MODULE).shimVersion=…"
(MODULE = go list -m)
binary reports <name>-provider:<tag>

Only the first two paths set a real tag. The third (go run/go test) intentionally leaves "dev", which is why the tests use a prefix check, not an equality check.

Dockerfile guard: fail loud, never ship :dev

What happens to a docker build depending on whether SHIM_VERSION was passed.

docker build
ARG SHIM_VERSION
test -n "${SHIM_VERSION}"
passed → build proceeds,
injects real tag
empty → exit 1
build fails loud

The Makefile and release workflow always pass it; the guard only trips on a hand-run bare docker build.

config.go schema diff (representative: dynamo)

The same shape applies to all five providers. Old on the left, new on the right.

Before

const (
  ProviderConfigName = "dynamo"
  // ProviderVersion is the version…
  ProviderVersion = "dynamo-provider:v0.2.0"
  ProviderDocumentation = "…"
)

After

const (
  ProviderConfigName = "dynamo"
  ProviderDocumentation = "…"
)
// injected via -ldflags -X …shimVersion
var shimVersion = "dev"
var ProviderVersion =
  ProviderConfigName + "-provider:" + shimVersion

ProviderConfigName stays a const; only the version moves to a var so the linker can rewrite it.

Before / After capabilities

The dimensions a reviewer cares about, side by side.

DimensionBeforeAfter
Version source Hand-typed Go const Build-time -ldflags -X from release tag
Accuracy of status.version Stale / wrong (never bumped) Equals image tag by construction
Non-release build reports The stale literal dev-<sha> (-dirty if applicable)
-X target path Hand-typed module string (dynamo/vllm) Computed via go list -m
Bare docker build w/o arg Ships the stale literal silently Fails loud (test -n guard)
Version-assert test Exact literal equality Prefix shape check (no lockstep edits)
Regression guard None CI read-back of binary in test.yml

Annotated Diff

5 of the most consequential hunks shown. The remaining ~20 hunks are the same four-part pattern (config.go / Makefile / Dockerfile / config_test.go) repeated across the other four providers, plus the four one-line release-workflow build-arg additions. Omitted hunks listed at the end.

providers/dynamo/config.go lines 38–70
const (
  ProviderConfigName = "dynamo"
- // ProviderVersion is the version of the AIRunway Dynamo provider controller.
- ProviderVersion = "dynamo-provider:v0.2.0"
  ProviderDocumentation = "https://…/dynamo.md"
)
+ // shimVersion is injected at build time via -ldflags -X …shimVersion
+ var shimVersion = "dev"
+ var ProviderVersion = ProviderConfigName + "-provider:" + shimVersion
Why: The heart of the change. ProviderVersion moves from const to a var derived from the injectable shimVersion. Injecting shimVersion (a constant-string initializer) rather than the composite ProviderVersion is deliberate — -X silently no-ops on a non-constant initializer.
providers/dynamo/Makefile lines 9–34
+ MODULE := $(shell go list -m)
+ GIT_SHA      := $(shell git rev-parse --short HEAD 2>/dev/null || echo unknown)
+ GIT_DIRTY    := $(shell test -n "$$(git status --porcelain 2>/dev/null)" && echo '-dirty')
+ SHIM_VERSION ?= dev-$(GIT_SHA)$(GIT_DIRTY)
- LDFLAGS := -X github.com/kaito-project/airunway/providers/dynamo.DynamoVersion=$(DYNAMO_VERSION)
+ LDFLAGS := -X $(MODULE).DynamoVersion=$(DYNAMO_VERSION)
+ LDFLAGS += -X $(MODULE).shimVersion=$(SHIM_VERSION)
Why: Resolves the module path with go list -m so the existing DynamoVersion flag and the new shimVersion flag both target a computed path, not a hand-typed one. SHIM_VERSION defaults to a git stamp; the release workflow overrides it.
providers/kaito/Dockerfile lines 2–33
+ ARG SHIM_VERSION

- RUN cd providers/kaito && CGO_ENABLED=0 … go build -a -o provider cmd/main.go
+ RUN test -n "${SHIM_VERSION}" || (echo "ERROR: SHIM_VERSION build arg is required…" >&2; exit 1)
+ RUN cd providers/kaito && MODULE=$(go list -m) && \
+   CGO_ENABLED=0 GOOS=${TARGETOS:-linux} GOARCH=${TARGETARCH} \
+   go build -a -ldflags="-X ${MODULE}.shimVersion=${SHIM_VERSION}" -o provider cmd/main.go
Why: kaito/kuberay had no prior -ldflags at all, so this adds the whole injection. The test -n guard makes a hand-run docker build without the arg fail loudly rather than ship status.version=kaito-provider:dev under a real tag.
providers/dynamo/config_test.go lines 140–148
- if ProviderVersion != "dynamo-provider:v0.2.0" {
-   t.Errorf("expected provider version 'dynamo-provider:v0.2.0', got %s", ProviderVersion)
+ if !strings.HasPrefix(ProviderVersion, "dynamo-provider:") {
+   t.Errorf("expected provider version to start with 'dynamo-provider:', got %s", ProviderVersion)
Why: Under go test the binary isn't linked with -X, so ProviderVersion is "dynamo-provider:dev". An exact-literal assertion would break; the prefix check validates shape without requiring lockstep edits on every release.
.github/workflows/test.yml lines 146–164 (new CI step)
+ - name: Assert shimVersion injection + git-stamp default
+   run: |
+     mod=$(cd "providers/$p" && go list -m)
+     go version -m "providers/$p/bin/provider" \
+       | grep -Eq -- "-X ${mod}.shimVersion=dev-[0-9a-f]+(-dirty)?" || exit 1
+     make -C "providers/$p" build SHIM_VERSION=v9.9.9-citest
+     go version -m "providers/$p/bin/provider" \
+       | grep -F -- "-X ${mod}.shimVersion=v9.9.9-citest" || exit 1
Why: The anti-regression guard. It reads the linker flags back out of the built binary with go version -m to prove (a) the un-overridden build used the git-stamp default and (b) an explicit override actually injects. This is what catches future Makefile/symbol drift.

Omitted hunks (same patterns)

  • providers/{kaito,kuberay,llmd,vllm}/config.go — same const→var conversion as dynamo.
  • providers/{kuberay,llmd,vllm}/Makefile — same MODULE + git-stamp + LDFLAGS pattern (llmd/vllm retrofit existing LLMDSchedulerImage/VLLMVersion flags onto $(MODULE)).
  • providers/{dynamo,kuberay,llmd,vllm}/Dockerfile — same ARG + test -n guard + computed MODULE.
  • providers/{kaito,kuberay,llmd,vllm}/config_test.go — prefix assertion (llmd/vllm add a new TestProviderConstants).
  • .github/workflows/release-{dynamo,kaito,kuberay,llmd}-provider.yml — one-line SHIM_VERSION=${{ inputs.version }} build-arg.
  • .github/workflows/release-vllm-provider.yml — new file, a full workflow mirroring the dynamo one.

Risk Assessment

Severity: CRIT WARN INFO OK

AxisSeverityFinding
Correctness (injection)OK Injecting the constant-string shimVersion rather than the composite ProviderVersion is correct — -X needs a constant initializer. Init order (shimVersion then ProviderVersion) resolves correctly since Go orders package var init by dependency.
Release wiring (vllm)WARN The hand-authored release-vllm-provider.yml is workflow_dispatch-only and never lints in PR CI. A missing build-arg would only be caught by the Dockerfile test -n guard at release time. Author flags an actionlint follow-up.
API / compatibilityINFO ProviderVersion changes from const to var. Any external code doing a compile-time const comparison would break, but it's an internal symbol; unlikely to matter. Worth a quick grep for external referencers.
Test coverageINFO Tests assert shape (prefix) not injected value — by necessity under go test. The actual injection is covered by the new CI read-back step in test.yml, which is the right place for it. Verify the matrix covers all 5 providers.
Build reproducibilityINFO Git-stamp default embeds the short SHA + dirty flag into the binary. Intentional and documented; means non-release binaries are not byte-reproducible across commits, which is expected for a dev stamp.
Docs driftINFO Author deliberately left illustrative v0.2.0 examples in docs/api.md / crd-reference.md (shape examples, untested). Reasonable, but a reader could mistake them for current.
ConcurrencyINFO Not applicable — package-init var assignment, no runtime mutation, no shared state.
Security / secretsINFO Not applicable — no secrets, auth, or input handling touched. Build args carry a version string only.
Failure modesOK The test -n Dockerfile guard converts a silent-wrong outcome (ship :dev) into a loud build failure — a net improvement in failure behavior.

Assumptions & Unknowns

Everything I'm guessing, couldn't verify, or would need to confirm with the author. All hedging lives here.

  • Go var-init ordering. I assume ProviderVersion = ProviderConfigName + "-provider:" + shimVersion resolves correctly because Go initializes package-level vars in dependency order regardless of source position. This is guaranteed by the spec — confidence high — but I did not run the binary to confirm the injected value reaches status.version end-to-end.
  • CI matrix coverage. The test.yml step uses ${{ matrix.provider }}; I assume the matrix enumerates all five providers. I did not read the full provider-go-checks matrix definition to confirm vllm is included.
  • External referencers of ProviderVersion. I did not grep the controller/backend for code that imports and compares ProviderVersion as a const. The const→var change is compile-compatible for normal use, but a const-only context (e.g. array size, switch case) would break.
  • The go list -m in Dockerfile RUN. I assume go list -m resolves correctly inside the build container's working directory for each provider. Plausible given the cd providers/<name>, but unverified against an actual image build.
  • vLLM workflow parity. I assume the new release-vllm-provider.yml faithfully mirrors the dynamo workflow's semantics. I read it in the diff and it looks complete, but it has never run in CI (workflow_dispatch only), so there's no execution evidence.
  • Did not verify the panel's findings here. This dashboard is my structural read of the diff. The five-model review panel ran independently; their findings are collated in FINAL-REPORT.md, not folded into this dashboard.