Prompts as a first-class dependency in the build graph

Skelf-Research · May 22, 2026 ·

architecturebuildprompt-ops

Every modern application has, somewhere in it, a list of things it depends on to behave correctly. The obvious entries are source libraries: openai, anthropic, flask, pydantic, whatever your stack uses. Slightly less obvious but treated the same way are runtime resources: environment variables, database schemas, configuration files, container images. Some teams go further and treat compiled assets, model weights, and even fine-tuning datasets as graph nodes.

If you accept that framing, an LLM prompt is the same kind of object. It is an artifact your program cannot run correctly without, it has a name, it has a content identity, and it has the property that swapping one in for another silently changes behaviour. The natural place to put it is the dependency graph.

This post is about what it looks like to actually do that — and what blogus is doing inside that picture.

A prompt has a dependency surface

Pick a single prompt: summarize. It is a template that interpolates one variable, text. It is targeted at gpt-4o at temperature: 0.3. It returns 2–3 sentences. The application code calls it via load_prompt("summarize", text=...).

That single prompt has four downstream dependencies that the build graph should care about:

The template body itself. Words and structure. Hashable.
The model identifier. gpt-4o is not gpt-4o-mini. Swapping changes behaviour.
The model parameters. Temperature, top_p, response_format. These are part of the contract.
The variable signature. Required vs default. A prompt that needs language but is called without it is a runtime error.

A .prompt file carries all four in one place. The lock file pins the content of the file by hash. Everything that downstream code depends on — the words, the model id, the temperature, the variables — is captured by that one hash. If any of them changes, the hash changes; if the hash changes, the lock is stale; if the lock is stale, the build fails.

This is the same shape as a Pipfile.lock or a package-lock.json. The dependency is named, its concrete content is resolved, the resolved hash is recorded, and verification is a fast non-network operation.

Where it slots into the pipeline

Most teams have three stages where this kind of object naturally lives:

Pre-commit. A pre-commit hook runs blogus verify. If a developer edits prompts/summarize.prompt but forgets to re-run blogus lock, the commit is blocked. The error message tells them exactly what to do: re-lock, then commit. This is the cheapest, fastest defence — it stops drift from ever entering the repo’s history.

CI. The CI build runs blogus verify as one of its first steps. If a PR opens with a .prompt change but no corresponding prompts.lock change (someone forgot the hook, or pushed from a clean clone), CI fails fast. The same step is harder to bypass than a local hook, so it is the real gate.

Release. Some teams want to know not just “the lock matches the prompts” but “this build’s prompts.lock is the same one that was reviewed and merged.” That is a one-line check at deploy time: the SHA of prompts.lock is captured in the release metadata and matched against the expected SHA. blogus is happy to live underneath that check; it just makes sure the lock is internally consistent.

Three layers, three different costs, three different blast radii. The pattern is identical to how teams treat source dependencies: lockfile in the repo, pre-commit hook that re-runs the resolver, CI that re-verifies, deploy step that records the resolved hash.

Hash collisions are not a problem here

A small but worth-saying note: the lockfile hashes are SHA256 over the template content. SHA256 collisions for adversarial inputs are an active research topic in cryptography but a non-issue for the use case here — there is no adversary trying to construct a malicious prompt that hashes the same as a benign one. The hash is not a security artifact. It is a content identity. The same role Cargo.toml’s lock hashes play: detect change, not detect malice.

What it looks like from the prompt-engineer’s seat

The temptation, when teams hear “prompts are dependencies,” is to assume that this means prompt engineers now need to learn build systems. They do not. The .prompt file is a YAML document with a template body; it can be edited in any editor. The workflow looks like:

Edit prompts/summarize.prompt.
Run blogus lock.
Commit both files together.

That is the same shape as editing pyproject.toml and running uv lock. A reviewer opens the PR and sees two changes side by side: the human-readable prompt edit and the machine-readable hash update. They review the prompt. The hash update is automatic and uncontroversial.

If the change is bigger — say, the prompt now needs a new variable — the diff shows the frontmatter change and the body change in one place. If the application code needs to start passing the new variable, that goes in the same PR; or, if the variable has a default, the call sites can adopt it gradually. Either way, the surface that needs review is in front of the reviewer.

What this does not replace

A prompt in the build graph is a hygiene mechanism, not a quality mechanism. It guarantees that the prompt that ran in the last eval is bit-for-bit the same as the prompt that will run on the next deploy. It does not say whether the prompt is good.

Eval suites still need to exist. A/B tests still need to exist. Tracing and observation tools — Langfuse, Helicone, the homegrown thing your platform team built — still have a role. blogus is not in competition with any of those. It is one layer down: it makes sure those higher-level systems are reasoning about a stable artifact, not a string that quietly changed between when it was evaluated and when it shipped.

In a stack diagram, blogus sits at the same layer as your dependency resolver. The eval suite sits at the same layer as your test runner. Observability tools sit at the same layer as your production monitoring. None of these replace each other; they each guarantee one different thing.

A note on languages

blogus’s scan covers Python and JavaScript LLM calls today — the two languages where the majority of LLM applications are being written. That is a deliberate scope choice, not a limit on the file format. A .prompt is just YAML; nothing in it is language-specific. Teams writing in Go or Rust can author and lock prompts with blogus and load them in whatever shape their runtime prefers. The scanner is the part that needs language awareness; the artifact is portable.

The smallest commit you can make

If you want to try this on a real codebase without committing to anything, the smallest action is two commands:

uvx blogus scan
uvx blogus check

The first lists every LLM call in the repo. The second flags the ones that are not yet versioned. Neither requires installing anything permanent; neither writes to your code. You will see, in under a minute, how many prompts in your codebase are currently floating outside of any kind of dependency tracking.

That number is almost always larger than the team expected. Putting the prompts into the build graph is the work of bringing that number down to zero — and then keeping it there.