The Artifact Your Regulator Will Sign Off
Sign-off is not about the AI. It's about what you can put in front of the head of compliance - and what they can put in front of the regulator

A senior modernization executive at a Tier-1 European bank put the entire AI-on-legacy debate in one sentence:
"It gives you something, it looks great, it probably even works, but you don't really know why and how because it's a black box in there that becomes very difficult for me to sign off."
He wasn't worried about whether the AI was capable. He'd already seen capable. He was worried about what he could put in front of his head of compliance the morning a change had to be approved - and what his head of compliance could put in front of a regulator if the change later failed. The AI's quality wasn't the gating question. The artifact was.
This is the part of regulated AI that vendor pitches keep skipping. Sign-off in a Tier-1 bank is not a one-person decision; it is a chain. The executive signing off has personal fiduciary accountability. The head of compliance counter-signs. The audit function reviews after the fact. And the regulator - ECB, the local supervisor, DORA-aware authorities - can ask for the file at any point. Every link in that chain wants the same thing: an artifact that proves the change was understood, the risk was bounded, and the decision was defensible.
If you cannot produce that artifact, the AI doesn't matter. This post is about what the artifact actually contains, where each field comes from, and why the only credible AI-assisted change tooling in regulated environments is the kind that produces the artifact as a first-class output - not as a screenshot you assemble afterwards.
What the sign-off chain actually needs
Strip away the vendor language and ask the question from the executive's seat. To sign a production change to a critical banking system, what does the chain need to see?
Five things, in this order:
1. A clear statement of what is changing. Not "we modernized the loan processing module." A specific list of programs, fields, transactions, and data flows affected - with line-level references back to the source code.
2. A scoped impact analysis - the blast radius. Every downstream system, batch job, regulatory report, and customer-facing surface that could be affected by the change. With the dependency provenance: how do we know it could be affected? Through what call chain, what data access, what message queue?
3. The business rules that touch the change, in human-readable form. Not a code diff. The conditional logic, expressed in plain language, with provenance back to the lines of code that enforce it. The compliance officer needs to know that the bank's premium savings rate logic, or its KYC threshold logic, or its anti-money-laundering check, is not being subtly altered.
4. Evidence that the change was investigated, not just generated. The artifact has to show that the people and systems involved actually looked at the system before changing it - not that an AI took a guess and the team hoped. This is where the regulator's audit pivots.
5. A documented human approval, with the role of the approver, the date, and the rollback plan if the change fails. Sign-off is not just one signature; it is the named chain of approvers, the time-ordered events, and the contingency.
Every one of these has to be produced from the change, for the change, traceable to the source. Screenshots won't survive an audit. PDFs assembled by hand won't survive an audit. The artifact has to be a first-class output of the change-design process itself.
Why most AI tooling fails this bar
The default AI-coding-assistant pattern is: prompt → diff → review → commit. The artifact at the end of that pipeline is the diff. Maybe a PR description. Maybe a test result. None of these survive a regulator's question.
The regulator doesn't ask "did the AI generate the right code?" That's the wrong question - it can't be answered structurally even if the answer is yes. The regulator asks: "How did you know the impact was bounded? Show me the dependency analysis. Show me the business rules touched. Show me who reviewed each one. Show me the lineage from the analysis to the decision."
If the answer to any of those is "the AI told us," the chain has a hole and the sign-off fails. If the answer is "we looked at it ourselves," the next question is: with what evidence, in what depth, across how much of the estate? On a 10-million-line legacy core, the "we looked at it ourselves" answer is performative - and senior compliance officers know it. We have heard this concern raised verbatim more than once:
"We sent some AI in there, they never came back."
The "never came back" failure mode wasn't that the AI produced bad answers. It was that the AI produced unanchored answers - sentences that looked plausible but couldn't be traced to the lines of code that justified them. Nobody could sign off on that, because nobody could defend it.
What the artifact actually looks like
A regulator-grade change packet for a Tier-1 banking system has, fundamentally, three layers. Each layer answers a specific class of question from the sign-off chain.
Layer one: the structural assertion
Every claim in the artifact - "this change affects these 14 programs, this 3-field copybook, these 2 batch jobs" - needs to be a structural claim, not a narrative one. Structural means: it sits on a queryable model of the system, and every assertion can be traced back to the lines of code that produced it.
The senior executive who prompted this post articulated the test the regulator effectively applies:
"You decompose the problem in order to reduce the scope of hallucination, i.e. make the work more deterministic. And then once you've decomposed it into better understood pieces, you can then run the models on top of those better understood pieces."
That's the right architecture for the regulator-facing artifact. The structural claims come from a deterministic layer that can be independently verified. The narrative parts - explanations, rationales, summaries - sit on top of the structural claims and inherit their provenance. If the regulator asks "show me the call chain that justifies including program X in the impact set," the artifact can produce it. Not because someone took a screenshot - because the structural layer can answer the question by traversal.
Layer two: the spot-check governance surface
A Tier-1 bank cannot manually review every assertion in a change packet - there are thousands. What it can do is spot-check, on a risk-weighted basis. Pick 12 assertions across the affected modules. Verify each one independently against the code. If those 12 hold, the rest are statistically defensible.
For spot-checking to work as a governance mechanism, every assertion in the artifact has to be individually inspectable. Click a claim about a business rule - see the source line. Click a dependency edge - see the call site. Click a process flow - see the entry-point definition. The same senior executive landed exactly on this framing during the conversation:
"Spot checks become much more possible because you can take a number of those decomposed problems and risk-based-basis check 12 of them. All right. You haven't checked every single one of them, but looks pretty good kind of thing."
That's not a feature; it's the operating model the artifact has to support. If the artifact is a 200-page PDF without source provenance on each claim, spot-check governance doesn't work - every claim becomes an act of faith. If the artifact is a structured set of evidence-anchored assertions, spot-check governance scales.
Layer three: the chain of human approval
The third layer is non-technical and gets skipped in vendor decks. Every assertion the AI made, every analysis the platform produced, every change the engineer accepted - has to be tied to a named human who took responsibility for it, at a specific time, against a specific version of the system.
This is what the audit function and the regulator are actually looking for when they review post-incident. Not whether the AI was right. Whether the bank exercised the appropriate human review at the appropriate authority level. A platform that produces beautiful analysis but no audit trail of who reviewed what, when, on which version, leaves the bank exposed at exactly the moment it most needs the trail.
,[object Object], Sign-off is not about whether the AI was correct. It is about whether the bank can demonstrate, with traceable evidence, that the change was ,[object Object], before it was made. Three layers - structural provenance, spot-check governance, named human approval - make that demonstration possible. Without all three, you cannot defend the decision when asked.
What DORA actually demands (and why it matters here)
The Digital Operational Resilience Act and its European-supervisor implementations don't speak directly about AI-assisted change. They speak about ICT risk management, change controls, and the operational effectiveness of those controls. The distinction matters.
A bank can prove a control exists - there's a policy, a procedure, a workflow. That's the lower bar. DORA-aware regulators are increasingly asking the higher one: is the control effective? When you say "we perform impact analysis before production changes," can you prove the analysis covered everything material? When you say "we approve changes through a defined committee," can you produce the evidence the committee considered?
For AI-assisted change, those questions cut especially hard. If the AI generated the change, the controls have to demonstrate that the impact analysis was complete, the business rules touched were reviewed, the dependency graph was current, and the human approvers had defensible evidence in front of them. A platform that produces evidence as a first-class output makes those controls demonstrably effective. A platform that produces "AI-generated code with a confidence score" does not.
The same logic applies to local national banking regulator requirements (audit-log specifications, for example) and to ECB-supervised institutions. The format differs by jurisdiction; the underlying ask is the same - the bank has to be able to show the file.
,[object Object], A regulator can ask, post-incident, for the evidence package supporting any production change to a critical system. That package needs to include: (1) the impact analysis with source provenance, (2) the business rules touched with line-level references, (3) the human approver chain with timestamps and version references, (4) the rollback plan and its trigger conditions. If any of these is missing or inferred after the fact, the control is documented but not effective.
What changes when the artifact is a first-class output
When the artifact is built during the change-design process rather than assembled afterwards, three downstream effects compound.
Sign-off cycles compress. The executive who used to need a week to feel confident enough to approve a critical change can approve in days - because the evidence is in front of her, structured, with spot-check inspectability. The compliance officer countersigns from the same artifact.
Audit reviews become routine, not heroic. When internal audit asks for the change file, the file exists. It wasn't built when audit asked; it was built when the change was designed. Routine production becomes routine audit.
Regulator conversations stop being adversarial. When the regulator asks the inevitable "show me the file" question - and they will, especially under DORA scrutiny - the bank produces a file the regulator can actually read. Not a defense; an artifact.
These are not marginal improvements. They are the difference between AI being a productivity story for engineering and AI being something the COO can defend to the board.
What to ask your AI-on-legacy vendor before signature
If you are evaluating a platform that promises AI-assisted change in a regulated environment, the procurement-grade question is not "what does the AI do?" It is: show me the artifact the AI produces for a production change.
Specifically:
-
For any change proposed by the platform, can you produce an impact analysis with source provenance - line-level references back to the affected programs and fields? If the answer is "screenshots from the UI," it's not an artifact.
-
Can you produce the business rules touched by the change in human-readable form, with line-level provenance? If the rules are only visible inside the engineer's prompt, the compliance officer has nothing to review.
-
Can you produce a named approver chain - who reviewed which assertion, when, against which version? If the audit trail is a Git commit log, it's not sufficient.
-
Can you spot-check the artifact? Pick 12 random assertions. Can each one be independently traced back to its source? If not, the spot-check governance model doesn't work, and the artifact is performative.
If any of these answers is "we don't produce that today," the platform is not regulator-ready. It may be a fine productivity tool for unregulated code. It is not the platform that survives a Tier-1 bank's sign-off chain.
,[object Object], AI-on-legacy in a regulated bank stands or falls on whether the platform produces the artifact the sign-off chain needs. The AI can be brilliant; if it doesn't produce the evidence package, it doesn't ship. The platforms that win in regulated environments treat the artifact as the deliverable. The platforms that lose treat it as an afterthought.
Frequently Asked Questions
Isn't this just a documentation problem? Can't we generate the artifact after the fact?
No, for two reasons. First, regulator-grade evidence requires provenance - the artifact has to be traceable to the analysis that informed the change at the time of decision, not reconstructed from memory. Second, retrofitted documentation rarely survives spot-checking. When an auditor or regulator picks a random assertion and traces it back, after-the-fact documentation tends to have gaps the original analysis would not have had.
Does this only matter for changes that go to production?
In a Tier-1 bank, the change-control regime usually applies to changes promoted to the higher environments - UAT and above - because those environments inform production-readiness decisions. Pre-UAT changes have lighter controls. The artifact requirement scales with the environment: more scrutiny for production, less for sandbox. But the architectural requirement - that the platform can produce the artifact when needed - is invariant.
Who signs off on AI-assisted changes inside the bank?
The chain varies by institution and by change criticality. For changes to critical banking systems - payments, posting engines, regulatory reporting - the chain typically includes the application owner, the operational risk function, the head of compliance, and a senior operational executive with personal fiduciary accountability. The exact roles differ; the principle is constant: AI cannot sign for itself, and the humans in the chain need defensible evidence to sign.
Does DORA explicitly require this kind of artifact?
DORA does not specify the artifact's format. It requires that ICT risk management controls be in place and that their operational effectiveness be demonstrable. The artifact is the practical implementation of "operationally effective" for change controls on critical ICT systems. Banks that produce it pass the operational-effectiveness test; banks that have the policy but not the evidence don't.
How does this differ from a standard change-approval workflow?
Standard change-approval workflows in banks already exist - ServiceNow, internal ticketing systems, manual review committees. The artifact described here is what feeds that workflow when AI is involved in the change. Without it, the workflow approves changes based on engineer narrative and test results; with it, the workflow approves changes based on traceable evidence with source provenance. The workflow doesn't change. The quality of the inputs does.
Tweezr produces the change packet as a first-class output of every analysis - impact set with source provenance, business rules with line-level references, approver chain with timestamps and version pinning. If your bank is preparing for DORA scrutiny or running AI-assisted change at scale, see how the change packet works or book a session to walk through a real packet on real code.
Related Posts

Citations or It Didn't Happen
In a regulated bank, an AI answer without sources is wishful thinking. The single property that turns AI output from a chatbot guess into an auditable artifact is verifiable grounding - every claim, one click back to the line of code it came from.

Discovery Is the Migration
The industry treats discovery as Phase 1. That is the mistake that kills 70% of modernization programs. Discovery is not the prelude to the work. It is the work.

Two Different Sales: Why Mainframe and Temenos Modernization Need Different Stories
If you're selling 'AI for legacy modernization' the same way to a bank running a 30-year COBOL core and one running a packaged platform, you're losing both deals