All posts
thought leadership

Discovery Is the Migration

The industry treats discovery as Phase 1. That is the mistake that kills 70% of modernization programs. Discovery is not the prelude to the work. It is the work.

Ohad KotlerJune 12, 202611 min

The industry has a standard story about how modernization works. There is a Phase 1, called discovery: you analyze the legacy system, you produce documentation, you map dependencies. Then there is a Phase 2, called transition: you actually do the work, in chunks or all at once depending on your appetite. Then there is a Phase 3, called cutover: the new system goes live, and the old one is retired.

This story is wrong. The 70% of large transformation programs that fail to meet their objectives fail because of it.

Here is the correction:

Discovery is not a phase. Discovery is the migration.

Everything else - the conversion, the testing, the cutover - is downstream of one question: do you understand the system completely enough to change it safely? If the answer is yes, the rest is mechanical. If the answer is no, no conversion tool, no agentic playbook, no Big-4 methodology can rescue you. You are not running a migration. You are running an expensive guess.

This post is about why discovery has been miscategorized, what happens when teams treat it as a phase that ends, and what the alternative looks like in practice.

The phase-1 framing is what causes the wall

Every team we talk to who has been inside a stalled migration describes the same sequence.

Phase 1 - Easy wins. The first 30 to 40 percent of programs convert cleanly. They are stateless, well-bounded utilities. The project reports green across the board. The team announces a velocity. Stakeholders relax.

Phase 2 - The middle. Programs with moderate complexity start revealing integration dependencies. A converted Java class calls a method that expects a COMMAREA layout from a program that hasn't been converted yet. The team builds adapters. The adapter count grows. Test failures multiply. The discovery doc from Phase 1 is now nine months old and no longer matches what the team is actually finding in the code.

Phase 3 - The wall. The remaining 20 to 30 percent of programs are the load-bearing walls. They participate in multiple business processes, embed decades of regulatory rules in nested IF statements, and depend on call chains that span six programs and a JCL step nobody on the current team has read. Converting them line-by-line produces syntactically valid output that is semantically wrong, because the conversion tool didn't know that the program's behavior depends on which batch step called it.

This pattern is structural, not incidental. It is what happens when an organization commits to the body of the migration before it knows whether the discovery is complete. And because most teams never confront the limits of their discovery until they are deep inside the wall, the wall is where the budget dies - not where it was forecast to peak.

A senior engineer at a Tier-1 European bank described it from inside the experience. Two years on a payments-modernization program, interviewing the front-end team, then the backend, then the processing team. "And then I come and analyze, okay, which parts of this flow do we want to replace. And then the hope is that I captured all the requirements." Two months before going live, the wall: "We realize that, okay, there is another call to another system that the payment does way before it comes to us. And no one mentioned it before and it was not documented anywhere. But now we notice that because testing is failing."

This is not a story about insufficient interviews. It is a story about treating discovery as a deliverable rather than as the work itself.

What "discovery is the migration" actually means

When we say discovery is the migration, we mean four things - each of them an inversion of standard practice.

1. The migration's primary output is understanding, not converted code

A successful migration produces a system the bank understands at least as well as the one it replaced. Not a system that runs in a different language. A system the team can change with confidence, reason about, audit, and operate without depending on the tribal knowledge of the engineers who happened to be in the room when the cutover happened.

If the team comes out of the migration with a Java codebase nobody understands instead of a COBOL codebase nobody understands, the migration has not succeeded. It has changed the failure mode without changing the underlying problem.

This is why the right primary metric for a migration is not "lines of code converted." It is "percentage of business processes for which the team can answer, with verifiable evidence, what the system does and what will break if we change it."

2. Discovery runs throughout, not at the start

A piece of discovery that was true on the day it was produced - and is stale six months later - is a liability, not an asset. It looks like understanding while it has stopped being understanding. Teams that operate from stale discovery documents make decisions on the assumption that the artifact is still authoritative, and the assumption is wrong.

The right alternative is continuous: the model of the system updates as the system changes. Every conversion, every refactor, every patch - re-ingested. The blueprint stays aligned with the code. The team makes its next decision against a current model, not a snapshot.

A Tweezr product lead in a customer demo described this as "a live monitor over the migration process. You can constantly see that things are working - rather than find out, or validate that it works, at the end." That distinction - between continuous validation and end-state verification - is the difference between catching drift while it is still cheap to fix and discovering it on the day before go-live.

3. The right chunk size is daily, not quarterly

Risk doesn't compound at a uniform rate. It compounds with chunk size. Big-bang migrations fail because risks compound within a single, monolithic scope: by the time anyone discovers a load-bearing dependency, the team is already committed to the chunk that depends on it. Traditional phased migration fails for the same reason at a slower cadence - risks still compound, just over weeks instead of months.

The correct chunking is daily, or as close to it as the system permits. Each chunk small enough to test, observe, and learn from before the next one. The right cadence is set by the question: how small can I make this chunk and still have it carry meaningful migration value? The answer is almost always smaller than the team is currently assuming.

,[object Object], In big-bang migrations, errors compound - every undiscovered dependency raises the cost of the next one. In daily-chunk migrations, ,[object Object], compounds - every small mistake reduces the probability of the same class of mistake in the next chunk. The last piece is easy because the team has done it a hundred times. The first piece is hard because the team has done it once.

This is why "phased migration," as it is usually practiced - three to five large chunks - is still much closer to big-bang than to daily-small. Reducing six-month chunks to three-month chunks is not the change that matters. The change that matters is going from quarterly to daily.

4. The 70% failure rate is a discovery-skip rate, not a complexity rate

The standard explanation for the modernization failure rate - "these systems are extraordinarily complex" - is true but unhelpful. The complexity of legacy banking systems hasn't changed in twenty years. The failure rate has been stable for twenty years. What hasn't changed in twenty years is the industry's commitment to treating discovery as a Phase 1 that ends.

Banks that fail at migration are not failing at execution. They are failing at the prerequisite. They are starting Phase 2 before Phase 1 has done its job - and Phase 1 cannot do its job if it ends. The 70% failure rate is the cost of that miscategorization, paid out across the industry every year, dressed up as "this system was unusually complex" or "we didn't anticipate the integration challenges."

The systems are not unusually complex. The integration challenges are not unanticipatable. The discovery was incomplete on the day Phase 2 started, and nobody updated it.

Why this is hard to adopt - and the partnership tension it surfaces

The "discovery is the migration" reframe sits awkwardly inside the dominant industry vocabulary. Three forces push back on it.

The first is procurement. Banks buy migrations as fixed-scope, fixed-budget programs. Fixed-scope requires a defined Phase 1 deliverable. The procurement contract incentivizes treating discovery as a one-time output that gets signed off and locked. Adopting "discovery is the migration" means restructuring the procurement model - which is not something the engineering team can do alone.

The second is the consulting model. Big-4 modernization practices are built around long, expensive Phase 1 engagements. The economics depend on a discovery phase that ends - and bills. A reframe in which discovery never ends, runs continuously, and is operated by the bank's own team is a direct threat to the billable hours that fund those practices. Expect the framing to be resisted by the people who have the most credibility to recommend modernization approaches.

The third is the partnership tension with phased-migration advocates. Even the right voices in the modernization conversation - including partners we deeply respect - still publish "phased migration" as the preferred methodology, where phased typically means three to five large chunks. We agree directionally. We disagree on the granularity. A head of products at a strategic partner platform told us frankly: "We can do big-bang, but we definitely advise clients not to do it. Most clients are still choosing to do big-bang, which I don't fundamentally understand. But they are, and we don't stop them." Same instinct, same conclusion that the industry default is wrong. The remaining question is how small the chunks need to be - and our answer is much smaller than the industry assumes.

This is a constructive disagreement, not a competitive one. The advocacy for phased over big-bang is the right direction. The work that remains is convincing the industry that phased, done at the granularity the industry currently practices, is still much closer to big-bang than to the daily-small cadence that actually works.

What the alternative looks like in operation

A migration built on "discovery is the migration" doesn't look exotic in practice. It is recognizable engineering, just sequenced and instrumented differently.

A complete, structural model of the legacy system is built first - automated, deterministic, exhaustive. Not a sample. Not a summary. The full graph of programs, calls, data flows, business rules, process flows, and architectural views. Every claim back-anchored to the lines of code it came from. This is the artifact every later step references.

Against that model, the team identifies the smallest meaningful unit of migration - usually a business process or sub-process - and migrates it. Daily. With verification at every step against the system model.

As each chunk lands, the model re-ingests and updates. The team's next chunk is planned against the current state, not a six-month-old snapshot. Drift between the model and the system is detected as it happens, not at the end. The "wall" never appears - because the team never lets the gap between the model and the system grow large enough to create one.

When the migration is complete, the bank has two artifacts: the new system, and a living model of how it works. The model is the answer to "what happens if we change this?" - and it is the bank's, not the consultancy's.

,[object Object], Not "what's your conversion technology?" Not "what's your industry experience?" The question is: ,[object Object], If the answer is "you'll need to redo it" - or "we hand it over and your team maintains it" - the program is not going to survive contact with the live system. The model has to be a live artifact the bank owns, not a static deliverable the consultancy archives.

The framing matters because it changes what banks buy

If discovery is a phase, the right thing to buy is a Big-4 discovery engagement followed by a conversion vendor. The two purchases are separable. The output of the first becomes the input of the second.

If discovery is the migration, those are not separable purchases. The team that builds the model has to be the team that operates against the model - because the model is not a deliverable, it is a continuously running artifact, and the people who own it have to be the people doing the work.

The implication for the buyer is concrete. A migration program built on the wrong framing buys two things that don't fit together. A migration program built on the right framing buys one capability - a deterministic, continuously updated model of the system - and applies it to whichever migration path makes sense for that bank: evolve, replace, or partial. The choice between paths is downstream of the model, not upstream.

This is why the most consequential decision in any migration is the one that gets made before the migration is named: do we treat understanding as a phase, or as the work?

The 70% failure rate has answered that question on the industry's behalf. The teams that survive their migrations are the ones who answer it again, more carefully, for themselves.


Frequently Asked Questions

Doesn't every migration vendor say discovery matters?

Yes - and most of them mean Phase 1 discovery. The reframe in this post is not "discovery is important." It is "discovery never ends." The test is whether the discovery artifact is alive at the end of the migration or archived. If it is archived, the framing is still Phase 1, however much rhetoric is attached to it.

How does this work with conversion tools like AWS Transform or IBM watsonx Code Assistant for Z?

Conversion tools are downstream of the model. Once you have a living, complete, deterministic model of the legacy system, conversion tools are useful for what they are good at: syntactic translation of the chunks for which the semantic understanding is now solid. The mistake is to use them on chunks where the model hasn't been built yet. Conversion-first vendors solve the cheap part of the problem and leave the expensive part - comprehension - to the customer.

Is daily-chunk migration realistic for a Tier-1 bank?

Yes, but it requires the model to be live. You cannot make daily decisions against a six-month-old discovery document. You can make daily decisions against a continuously updated system model. The infrastructure that makes daily-small possible is exactly the infrastructure that makes "discovery is the migration" possible. They are the same investment.

What about regulated environments where every change requires sign-off?

The regulator's underlying ask is "produce evidence that you understand what the change does and what could break." A continuously updated model with full provenance - every claim back-anchored to source - is the most defensible regulatory artifact a bank can produce. The compliance posture is strengthened by this approach, not weakened. The regulator's nightmare is the bank that converted 10,000 programs and can't tell them what any one of them does.


Tweezr is built around the principle that discovery is the migration. The platform produces a complete, continuously updated, deterministic model of the legacy system - and that model is the spine of every migration decision that follows. If your modernization program treats discovery as Phase 1, see how a model-first migration is structured or book a conversation.

Related Posts