Whitepaper · June 2026
The Legacy
Discovery Playbook.
Mapping unknown unknowns across COBOL, RPG, AS/400, Natural, and Fujitsu estates in 48–72 hours — the SOTERIA methodology.
Every failed modernization began with incomplete discovery.
The single most reliable predictor of a failed legacy modernization program is the depth of the discovery phase. Programs that begin with a six-month consulting-led "current state assessment" produce inventories. Programs that begin with a scan that reveals dependencies nobody knew existed produce strategies. SOTERIA is engineered to produce the latter — in 48 to 72 hours, against a multi-million line estate, with zero source code leaving the customer's environment.
This whitepaper describes the methodology. We explain why traditional discovery approaches systematically underestimate risk, what the five dimensions of legacy risk look like when properly measured, and how SOTERIA condenses what historically required a quarter of consulting effort into a working week. The output is a single artifact — the SOTERIA Risk Score and accompanying dependency map — that has been used as the planning baseline for 50+ enterprise modernization programs.
"You cannot modernize what you cannot see. And in a 40-year-old estate, most of what matters is invisible — to humans, to documentation, and to every tool that came before SOTERIA."
The map is not the territory. And the territory is older than the map.
Legacy estates are not architectures. They are sedimentary deposits — forty years of patches, mergers, regulatory amendments, vendor migrations, and emergency fixes layered on top of one another. The original architecture diagrams, if they ever existed, describe a system that has not existed for two decades. The current architecture exists only in the runtime behavior of the code itself.
This creates a discovery paradox. The team that knows the system best — its engineers — has the least incentive to admit what they do not know. The team that admits ignorance most freely — new consultants — has the least context to discover anything. The result, in every program that begins with manual discovery, is the same: a polished current-state document that captures what the team agreed was important, and misses what the system actually does.
The cost of incomplete discovery
- Modernization programs underscoped at the start. Programs that look like 18 months at kickoff become 36 months because dependencies surface mid-program.
- Architectures that cannot fit the data. Target microservice boundaries chosen at design time turn out to violate transactional constraints discovered during implementation.
- Production incidents at cutover. Subsystems that were not in the inventory turn out to be calling the system that was just turned off.
- Vendor lock-in revealed too late. Embedded calls to retired SDKs, deprecated databases, and end-of-life hardware are discovered when the modernization team tries to replicate them.
What "discovery" needs to mean in 2026
A modern discovery process must do three things that traditional consulting-led discovery cannot:
- Read the code, not the documentation. The code is the only authoritative source.
- Quantify risk in dimensions that drive program decisions. Not just "complex" — but how complex, in what way, and for what reason.
- Run against the live estate without exfiltrating it. Source code must stay where it is. Discovery output is what leaves the perimeter.
SOTERIA is engineered to do all three.
"Complex" is not an actionable word.
The word "complex" is not actionable. SOTERIA decomposes complexity into five orthogonal risk dimensions, each measured against the actual source, each scored on a normalized 0–100 scale, and each tied to a distinct set of program decisions.
Dim 01
Composition
What languages, runtimes, and copybooks the estate is made of. The first map.
Dim 02
Complexity
How tangled the code is — cyclomatic, fan-in, fan-out, dead-code ratio, monolith index.
Dim 03
Dependency
What the system reaches into and what reaches into it. The actual blast radius of any change.
Dim 04
Vulnerability
What is exploitable today — CVEs in embedded libraries, weak crypto, unencrypted persistence.
Dim 05
Portability
How much the code assumes about its runtime — implicit hardware, OS, vendor SDK couplings.
Composition: the table of contents
The first dimension is inventory. SOTERIA produces a complete file-level catalog of the estate: every COBOL program, every JCL job, every copybook, every embedded SQL statement, every external call signature, every data file referenced. For polyglot estates — which most modern legacy environments are — the catalog crosses language boundaries: a COBOL batch process calling a Java web service writing to a DB2 z/OS table is one workflow, even though it spans three language ecosystems.
Complexity: the shape of the code
Complexity is measured in five sub-metrics: cyclomatic complexity per program, fan-in (how many callers), fan-out (how many callees), dead-code ratio (lines that no execution path reaches), and a derived "monolith index" that quantifies how decomposable the estate is. The scores tell a story: high cyclomatic complexity with high fan-in means a hot, tangled core that must be broken open carefully. Low complexity but high dead-code ratio means an estate where most of the code has been replaced silently and is no longer used.
Dependency: the blast radius
Dependency is the most undervalued risk dimension. SOTERIA traces every external call signature — JCL submissions, MQ topics, REST endpoints, embedded SQL — and builds a directed graph of the estate's actual integration points. The graph routinely surfaces dependencies the customer's team did not know existed: subsystems calling subsystems calling third-party SDKs no one remembers integrating.
Vulnerability: the security posture
Legacy systems are routinely behind on CVE remediation by years. SOTERIA scans embedded libraries against current CVE feeds, identifies cryptographic algorithms that are deprecated or broken (DES, MD5, SHA-1 in security contexts, weak TLS configurations), and flags persistence patterns that fail modern compliance (unencrypted credentials, plain-text PII in logs, undocumented privileged account paths).
Portability: the runtime tax
Portability quantifies how much the code assumes about its environment. Direct CICS calls, MVS-specific JCL constructs, Fujitsu-specific SDK invocations, hardware-specific timing assumptions — each is a portability friction point that must be addressed during transformation. The portability score is the leading indicator of how much retargeting work the transformation will require, independent of language conversion.
"A 73% Complexity score and a 41% Portability score tell a completely different story than the reverse. Same total. Different modernization strategy."
What happens between kickoff and delivery.
SOTERIA is engineered to be operationally trivial for the customer. The full delivery of a complete legacy risk assessment, across a multi-million-line estate, takes between 48 and 72 hours from access provisioning to report handoff. The internal workflow runs through six phases.
Access provisioning
The customer provisions a read-only environment containing the source estate. SOTERIA runs inside the customer's network perimeter; source never leaves. Provisioning is typically 2–4 hours of customer time.
Inventory ingestion
SOTERIA crawls the estate and produces a complete composition catalog. For a 5M-line estate, this is roughly 6 hours of compute. Output is the composition map.
Semantic parse
Every program is parsed by Ionate's proprietary legacy-language models. Cyclomatic complexity, fan-in/out, dead-code ratio, and the monolith index are computed program-by-program. This is the most compute-intensive phase.
Dependency tracing
External call signatures are reconciled into a directed graph. The graph identifies the cluster boundaries that any decomposition strategy must respect.
Risk overlay
Vulnerability and portability scores are computed and overlaid on the composition map. The result is a five-dimension risk surface across the estate.
Report synthesis
The SOTERIA Risk Score, the dependency graph, the modernization roadmap, and the executive summary are generated. Output is reviewed by an Ionate principal engineer before handoff.
Throughput characteristics
SOTERIA is built to scale linearly with estate size. The bottleneck is rarely raw compute — it is the customer's ability to provision a clean read-only environment. For mainframe estates, this typically involves spinning up a non-production LPAR or providing a snapshot of source libraries. For distributed estates, a tarball of source repositories is usually sufficient.
| Estate Size | Typical Duration | Compute Profile | Customer Effort |
|---|---|---|---|
| < 500K LOC | 24–36 hours | Single SOTERIA worker | 2–4 hours provisioning |
| 500K – 5M LOC | 48–60 hours | 2–4 parallel workers | 4–8 hours provisioning |
| 5M – 25M LOC | 60–72 hours | 4–8 parallel workers | 8–16 hours provisioning |
| > 25M LOC | 3–5 days | Sharded execution | Custom; planned in advance |
One artifact, four decision-grade outputs.
The output of a SOTERIA engagement is a single document — the SOTERIA Risk Report — structured around four sections, each designed to drive a specific category of program decision.
The SOTERIA Risk Score
A single normalized score (0–100) summarizing the overall modernization risk of the estate. The score is decomposed into the five dimensions, so a reader can immediately see whether the risk is concentrated in dependency complexity, in vulnerability density, in portability friction, or distributed evenly. The score is calibrated against Ionate's reference corpus of 50+ prior engagements, so a 67 in one engagement is comparable to a 67 in another.
The dependency map
A directed graph of every cross-boundary call in the estate, annotated with frequency, criticality, and the data classification of the flowing payload. The dependency map is the planning artifact for any decomposition strategy: it dictates which subsystems can be modernized independently and which must move together.
The modernization roadmap
A phased plan that maps the estate into modernization waves, ordered by risk and dependency. The first wave is always the lowest-risk, highest-leverage scope — designed to deliver visible value early and validate the approach. Subsequent waves are sized for sustained throughput.
The decision register
A list of specific, named decisions the customer's team must make to execute the roadmap: target architecture style, language choices, decomposition boundaries, data ownership, cutover sequencing. Each decision is presented with options, tradeoffs, and Ionate's recommendation.
If a discovery document does not change a single decision your steering committee was about to make, it is a polished artifact, not a discovery. The Risk Report's purpose is to change decisions. Each section maps to one.
The output is not the report. The output is the decisions it makes possible.
Discovery artifacts that do not drive decisions are organizational waste. SOTERIA is engineered to produce decision-grade outputs — each section of the Risk Report maps directly to a specific class of program decision a steering committee must make.
Decisions enabled by the Risk Score
- Go / no-go on a modernization program at the proposed scope.
- Phasing strategy: single wave, multi-wave, or strangler-pattern decomposition.
- Vendor selection: where SI partner capability gaps exist for the dimensions where risk is concentrated.
Decisions enabled by the dependency map
- Service decomposition boundaries — what becomes one microservice, what becomes several.
- Cutover sequence — which dependencies must be untangled first.
- Co-existence architecture — how the legacy and modernized estates communicate during the transition.
Decisions enabled by the modernization roadmap
- Budget and resource allocation by phase.
- Stakeholder communication and milestone planning.
- Risk-weighted SLA negotiation with internal sponsors.
Decisions enabled by the decision register
- Target architecture: event-driven vs. request-response, mesh vs. monorepo, CQRS vs. unified state.
- Language and framework standardization across the modernized estate.
- Data ownership and master-record sequencing for shared entities.
What SOTERIA found that no one knew.
Cases below are representative composites drawn from real engagements. Identifying details are withheld per engagement confidentiality; specific outcomes and numerics are accurate to the engagement profile.
1.2M COBOL · 47 years of accretion
SOTERIA identified 19 active dependencies on an in-house security library that had been deprecated in 2014 and replaced by a documented enterprise-wide replacement. The replacement, however, was never integrated into the payment estate. Every wire transfer flowing through the core continued to validate against the deprecated library, including its known weak RSA implementation.
Modernization scope adjusted to include a security-library migration as Wave 0, ahead of any structural transformation.
3.8M Natural · Adabas backend
The Risk Report's portability dimension flagged a 38% score driven almost entirely by Adabas-specific data type assumptions distributed across 240 programs. The customer's IT leadership had assumed Adabas decoupling would be a single phase. SOTERIA's analysis showed it threaded through every business line.
Program re-baselined: data-tier decoupling parallelized with application transformation rather than sequenced before it. Saved an estimated 11 months on the program timeline.
720K AS/400 RPG
Customer expected a high-risk estate. SOTERIA returned a moderate score (49) — but flagged a single dead-code ratio of 31%. Investigation revealed that approximately one-third of the codebase had been silently replaced over the previous decade and was no longer in any execution path.
Modernization scope reduced by ~30%, eliminating ~$2.1M in projected transformation effort. Live estate modernized in 8 months.
From kickoff to risk report in three days.
Day -1
Kickoff Call
- Scope and access alignment
- Environment provisioning plan
- Success criteria sign-off
Day 0–2
Scan Execution
- Air-gapped SOTERIA run
- Six-phase pipeline executes
- Engineer-supervised through delivery
Day 3
Report Walkthrough
- SOTERIA Risk Report delivered
- Live walkthrough with steering committee
- Decision register reviewed line-by-line
SOTERIA scans are available as a standalone engagement and as the discovery phase of every Ionate modernization program. There is no commitment to proceed with transformation after a scan — many customers use SOTERIA as a vendor-evaluation tool. We are confident in the strength of the output.
Ready to map your estate?
Point SOTERIA at your hardest legacy system. In 48 to 72 hours we will return a risk score, a dependency map, and a decision register — air-gapped, no source code egress, no commitment beyond the scan.