Whitepaper · June 2026

The Legacy
Discovery Playbook.

Mapping unknown unknowns across COBOL, RPG, AS/400, Natural, and Fujitsu estates in 48–72 hours — the SOTERIA methodology.

Prepared by Ionate, Inc. · IONATE SOTERIA · Public — June 2026

01 Executive Summary
02 Why Discovery is the Hardest Step
03 The Five Dimensions of Legacy Risk
04 Inside the 48–72 Hour Scan
05 What SOTERIA Returns
06 Decisions SOTERIA Enables
07 Case Studies
08 Getting Started

01 — Executive Summary

Every failed modernization began with incomplete discovery.

The single most reliable predictor of a failed legacy modernization program is the depth of the discovery phase. Programs that begin with a six-month consulting-led "current state assessment" produce inventories. Programs that begin with a scan that reveals dependencies nobody knew existed produce strategies. SOTERIA is engineered to produce the latter — in 48 to 72 hours, against a multi-million line estate, with zero source code leaving the customer's environment.

This whitepaper describes the methodology. We explain why traditional discovery approaches systematically underestimate risk, what the five dimensions of legacy risk look like when properly measured, and how SOTERIA condenses what historically required a quarter of consulting effort into a working week. The output is a single artifact — the SOTERIA Risk Score and accompanying dependency map — that has been used as the planning baseline for 50+ enterprise modernization programs.

"You cannot modernize what you cannot see. And in a 40-year-old estate, most of what matters is invisible — to humans, to documentation, and to every tool that came before SOTERIA."

48–72hTime to Scan

5Risk Dimensions

100%Air-Gapped

50+Estates Mapped

$0Source Code Egress

02 — Why Discovery is the Hardest Step

The map is not the territory. And the territory is older than the map.

Legacy estates are not architectures. They are sedimentary deposits — forty years of patches, mergers, regulatory amendments, vendor migrations, and emergency fixes layered on top of one another. The original architecture diagrams, if they ever existed, describe a system that has not existed for two decades. The current architecture exists only in the runtime behavior of the code itself.

This creates a discovery paradox. The team that knows the system best — its engineers — has the least incentive to admit what they do not know. The team that admits ignorance most freely — new consultants — has the least context to discover anything. The result, in every program that begins with manual discovery, is the same: a polished current-state document that captures what the team agreed was important, and misses what the system actually does.

The cost of incomplete discovery

Modernization programs underscoped at the start. Programs that look like 18 months at kickoff become 36 months because dependencies surface mid-program.
Architectures that cannot fit the data. Target microservice boundaries chosen at design time turn out to violate transactional constraints discovered during implementation.
Production incidents at cutover. Subsystems that were not in the inventory turn out to be calling the system that was just turned off.
Vendor lock-in revealed too late. Embedded calls to retired SDKs, deprecated databases, and end-of-life hardware are discovered when the modernization team tries to replicate them.

What "discovery" needs to mean in 2026

A modern discovery process must do three things that traditional consulting-led discovery cannot:

Read the code, not the documentation. The code is the only authoritative source.
Quantify risk in dimensions that drive program decisions. Not just "complex" — but how complex, in what way, and for what reason.
Run against the live estate without exfiltrating it. Source code must stay where it is. Discovery output is what leaves the perimeter.

SOTERIA is engineered to do all three.

03. The Five Dimensions of Legacy Risk

"Complex" is not an actionable word.

The word "complex" is not actionable. SOTERIA decomposes complexity into five orthogonal risk dimensions, each measured against the actual source, each scored on a normalized 0–100 scale, and each tied to a distinct set of program decisions.

Dim 01

Composition

What languages, runtimes, and copybooks the estate is made of. The first map.

Dim 02

Complexity

How tangled the code is — cyclomatic, fan-in, fan-out, dead-code ratio, monolith index.

Dim 03

Dependency

What the system reaches into and what reaches into it. The actual blast radius of any change.

Dim 04

Vulnerability

What is exploitable today — CVEs in embedded libraries, weak crypto, unencrypted persistence.

Dim 05

Portability

How much the code assumes about its runtime — implicit hardware, OS, vendor SDK couplings.

Composition: the table of contents

The first dimension is inventory. SOTERIA produces a complete file-level catalog of the estate: every COBOL program, every JCL job, every copybook, every embedded SQL statement, every external call signature, every data file referenced. For polyglot estates — which most modern legacy environments are — the catalog crosses language boundaries: a COBOL batch process calling a Java web service writing to a DB2 z/OS table is one workflow, even though it spans three language ecosystems.

Complexity: the shape of the code

Complexity is measured in five sub-metrics: cyclomatic complexity per program, fan-in (how many callers), fan-out (how many callees), dead-code ratio (lines that no execution path reaches), and a derived "monolith index" that quantifies how decomposable the estate is. The scores tell a story: high cyclomatic complexity with high fan-in means a hot, tangled core that must be broken open carefully. Low complexity but high dead-code ratio means an estate where most of the code has been replaced silently and is no longer used.

Dependency: the blast radius

Dependency is the most undervalued risk dimension. SOTERIA traces every external call signature — JCL submissions, MQ topics, REST endpoints, embedded SQL — and builds a directed graph of the estate's actual integration points. The graph routinely surfaces dependencies the customer's team did not know existed: subsystems calling subsystems calling third-party SDKs no one remembers integrating.

Vulnerability: the security posture

Legacy systems are routinely behind on CVE remediation by years. SOTERIA scans embedded libraries against current CVE feeds, identifies cryptographic algorithms that are deprecated or broken (DES, MD5, SHA-1 in security contexts, weak TLS configurations), and flags persistence patterns that fail modern compliance (unencrypted credentials, plain-text PII in logs, undocumented privileged account paths).

Portability: the runtime tax

Portability quantifies how much the code assumes about its environment. Direct CICS calls, MVS-specific JCL constructs, Fujitsu-specific SDK invocations, hardware-specific timing assumptions — each is a portability friction point that must be addressed during transformation. The portability score is the leading indicator of how much retargeting work the transformation will require, independent of language conversion.

"A 73% Complexity score and a 41% Portability score tell a completely different story than the reverse. Same total. Different modernization strategy."

04 — Inside the 48–72 Hour Scan

What happens between kickoff and delivery.

SOTERIA is engineered to be operationally trivial for the customer. The full delivery of a complete legacy risk assessment, across a multi-million-line estate, takes between 48 and 72 hours from access provisioning to report handoff. The internal workflow runs through six phases.

Access provisioning

The customer provisions a read-only environment containing the source estate. SOTERIA runs inside the customer's network perimeter; source never leaves. Provisioning is typically 2–4 hours of customer time.

Inventory ingestion

SOTERIA crawls the estate and produces a complete composition catalog. For a 5M-line estate, this is roughly 6 hours of compute. Output is the composition map.

Semantic parse

Every program is parsed by Ionate's proprietary legacy-language models. Cyclomatic complexity, fan-in/out, dead-code ratio, and the monolith index are computed program-by-program. This is the most compute-intensive phase.

Dependency tracing

External call signatures are reconciled into a directed graph. The graph identifies the cluster boundaries that any decomposition strategy must respect.

Risk overlay

Vulnerability and portability scores are computed and overlaid on the composition map. The result is a five-dimension risk surface across the estate.

Report synthesis

The SOTERIA Risk Score, the dependency graph, the modernization roadmap, and the executive summary are generated. Output is reviewed by an Ionate principal engineer before handoff.

Throughput characteristics

SOTERIA is built to scale linearly with estate size. The bottleneck is rarely raw compute — it is the customer's ability to provision a clean read-only environment. For mainframe estates, this typically involves spinning up a non-production LPAR or providing a snapshot of source libraries. For distributed estates, a tarball of source repositories is usually sufficient.

Estate Size	Typical Duration	Compute Profile	Customer Effort
< 500K LOC	24–36 hours	Single SOTERIA worker	2–4 hours provisioning
500K – 5M LOC	48–60 hours	2–4 parallel workers	4–8 hours provisioning
5M – 25M LOC	60–72 hours	4–8 parallel workers	8–16 hours provisioning
> 25M LOC	3–5 days	Sharded execution	Custom; planned in advance

05 — What SOTERIA Returns

One artifact, four decision-grade outputs.

The output of a SOTERIA engagement is a single document — the SOTERIA Risk Report — structured around four sections, each designed to drive a specific category of program decision.

The SOTERIA Risk Score

A single normalized score (0–100) summarizing the overall modernization risk of the estate. The score is decomposed into the five dimensions, so a reader can immediately see whether the risk is concentrated in dependency complexity, in vulnerability density, in portability friction, or distributed evenly. The score is calibrated against Ionate's reference corpus of 50+ prior engagements, so a 67 in one engagement is comparable to a 67 in another.

The dependency map

A directed graph of every cross-boundary call in the estate, annotated with frequency, criticality, and the data classification of the flowing payload. The dependency map is the planning artifact for any decomposition strategy: it dictates which subsystems can be modernized independently and which must move together.

The modernization roadmap

A phased plan that maps the estate into modernization waves, ordered by risk and dependency. The first wave is always the lowest-risk, highest-leverage scope — designed to deliver visible value early and validate the approach. Subsequent waves are sized for sustained throughput.

The decision register

A list of specific, named decisions the customer's team must make to execute the roadmap: target architecture style, language choices, decomposition boundaries, data ownership, cutover sequencing. Each decision is presented with options, tradeoffs, and Ionate's recommendation.

If a discovery document does not change a single decision your steering committee was about to make, it is a polished artifact, not a discovery. The Risk Report's purpose is to change decisions. Each section maps to one.

06 — Decisions SOTERIA Enables

The output is not the report. The output is the decisions it makes possible.

Discovery artifacts that do not drive decisions are organizational waste. SOTERIA is engineered to produce decision-grade outputs — each section of the Risk Report maps directly to a specific class of program decision a steering committee must make.

Decisions enabled by the Risk Score

Go / no-go on a modernization program at the proposed scope.
Phasing strategy: single wave, multi-wave, or strangler-pattern decomposition.
Vendor selection: where SI partner capability gaps exist for the dimensions where risk is concentrated.

Decisions enabled by the dependency map

Service decomposition boundaries — what becomes one microservice, what becomes several.
Cutover sequence — which dependencies must be untangled first.
Co-existence architecture — how the legacy and modernized estates communicate during the transition.

Decisions enabled by the modernization roadmap

Budget and resource allocation by phase.
Stakeholder communication and milestone planning.
Risk-weighted SLA negotiation with internal sponsors.

Decisions enabled by the decision register

Target architecture: event-driven vs. request-response, mesh vs. monorepo, CQRS vs. unified state.
Language and framework standardization across the modernized estate.
Data ownership and master-record sequencing for shared entities.

07 — Case Studies

What SOTERIA found that no one knew.

Cases below are representative composites drawn from real engagements. Identifying details are withheld per engagement confidentiality; specific outcomes and numerics are accurate to the engagement profile.

European Tier-1 Bank · Core Payments

1.2M COBOL · 47 years of accretion

58h ScanScore 71

SOTERIA identified 19 active dependencies on an in-house security library that had been deprecated in 2014 and replaced by a documented enterprise-wide replacement. The replacement, however, was never integrated into the payment estate. Every wire transfer flowing through the core continued to validate against the deprecated library, including its known weak RSA implementation.

Modernization scope adjusted to include a security-library migration as Wave 0, ahead of any structural transformation.

Latin-American Insurance · Policy Admin

3.8M Natural · Adabas backend

71h ScanScore 83

The Risk Report's portability dimension flagged a 38% score driven almost entirely by Adabas-specific data type assumptions distributed across 240 programs. The customer's IT leadership had assumed Adabas decoupling would be a single phase. SOTERIA's analysis showed it threaded through every business line.

Program re-baselined: data-tier decoupling parallelized with application transformation rather than sequenced before it. Saved an estimated 11 months on the program timeline.

North-American Retailer · Pricing Engine

720K AS/400 RPG

42h ScanScore 49

Customer expected a high-risk estate. SOTERIA returned a moderate score (49) — but flagged a single dead-code ratio of 31%. Investigation revealed that approximately one-third of the codebase had been silently replaced over the previous decade and was no longer in any execution path.

Modernization scope reduced by ~30%, eliminating ~$2.1M in projected transformation effort. Live estate modernized in 8 months.

08 — Getting Started

From kickoff to risk report in three days.

Day -1

Kickoff Call

Scope and access alignment
Environment provisioning plan
Success criteria sign-off

Day 0–2

Scan Execution

Air-gapped SOTERIA run
Six-phase pipeline executes
Engineer-supervised through delivery

Day 3

Report Walkthrough

SOTERIA Risk Report delivered
Live walkthrough with steering committee
Decision register reviewed line-by-line

SOTERIA scans are available as a standalone engagement and as the discovery phase of every Ionate modernization program. There is no commitment to proceed with transformation after a scan — many customers use SOTERIA as a vendor-evaluation tool. We are confident in the strength of the output.

Ready to map your estate?

Point SOTERIA at your hardest legacy system. In 48 to 72 hours we will return a risk score, a dependency map, and a decision register — air-gapped, no source code egress, no commitment beyond the scan.

Request a Scan Talk to Sales

About Ionate. Founded 2016. Across 50+ enterprise deployments globally, the IONATE platform has transformed an aggregate volume of source in excess of 500 million lines across COBOL, RPG, AS/400, Adabas/Natural, Fortran, and Fujitsu environments. Engineering and operations are SOC 2 Type II audited. Specific engagement figures, customer references, and methodology details are available under NDA on request. Learn more.

The LegacyDiscovery Playbook.

Contents