Methodology
Open, versioned, updated monthly. Capability is one input of nine.
The nine axes
For every (role × task) cell in our matrix, the following nine axes are scored. The composite economic substitution exposure is a weighted product, not a sum — capability gates, reliability multiplies, error cost divides, human advantage dampens.
Status: matrix v1 (live) is scored by a single calibrated evaluator against a transparent rubric. The three-model evaluator panel — which compares scores across Claude, GPT-4-class, and Gemini-class models and stores the per-model breakdown in ai_capability_matrix.eval_panel — ships with v1.5 in Q3 2026. We do not retroactively backfill v1 cells; the matrix version stamp on every Wagecard records which calibration produced its read.
Capability cluster
- Output quality
- Does AI produce work that's usable as-is?
- Oversight need
- Minutes of human review per unit of output.
- Latency
- Fast enough for the task's real-world cadence?
Reliability and risk cluster
- Reliability
- Escalation and retry rate in production-like conditions.
- Hallucination risk
- Frequency of confidently-wrong outputs.
- Error cost
- Financial, regulatory, reputational severity when AI is wrong.
Operational economics cluster
- Integration overhead
- Context-prep, monitoring, prompt-iteration amortized.
- Unit economics fit
- Does the per-task AI cost hold up at the role's wage benchmark?
Human advantage cluster
- Human advantage dampener
- Composite from the task's irreducible value across five axes: trust, ambiguity, accountability, persuasion, context.
Substitution classes
Every task in a Wagecard lands in one of four classes, with confidence.
- Replaceable — AI runs the task end-to-end with minimal human oversight. Substitution viability high.
- AI-augmented — AI does most of the work; the human owns decisions and context.
- Human-led, AI-assisted — Human leads the task; AI accelerates tooling (drafting, search, summarization).
- Human-critical — AI delivers no net value (or negative value) due to trust, regulation, accountability, or relational complexity.
What's open versus what's the moat
A reasonable question for any open-methodology product: if anyone can read the formula, what stops them from skipping the subscription? The honest answer is that the formula is open because trustworthy economics need to be auditable; the moat is the underlying dataset and its continuous refresh.
Open (trust)
- · Formula and the 9 axes
- · Substitution-class thresholds
- · Confidence-band rules
- · Quality and aggregation gates
- · Capability-matrix version on every Wagecard
Licensed (moat)
- · Capability-matrix scores per task × model × date
- · Salary benchmark integrations (paid data sources)
- · Cross-industry adoption distribution
- · Monthly refresh pipeline
- · Industry-standard score (the network-effect of comparability)
Reproducing the dataset internally is feasible but costly: roughly $300–500k/year in eval compute and senior-engineering time, plus $50–150k/year in salary-data licensing — and the cross-industry adoption signal isn't reproducible from inside a single company at all. The methodology is open so you can audit our reasoning; the data is what you're paying for.
How we keep the dataset honest
The aggregated metrics Wagecore publishes are only as good as the inputs they're computed from. Four gates run before any Wagecard contributes to a public insight:
- Hard validation. Salary must be in [$10k, $2M]; hours per task in [0.5, 80]; total hours ≤ 80/week. Out-of-band inputs are rejected before the engine runs.
- Quality score per Wagecard. Each card is scored 0–100 on five soft signals: did the user enter salary, how many tasks did they select, are total hours plausible, and did they contribute opt-in adoption signals. Only cards at quality ≥ 50 count toward public insights.
- Variance check on adoption signals. A submission where every task gets the same bucket is flagged as low-variance and excluded from public adoption distributions. The row is retained for audit, never used for publication.
- Cell density floor.A role × geo × experience cell isn't published as an insight until it crosses 100 qualifying Wagecards. Until then the answer is “n/a” with an explicit explanation, not a fabricated number.
What we publish
The capability matrix version on each Wagecard. The exact weights of the composite formula. The model names and dates of the most recent evaluation panel. The data sources for salary benchmarks. The sample size and confidence for every output.
MVP runs on a v1 formula (simpler reliability and error-cost buckets). v1.5 ships the full multi-axis economics by months 2–3, with public methodology updates.
How a CFO will read this — NPV, IRR, and payback
The substitution-exposure score is the read; the financial-projection layer is the decision. Enterprise AI deployment decisions ultimately go through three standard finance gates: NPV(net present value of the five-year cash flows, discounted at the firm's cost of capital), IRR (internal rate of return — the discount rate at which NPV is zero), and payback period (how many years until cumulative savings cover the initial transition cost).
We compute all three for any Wagecard with a salary input. The model treats AI substitution like any other capital project: the Year 0 transition cost is the cash outflow (training, integration, oversight tooling, error remediation reserves); the annual savings are the delta between the loaded human cost of the displaced task hours and the operational AI cost to produce the same output. Discount rate defaults to 10% (typical mid-market WACC) and is adjustable inside the Investment view.
NPV (5-year)
Sum of discounted annual savings minus the Year 0 transition cost. A positive NPV means the deployment creates value at the given discount rate; negative means the firm is better off keeping the human work in place.
IRR
The annual return the project earns on its capital. Compared against the firm's hurdle rate (WACC). A 35% IRR with cheap capital means “do it now”; a 6% IRR means “reconsider.”
Payback period
Years before cumulative savings recover the transition cost. A sanity check against NPV/IRR. Positive NPV with a 6-year payback may still be rejected if the firm needs faster cash recovery.
Full worked example with the math — including how the Year 0 transition cost flips IRR computability — lives in our NPV / IRR / Payback explainer. Pro accounts also get the Investment view inside each Wagecard with editable discount rate, transition-cost band, and sensitivity scan.
What we deliberately do NOT model: option value (the right to defer deployment until capability or cost improves), strategic redeployment value (freed human hours redirected to higher-leverage work), or terminal value beyond Year 5. These matter at scale; our model is calibrated to be conservative for the individual and mid-market read, not to oversell upside.
External validation of the operational-cost thesis
Wagecore's central claim — that capability does not equal economic viability, and that the full cost of AI substitution includes oversight, retries, error cost, and integration overhead — is not unique to us. As of April 2026:
- Bryan Catanzaro, VP of Applied Deep Learning at Nvidia, told Axios on April 26, 2026: “For my team, the cost of compute is far beyond the costs of the employees.” Fortune
- MIT CSAIL (2024) found AI automation economically viable in only 23% of vision-primary roles at current cost structures; in 77%, human labor remains cheaper. study
- BCG (2025) — only 5% of companies are capturing AI value at scale; ~60% report no material value despite investment. BCG
- Klarna (2025–2026) publicly reversed its 700-role customer-service replacement after repeat-contact rates rose 25% and CSAT dropped on complex tickets — and resumed hiring humans.
- Uber (April 2026)burned its full 2026 AI coding budget in four months under token-based pricing — CTO publicly noted being “back to the drawing board.”
This is the gap Wagecore prices. Capability is rising. Economic viability is not — yet, and not uniformly. Our four-class taxonomy is calibrated to where AI is operationally cheaper today, not where it could be in 2030.