The Other Jagged Frontier

The floor

Roughly 1 in 3 US adults — about 68 million people — score at Level 1 or below on the OECD's most recent assessment of how adults solve problems in changing, multi-source environments.¹ In the previous cycle, when the OECD measured the same population specifically on digital problem-solving tasks, ~50% either scored at Level 1 or could not complete the digital assessment at all.² Whichever instrument you use, the floor is the same: a third to a half of US adults cannot reliably complete a task that requires holding multiple criteria, switching between information sources, and adjusting when conditions change.

The OECD defines Level 1 as solving "problems that do not change and thereby do not require adaptivity." Directing an AI is nothing but adaptivity — the loop of prompt, evaluate, refine. AI's productivity gains are real and growing, but they are accruing only to people already above this line.

Thesis

Without a deliberate intervention, the 68 million below the line won't just miss the gains — the gap between them and everyone else will widen.³ Whether AI's dividend is broadly shared or concentrates at the top depends on whether that gap closes.

Benefiting from AI is a stack of three capacities, in order:

Execute a multi-step task that spans applications. Hold the goal, move between tools, recover from errors. Below this line, no AI tool helps.
Structure a problem clearly enough to direct an AI. Decompose the task, name what you want, supply the right context.
Evaluate AI output against the real task. Notice when the answer is wrong, refine, integrate.

Most adults today fail at (1), never reach (2), and cannot leverage (3). This is a transfer problem, not a content problem — the skills that produce digital fluency in one context have to generalize to unfamiliar ones, or they don't count. See pedagogy.md for the full argument.

We propose to build the platform that produces transferable schemas for digital problem-solving — not procedural memorization — at population scale, using AI as the cost-collapse mechanism that makes structured 1:1 coaching deployable at $4 per learner.

The "jagged frontier" framing in the title is borrowed from Dell'Acqua et al. (2026), whose BCG field experiment showed that AI dramatically augments knowledge work inside its capability frontier and degrades it outside.⁴ Their frontier is which tasks AI can handle. This pitch names the other one: which adults can operate the cognitive scaffold that any AI task requires.

Why us

I run Happy Robots, an AI consulting firm that trains Fortune 500 teams to adopt and direct AI systems. Our 15-week enablement programs cover LLM fundamentals through task-level evaluation. One client, Une Femme Wines, went from zero to 100% daily AI adoption in six weeks. I have spent the last two years watching the exact failure mode this project targets: employees who can operate individual tools but cannot compose cross-application workflows, cannot structure a problem clearly enough to direct an AI, and cannot evaluate whether the AI's output is correct. The curriculum in this proposal is not theoretical. It is what I already teach, restructured around transfer-of-learning research and delivered through software instead of consulting.

I can build this myself. My background is Gettysburg College (CS, economics, statistics), a decade in brand and product at L'Oreal, AB InBev, and Drinkworks, then independent consulting and product development since 2021. In the last year I have shipped 16 projects spanning full-stack development, RAG systems, reinforcement learning, OCR pipelines, agentic frameworks, and browser-based tools — all built with Claude Code in tight iteration cycles. The simulated desktop, the telemetry layer, and the co-pilot integration described in technical-approach.md are within my solo build capacity for v1.

What I see that incumbents miss: Khan, Google, and GCFGlobal teach tools. The field needs to teach transfer. No existing product targets the specific population — adults who plateau between basic app use and cross-application fluency — with pedagogy designed to produce schemas that survive contact with unfamiliar environments. The evidence base for this population is nearly empty (no RCTs, no mental-model-construction studies; see pedagogy.md §7), which means whoever builds and measures first defines the standard.

The bridge nobody has built

The adult digital-skills field already knows where it fails. The Urban Institute's 2019 synthesis of provider interviews names it directly: training programs can teach an isolated digital task in context, but moving learners past basic familiarity into genuine fluency — the ability to compose patterns across unfamiliar tools, to recover from errors, to direct an AI inside a real workflow — is the part nobody has solved. Their words:⁵

"Multiple respondents suggested it is not clear how to train people to move from this initial level to more fluency."

This is the gap. Public libraries deliver intro-to-computer and intro-to-email instruction at scale, free, with in-person human support — that work is well-served and we don't propose to redo it. But the bridge from "I can use Gmail" to "I can complete a Medicare appeal that requires reading the denial letter, locating supporting documents, drafting a response with AI assistance, and tracking the case across email and a benefits portal" — that bridge does not exist as a product, in any provider, anywhere. The training infrastructure ends where the most important skills begin.

The pitch's claim is that this bridge is buildable now, for the first time, because frontier AI models can do the structured 1:1 coaching that previously required a human tutor — and because transfer-of-learning research provides a specific, falsifiable design for what that coaching has to do.

What "fluency" means here

Adult digital competence is measured by the OECD's PIAAC assessment on a five-band scale. The scale runs from "no digital skills" through Level 4. The middle bands are where the gap lives:

Level	What an adult at this level can do	Where it's served today
Level 2	Multi-step tasks within a single tool, with inferential reasoning. Example: sort a spreadsheet to count entries matching criteria from another app.	Some library systems; partial.
Level 3	Higher-order tasks across multiple sources, evaluating relevance and reliability. Example: schedule a meeting using a new web app under multiple constraints — booked rooms, participant schedules.	Almost nowhere.
Level 4	Complex problem-solving across unfamiliar tools, integrating evidence to support a decision. Example: research a major purchase across vendors, evaluate source credibility, reconcile contradictions, produce a justified recommendation.	Nowhere.

Roughly 32% of US adults — 68 million people — score at Level 1 or below.⁶ Only 10–15% of OECD adults reach Level 3 or 4.⁶ Adult literacy and numeracy in most OECD countries stagnated or declined between 2012 and 2023, despite rising educational attainment.⁷ Most modern jobs and government services require Level 2 minimum; AI-augmented work pushes the bar toward Level 3+. That mismatch is the workforce adaptation problem named in plain numbers.

Why now

Three conditions converged in the last 18 months that did not exist before.

AI shifted where the bottleneck sits. When complex tasks become possible for anyone who can direct an AI, directing becomes the gating skill. The OECD's 2025 Bridging the AI Skills Gap report quantifies this: roughly 1 in 3 job vacancies have high AI exposure, but only ~1% require specialized AI skills.⁸ The other 99% require general digital fluency.

Untargeted AI access is widening the divide, not closing it. Microsoft's 2025 diffusion data shows cross-country AI adoption gaps grew from 2–16% (2021) to 4–28% (2024).³ Without structured intervention, AI-powered tools accrue only to the already-skilled.

Frontier AI tutoring works at scale, with measured effect, when it's structured correctly. Bastani et al. (2026) — preregistered RCT, 770 students across 10 Taipei schools, 5 months — produced +0.15 SD on an unassisted final exam, equivalent to 6–9 months of additional schooling. Effects concentrated in lower-tier schools (0.17 SD) and prior novices (0.22 SD). The intervention closed the gap rather than widening it. Engagement-mediated, not problem-volume mediated.⁹ See pedagogy.md for what "structured correctly" specifically means and how our design implements it.

Solution

A simulated digital workspace — browser, email, document editor, forms, file system — running in the user's browser, instrumented at the keystroke and event level, with an embedded AI co-pilot that observes the user's work and intervenes pedagogically.

Users complete real tasks across real-feeling apps. The co-pilot watches without interrupting, names the patterns the user just used, and prompts metacognitive reflection at task end. Curriculum content is structured around five evidence-supported design moves for transfer (cross-domain task families, explicit pattern naming, metacognitive debrief, contrasting cases, far-transfer assessment) — see pedagogy.md and curriculum.md.

The design concentrates content at PIAAC Levels 3+ — where the field has named the gap, where libraries don't reach, where AI-augmented work pushes the bar.

What that looks like

A learner four months into the platform. She arrived a smartphone-native who had never used a desktop email client. She is now mid-task: choosing a Medicare Advantage plan for her father.

She has three vendor sites open across two browser tabs. She has a benefit-comparison document she is drafting in a third tab. She asks the AI co-pilot to summarize the prescription drug coverage differences across the three plans. The AI produces a comparison; she notices one claim contradicts what she just read on Vendor B's actual page; she asks the AI to verify against the source; the AI corrects itself; she pastes the corrected comparison into her document. She finishes by writing a one-paragraph recommendation for her father with three supporting bullets.

She cannot recite every keystroke. But asked what she did, she names the patterns: "I broke it into pieces. I had the AI draft, but I checked its claims against the actual sites. I made a recommendation I can defend."

That is what far-transfer success looks like. The Medicare comparison is not a task we trained on. The patterns are.

Evidence base

Three claims, three pieces of evidence:

1. Adaptive AI tutoring works, and the design pattern matters. Bastani et al. (2026) — full citation above. The strongest single piece of upstream evidence we have. Engagement-mediated gains, equity-positive distribution, mechanism-isolated experimental design.⁹

2. The transfer mechanisms our design uses have decades of evidence in adjacent domains. Self-explanation prompts in Intelligent Tutoring Systems produce d ≈ 0.33–0.55 on transfer (VanLehn 2011 meta-analysis of 50+ studies). Contrasting-cases pedagogy (Schwartz & Bransford 1998) produces measurable far-transfer gains. The mindful-abstraction mechanism (Salomon & Perkins 1989) is the foundation of high-road transfer. Full citations and product implications in pedagogy.md.

3. The specific evidence base for our population is thin — and that is the contribution opportunity. No RCTs exist on adult computational-thinking transfer. No empirical studies exist on how adults build the mental models digital fluency requires. The K-12 CT-transfer meta-analyses explicitly note adults "received relatively little attention." A platform that deploys a structured intervention to thousands of low-fluency adults and reports honestly on what works will produce the evidence the field currently lacks. See pedagogy.md §7 for the full audit of what's known and what isn't.

Why a simulated environment, not an AI agent on real apps

A skeptic's first question: why not deploy a Claude or Operator coach on top of real Gmail, real Google Docs, real government forms? Three reasons.

Latency. Real-time coaching needs sub-1-second response. Frontier vision-based computer-use agents take 2–7 seconds per step (screenshot → inference → action). For productive-struggle pedagogy ("intervene when the user gets stuck, before they give up"), that is 4–8x too slow. An instrumented sandbox responds in <10ms.

Reliability. OSWorld benchmarks show ~80% top-line, but on real production websites (Online-Mind2Web) frontier agents drop to ~30%. Production sites defend against agents — CAPTCHAs, bot detection, dynamic DOM. The benchmark gap is structural, not temporary.

Telemetry. Vision agents see pixels. An instrumented sandbox sees keystroke timing, dwell, hover-without-click, partial input, undo events, paste origin — the cognitive signals that mediate learning gains in Bastani's study and that are invisible to vision.

Full architecture, build-vs-buy analysis, and cost model in technical-approach.md.

Outcomes (measurable)

The headline metric is far-transfer rate: % of users who successfully complete a task they have never seen before, in a context they have never trained in, using a pattern from earlier in the curriculum.

Secondary metrics:

Pattern-naming recall (do users name the patterns the co-pilot named for them, weeks later, without prompting?)
Sustained engagement (Bastani's mediator, measured as time-on-task and persistence)
Procedural completion rate, time-to-completion, error frequency
Self-efficacy on novel digital tasks

A platform without far-transfer measurement is procedural training in disguise. pedagogy.md §6 commits, in advance, to three concrete failure signals that would tell us the design is not working.

Scale economics

Per active user-hour: ~$0.05–0.15 in LLM inference (Anthropic Sonnet + Haiku, with prompt caching). Detailed cost model in technical-approach.md §6.

Per 5-month learning relationship at 2 hours/week (40 hours): **$4 per learner.**
Roughly an order of magnitude lower than human-tutored alternatives; 5–10x lower than vision-agent-based coaching alternatives.
LLM pricing trajectory has been deflationary and is expected to remain so.

The product is price-taking on a fast-deflating curve, not zero-marginal-cost. Plan, budget, and pilot scope assume current prices and improve from there.

What we have already done

Reviewed and synthesized the foundational transfer-of-learning literature — Schwartz & Bransford 1998 (contrasting cases), Salomon & Perkins 1989 (low-road / high-road transfer), How People Learn II Ch 5, Wing 2006 (computational thinking), Bastani et al. 2026 (adaptive AI tutoring RCT). Notes and PDFs bundled in the Research library.
Validated the simulated-vs-vision-agent architectural choice against current benchmark and production-reliability data (OSWorld, Online-Mind2Web, Mind2Web 2). See technical-approach.md §1.
Mapped the open-source build stack — ~50–60% of the simulated environment is assembled from MIT-licensed projects (daedalOS, TipTap, ZenFS, React Hook Form). See technical-approach.md §2.
Drafted full pedagogy, curriculum, technical-approach, spec, and field-research documents. Each is a separate, citation-grounded artifact.
Defined a research program for field interviews with 6–8 adult digital-skills providers (libraries, ABE programs, senior programs) — see fieldwork.md. Phase 1 begins as soon as funding is secured.

What we propose to do

Build the v1 MVP per the technical-approach doc:

v0 proof of concept (4–6 weeks): one app pair, one task, full telemetry, AI co-pilot loop validated.
v1 MVP (4–6 months): five apps (browser, email, docs, forms, files), curriculum scoped to Levels 2–3 or 3–4 (decision contingent on field-research findings — see product-spec.md MVP scope), far-transfer assessment instrument, deployable to a partner-cohort pilot.
Field research in parallel: visit 6–8 digital-skills providers, reshape co-pilot intervention rules and curriculum content against observed practice.
Pilot deployment: 100–500 learners in collaboration with a library system, ABE provider, or workforce development partner.

Detailed scope, build-vs-buy analysis, and cost model in technical-approach.md. Curriculum content in curriculum.md. Assessment design in product-spec.md.

Why Emergent Ventures

Fits the pattern: individual-driven, high leverage, fast execution, scalable impact, AI-forward. The deeper fit:

Targets a population EV cares about. Workforce adaptation, the AI haves vs. have-nots gap, the failure mode where intelligence augmentation accrues only to the already-able.
Uses AI as the cost-collapse mechanism. What was previously possible only with 1:1 instruction becomes deployable at $4/learner.
Generates research evidence as a byproduct. The deployment dataset will be the largest empirical record of adult digital-fluency transfer ever assembled.
Falsifiable, not just aspirational. pedagogy.md §6 commits, in advance, to what would constitute failure.

Ask

TODO (Matt): Specific dollar amount, timeline, what the funding buys.

Suggested framing: "$X for a Y-month Z that produces (a) v1 MVP deployable to 500–1,000 pilot learners, (b) the first dataset on adult digital-fluency transfer at scale, (c) an open-source release of the simulated-environment platform layer."

Reference: typical EV grants are $10k–$100k. Pick a credible number with a credible budget breakdown (engineering / inference cost / pilot partnership / research write-up).

Distribution channel

TODO (Matt): Which path to first cohort?

Options to consider:

Library partnership. Public libraries have decades of experience with this exact population. Distribution is solved; pilot recruitment is plausible. Their existing Level 1–2 work complements our Level 3+ contribution rather than overlapping. Trade-off: library tech-procurement cycles are slow.

Workforce development partnership. Goodwill, local workforce boards, AEFLA-funded ABE providers. Outcome incentives align (employment outcomes are funder-tracked).

Direct-to-consumer. Highest leverage if it works, but the target population is precisely the one least likely to find a D2C learning product on their own.

Government / public service. Slow but potentially large; possibly via state workforce agencies.

The field-research program (fieldwork.md) is designed to produce a defensible answer here. Defer the commitment until Phase 2 of that program is done.

First-cohort plan

TODO (Matt): How we get from v1 MVP to 500–1,000 real learners.

Should be specific: which partner organization, which specific learner population, what the recruitment looks like, how long it takes, what the success criterion for the pilot is.

The field-research program produces 2–3 named candidate partner organizations as one of its deliverables — fill this in after Phase 2 of that program.

Long-term vision

The standard instrument for measuring digital + AI fluency in adult populations.
The largest empirical dataset on real-world human + AI performance in cross-application work.
An open-source platform layer (daedalOS-derived shell + telemetry layer + co-pilot integration) that other adult-education organizations can adopt and extend.
The foundation layer for workforce adaptation to AI systems — not as a metaphor, but as the specific software deployed in libraries, ABE programs, and workforce development centers.

NCES, PIAAC 2023 National Results — Adaptive Problem Solving, Dec 2024. 32% of US adults at Level 1 or below; OECD average 29%. https://www.nces.ed.gov/surveys/piaac/2023/national_results.asp · APS measures the capacity to achieve goals in dynamic situations where information changes mid-task, across digital, physical, and social information environments. It replaces the prior cycle's Problem Solving in Technology-Rich Environments (PSTRE) measure. The OECD's stated reason for the switch: PSTRE "conflated problem solving and information and communication technologies (ICT) skills, as only test-takers with some (basic) ICT skills could participate" and excluded between 8% and 57% of the target population per country who could not pass the ICT screener (Survey of Adult Skills 2023 — Reader's Companion, OECD 2024, p. 37 — see our notes and the PDF). The two measures are not comparable; NCES is explicit that "the digital problem-solving and adaptive problem-solving domains cannot be compared due to differences in their assessment frameworks." The APS Level 1 descriptor includes the phrase "solve problems that do not change and thereby do not require adaptivity" — a plain-English match for the cognitive operation an AI workflow requires. ↩
NCES, PIAAC PSTRE Proficiency Level Results (Cycle 1, US data through 2017). https://nces.ed.gov/surveys/piaac/pstreproficiencylevel.asp · 31% of US adults scored at Level 1 or below on PSTRE; an additional ~19% could not complete the digital assessment at all (no computer experience, failed ICT screener, or opted out of computer-based assessment). PSTRE specifically measured the ability to use digital tools — email, web, spreadsheets, simulated apps — to solve information problems. It was retired after Cycle 1. ↩
Microsoft Research, AI Diffusion Report 2025 H2. https://www.microsoft.com/en-us/research/wp-content/uploads/2026/01/Microsoft-AI-Diffusion-Report-2025-H2.pdf ↩ ↩²
The "jagged frontier" framing is borrowed from Dell'Acqua, McFowland, Mollick et al., Navigating the Jagged Technological Frontier (Organization Science, 2026), whose 758-knowledge-worker BCG field experiment named the asymmetry between AI tasks inside the capability frontier (where AI yields +12% throughput, +25% speed, higher quality) and tasks outside it (where AI use makes correct answers 19% less likely). Their frontier is which tasks AI can handle. The "other" frontier this pitch names is which adults can operate the cognitive scaffold that any AI task requires — a population-level prerequisite that sits below the task-level question they study. Our notes · PDF · original source ↩
Hecker & Loprest (Urban Institute), Foundational Digital Skills for Career Progress, 2019. PDF · our notes · original source ↩
NCES, PIAAC 2023 National Results, Dec 2024. https://www.nces.ed.gov/surveys/piaac/2023/national_results.asp ↩ ↩²
OECD, Education at a Glance 2025. https://www.oecd.org/en/publications/education-at-a-glance-2025_1c0d9c79-en.html ↩
OECD, Bridging the AI Skills Gap, 2025. https://www.oecd.org/en/publications/bridging-the-ai-skills-gap_66d0702e-en.html ↩
Chung, Zhang, Kung, Bastani & Bastani (2026), Effective Personalized AI Tutors via LLM-Guided Reinforcement Learning. SSRN 6423358. PDF · original source ↩ ↩²