Four engineers, 28 days, 85% AI-written code, #1 on the Play Store. OpenAI just published a case study that validates everything synthesis coding is about — and adds patterns worth learning.
OpenAI released a case study this week describing how their team built the Sora Android app. The numbers caught my attention: four engineers, 28 days from start to #1 on the Play Store, 85% of code written by Codex, 99.9% crash-free rate, and over a million videos generated in the first 24 hours.
Those metrics matter, but what interests me more is how they worked. Reading their account felt like finding notes from a parallel universe where someone arrived at the same conclusions I’ve been documenting in this series — sometimes with different language, sometimes with the exact same framing.
They failed first
The team didn’t start with their successful approach. They tried the obvious thing: “Build the Sora Android app based on the iOS code. Go.”
It didn’t work. The output was technically functional but the product experience fell short. Single-shot code generation produced unreliable results. They had to course-correct.
This matches what I’ve seen repeatedly. Teams assume AI coding means describing what you want and getting what you need. That assumption fails on anything beyond trivial tasks. The interesting question is what they did next.
The senior engineer mental model
Their breakthrough came from a framing shift: treat the AI like a newly hired senior engineer who needs onboarding.
A new senior hire can write code. They understand programming languages and patterns. But they don’t know how your team works. They can’t infer your architecture decisions, your product strategy, the shortcuts that make sense in your context. They haven’t watched the app run, so they lack intuition about scroll feel or confusing flows. Their instinct is “get something working” rather than “build it the way we build things here.”
This mental model changed how the Sora team worked with Codex. Instead of treating it as an oracle that should know everything, they treated it as a capable colleague who needed context. They wrote AGENT.md files throughout the codebase — essentially onboarding documents that explained how they wanted things done. When the AI saw one of these files, it understood the patterns to follow.
I’ve been calling these CLAUDE.md files in my own work. Same idea, different tool. The pattern generalizes: persistent context files that travel with your codebase and shape how AI agents work within it.
Foundation-first, not AI-first
Here’s where the case study gets counterintuitive. In a project where 85% of code was AI-generated, the team spent their first week writing code by hand.
They built the architecture manually. Dependency injection, navigation, authentication, base networking — the foundational patterns that would shape everything else. They implemented a few representative features end-to-end. And critically, they documented patterns as they built them.
Their reasoning: “The idea was not to make ‘something that works’ as quickly as possible, rather to make ‘something that gets how we want things to work.’”
They weren’t trying to ship fast. They were trying to establish what “correct” looked like before letting AI scale the implementation. Once they had examples of well-built features, they could tell Codex: “Build this settings screen using the same architecture and patterns as this other screen you just saw.”
That instruction works. “Build the Sora app” doesn’t.
The planning loop
For non-trivial changes, they developed a consistent pattern. First, ask the AI to read relevant files and summarize how the feature currently works. Correct its understanding if needed. Then create an implementation plan together — essentially a mini design doc covering which files change, what new states get added, how the logic flows.
Only after that planning phase would they execute, step by step.
When tasks were large enough to span multiple sessions, they saved plans to files. The plans became shared context that kept work coherent across time and across different AI instances working in parallel.
I’ve been doing something similar with CONTEXT.md files that capture project state. The Sora team’s approach validates that pattern while adding the explicit planning step as a first-class part of the workflow.
Managing a distributed team
At their peak, the team ran multiple Codex sessions simultaneously — one working on playback, another on search, another on error handling, another on tests. Their description of this experience stands out: “It felt less like using a tool and more like managing a team.”
That framing shifts everything. You’re not writing code with AI assistance. You’re directing AI agents while maintaining system coherence. The bottleneck moves from typing to decision-making, from implementation to integration.
Brooks’s Law applies here. You can’t simply add more AI sessions and expect linear speedup, just as adding more programmers doesn’t linearly accelerate a late project. Someone has to make decisions, give feedback, integrate changes, and maintain the vision. The team described themselves as “the conductor of an orchestra versus simply faster solo players.”
Cross-platform through translation
One architectural insight deserves attention. The team built a native Android app, but they had the iOS codebase as a reference. Rather than using cross-platform frameworks like React Native or Flutter, they let the AI translate.
Logic is portable. Swift code expressing business logic can be translated to Kotlin while preserving semantics. UI code needs to be native anyway to feel right on each platform. So they gave Codex access to iOS, backend, and Android repos simultaneously, documented in AGENTS.md where each repo lived and what it contained.
The result: Android-native code informed by an existing iOS implementation, without the compromises of cross-platform abstractions. Whether this scales to larger teams is an open question, but for a four-person sprint, it worked.
What they got right about human judgment
The case study’s conclusion resonates: “AI-assisted development does not reduce the need for rigor; it increases it.”
When AI can produce code at scale, the quality of human direction determines the quality of output. The team emphasized that the interesting parts of software engineering remain intensely human: building compelling products, designing scalable systems, writing complex algorithms, making judgment calls about user experience.
AI handles the mechanical parts faster than humans can type. That shifts engineer time toward higher-leverage activities — which is exactly the point.
What this means for synthesis coding
Reading this case study, I kept noticing patterns I’ve been documenting:
The senior engineer mental model — treat AI as a capable colleague needing context, not an oracle.
Foundation-first development — humans build the architecture, AI scales the implementation.
Persistent context files — AGENT.md/CLAUDE.md documents that shape AI behavior across sessions.
Planning before coding — create shared understanding before generating code.
Conductor, not player — manage AI work rather than doing it yourself.
The Sora team didn’t call their approach synthesis coding. But they discovered the same patterns independently, under production pressure, with real stakes. That’s validation.
The uncomfortable implication
If four engineers can ship a #1 Play Store app in 28 days with this approach, what does that mean for teams that haven’t learned it?
The productivity gap between organizations that master human-AI collaboration and those that don’t will widen. Not because the tools are scarce — Codex and Claude are available to anyone. But because the practices that make them effective require learning, organizational investment, and genuine skill development.
The Sora case study shows what’s possible. The question for engineering leaders: how quickly can you get your teams there?
This article is part of the synthesis coding series. For the foundational framework, see The Synthesis Engineering Framework.
Rajiv Pant is President of Flatiron Software and Snapshot AI, where he leads organizational growth and AI innovation. He is former Chief Product & Technology Officer at The Wall Street Journal, The New York Times, and Hearst Magazines. Earlier in his career, he headed technology for Condé Nast’s brands including Reddit. Rajiv coined the terms “ synthesis engineering” and “ synthesis coding” to describe the systematic integration of human expertise with AI capabilities in professional software development. Connect with him on LinkedIn or read more at rajiv.com.