Originally published on Advisory Hour on Substack. This local copy preserves the text and inline images with a link back to the original.
T’was a busy weekend.
Feeling unreasonably confident in the new engine-world harness, I pointed Codex at John’s World and told it to run for eight hours just to see what fresh absurdities would crawl out of the machinery. Happily, this round earned that confidence. Over a much longer cycle, the world held together well enough for something pleasantly mundane to happen: a character goes shopping, comes home, and carries the groceries inside.
That may not sound glamorous, but in a generated world, boring is where the real trouble lives. Errands are full of continuity traps, object-state problems, camera transitions, and all the small connective tissue most systems try to fake their way past.
This run handled more of that than I expected. The character camera correctly shifts from third person to first person during the interaction, which is exactly the kind of detail that matters if you want the world to feel inhabited instead of merely illustrated.
So in this article, I’m walking through the latest run, the failure modes that still showed up, and the more interesting signs of progress. Also, yes, there are a lot of screenshots, because if we’re going to inspect the behavior of a synthetic little suburban universe, we might as well bring receipts.
Let’s Go Shopping
So once the character has the groceries, then what remains is taking them home…
Taking the Groceries Home
These are exported frames. What you’re not seeing are my administrative insight tools that allow me to make sense of the timeline and each image. For example, consider this image that’s as much a diagnostic tool as it is a slice of insight into a digital world that exists inside the model.
Grocery errands are done, now what?
What does a character do after done with the groceries?
I was rather curious why the character was sitting-it seems a little out of place. What was going on? Well, it turns out she was taking a break. I note the other characters that were “last seen” are still believed to be at the grocery store still at the entrance. It’s an interesting failure mode of the “if a tree falls, did it happen if no one was there” variety.
Going for a walk
I wondered why this character didn’t drive to the store. Well, it turns out she lives right next to the store so she just walks over. Convenient. However, it looks like we’re about to restart the loop of shopping at the store.
As the image sequence above shows, spatial path navigation is still uneven. Near the end, the character appears to change her mind and start walking home, which means the world logic is not fully holding yet. But rough edges like this… this is the fun part. You can see the shape of the thing trying to become real.
Characters are walking around. They are doing errands. The system is preserving enough logic to keep a trip legible across multiple beats. The camera can shift perspective when the interaction demands it. And all of this is happening at the speed of an image model, not a hand-authored scene graph or a traditional video pipeline grinding away in the background.
That matters.
A lot of people still talk about virtual worlds as if they must arrive fully assembled, blessed by some giant platform, polished into corporate submission, and presented like a theme park brochure.
What we are seeing in this build is messier, stranger, and more interesting. The world is not complete, but it is starting to exhibit the right kinds of behavior. It can maintain a character, preserve an errand, shift camera perspective on the fly, and keep enough continuity alive that the failures are no longer random image glitches. They are world-engine failures. That is a very different class of problem.
A few of the ideas in this harness still feel genuinely novel to me. One is live camera perspective switching without rebuilding the whole system around video. Another is refusing the obvious route and not using a video model as the foundation at all. Instead, this experiment leans into GPT-image-2 and pushes it into a job it was not politely designed to do. General purpose technology is profound, truly.
What’s around the corner
What this experiment shows, at least to me, is that genai virtual worlds are already possible.
They are just unevenly distributed, half-built, and currently held together with a mixture of stubbornness, harness logic, and the digital equivalent of duct tape. Having grown up in the rural parts of Indiana, I’m especially good at holding things together with wire, duct tape, and prayers. It’s our one of top ten cultural contributions.
This is also, to be clear, a side experiment. I mostly wanted to put some Codex tokens to work before the reset and see what kind of new failure modes would crawl out of the walls. Instead, the run produced something more encouraging. Not a finished system. Not a product. But a visible path-”did you know you could simulate your customers or anyone in any situation?” is a high value concept for a lot of people out there.
If this keeps improving, the gap between image generation and simulated worlds starts to look a lot smaller than people think. There’s an endless series of optimizations and possibilities, but it’s showing a glimpse of the nearby future.
Oh, speaking of.
If you want more on where this is heading, you can read the rest of my John’s World notes on the site, or subscribe below and just catch the next experiment when I inevitably teach the machine a new and slightly alarming trick.