Failure Mode

Grocery Shopping Is Hard

The store errand looks boring, which is exactly why it breaks useful things.

Every image in this article is generated. John is an emergent fictional character from an image-model continuity experiment, not a real person.

The Strange Moment

Grocery shopping should be easy for a generated world. That was my first mistake.

It is visually common. It has obvious props. Parking lot, storefront, cart or basket, milk, maybe a bag of soil if the model has decided today is a home-improvement crossover episode. A person goes in, gets an item, pays, returns to the car, drives home. This is not high fantasy. This is the content equivalent of a Tuesday.

And that is why it is a good test.

A store errand forces the harness to manage object state across several locations. The basket can be empty, then contain an item, then be carried, loaded, set down, or returned. The phone can interrupt the task. The vehicle has to remain the same vehicle. The location has to move from house to store and back without silently compressing half the trip into a single image.

In the runs, John handles a black shopping basket, a yellow-green bag, a gray SUV-shaped problem, a parking lot, and messages that keep arriving at exactly the wrong time. It is the generated-world version of trying to buy one thing and being punished for existing in public.

Generated image of John carrying a black basket beside a gray SUV in a store parking lot.
Exploration 04, frame 0007. The run jumps into a Dollar General parking lot while the expected state still has garage residue.

What the System Was Trying to Do

The harness was trying to preserve a mundane errand as a chain of concrete actions. Pick up or carry the basket. Move toward checkout. Return to the same vehicle. Load or handle the purchased object. Avoid restarting completed message/photo work. Keep John moving through the world rather than letting him dissolve into a phone screen forever.

The store task is a nice stress test because there is no single hero frame. A store is mostly transitions. Walk across the lot. Enter. Pick up the thing. Pay. Leave. Put it in the car. Close the hatch. Drive. Unload at home. Most of the truth lives in the connective tissue.

That is exactly what image models are tempted to skip. They can produce a convincing parking lot, a convincing basket, a convincing garage, and a convincing person carrying something. The risk is that the run becomes a slideshow of plausible errand states without enough proof that one caused the next.

Run note: Exploration 04 ends with a billing-limit skip while John is visually checking the garage floor around the SUV. Even the infrastructure found a way to participate in the failure mode.

Generated image of John carrying a black shopping basket beside an SUV in a garage.
Exploration 04, frame 0039. John returns with the basket and the world has to reconcile store, vehicle, garage, and object state.

What Broke

The errand broke in layers.

First, location accounting got soft. The run would hold on to a garage state while the image clearly showed a store parking lot, or vice versa. That mismatch is more than a caption issue. If the system does not know where John is, then every object around him becomes suspect.

Second, object tracking became conditional. The basket appears, gains contents, changes role, moves between hand, store, car, and garage. Each local step is plausible. The hard part is proving that the basket is the same basket, the contents are the same contents, and the task has not been silently completed off-screen because the model knows what errands usually imply.

Third, messages interrupt the physical task. John gets a phone prompt while moving through the errand. That is great for realism and terrible for accounting. Once the phone is active, the model tends to focus on the screen because screens are explicit action surfaces. Meanwhile the basket, vehicle, and location can drift in the background.

This is the grocery-store problem: the world is simple enough to look stable, but the number of small commitments is high.

Generated image of John carrying a basket near a store entrance after receiving a message.
Exploration 05, frame 0074. The errand keeps moving while a new message thread competes for the next action.

Why It Is Interesting

I keep coming back to errands because they punish vague intelligence. A model can sound smart about a plan. A model can produce a visually nice store. But errands require state discipline.

Did John buy the thing or merely approach the store? Is the basket empty, full, loaded, or abandoned? Did he return to the same vehicle? Did the phone interruption happen before or after checkout? Did the world move because John moved, or because the prompt implied the next normal scene?

These are not glamorous questions. They are the questions that make a virtual world behave like a world instead of a mood board with continuity aspirations.

Next Harness Change

The next harness needs explicit errand phases: depart, arrive, enter, acquire item, checkout, return to vehicle, load item, return home, unload. The model should not be allowed to claim a later phase unless the required object and location facts have been earned or deliberately summarized.

It also needs object receipts. A basket is not just visible or not visible. It has identity, contents, carrier, and current location. Until that is tracked as first-class state, grocery shopping will remain surprisingly hard, which is rude but not inaccurate.