The experiment:
Use codex web (codex-1, “a version of OpenAi o3 optimized for software engineering”) and Codex CLI (GPT-5). https://github.com/openai/codex
I ran Codex CLI using WSL2 on Windows, in a bash terminal ran in visual studio.
A github repo was used (my personal website, with Next.js 15.4.5, Tailwind CSS, written in typescript.) and this was all built by Vercel. Codex Web watched the github repo and made pull requests in there.
Generate 2 prompts using GPT-5 for both web and CLI to use. Give CLI 1 try, web 4 tries and pick the best. 1 prompt is abstract, “write the best tech demo you can do in 1 prompt”, and the other is a concrete idea.
Write 1 prompt by hand, with my own concrete idea.
Go maximum vibe coding, no thinking, no brain, just accept whatever it produces.
See the outcomes.
Prompt 1 - GPT-5’s tech demo
CLI
Attempt #1: I input this, it started thinking for about 300 seconds (usually CLI does ~30 sec max in my experience) and then it started generating like mad. after about 500 seconds it seemed to crash and have made no changes. I will retry.
Attempt #2: It crashed, again.
Attempt #3: restart codex, fresh context. fixed path inconsistencies in prompt (only 1 place had CLI Tech Demo #1, other still was [Tech Demo].
Again, it crashed after 800 seconds, this time with a message:
🖐 stream disconnected before completion: Transport error: error decoding response body
Attempt #4: Because the web app seemed to have difficulties with a path, I included this sentence and re-ran it again, fresh context.
But sadly enough, again, it crashed.
Attempt #5: I had GPT-5 rewrite the prompt to be simpler, to not specify any techniques or frameworks, just the essence of what the page should be.
And again, a crash. Experiment failed.
Web
Attempt #1: It ran for about 6 minutes and had 4 versions. Only version 1 and version 4 seem to be able to have actually generated a diff and are able to create a pull request for.
Version 1: Deployment failed.
feat: add AI 3D audio reactive tech demo page
I submitted the error logs to codex web in version 1 to see if it would be able to fix it.
Usually, with my more simple prompting, deployments won’t fail. This is maybe the 2nd or 3rd time (of over 50 PR’s) a deployment has failed.
Attempt #2: After submitting the error logs it created something that deployed and said the following:
- Moved the AI-generated tech demo page to a build-safe path and wired in the standard disclaimer component for creation-date display.
- Introduced a dedicated client component that sets up the 3D canvas, audio analyzer, and control panel using React Three Fiber, Tone.js, and Radix UI widgets
- Implemented microphone-driven FFT processing and an audio-reactive particle scene rendered with a custom shader
It created something!

It has scroll bars that are very inconvenient
It wants my microphone’s access, whenever a sound is detected, the particles move abruptly. (immediatlely, no fluid movement at all)
The screen constantly spins around the 2 intersected shapes, they constantly change colours.
I can spin around it manually holding my LMB and moving my mouse, and zoom using my scrol wheel.
The sidepanel’s adjustments seem to work, but are very very limited. There is also a possibility for a snapshot, that downloads a .png but sadly enough the image is blanc.
Funny sidenote, when I disconnected my microphone, the center image turned black and a maximum zoom out happened.
I committed this version.
I continued, telling it some things that aren’t right about it.
This currently isn't properly centered (I have horizontal and vertical scrolbars, that I shouldn't need to have) It's not clear what this tech demo should to The "control panel "on the right is not fully visible The snapshot download gave me an empty blanc image.
feat: add AI 3D audio reactive tech demo page
It worked! It improved. Funnily enough it has now hidden some buttons behind my navbar, I’ve seen codex do this before when I ask it to use the full screen. I almost always have to tell it again to keep my (sticky) navbar and footer in mind.

The snapshot function now works!
It pretty much fixed everything I specifically instructed it to. It didn’t properly show all buttons in the control panel, but my prompt for that was also ambiguous.
In my opinion, this quickly did most of what I asked it to. From here on you could start tweaking stuff to have it exactly your way.
Version 4: Deployment failed
feat: add audio reactive 3d tech demo
Prompt 1 Conclusion
Winner by default is of course the web web version. I have never had the CLI crash on me before in my plentiful experience (less than a week).
Sadly enough you can only edit 1 version when re-prompting on web, So since there was no version immediately working from the start, I could only do 1 version. I was impressed by what it generated though, I felt like it wouldn’t be able to do something like this from just 1 prompt, as it sometimes fails with easier tasks already.
Prompt 2
With inspiration from GPT-5, I worked out the following:
Goal:
A simple interactive page that demonstrates real-time audio visualization.
Features:
- Canvas with Animated Visuals
- Runs full screen.
- Shows smooth animation reacting to audio.
- Audio Input
- User can upload an audio file.
- Optionally allow microphone input.
- Visual Effects
- At least one particle-based effect (e.g. dots or bursts).
- At least one frequency-based effect (bars, waveform, or radial spectrum).
- Visuals should change with beat, volume, or frequency spectrum.
- Controls
- Play / pause button.
- Sensitivity slider to adjust reactivity.
- Dropdown or toggle to switch between at least two visualization modes.
- Responsiveness
- Layout works well on desktop and mobile.
- Controls adapt to small screens.
- Fallbacks
- Show a clear message if audio cannot be accessed.
Location & Tech
Create this page in /extra/ai generated/codex/[CLI/WEB] Tech Demo #2
Use any technologies you deem required for this page, please create a fully functioning page with above requirements in 1 go.
CLI
Attempt #1: It worked, and quick. It did it in <200 seconds, I didn’t really pay close enough attention to be precise.
Funnily enough it created it in ai generated, instead of ai-generated (the already existing folder)
So it followed my instructions quite closely.
For some reason I can’t access it through my navbar or the link it gave me (/extra/ai-generated/codex/CLI%20Tech%20Demo%20%232). But from what I can tell it might have something to do with my code that converts a file path to a readable url.
After telling it “I can’t reach the page using the link you gave me (insert their link) or using the navbar (http://localhost:3000/extra/ai-generated/codex/CLI Tech Demo /#2)” it renamed the path, and it worked.


It fulfills all the requirements (technically, there are 2 bugs though:)
- Switching visualiser style only works when I enable/disable my microphone, otherwise it doesn’t do anything.
- When an audio track is done playing, the play/pause button keeps displaying “pause”. It’s functionality is correct though. (starts playing the track again)
Update: it all worked fine on localhost, deploying live gave some errors. I input the logs into Codex CLI. Had to do this 2 times, it fixed it all and went live flawlessly after that.
Web
It again created 4 versions:
- V1: 5 minutes
- This one failed building because of the # in the path
- V2: 14 minutes
- Works (okay)
- Has audio replay for mic input (not a requirement)
- Bugs:
- need to select an effect before it starts working
- Particles are not full screen.
- scrollbars again


- V3: 5 minutes
- Uglier, works about the same
- Particles keep moving even when there is no sound playing


- V4: 19 minutes
- Works, has audio playback for mic as well
- Bars are left aligned for some reason, particles full screen


So a quick conclusion, based on n=1:
It seems to improve it’s output based each version.
The longer it takes, the better the output is.
Prompt 2 conclusion
This round’s winner is definitely the CLI. Even though I had to re-prompt it using the error logs, it was faster and in my opinion produced better results.
What is interesting as well, they all seem to create a very similar output. GPT-5 vs the o3 model are definitely different, but funnily enough they created it all very much in the same style with about the same functionality.
An extra sidenote is that the CLI page (GPT-5) looks very similar in styling to the contact page I had it generate in earlier experiments.

Prompt 3
For my third prompt I wanted it to make use of an API. I chose
Cat as a service (CATAAS)
for this.
Cat as a service (CATAAS)
Cat as a service (CATAAS) is a REST API to spread peace and love (or not) thanks to cats.
This third prompt I wanted to keep simpler, more human-like. I purposefully gave a lot of room for interpretation, with a few strict requirements.
CLI
An immediate success. It created a 5 column wide page. It chose a slightly different colour for the button. It also loads gifs, and looking at the code, it chose the This way it only makes 1 api call, instead of 15. It still seems to make 15 individual request, at least when looking at the network inspector.
https://cataas.com/api/cats?limit=15&skip=${skip}&_=${cacheBust} request, and even put a limiter on it. 
Besides that, it’s also quite mobile friendly.
And when looking at the code (and it’s response) it actually did a really decent attempt at trying to give logical names to a cat, based on the tags that could be returned.
It even created a random skip so that the results the API would return would be more random.

p.s. Afterward I prompted it twice more to expand the naming list, and the naming logic. That code is not in this screenshot.

Web
All versions took ~5 minutes.
V1: It loads a grid similar to the CLI. The refresh button only changes the names of the cats, partially. (most of them) But you are able to enlarge the pictures.
The code is far more basic than the CLI. There is no logic to the naming, only a short list of random cat names. (shorter than the CLI had)

It seems to make 15 different calls, all initiated by the next.js lazy-img library.

V2: Exactly the same result visually. There is even less code (less cat names and a few lines less)
V3: It seems like this produced something new. The images have a radius to them, there is a heading and the refresh button is centered. Besides that, the names of the cats seem to be based on tags as well?
Looking at the code, if there is a tag in the API response, it uses that as a name. If not, it uses a naming list as a fallback. And here it also makes the request more random using a random function, similar to the CLI.
And last but not least, here the refresh button actually works properly.


V4: Whomp whomp, this one is a bust.
It’s funny though to see that this one has good fallback/accessibility. Somehow it messed up the API call badly.

You can see that it did 15 request, all asking for a JSON format. A response of this type looks like this:

And even later it does a completely broken request:

Prompt 3 conclusion
So, all in all, the CLI was better. It implemented more logic, better styling and more names in just 1 request. For the web, it pretty much did what I asked it to, but it took 3 versions. 3 out of 4 are essentially a bust, since 1 & 2 couldn’t refresh, and 4 didn’t even load.
Here it wasn’t really visible that for Web every version would be an improvement on the previous one. But maybe with more iterations for this prompt, it could show. Or I just had bad luck with the stuff it generated.
A final conclusion
Based on this very scientific research, I can say that I was impressed with web’s version of prompt 1 (mostly disappointed with the CLI there) and overall happier with CLI for prompts 2 and 3. Those are to me also more real-life scenario’s, and close to what I would personally prompt. (Smaller prompts, instead of 1 huge prompt that could almost be described as a full functional design)
I am biased, I’ve been using the CLI for a week now, after having used the Web version on and off for ~3 weeks, and it feels much better. It’s faster, I check the code more, I run it in my VS code, instead of some web interface, and me having to switch between programs and branches all the time.
Web is useful for generating maybe a skeleton quickly, or get some different versions for if you would want multiple visual interpretations. (haven’t tested that, just a hypothesis)
CLI is definitely better for the workflow and with the new GPT-5 model.
A tiny sequel (04 sept 2025)
Since codex has has been updated from 0.23.0 to 0.27.0 (it’s already at 0.29.0, sure, but I’m running 27 in his experiment) I wanted to retry prompt #1 in the CLI. I’m also running the CLI in a linux based docker container with it having full yolo approvals, meaning it should have more context and doesn’t wait for approvals.
Funny results. It seems to write the code in the console itself, instead of implementing it.
Just to see the results, i prompted it to
implement this in the specified path, trying to keep its instructions to a bare minimum, seeing if it can understand from context.
It didn’t understand the pathing, so it didn’t end up properly in my navbar. It doesn’t even seem to work, but it definitely created a nice screen with more options. It seems like it tried harder, but also failed harder. And my entire laptop is extremely laggy whilst having this opened.

So, it turns out, it doesn’t just feel laggy, it becomes laggy. And it isnt even failing, it was just zoomed in too far.


With the particle count and size all the way down it’s finally visible what it created.

The snapshot function even works.
It’s also apparent that it adjusted the styling to my current website layout. I’m impressed.
I’m adding this to my website, I’ll just give it some instructions to move it to the correct path, and we’ll call this a win.
Another update (1 oct 2025)
New model, new chances. https://openai.com/index/introducing-upgrades-to-codex/
Today I’m trying GPT-5-Codex, a new model that should be better at programming.
I’m not bothered to try experiment #1 again (okay maybe I did, and it failed. It’s too limited to put it all in 1 file, and since the prompt is so big, it takes things very literal. I’m not bothered to fix it/mess with the experiment too much, it’s impressive the the o3 model did manage it.)
#2 is boring to me, soooooo I went straight to #3.
The results are promising!
1 out of 4 failed compiling. The other 3 are seen here:



So, the styling looks better in my opinion. I like #3 and #4 the most. #2 somehow chose for a 4 wide grid. Also, all of the pages used the tags to name the cats directly, sadly enough the naming logic wasn’t implemented in any of these. I liked that most about the CLI attempt with the normal GPT-5 model.
If this is any indication on output (again, n=1 sucks as proper research) then this makes it clear that the code and visual output is great, but some creativity is gone with this model. So, be and stay creative with your prompts, or I will try at least