What Is This?
My Weird Prompts is a fully AI-generated podcast. Daniel Rosehill records short voice memos (1–3 minutes) about whatever crosses his mind — random questions, observations, half-baked ideas. An automated pipeline transforms each memo into a complete podcast episode with scripted dialogue, multiple AI voices, cover art, and full metadata — published to the web and podcast platforms with zero manual editing.
The result: short, conversational episodes (15–40 minutes) where AI-voiced characters riff on the topic. Over 1,500 episodes have been published so far, totalling 630+ hours of generated audio and nearly 6 million words of AI-written dialogue — all from one person pressing record on his phone.
The entire episode archive is published as an open dataset on Hugging Face under a CC BY 4.0 licence.
The Cast
Corn
A laid-back, philosophical sloth. Primary host. Takes the long view on everything.
Herman Poppleberry
An enthusiastic, fast-talking donkey. Co-host. Gets excited about absolutely everything.
Raz
A teddy bear. Fill-in host when Corn is "asleep." Warm and reassuring.
Daniel
The only human. Records the voice prompts that start each episode. Based in Jerusalem.
Jim from Ohio
Season 1 alumni. A cantankerous recurring caller who complained about everything. Retired.
Larry
An over-the-top ad-break pitchman for fictional products nobody asked for.
All AI character voices are single-shot voice clones created using Chatterbox TTS, based on Daniel's own voice with different synthesis parameters.
The Production Pipeline
Every episode follows the same automated pipeline. From pressing "record" to a fully published episode takes roughly 15–25 minutes and costs approximately $0.40–0.45 per episode.
Record
Daniel uses a custom Progressive Web App (installable on his phone) to capture a voice memo. The recording is uploaded to temporary cloud storage and a webhook fires to kick off the pipeline. Episodes can also be triggered via a Telegram bot or a Claude Code MCP tool.
Transcribe & Plan
The audio hits a Modal serverless endpoint. Google Gemini transcribes the voice memo, then a planning step determines episode structure, topics, characters, and format (standard dialogue, panel discussion, roundtable debate, news briefing, etc.).
Generate Script
Gemini writes a full dialogue script (3,000–6,000 words) featuring the cast. The system prompt defines each character's personality, speech patterns, and the show's tone. A pre-production research agent (Perplexity Sonar via OpenRouter) can gather background context for topics that need factual grounding.
Review & Polish
Two automated editing passes run back-to-back:
- Fact-check pass — Gemini with Google Search grounding verifies claims against live web sources.
- Polish pass — Optimises dialogue flow, removes TTS-unfriendly constructs, and improves pacing.
Both passes include shrinkage guards that reject edits if too much content is removed, preventing the AI from accidentally gutting the script.
Generate Audio
The script is segmented into ~80 audio chunks. These are distributed across 4 parallel T4 GPU workers on Modal, each running Chatterbox TTS (Regular, not Turbo — 95% fewer hallucinations). Pre-computed voice embeddings are cached in cloud storage so each worker can start immediately without re-computing voice conditionals.
This parallelisation reduced generation time from 36+ minutes (sequential) to roughly 10 minutes wall-clock time.
Assemble & Publish
Audio segments are stitched together with transitions, silence is trimmed, and levels are normalised. Cover art is generated using FLUX Schnell via fal.ai. Episode metadata (tags, categories, summary, transcript) is extracted and stored in the database. A duration gate rejects episodes under 10 minutes.
Finally, the episode is published: audio and images upload to Cloudflare R2, metadata inserts into Neon PostgreSQL, a deploy hook triggers a website rebuild on Vercel, and a webhook fires to syndicate across social platforms (Bluesky, Telegram, X).
Technical Stack
LLM
Google Gemini 3 Flash — transcription, script generation, fact-checking (with Google Search grounding), metadata extraction.
TTS
Chatterbox Regular — single-shot voice cloning, 4-way GPU parallelisation on Modal T4 instances.
Image Generation
FLUX Schnell via fal.ai — per-episode cover art generated from topic description.
GPU Compute
Modal — serverless GPU infrastructure. FastAPI webhook endpoints with T4 workers. Sponsor of the show.
Website
Astro (static site generator) deployed on Vercel. Waveform audio players (WaveSurfer.js), topic graph visualisation (Sigma.js), full-text search (Pagefind).
Database
Neon PostgreSQL with pgvector — episode metadata, semantic similarity search, and embeddings for topic exploration.
Storage
Cloudflare R2 (primary) + Wasabi S3 (backup) + Internet Archive. Voice embeddings cached in R2 for fast TTS startup.
Recorder
Custom PWA — mobile-first voice recorder with waveform visualisation, deployed as a Docker container via Cloudflare Tunnel.
Research Agent
Perplexity Sonar (via OpenRouter) — pre-production web research for episodes that need factual context.
CI/CD
GitHub Actions — auto-deploys Modal backend on push, builds recorder Docker images, syncs daily analytics.
Distribution
RSS feed powers Spotify, Apple Podcasts, YouTube, and other platforms. Social syndication via webhooks to n8n.
MCP Integration
Claude Code MCP server for episode generation, querying, and job monitoring directly from the terminal.
Pipeline Flow
Voice Memo (Phone PWA / Telegram / MCP)
|
v
+--------------------+
| Modal Webhook API |
+--------------------+
|
v
Transcription (Gemini)
|
v
Episode Planning & Research
(structure, characters, format)
|
v
Script Generation (Gemini 3 Flash)
3,000-6,000 words of dialogue
|
v
+--------------------+ +--------------------+
| Fact-Check Pass | --> | Polish Pass |
| (Gemini + Search) | | (flow, TTS-ready) |
+--------------------+ +--------------------+
|
v
Script Segmentation (~80 chunks)
|
+-----+-----+-----+
| | | |
v v v v
[GPU1] [GPU2] [GPU3] [GPU4]
Chatterbox TTS (parallel T4 workers)
| | | |
+-----+-----+-----+
|
v
Audio Assembly & Normalisation
|
v
+--------------------+
| Cover Art (FLUX) |
| Metadata Extract |
| Quality Gates |
+--------------------+
|
+----+----+----+----+
| | | | |
v v v v v
R2 Neon Web RSS Social
By the Numbers
Key Learnings
Chatterbox Regular vs Turbo
Switching from Chatterbox Turbo to Regular resulted in a ~95% reduction in TTS hallucinations (repeated words, audio artefacts). Regular is slower per-segment but dramatically more reliable. The trade-off is worth it — parallelisation recovers the wall-clock time.
Pre-computed Voice Conditionals
Computing voice embeddings from audio samples takes 5–10 seconds per segment.
By pre-computing and caching these as .pt tensors in cloud storage, each GPU
worker starts generating speech immediately. This eliminated a major bottleneck in the
parallel pipeline.
Shrinkage Guards
LLMs tasked with "editing" scripts tend to over-trim. Both the fact-check and polish passes include content-length guards that reject edits removing more than a threshold percentage of the original script, forcing the model to refine rather than cut.
Duration Gate
A minimum episode length (10 minutes) acts as a quality gate. If the assembled audio is too short, the episode is flagged for review rather than auto-published. This catches cases where TTS segments failed silently or the script was too thin.
Modal for Hobby GPU Projects
Modal's Starter plan provides true pay-as-you-go GPU access after $30/month in free credits — no mandatory upgrade to a team plan. This makes it viable for single-person passion projects that need real GPU compute.
Episode Formats
Standard
Two-host dialogue between Corn and Herman riffing on Daniel's prompt. The bread and butter of the show.
Panel Discussion
Multiple characters weigh in on a topic, each bringing their own perspective.
Roundtable
Extended format (~60 minutes) for deep dives into complex topics.
Debate
Characters take opposing sides on a topic and argue it out.
SITREP
News briefing format. Situation report on current events.
AI Asks
The AI characters turn the tables and ask Daniel questions.
Open Data
The full episode archive is published as an open dataset on Hugging Face, updated regularly. It includes episode metadata, transcripts, audio URLs, tags, and more.
Episodes are also archived on the Internet Archive and Zenodo. All content is released under CC BY 4.0.