My Weird Prompts

A sloth and a donkey discuss whatever's on Daniel's mind — every episode generated by AI from a single voice prompt.

What Is This?

My Weird Prompts is a fully AI-generated podcast. Daniel Rosehill records short voice memos (1–3 minutes) about whatever crosses his mind — random questions, observations, half-baked ideas. An automated pipeline transforms each memo into a complete podcast episode with scripted dialogue, multiple AI voices, cover art, and full metadata — published to the web and podcast platforms with zero manual editing.

The result: short, conversational episodes (15–40 minutes) where AI-voiced characters riff on the topic. Over 1,500 episodes have been published so far, totalling 630+ hours of generated audio and nearly 6 million words of AI-written dialogue — all from one person pressing record on his phone.

The entire episode archive is published as an open dataset on Hugging Face under a CC BY 4.0 licence.

The Cast

🦥

Corn

A laid-back, philosophical sloth. Primary host. Takes the long view on everything.

🧄

Herman Poppleberry

An enthusiastic, fast-talking donkey. Co-host. Gets excited about absolutely everything.

🧷

Raz

A teddy bear. Fill-in host when Corn is "asleep." Warm and reassuring.

👤

Daniel

The only human. Records the voice prompts that start each episode. Based in Jerusalem.

😤

Jim from Ohio

Season 1 alumni. A cantankerous recurring caller who complained about everything. Retired.

📢

Larry

An over-the-top ad-break pitchman for fictional products nobody asked for.

All AI character voices are single-shot voice clones created using Chatterbox TTS, based on Daniel's own voice with different synthesis parameters.

The Production Pipeline

Every episode follows the same automated pipeline. From pressing "record" to a fully published episode takes roughly 15–25 minutes and costs approximately $0.40–0.45 per episode.

1

Record

Daniel uses a custom Progressive Web App (installable on his phone) to capture a voice memo. The recording is uploaded to temporary cloud storage and a webhook fires to kick off the pipeline. Episodes can also be triggered via a Telegram bot or a Claude Code MCP tool.

2

Transcribe & Plan

The audio hits a Modal serverless endpoint. Google Gemini transcribes the voice memo, then a planning step determines episode structure, topics, characters, and format (standard dialogue, panel discussion, roundtable debate, news briefing, etc.).

3

Generate Script

Gemini writes a full dialogue script (3,000–6,000 words) featuring the cast. The system prompt defines each character's personality, speech patterns, and the show's tone. A pre-production research agent (Perplexity Sonar via OpenRouter) can gather background context for topics that need factual grounding.

4

Review & Polish

Two automated editing passes run back-to-back:

  • Fact-check pass — Gemini with Google Search grounding verifies claims against live web sources.
  • Polish pass — Optimises dialogue flow, removes TTS-unfriendly constructs, and improves pacing.

Both passes include shrinkage guards that reject edits if too much content is removed, preventing the AI from accidentally gutting the script.

5

Generate Audio

The script is segmented into ~80 audio chunks. These are distributed across 4 parallel T4 GPU workers on Modal, each running Chatterbox TTS (Regular, not Turbo — 95% fewer hallucinations). Pre-computed voice embeddings are cached in cloud storage so each worker can start immediately without re-computing voice conditionals.

This parallelisation reduced generation time from 36+ minutes (sequential) to roughly 10 minutes wall-clock time.

6

Assemble & Publish

Audio segments are stitched together with transitions, silence is trimmed, and levels are normalised. Cover art is generated using FLUX Schnell via fal.ai. Episode metadata (tags, categories, summary, transcript) is extracted and stored in the database. A duration gate rejects episodes under 10 minutes.

Finally, the episode is published: audio and images upload to Cloudflare R2, metadata inserts into Neon PostgreSQL, a deploy hook triggers a website rebuild on Vercel, and a webhook fires to syndicate across social platforms (Bluesky, Telegram, X).

Technical Stack

LLM

Google Gemini 3 Flash — transcription, script generation, fact-checking (with Google Search grounding), metadata extraction.

TTS

Chatterbox Regular — single-shot voice cloning, 4-way GPU parallelisation on Modal T4 instances.

Image Generation

FLUX Schnell via fal.ai — per-episode cover art generated from topic description.

GPU Compute

Modal — serverless GPU infrastructure. FastAPI webhook endpoints with T4 workers. Sponsor of the show.

Website

Astro (static site generator) deployed on Vercel. Waveform audio players (WaveSurfer.js), topic graph visualisation (Sigma.js), full-text search (Pagefind).

Database

Neon PostgreSQL with pgvector — episode metadata, semantic similarity search, and embeddings for topic exploration.

Storage

Cloudflare R2 (primary) + Wasabi S3 (backup) + Internet Archive. Voice embeddings cached in R2 for fast TTS startup.

Recorder

Custom PWA — mobile-first voice recorder with waveform visualisation, deployed as a Docker container via Cloudflare Tunnel.

Research Agent

Perplexity Sonar (via OpenRouter) — pre-production web research for episodes that need factual context.

CI/CD

GitHub Actions — auto-deploys Modal backend on push, builds recorder Docker images, syncs daily analytics.

Distribution

RSS feed powers Spotify, Apple Podcasts, YouTube, and other platforms. Social syndication via webhooks to n8n.

MCP Integration

Claude Code MCP server for episode generation, querying, and job monitoring directly from the terminal.

Pipeline Flow

  Voice Memo (Phone PWA / Telegram / MCP)
          |
          v
  +--------------------+
  |  Modal Webhook API  |
  +--------------------+
          |
          v
  Transcription (Gemini)
          |
          v
  Episode Planning & Research
  (structure, characters, format)
          |
          v
  Script Generation (Gemini 3 Flash)
  3,000-6,000 words of dialogue
          |
          v
  +--------------------+     +--------------------+
  |  Fact-Check Pass   | --> |  Polish Pass       |
  |  (Gemini + Search) |     |  (flow, TTS-ready) |
  +--------------------+     +--------------------+
          |
          v
  Script Segmentation (~80 chunks)
          |
    +-----+-----+-----+
    |     |     |     |
    v     v     v     v
  [GPU1] [GPU2] [GPU3] [GPU4]
  Chatterbox TTS (parallel T4 workers)
    |     |     |     |
    +-----+-----+-----+
          |
          v
  Audio Assembly & Normalisation
          |
          v
  +--------------------+
  |  Cover Art (FLUX)  |
  |  Metadata Extract  |
  |  Quality Gates     |
  +--------------------+
          |
     +----+----+----+----+
     |    |    |    |    |
     v    v    v    v    v
    R2  Neon  Web  RSS  Social

By the Numbers

1,500+
Episodes Published
630+
Hours of Audio
~6M
Words Generated
~$0.42
Cost per Episode
~15 min
End-to-End Time
0
Manual Editing

Key Learnings

Chatterbox Regular vs Turbo

Switching from Chatterbox Turbo to Regular resulted in a ~95% reduction in TTS hallucinations (repeated words, audio artefacts). Regular is slower per-segment but dramatically more reliable. The trade-off is worth it — parallelisation recovers the wall-clock time.

Pre-computed Voice Conditionals

Computing voice embeddings from audio samples takes 5–10 seconds per segment. By pre-computing and caching these as .pt tensors in cloud storage, each GPU worker starts generating speech immediately. This eliminated a major bottleneck in the parallel pipeline.

Shrinkage Guards

LLMs tasked with "editing" scripts tend to over-trim. Both the fact-check and polish passes include content-length guards that reject edits removing more than a threshold percentage of the original script, forcing the model to refine rather than cut.

Duration Gate

A minimum episode length (10 minutes) acts as a quality gate. If the assembled audio is too short, the episode is flagged for review rather than auto-published. This catches cases where TTS segments failed silently or the script was too thin.

Modal for Hobby GPU Projects

Modal's Starter plan provides true pay-as-you-go GPU access after $30/month in free credits — no mandatory upgrade to a team plan. This makes it viable for single-person passion projects that need real GPU compute.

Episode Formats

Standard

Two-host dialogue between Corn and Herman riffing on Daniel's prompt. The bread and butter of the show.

Panel Discussion

Multiple characters weigh in on a topic, each bringing their own perspective.

Roundtable

Extended format (~60 minutes) for deep dives into complex topics.

Debate

Characters take opposing sides on a topic and argue it out.

SITREP

News briefing format. Situation report on current events.

AI Asks

The AI characters turn the tables and ask Daniel questions.

Open Data

The full episode archive is published as an open dataset on Hugging Face, updated regularly. It includes episode metadata, transcripts, audio URLs, tags, and more.

Episodes are also archived on the Internet Archive and Zenodo. All content is released under CC BY 4.0.