Back to Blog
Published on

How I built a 'Bot-Free' AI Super App using Electron, BullMQ, Qdrant & MCP

webdevarchitecturenodeaiengineering
How I built a 'Bot-Free' AI Super App using Electron, BullMQ, Qdrant & MCP

If you work in tech, you’ve probably noticed the sudden explosion of "AI notetakers". Every time you join a Zoom or Google Meet call, two or three headless browser bots join as participants.

Honestly, I got tired of it. It feels intrusive, it disrupts the flow of the meeting, and corporate IT departments hate the privacy implications of sending raw board-meeting audio to random third-party clouds. On top of that, despite having a transcript, I still found myself spending 30 minutes manually translating the discussion into perfectly scoped Jira or Linear tickets.

So, I decided to build Plan AI.

Instead of building another bot, I built a privacy-first, local system audio recorder that orchestrates a complex asynchronous AI pipeline. It automatically generates engineering tickets, Markdown documents, and Mermaid.js architecture diagrams.

If you love system design, distributed queues, Vector DBs, and bleeding-edge AI tooling (like Model Context Protocol), grab a coffee. Here is a massive technical teardown of how this monorepo actually works under the hood.


1. Defeating the "Bot" (System Audio via Electron)

The biggest product hurdle was getting rid of the virtual meeting participant. The solution was to capture the OS system audio directly.

Here is the fundamental difference between the standard industry approach and the Plan AI architecture:

The Old Way (Intrusive Bots)**

Loading diagram...

The Plan AI Way (Native Capture)**

Loading diagram...

I used Electron for the desktop app. By utilizing native desktop APIs (desktopCapturer and getUserMedia), the app listens to the system's output (what you hear from the speakers) and the microphone input simultaneously. The streams are mixed locally.

The challenge here is memory bloat. You can't just keep a 2-hour uncompressed WAV file in RAM. The Electron app writes the audio stream to the local disk in chunks using a MediaRecorder, producing an optimized WebM/Opus file. Once the meeting ends (or periodically for live transcription), the file is securely uploaded to the backend via a multipart stream to prevent blocking the Node server.

To complete the ecosystem, I also built a React Native (Expo) mobile app for in-person meetings, and a React web platform where you can chat with a "Live Meeting Assistant" (polling the transcript every 15 seconds) while the meeting is still happening.


2. The Asynchronous Orchestrator (BullMQ + Redis)

Processing large audio files and hitting multiple LLMs is incredibly CPU and network-intensive. If you try to process a 100MB audio file synchronously in an Express route, the Node.js event loop will block, and your server will immediately drop connections.

I designed a distributed pipeline using BullMQ and Redis. When the Electron app finishes uploading the audio, the Express controller immediately returns a 200 OK and pushes a job to Redis.

// Pushing heavy tasks to the background export async function queueAudioProcessing(meetingId: string, filePath: string) { await meetingQueue.add('process-audio', { meetingId, audioPath: filePath, }, { attempts: 5, backoff: { type: 'exponential', delay: 5000 }, removeOnComplete: true, }); }

From there, the background worker picks up the job and orchestrates a multi-step pipeline:

  1. Transcription (Deepgram): The audio is sent to Deepgram's API. I chose Deepgram because it’s blazingly fast and handles speaker diarization (Speaker 1, Speaker 2) out of the box.
  2. Voice Biometrics (SpeechBrain): Relying solely on Deepgram for speaker identification across multiple meetings doesn't work. So, I spun up a local Python microservice using FastAPI and SpeechBrain. The Node worker sends audio slices to Python, which extracts voice embeddings and verifies if "Speaker 1" is actually the CTO.
  3. Resilience & Rate Limiting: Because we are hitting external APIs (Deepgram, Jira, Linear), BullMQ allows me to implement strict rate-limiting and Dead Letter Queues (DLQ). If Jira's API goes down, the job pauses and retries exponentially without losing the audio payload.

3. OpenRouter & BYOK (Bring Your Own Key)

When building an AI app, hardcoding gpt-4o or claude-3.5-sonnet is a recipe for disaster. APIs go down, rates change, and some tasks require different levels of intelligence.

Instead of locking into OpenAI or Anthropic directly, Plan AI uses OpenRouter as a unified LLM gateway. In the backend, we dynamically route tasks based on their specific cognitive requirements:

  • Agentic Investigation: Routed to openai/gpt-4o-mini (incredibly fast and cost-effective for doing rapid context gathering and semantic searches before task creation).

  • Final Task Extraction: Routed to anthropic/claude-opus-4.7 (expensive, but unmatched at deeply understanding project context and structuring perfectly scoped tickets).

  • Architectural Diagrams: Routed to anthropic/claude-sonnet-4.6 (the absolute best model for coding and generating complex Mermaid.js diagrams).

  • Image Features: Routed to black-forest-labs/flux.2-klein-4b (a high-throughput image model that replaced DALL-E for our visual features).

  • Fallback Logic: If a primary model or provider goes down, we use OpenRouter's providerOptions to automatically fall back to alternative models, ensuring the background queues never stall.

Furthermore, Plan AI uses a BYOK (Bring Your Own Key) architecture. Instead of the platform paying for the API usage and charging a markup, users input their own OpenRouter keys per Workspace. This drastically lowers the SaaS operational costs and guarantees data privacy.


4. Semantic Memory (Qdrant Vector DB)

Having a raw transcript isn't enough to generate good Jira tickets. If a developer says "I'll fix the auth bug", the LLM needs to know what the "auth bug" actually is.

To solve this, I integrated Qdrant, an open-source Vector Database. Every time a meeting finishes, the transcript is chunked, vectorized using an embedding model, and stored in Qdrant.

When the background worker extracts tasks, it performs a semantic search first. It pulls past architectural decisions and project context from previous meetings, injecting them into the LLM prompt. This transforms generic output into highly specific engineering tickets with exact acceptance criteria.


5. The Mobile Companion (Expo 55 & React Native)

While the Desktop app captures virtual calls, in-person meetings require a different approach. I built the mobile companion app using the latest Expo SDK 55.

If you haven't touched React Native in a few years, the ecosystem has completely transformed. Using Expo Router, file-based routing makes navigating the mobile app feel exactly like Next.js. The entire codebase is strictly typed with TypeScript, sharing the exact same generated types (api.d.ts) as the React Web app and the backend.

This means that fetching a transcript on mobile, rendering the Live Assistant chat, and displaying the generated tasks shares 90% of the mental model with the web frontend. Clean architecture, zero manual type casting, and incredibly easy to maintain.


6. Bleeding-Edge DevEx: GitNexus (MCP)

This is a large monorepo, and building it solo required heavy use of AI coding assistants (like Cursor and Cline). But LLMs hallucinate when they don't understand the full codebase architecture.

To solve this, Plan AI ships with a Model Context Protocol (MCP) server via GitNexus. GitNexus indexes the entire monorepo into a local graph database.

When I ask my AI agent to "Modify the slide generation service", the agent doesn't just guess. It automatically calls gitnexus_impact() to see the blast radius of the change, and gitnexus_query() to understand the exact execution flow from the Express Controller down to the BullMQ worker. It makes AI pair-programming incredibly safe and deterministic.


7. Bridging Tech and Non-Tech (Repomix & GitHub)

The ultimate goal of Plan AI isn't just to write tickets; it's to completely eliminate the friction between non-technical stakeholders (Product Managers, Designers) and technical execution (Developers and AI Agents).

When a Product Manager finishes a meeting, Plan AI automatically generates the engineering tickets and pushes them directly to GitHub Issues (or Linear/Jira). But to make those tickets immediately actionable for an AI coding assistant (like Cursor or Copilot), we integrated Repomix.

We use a simple script (yarn repomix) that packs the entire monorepo into a single, highly-optimized Markdown file (ignoring tests, builds, and node_modules).

The workflow is magical:

  1. The PM speaks in the meeting. Plan AI creates a highly technical GitHub Issue with exact acceptance criteria.
  2. The developer assigns the issue to their AI coding assistant.
  3. The AI assistant reads the issue, consumes the repomix.md file to instantly understand the entire monorepo context, and writes the Pull Request.

Here is the flow of how Plan AI acts as the translation layer between human conversation and AI-generated code:

Loading diagram...

8. Extreme Type Safety (Prisma to Swagger)

When dealing with a backend, a React web app, a React Native mobile app, and an Electron desktop app, maintaining API interfaces manually is a nightmare. If the backend changes userName to username, 3 different frontends break.

I implemented an automated type-safety pipeline:

  1. Prisma is the single source of truth for the database schema.
  2. Prisma feeds into TSOA (TypeScript OpenAPI) controllers on the backend.
  3. TSOA automatically generates a swagger.json specification based on decorators.
// Backend Controller @Route("api/tasks") export class TaskController extends Controller { @Get("{id}") public async getTask(@Path() id: string): Promise<TaskResponse> { return taskService.findById(id); } }
  1. I wrote a single script (yarn update) that migrates the DB, regenerates the swagger file, and runs openapi-typescript to sync an api.d.ts file across all three frontends simultaneously.
Loading diagram...

This completely eliminated the dreaded TypeError: undefined is not an object across the entire stack.


9. Going Open Core (BUSL-1.1)

Because this app asks for permission to record system audio, trust is paramount. I couldn't just release a closed-source binary and expect engineers to install it.

I decided to open-source the core of the project under the Business Source License (BUSL-1.1), which converts to AGPLv3 after 4 years. This allows anyone to audit the Electron source code, verify that it is truly privacy-first (no spyware), and even self-host the entire infrastructure using Docker.

If you love system architecture, RAG orchestration, or just want a tool that writes your Jira tickets for you locally, you can check out the source code here:

GitHub Repository: Plan AI
Website & Demo

I'd love to hear how you guys handle async audio processing or type safety in your own massive monorepos! Let me know what you think in the comments.