September 22, 2025

Voice agents take center stage: Highlights from the SF Voice Agent Hackathon

Developers, entrepreneurs, and AI enthusiasts filled a San Francisco venue on Friday, September 19th for an exclusive one-day hackathon focused on building cutting-edge voice agents. AssemblyAI, LiveKit, Rime, and Accel joined forces to challenge teams to create innovative conversational AI applications using the latest in speech recognition, real-time communication, and voice synthesis technology.

Devon Malloy

Staff Growth Manager

AI voice agents

Streaming Speech-to-Text

Reviewed by

Table of contents

[Visible on live site]

With a $1,500 grand prize on the line and category winners taking home $300 each, the competition was fierce. But beyond the prizes, this event showcased how voice agents are moving from experimental tech to practical solutions that solve real problems.

What's a voice agent hackathon?

The goals were straightforward: build something innovative using AssemblyAI's speech-to-text capabilities, LiveKit's real-time infrastructure, and Rime's voice synthesis, then show it off. Teams had just one day to go from concept to working demo, and the results exceeded expectations.

"We wanted to see what happens when you give developers access to the best voice AI tools and let their creativity run wild," explained one of the event organizers. The answer? Everything from restaurant kitchen assistants to civic engagement platforms.

Participants kicked off the day with a quick overview of the Hackathon schedule.

Time for creation and innovation

Teams formed quickly, some arriving with pre-planned concepts while others met and formed spontaneously at the event. You could see the energy building as developers spread out across the venue, grabbing whiteboards and diving into the AssemblyAI documentation.

The variety of approaches was striking. Some teams went straight for practical business applications, while others tackled broader social problems. A few developers even brought their own microphones to test their speech-to-text implementations in real-time.

Coffee cups multiplied, energy drinks appeared, and the familiar hum of a productive hackathon filled the room. By afternoon, sandwhiches arrived to fuel the final push toward demo time.

Show us what you made

When presentation time arrived, the range of solutions was impressive. From undergrads to senior developers, teams had built applications that showed just how versatile voice agents can be when you combine the right tools.

And the winners are...

After seeing the incredible range of projects, the judges faced the tough task of selecting winners from so many creative solutions.

Best Overall: Voxy

Taking home the grand prize was Voxy (Bob Summers), a zero-code platform that completely reimagines voice agent development. Instead of requiring weeks of technical work, Voxy lets businesses input their company name and get a fully configured voice agent.

The system includes agent memory that persists across conversations and can switch between different voices during the same conversation. In their live demo, the team built "Good Call AI" from scratch and configured it for 9 different employee personas across Seattle, Palo Alto, and Virginia—all without writing a single line of code.

"The platform just works," one judge commented. "You can see how this could make voice agents accessible to any business, not just those with technical teams."

Overall winner Bob Summers completing his presentation of Voxy, a zero-code platform that completely reimagines voice agent development.

Most Technically Complex: Podweaver

The Dynamic Podcast Ad Insertion project Podweaver (Anup Ghatage) took home the technical complexity award for their sophisticated system that replaces podcast ads with natural-sounding sponsorships. Their achievement in maintaining 250-millisecond timing precision while cloning voices and ensuring organic-sounding content impressed the technical judges.

Most Fun

The judges couldn't pick just one winner in the "Most Fun" category, so they awarded three teams:

Deep Fried Learning (Eldar Akhmetgaliyev, Ulugbek Isroilov, Bayram Annakov, Sapar Shayan) for their voice-controlled cooking assistant that makes recipe following actually enjoyable
Hands-Free Podcast (Kai Song Eer, Jay Yen Lim, Miyu Selene Horiuchi) for bringing voice interaction to podcast consumption
Work Daddy (Yilun Sun, Lucas Ho, Aditri Bhagirath) for their productivity platform with behavioral tracking and focus improvement

Real problems, voice solutions

Team Tokyo brought their restaurant experience to the hackathon with a Kitchen Assistant that coordinates chef activities through voice commands. No more trying to check order status with flour-covered hands—the system tracks preparation stages and manages multiple concurrent orders entirely through conversation.

The Dynamic Podcast Ad Insertion team tackled something completely different: seamlessly replacing podcast ads with more natural-sounding sponsorships. Their system identifies speakers, clones voices, and maintains perfect timing (within 250 milliseconds) while ensuring the replacement content sounds organic rather than obviously AI-generated.

"We had to fine-tune our model to avoid that typical 'LLM voice,'" the team explained. "The goal was making it sound like the hosts are naturally talking about sponsors, not reading from a script."

Deep Fried Learning (one of the "Most Fun" winners) solved the universal problem of trying to follow recipes while cooking. Their voice-controlled cooking assistant integrates with Instacart for ingredient ordering and provides hands-free guidance through complex recipes. The monetization strategy was clear from the start—every ingredient suggestion could turn into a purchase.

Hands-Free Podcast (another "Most Fun" winner) brought voice interaction to podcast consumption, though the specific implementation details weren't captured in our notes. The project clearly resonated with judges for its entertainment value and practical application.

Solving everyday headaches

Some of the most compelling projects addressed simple but persistent problems. AI Minder targets busy professionals and founders who forget basic tasks like buying milk or handling quick financial calculations. The voice-first approach recognizes that traditional productivity apps often fail because they require visual attention at inconvenient moments.

"I built this because I'm part of the 5% with ADHD," the creator explained. "We're already spending around $100 monthly on medication, so why not try a different approach through technology?"

Civic Voice tackles the friction in civic engagement by automatically routing citizen complaints to the right government departments. Report a pothole, noise complaint, or policy concern through natural conversation, and the system figures out who should handle it while building data for policy insights.

Pathfinder helps uncertain students figure out their career direction through guided conversations. Instead of overwhelming multiple-choice assessments, it uses dialogue to discover interests and creates personalized development pathways with local resources for experimentation.

Technical innovation on display

The Scam Fighters team addressed the $28.8 billion annual fraud problem with deep fake detection and ML models designed to waste scammers' time. Their success metric was impressive: they kept a scammer engaged for 33 minutes while maintaining legal compliance around voice AI disclosure.

True Voice identified a gap in voice agent deployment that many teams overlook: making sure the voice actually matches the brand. Their platform analyzes company websites and brand guidelines to recommend appropriate voices from Rime's library, then provides real-time emotional adjustment based on customer interactions.

Work Daddy (one of the "Most Fun" winners) combined multiple technologies to help ADHD users improve focus. Using Gemini for interval detection, screen and webcam screenshots, and Apple Script to detect social media usage like Instagram, the system provides insights into time and efficiency patterns. The platform targets the 5% of the population with ADHD who currently spend around $100 monthly on medication, offering a technology-based approach to focus improvement.

The technology stack that made it possible

What struck the judges was how seamlessly teams integrated the sponsor technologies. AssemblyAI provided the speech recognition foundation that nearly every project relied on, LiveKit handled the orchestration infrastructure, and Rime delivered natural-sounding voice synthesis.

This combination let developers focus on solving problems rather than building infrastructure. Teams could spend their limited time on the creative aspects—the conversation flows, the user experience, the business logic—rather than wrestling with basic speech recognition or audio processing.

More than just demos

Check on Loved Ones addressed medication compliance for elderly family members through gentle voice check-ins. The system monitors whether medications are taken on schedule and alerts family members when there are concerns—a simple idea with significant potential impact.

Smooth Life created a daily planning tool that reduces overwhelm through structured conversations. Five-minute morning and evening check-ins help users process their thoughts and plan their days without the complexity of traditional productivity systems.

Several teams worked on variations of voice-controlled task management, translation services, and conversation analysis—showing how certain use cases naturally emerge when voice interaction becomes accessible.

Connection doesn't stop here

The day wrapped up with team photos and plenty of LinkedIn connections being made. Developers shared implementation details, discussed potential partnerships, and exchanged ideas for future projects.

What made this hackathon special wasn't just the technical achievements—though those were impressive. It was seeing how quickly good ideas can become working prototypes when developers have access to the right tools and a supportive community.

The projects showcased here represent just the beginning of what's possible as voice agents become easier to build and deploy. From zero-code platforms to specialized domain applications, the future of conversational interfaces is being built one hackathon at a time.

Hackathons like this are just one way we engage with the developer community. If you want to get more connected, sign up for our newsletter. See you at the next one!

Voice agents take center stage: Highlights from the SF Voice Agent Hackathon

What's a voice agent hackathon?

Time for creation and innovation

Show us what you made

And the winners are...

Best Overall: Voxy

Most Technically Complex: Podweaver

Most Fun

Real problems, voice solutions

Solving everyday headaches

Technical innovation on display

The technology stack that made it possible

More than just demos

Connection doesn't stop here

The 300ms rule: Why latency makes or breaks voice AI applications

Voice agents in healthcare: Automating phone interactions for scheduling, billing, and more

Introducing Multilingual Universal-Streaming: Go global with ultra-fast, ultra-accurate real-time speech-to-text

Transcribe a phone call in real-time using Python with AssemblyAI and Twilio

Improved Punctuation Restoration & Truecasing Models

Best Speech-to-Text Software

Why You Should (or Shouldn't) be Using Google's JAX in 2023

What is Audio Intelligence?

Voice agents take center stage: Highlights from the SF Voice Agent Hackathon

What's a voice agent hackathon?

Time for creation and innovation

Show us what you made

And the winners are...

Best Overall: Voxy

Most Technically Complex: Podweaver

Most Fun

Real problems, voice solutions

Solving everyday headaches

Technical innovation on display

The technology stack that made it possible

More than just demos

Connection doesn't stop here

Related posts

The 300ms rule: Why latency makes or breaks voice AI applications

Voice agents in healthcare: Automating phone interactions for scheduling, billing, and more

Introducing Multilingual Universal-Streaming: Go global with ultra-fast, ultra-accurate real-time speech-to-text

Transcribe a phone call in real-time using Python with AssemblyAI and Twilio

Improved Punctuation Restoration & Truecasing Models

Best Speech-to-Text Software

Why You Should (or Shouldn't) be Using Google's JAX in 2023

What is Audio Intelligence?