Build & Learn
December 2, 2025

How to remove or reduce background noise from audio for (stt) transcription

Remove background noise from audio with simple online tools or software for clear recordings, podcasts, and videos. Improve sound quality in minutes.

Kelsey Foster
Growth
Kelsey Foster
Growth
Reviewed by
No items found.
No items found.
No items found.
No items found.
Table of contents

How to remove or reduce background noise from audio for (stt) transcription

This tutorial shows you how to build a complete noise reduction and transcription pipeline using Python libraries, desktop software, and online tools. You'll learn when preprocessing helps versus when it hurts your results.

We'll build an automated system that combines the noisereduce Python library with AssemblyAI's speech-to-text API to clean audio files and transcribe them in batch. The tutorial covers three approaches: AI-powered online tools for quick fixes, professional audio editing software for precise control, and Python scripts for processing hundreds of files automatically. You'll also learn to identify when noise removal improves transcription accuracy and when clean audio performs better without preprocessing.

Methods to remove background noise from audio

You can remove background noise from audio using three main approaches: online AI-powered tools, desktop audio editing software, or Python scripts. Each method works differently and serves different needs depending on how much audio you process and your technical comfort level.

AI-powered online tools for quick noise removal

Online noise removal tools are web-based services that automatically clean your audio files. This means you upload a noisy recording, the AI analyzes it, and you download a cleaner version—all without installing anything on your computer.

Popular tools like Kapwing, Media.io, and VEED.IO use AI models trained to recognize common background sounds. You simply drag your audio file into the browser, wait 30 seconds to 2 minutes for processing, then download the result.

Here's what makes online tools appealing:

  • Zero setup required: Just open your browser and start uploading
  • Automatic processing: The AI handles everything without manual adjustments
  • Multiple formats supported: Works with MP3, WAV, M4A, and video files
  • Free tiers available: Most services let you process small files for free

The downside? You can't process hundreds of files at once, and free versions limit you to short recordings under 5-10 minutes.

Professional audio editing software

Desktop software like Audacity and Adobe Audition gives you precise control over noise removal. These programs use a technique called spectral editing—you select a sample of pure background noise, then the software removes those exact frequencies from your entire recording.

Here's how it works: You find a quiet section containing only background noise, highlight it to create a "noise profile," then apply that profile to remove similar sounds throughout the file. Audacity does this for free, while Adobe Audition offers more advanced options for a monthly subscription.

The manual process takes about 3-5 minutes per file even if you're experienced. You need to carefully select noise samples and adjust settings to avoid damaging the speech quality.

Python libraries for programmatic noise reduction

Python libraries let you remove noise automatically from multiple files using code. The noisereduce library is the most popular choice—it analyzes your audio's frequency spectrum and reduces noise while preserving speech.

This approach works best if you're comfortable with basic Python programming and need to process many files regularly. You write a script once, then run it on dozens or hundreds of audio files without manual work.

Here are the key advantages:

  • Batch processing: Handle thousands of files with one command
  • API integration: Connect directly to transcription services
  • Consistent results: Same settings produce identical output every time
  • Full customization: Adjust noise reduction intensity for different audio types

Method

Best For

Setup Time

Monthly Cost

Batch Processing

Online Tools

Occasional cleaning

None

Free to $20

No

Desktop Software

Detailed manual control

15 minutes

Free to $30

Limited

Python Libraries

Automated workflows

30 minutes

Free

Yes

How to remove background noise with Python for transcription

Let's build a complete system that removes noise from your audio files and transcribes them automatically. You'll use Python to clean the audio, then send it to AssemblyAI for speech-to-text conversion—an approach proven effective in preprocessing large-scale datasets of over 22,000 behavioral health sessions for automated transcription workflows.

Install required libraries and setup

First, create a new folder for your project and install the necessary Python packages. Open your terminal or command prompt and run these commands:

mkdir noise_reduction_project
cd noise_reduction_project
pip install noisereduce scipy numpy assemblyai

The noisereduce library handles noise removal, scipy loads audio files, and assemblyai connects to the transcription service. Create a new file called clean_and_transcribe.py and add this setup code:

import noisereduce as nr
from scipy.io import wavfile
import numpy as np
import assemblyai as aai
import os

# Set your AssemblyAI API key
aai.settings.api_key = "your-api-key-here"

You'll need to replace "your-api-key-here" with your actual API key from AssemblyAI. Sign up for a free account to get started.

Get your free AssemblyAI API key

Generate your API key and start transcribing cleaned audio automatically in your Python pipeline. Compare original vs. denoised transcripts at scale.

Get API key

Load and preprocess your audio file

Audio files come in different formats and configurations, so you need to standardize them before noise reduction. This function loads any WAV file and prepares it for processing:

def load_audio(file_path):
   """Load and prepare audio file for noise reduction"""
   # Read the audio file
   sample_rate, audio_data = wavfile.read(file_path)
   
   # Convert stereo to mono by averaging channels
   if len(audio_data.shape) > 1:
       audio_data = np.mean(audio_data, axis=1)
   
   # Normalize audio to prevent distortion
   audio_data = audio_data.astype(np.float32)
   if np.max(np.abs(audio_data)) > 0:
       audio_data = audio_data / np.max(np.abs(audio_data))
   
   print(f"Loaded {len(audio_data)/sample_rate:.1f} seconds of audio at {sample_rate}Hz")
   return sample_rate, audio_data

This function converts stereo recordings to mono by averaging the left and right channels. It also normalizes the audio levels to prevent clipping during processing.

Apply noise reduction and transcribe with AssemblyAI

Now create the main function that removes noise and handles transcription:

def clean_and_transcribe(input_file, output_file="cleaned_audio.wav"):
   """Remove noise from audio and get transcription"""
   
   # Load the original audio
   sample_rate, audio_data = load_audio(input_file)
   
   # Apply noise reduction
   reduced_noise = nr.reduce_noise(
       y=audio_data,
       sr=sample_rate,
       prop_decrease=0.8,    # Remove 80% of detected noise
       stationary=True       # Assume consistent background noise
   )
   
   # Convert back to integer format for saving
   reduced_noise = np.int16(reduced_noise * 32767)
   
   # Save the cleaned audio
   wavfile.write(output_file, sample_rate, reduced_noise)
   print(f"Cleaned audio saved as {output_file}")
   
   # Transcribe both versions for comparison
   transcriber = aai.Transcriber()
   
   # Transcribe original audio
   original_transcript = transcriber.transcribe(input_file)
   print("Original transcription complete")
   
   # Transcribe cleaned audio  
   cleaned_transcript = transcriber.transcribe(output_file)
   print("Cleaned transcription complete")
   
   return {
       "original_text": original_transcript.text,
       "cleaned_text": cleaned_transcript.text,
       "confidence": cleaned_transcript.confidence,
       "duration": cleaned_transcript.audio_duration
   }

def main():
   # Replace with your audio file path
   audio_file = "noisy_recording.wav"
   
   if not os.path.exists(audio_file):
       print(f"File {audio_file} not found!")
       return
   
   print("Processing audio file...")
   results = clean_and_transcribe(audio_file)
   
   # Display results
   print("\n" + "="*60)
   print("TRANSCRIPTION RESULTS")
   print("="*60)
   
   print("\nOriginal Audio:")
   print(results["original_text"])
   
   print("\nCleaned Audio:")
   print(results["cleaned_text"])
   
   print(f"\nConfidence: {results['confidence']:.0%}")
   print(f"Duration: {results['duration']} seconds")

if __name__ == "__main__":
   main()

The prop_decrease=0.8 parameter tells the algorithm to remove 80% of detected background noise. This setting works well for most recordings without damaging speech quality.

When background noise removal improves transcription accuracy

Background noise removal doesn't always help transcription accuracy. You need to understand when it helps and when it might make things worse.

Signal-to-noise ratio (SNR) is the key factor. SNR measures how loud the speech is compared to background noise, measured in decibels (dB). Clean recordings with high SNR already transcribe well, while noisy recordings with low SNR benefit significantly from cleaning.

Here's when noise removal makes the biggest difference:

  • Phone and video calls: Background chatter, echo, and compression artifacts
  • Field recordings: Wind, traffic, construction, and crowd noise
  • Home recordings: Air conditioning, computer fans, and room echo
  • Older audio files: Tape hiss, electrical hum, and analog artifacts

Skip preprocessing for these scenarios:

  • Professional studio recordings: Already optimized with minimal noise
  • Modern smartphone audio: Built-in noise cancellation already applied
  • Previously processed files: Additional cleaning can introduce artifacts

Aggressive noise reduction can damage speech by removing consonant sounds. You might lose the "s" from plural words or turn "fifteen" into "thirteen." These artifacts hurt transcription accuracy more than moderate background noise.

Modern speech-to-text models handle typical background noise well without preprocessing, though speech recognition accuracy still depends heavily on clear audio with minimal background noise, distortion, and echoes. AssemblyAI's Universal models train on millions of hours of real-world audio, learning to distinguish speech from common environmental sounds. But severely degraded audio—like windy outdoor interviews or construction site recordings—still benefits from cleaning.

Final words

You now have three methods to remove background noise from audio—online tools for quick fixes, desktop software for detailed control, and Python scripts for automated processing—and if your recordings include multiple voices, speaker diarization can further clarify who said what. The Python approach we built combines noise reduction with transcription, creating a complete workflow that scales from single files to thousands of recordings.

AssemblyAI's Voice AI models handle moderate background noise effectively without preprocessing, having trained on diverse acoustic environments from quiet studios to busy coffee shops. The platform's Universal models achieve high accuracy even on challenging audio, but combining smart preprocessing with robust speech-to-text models gives you the best results for severely noisy recordings.

Test transcription on noisy audio

Upload a file and see how our models handle background noise without code. Experiment before integrating your pipeline.

Try it now

Frequently asked questions

Which noise removal method works best for podcast audio with consistent background hum?

Desktop software like Audacity works best for podcast audio with consistent background hum because you can sample the exact hum frequency and remove it precisely. Online tools and Python libraries use generic noise profiles that may not target your specific hum frequency as effectively.

Does preprocessing noisy audio always improve AssemblyAI transcription results?

No, preprocessing only helps when background noise significantly interferes with speech recognition—typically when the noise level approaches or exceeds the speech volume. Clean audio with minor background noise may actually get worse results after aggressive noise reduction due to speech artifacts introduced during processing.

Can I remove background music from audio files containing both speech and music?

You cannot effectively remove background music without damaging speech quality because music and voice occupy overlapping frequency ranges. Spectral editing can reduce some instrumental music, but any music with vocals will interfere with speech removal algorithms and create unusable results.

What file formats work best for noise reduction before transcription?

WAV files work best for noise reduction because they contain uncompressed audio data, preserving all frequency information needed for effective noise analysis. MP3 and other compressed formats lose audio details during compression that noise reduction algorithms need to distinguish between speech and background noise.

Title goes here

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Button Text
Speech-to-Text