Best Practices for Building Medical Scribes
Introduction
Building a robust medical scribe requires careful consideration of accuracy, latency, speaker identification, and real-time capabilities while maintaining HIPAA compliance and clinical documentation standards. This guide addresses common questions and provides practical solutions for both post-visit and live encounter transcription scenarios.
Why AssemblyAI for Medical Scribes?
AssemblyAI stands out as the premier choice for medical scribes with several key advantages:
Industry-Leading Accuracy with Pre-recorded Audio
- Slam-1 model delivers exceptional accuracy for medical terminology and clinical documentation
- 2.9% speaker diarization error rate for precise attribution between provider and patient
- Comprehensive LLM Gateway integration for intelligent post-processing into structured clinical notes
Real-Time Streaming Advantages
As medical scribes evolve toward real-time documentation, AssemblyAI’s Universal-Streaming model offers significant benefits:
- Ultra-low latency (~300ms) enables live transcription during patient encounters
- Format turns feature provides structured, speaker-aware output in real-time
- Keyterms prompt allows providing medical context and patient history to improve accuracy
End-to-End Voice AI Platform
Unlike fragmented solutions, AssemblyAI provides a unified API for:
- Transcription with speaker diarization (provider vs. patient)
- Medical terminology recognition and contextual understanding
- HIPAA-compliant PII redaction on both text and audio
- Post-processing workflows with LLM Gateway - from SOAP notes to completely custom clinical documentation
- Real-time and batch processing in a single platform
- Compliance and Security built for medical workloads (BAA, HIPAA, DPA, etc.)
When Should I Use Pre-recorded vs Streaming for Medical Scribes?
Understanding when to use async (pre-recorded) versus streaming is critical for clinical workflows.
Use Pre-recorded (Slam-1) when:
Post-visit documentation - Encounter already happened, need highest accuracy
- Maximum accuracy required - Slam-1 has highest medical terminology accuracy
- Complex medical terminology - Rare medications, genetic conditions, specialized procedures
- HIPAA compliance critical - Full PII redaction with audio de-identification
- Structured note generation - SOAP notes, H&P, discharge summaries via LLM Gateway
- Quality assurance - Review and editing workflow needed
- Specialty documentation - Oncology, cardiology, neurology with complex terminology
- Speaker diarization needed - Automatic provider vs. patient separation
Best for: Post-visit SOAP notes, specialist consultations, hospital discharge summaries, quality review
Use Streaming (Universal-Streaming) When:
Live encounter documentation - Real-time transcription during patient visit
- Immediate documentation - No delay between encounter and note
- Telemedicine visits - Document while seeing patient virtually
- Emergency department - Fast-paced, immediate documentation needed
- Primary care visits - Standard encounters with common terminology
- Real-time review - Provider can review and correct during visit
- Ambient documentation - Microphone running throughout encounter
Best for: Telemedicine, primary care visits, ED encounters, real-time clinical decision support
Hybrid Approach (Recommended)
Many medical scribes use both:
- Streaming during visit - Real-time documentation, immediate review by provider
- Slam-1 post-processing - Run audio through Slam-1 after visit for:
- Highest accuracy verification
- Complex terminology correction
- Complete HIPAA compliance workflow
- Final structured note generation
- Speaker diarization (provider vs. patient)
Example workflow:
- Provider sees patient → Streaming captures real-time notes
- Visit ends → Audio sent to Slam-1 for final high-accuracy transcription
- LLM Gateway generates structured SOAP note from high-accuracy transcript
- Provider reviews and signs final note
This gives real-time utility during visits while ensuring maximum accuracy for official documentation.
What Languages and Features for a Medical Scribe?
Pre-Recorded doctor patient visits (Slam-1)
Languages: For post-visit documentation, Slam-1 supports English for the highest accuracy transcription. If you want to use other languages, Universal is a suitable alternative.
Core Features:
- Speaker diarization (provider-patient separation)
- Automatic formatting, punctuation, and capitalization
- Keyterms Prompting for medical specialties and conditions
- Ability to prompt on related medical terms and improve the accuracy of others (for example,
ibuprofen
improvingnaproxen
)
Speech Understanding Models:
- Entity detection for medications, conditions, and procedures
- Sentiment analysis for patient experience insights
- Speaker identification for separating doctor and patient in a visit
Guardrails:
- PII redaction on text and audio for HIPAA compliance
Real-Time Streaming (Universal-Streaming)
Languages: For live encounter transcription, Universal-Streaming supports:
- English model optimized for medical contexts
- Multilingual model for visits in other languages or with code switching (English, Spanish, German, French, Portuguese, Italian)
- Post-processing LLM Gateway tight integration for increasing medical accuracy
Streaming-Specific Features:
- Partial and final transcripts for responsive documentation
- Format turns for structured provider-patient dialogue
- Keyterms Prompt for patient history and current medications
- End-of-utterance detection for natural clinical conversation flow
- Post-processing LLM Gateway integration for increasing medical accuracy
Recommended approach: Use streaming for real-time documentation, then run through Slam-1 post-visit for accurate speaker-labeled final notes.
Coming Soon
- Medical model which packages up the best ways to contextually influence transcript output
- Multilingual Slam-1, especially important for multilingual medical conversations to improve accuracy
How Can I Get Started Building a Post-Visit Medical Scribe?
Here’s a complete example implementing async transcription with Slam-1:
How Can I Get Started Building a Real-Time Medical Scribe?
Here’s a complete example for real-time streaming transcription with LLM post-processing:
How Do I Handle HIPAA Compliance?
HIPAA compliance is mandatory for all medical transcription workflows. Here’s how to ensure your medical scribe meets requirements:
Required HIPAA Guardrails
1. Business Associate Agreement (BAA)
- AssemblyAI provides a BAA for healthcare customers
- Required before processing any PHI
- Contact us to execute BAA
2. PII Redaction (Required)
3. Secure Audio Storage
4. Access Controls
5. Audit Logging
For complete HIPAA guidance, see our Healthcare Compliance Guide.
What Workflows Can I Build for My AI Medical Scribe?
Use these flags to transform raw medical conversations into structured clinical documentation. Below is plain-English behavior, output shape, and clinical use cases for each option.
Entity Detection (Medical)
entity_detection: true
What it does: Extracts medical entities (medications, conditions, procedures, anatomy).
Output: Array of { entity_type, text, start, end, confidence }
.
Great for: Medication reconciliation, problem list updates, procedure coding.
Notes: Recognizes brand/generic drug names, medical conditions, surgical procedures.
Redact PII Text (HIPAA Compliance)
redact_pii: true
What it does: Scans transcript for Protected Health Information and replaces per HIPAA requirements.
Output: text
with PHI replaced; original timing preserved.
Great for: De-identification, research datasets, training data.
Notes: Covers all 18 HIPAA identifiers when properly configured.
redact_pii_policies: [person_name, date_of_birth, medical_record_number, phone_number, email_address]
Restricts redaction scope to key HIPAA identifiers:
person_name
– patient and provider namesdate_of_birth
– full or partial DOBmedical_record_number
– MRN, account numbersphone_number
– contact numbersemail_address
– electronic addresses
Why this set: Ensures HIPAA compliance while preserving clinical content for documentation.
redact_pii_sub: hash
What it does: Replaces each PHI span with a stable hash token.
Example:
"Patient John Doe, DOB 1/15/1980, MRN 12345"
⟶
"Patient #2af4…, DOB #7b91…, MRN #e13c…"
Benefits:
- Maintains referential integrity across document
- Preserves sentence structure for NLP/LLM processing
- Prevents reconstruction of original PHI
Redact PII Audio (HIPAA Compliance)
redact_pii_audio: true
What it does: Produces HIPAA-compliant audio with PHI portions silenced.
Output: redacted_audio_url
in the transcript payload.
Great for: Quality assurance, training, research.
Notes: Original audio preserved separately; ensure proper access controls.
Sentiment Analysis (Patient Experience)
sentiment_analysis: true
What it does: Analyzes emotional tone of patient responses.
Output: Array of { text, sentiment, confidence, start, end }
.
Great for: Patient satisfaction, pain assessment, mental health screening.
Notes: Helpful for identifying distressed or dissatisfied patients.
End-to-End Clinical Documentation Effect
Clinical Documentation Example
Original Encounter:
“Hi, I’m Dr. Smith. John Doe, born 1/15/1980, is here for follow-up. He’s taking metformin 1000mg twice daily for his diabetes.”
With medical scribe settings:
- Text: “Hi, I’m #2af4…. #7b91…, born #e13c…, is here for follow-up. He’s taking metformin 1000mg twice daily for his diabetes.”
- Entities:
[ { type: "medication", text: "metformin 1000mg" }, { type: "condition", text: "diabetes" } ]
- Clinical note: Structured SOAP format via LLM Gateway
- Redacted audio: PHI portions silenced for compliance
LLM Gateway for Clinical Notes
Our LLM Gateway enables transformation of raw transcripts into structured clinical documentation using the same API.
Here’s a complete example of generating structured SOAP notes from medical encounter transcripts:
Advanced SOAP Note Features
How Do I Improve the Accuracy of My Medical Scribe?
Medical Keyterms Strategy
The most effective approach for medical keyterms:
1. Patient-Specific Context
2. Specialty-Specific Terms
3. Visit-Specific Context
Using Keyterms Prompt for Streaming with LLM Gateway Enhancement
Common Medical Terminology - Top 1000 Terms
Even if you don’t know the context of a specific medical conversation, you can boost the accuracy of transcription by providing the top 1000 medical words in your field.
How Can I Improve the Latency of My Medical Scribe?
Async Chunking for Long Encounters
For lengthy patient visits, implement chunking to get progressive documentation. This is especially useful for:
- Hospital rounds (in-person microphone running ambient)
- Comprehensive physicals
- Specialty consultations
When to Use Streaming Instead
For optimal clinical workflow integration, streaming is ideal when:
-
Real-time documentation needed:
- Emergency department encounters
- Telemedicine visits
- Procedure documentation
-
Immediate clinical decision support:
- Medication interaction checking
- Diagnosis suggestion
- Protocol reminders
-
Live quality assurance:
- Compliance monitoring
- Training supervision
- Documentation coaching
Streaming provides:
- ~300ms latency for immediate documentation
- Real-time partial results for provider review
- No delay between encounter end and note availability
- Live clinical decision support integration
How Can I Use Speaker Identification for Doctor and Patient Recognition?
Speaker Identification can automatically distinguish between doctors and patients in medical encounters, replacing generic “Speaker A” and “Speaker B” labels with meaningful role-based identifiers.
Why Use Speaker Identification in Medical Scribes?
Clinical Benefits:
- Clear attribution - Know exactly who said what in clinical documentation
- SOAP note structure - Automatically separate subjective (patient) from objective (provider) statements
- Compliance documentation - Proper attribution for regulatory requirements
- Quality assurance - Review provider-patient communication patterns
- Training analysis - Analyze communication styles for medical education
Medical Speaker Identification Setup
Method 1: Role-Based Identification (Recommended)
Method 2: Name-Based Identification
For scenarios where you know the specific doctor’s name: