Newsletter

PII Redaction and Entity Detection In 13 New Languages 🇫🇷🇩🇪🇮🇳

We've expanded our PII Redaction and Entity Detection to include 13 new languages: Spanish, Finnish, French, German, Hindi, Italian, Korean, Polish, Portuguese, Russian, Turkish, Ukrainian, and Vietnamese.  

PII Redaction and Entity Detection In 13 New Languages 🇫🇷🇩🇪🇮🇳

Hey 👋, this weekly update contains the latest info on our new product features, tutorials, and our community.

PII Redaction and Entity Detection In 13 New Languages 🇫🇷🇩🇪🇮🇳

We've expanded our PII Redaction and Entity Detection to include 13 new languages: Spanish, Finnish, French, German, Hindi, Italian, Korean, Polish, Portuguese, Russian, Turkish, Ukrainian, and Vietnamese.  

PII Redaction automatically finds and hides personal details like names and email addresses in transcripts to protect privacy. Entity Detection spots and categorizes important data in your audio, like names, organizations, addresses, and more.

Now you will be able to redact sensitive information and identify entities from audio in 13 new languages. 

Here's a code example of how to use PII-redaction and Entity Detection on French audio in Python:

import assemblyai as aai

aai.settings.api_key = "YOUR_API_KEY"

audio_url = "https://storage.googleapis.com/aai-web-samples/french1.mp3"

config = aai.TranscriptionConfig(language_code="fr", entity_detection=True).set_redact_pii(
    policies=[
        aai.PIIRedactionPolicy.person_name,
        aai.PIIRedactionPolicy.organization,
        aai.PIIRedactionPolicy.occupation,
    ],
    substitution=aai.PIISubstitutionPolicy.hash,
)

transcript = aai.Transcriber().transcribe(audio_url, config)

print(f"Redacted transcript: \n{transcript.text}\n")

print("Entities:")
for entity in transcript.entities:
    print(entity.text)
    print(entity.entity_type)
    print(f"Timestamp: {entity.start} - {entity.end}\n")

Example output: 

Redacted transcript: 
Bonjour, je m'appelle #####, je travaille pour ### ############# en tant qu'agent d'assurance. Mon rôle consiste à aider nos clients à trouver les meilleures solutions d'assurance pour répondre à leurs besoins spécifiques. Que ce soit pour l'assurance habitation, automobile ou vie, je m'efforce de fournir un service client exceptionnel et des conseils d'experts. Travailler chez ### ############# me permet de contribuer positivement à la vie des gens en leur offrant tranquillité d'esprit et sécurité financière. N'hésitez pas à me contacter pour toute question ou besoin d'information sur nos services.

Entities:
Marie
EntityType.person_name
Timestamp: 1529 - 2149

ASP International
EntityType.organization
Timestamp: 2910 - 4070

agent
EntityType.occupation
Timestamp: 4351 - 4671

ASP International
EntityType.organization
Timestamp: 22012 - 23134

Check out our PII Redaction and Entity Detection docs for more information.

Fresh From Our Blog

Transcribe phone calls in real-time in Go with Twilio and AssemblyAI: Develop a server application in Go that transcribes an incoming Twilio phone call into text in real-time. Read more>>

RLHF vs RLAIF for language model alignment: RLHF is the key method used to train AI assistants like ChatGPT, but it has strong limitations and can produce harmful outputs. RLAIF improves upon RLHF by using AI feedback. Learn the differences between the two methods and what these differences mean in practice in this guide. Read more>>

Transcribe a phone call in real-time using Python with AssemblyAI and Twilio: Learn to build a Flask application that transcribes phone calls in real-time, using Python, AssemblyAI, ngrok, and Twilio. Read more>>

How Graph Neural Networks Are Transforming Industries: This video highlights the impactful applications of GNNs in science and industry, and the most recent research highlights, focusing on applied use cases. 

How to Index Podcasts with Keywords like on Huberman's Website: Learn to build an application that indexes podcast episodes based on keywords using Speech-to-Text. 

The Physics of Generative AI - How AI models use physics to generate novel data: Modern Generative AI is capable of generating entire stories and photorealistic images, but how do these models actually work?