Marco Ramponi

Developer Educator

AI trends in 2024: Graph Neural Networks
AI trends in 2024: Graph Neural Networks

From fundamental research to productionized AI models, let’s discover how this cutting-edge technology is powering production applications and may be shaping the future of AI.

AI for Universal Audio Understanding: Qwen-Audio Explained
AI for Universal Audio Understanding: Qwen-Audio Explained

Recently, researchers have made progress towards universal audio understanding, marking an advancement towards foundational audio models. The approach is based on a joint audio-language pre-training that enhances performance without task-specific finetuning.

Introducing Our New Punctuation Restoration and Truecasing Models
Introducing Our New Punctuation Restoration and Truecasing Models

We’ve trained new Punctuation and Truecasing models on 13 billion words to achieve a 39% F1 score improvement for mixed-case words. Building on a novel application of a hybrid architecture for a character-level classifier reduces inference time and improves the scalability of our Speech AI systems.

Combining Speech Recognition and Diarization in one model
Combining Speech Recognition and Diarization in one model

A new approach towards multi-speaker speech processing integrates Speaker Diarization and Automatic Speech Recognition in a unified framework. We discuss the key insights from this recent exciting development in Speech AI research.

What AI Music Generators Can Do (And How They Do It)
What AI Music Generators Can Do (And How They Do It)

Text-to-Music Models are advancing rapidly with the recent release of new platforms for AI-generated music. This guide focuses on MusicLM, MusicGen, and Stable Audio, exploring the technical breakthroughs and challenges in creating music with AI.

Residual Vector Quantization RVQ for Neural Compression
What is Residual Vector Quantization?

Neural Audio Compression methods based on Residual Vector Quantization are reshaping the landscape of modern audio codecs. In this guide, learn the basic ideas behind RVQ and how it enhances Neural Compression.

Why Language Models Became Large Language Models And The Hurdles In Developing LLM-based Applications
Why Language Models Became Large Language Models And The Hurdles In Developing LLM-based Applications

What’s the difference between Language Models and Large Language Models? Let’s understand AI development trends and the difficulties of integrating LLMs into real-world applications.

How RLHF Models Works - Reinforcement Learning From Human Feedback
How RLHF Preference Model Tuning Works (And How Things May Go Wrong)

Large Language Models like ChatGPT are trained with Reinforcement Learning From Human Feedback (RLHF) to learn human preferences. Let’s uncover how RLHF works and survey its current strongest limitations.

Recent developments in Generative AI for Audio
Recent developments in Generative AI for Audio

The spotlight has been on language and images for Generative AI, but there's been a lot of recent progress in the audio domain. Learn everything you need to know about generative audio models in this article.

Large Language Models for Product Managers: 5 Things to Know
Large Language Models for Product Managers: 5 Things to Know

A Product Manager's guide to understanding Large Language Models and the building blocks of Conversational AI.

The Full Story of Large Language Models and RLHF
The Full Story of Large Language Models and RLHF

Large Language Models have been in the limelight since the release of ChatGPT, with new models being announced seemingly every week. This guide walks through the essential ideas of how these models came to be.

Everything you need to know about Generative AI
Everything you need to know about Generative AI

Generative AI has taken the world by storm in the last several months, but what actually is Generative AI, and how does it work? Learn everything you need to know about Generative AI in this easy-to-follow series.

Conformer-1: A robust speech recognition model trained on 650K hours of data
Conformer-1: A robust speech recognition model trained on 650K hours of data

We’ve trained a Conformer speech recognition model on 650K hours of audio data. The new model, Conformer-1, approaches human-level performance for speech recognition, reaching a new state-of-the-art on real-world audio data.

How ChatGPT works
How ChatGPT actually works

Since its release, the public has been playing with ChatGPT and seeing what it can do, but how does ChatGPT actually work? While the details of its inner workings have not been published, we can piece together its functioning principles from recent research.

DeepMind's AlphaTensor Explained
DeepMind's AlphaTensor Explained

AlphaTensor is a novel AI solution to discover mathematical algorithms with Reinforcement Learning. Learn everything you need to know about AlphaTensor in this comprehensive introduction.