Blog

For humans, speech feels natural and effortless. Yet it is one of the most complex signals we know. Every utterance carries far more than words: it encodes intent, identity, emotion, and even hints at a person’s health. This richness is what drives my research. I’m passionate about understanding and quantifying speech in all its dimensions. My work spans the intersection of machine learning and computational speech and language processing, focusing on areas like automatic speech recognition (ASR or “speech-to-text”), speech synthesis (“text-to-speech”), speaker recognition, laboratory phonology automation, and the analysis of pathological speech.

In this blog, I’ll be presenting highlights from my research. Some of my colleagues wrote some of the posts.

Joseph Keshet

Blog

Whisper-Medusa: Using multiple Decoding Heads to Achieve 1.5X Speedup

August 5, 2024

Self-supervised Speaker Diarization

January 3, 2023

Speeding up and slowing-down speech with exceptional quality

March 19, 2023

DeepFry: Deep Neural Network Algorithms for Identifying Vocal Fry

September 2, 2022

AI for Speech Therapy and Language Acquisition

May 6, 2022

The Ubiquity of Machine Learning and its Challenges to Intellectual Property

August 9, 2018