Thoughts

The Current State of Speech Recognition and Automatic Transcription

Tom Zaragoza

Nov 14, 2017 • 1 min read

It's easy to get caught up in all the hype around artificial intelligence. Everyone in the media is talking about how AI will replace us humans, but the truth is, we are still a very long way from such a thing happening.

Take transcribing audio as an example. Even if the audio file is a bit noisy or if the speaker is talking a too quickly, we are still able to figure out what is being said. Humans are able to adapt to different types of audio recording environments on the fly - something that machines aren't as good at.

But that is slowly changing. The more we help these systems learn what the right way to transcribe is, the closer we get to more accurate results. The current state of the art in machine learning and artificial intelligence is doing pretty well, but there is lots of room for improvement.

Even recently, the modern day godfather of machine learning Geoffrey Hinton said we need to rethink the way we do deep learning. For the past few decades, Hinton was thinking about new ways of doing things and has recently spoken to the world about his new ideas. If you're the type that likes reading academic papers, check it out here.

With the proliferation of devices like Google Home and Amazon Alexa, we're in the midst of speech recognition technology becoming the norm. Just like how keyboards are the main way we communicate our thoughts to a computer, speech will be similar. To be a reliable way to communicate to a computer, the speech recognition technology must improve it's accuracy, no matter the environment.

Maybe one day machines will be capable of replacing humans in transcribing audio. Can you imagine that? You just upload your audio file and bam, a perfect transcription! But, like I said, we are still a long way from that. For now, we can lean on the state of the art in speech recognition to help us cut down on the time it takes to transcribe our audio recordings.

Sign up for more like this.