From Voice to Text, Every Word Counts

Tuesday, January 29, 2019
Michael Scheiner

Erupting all over the Disrupt scene

Film buffs descend upon Sundance; sports fanatics flock to the Olympics; Wagner enthusiasts feel the eternal pull towards Bayreuth.

And for those for whom technology is the thrilling apex of cultural output, the place to be is TechCrunch Disrupt.

So if you happen to have attended Disrupt’s 2018 bonanza in their flagship town of San Francisco, chances are you saw a live action display of the AI-powered transcription software

As titans of tech and plucky upstarts orated and fielded Q&As on the main stages during keynotes and panels, the Otter Voice Note engine was proudly chugging away to cast up real-time transcriptions on huge screens.

For those not fortunate enough to possess all-access passes, live transcriptions were on display at locations across the event grounds.

And what good is a live transmission if you can’t revisit that content at a later time? Otter, together with TechCrunch, set it up that Disrupt attendees could get a free Otter account and from there browse through the transcriptions of the events for later digesting.

All this to say that if an industry king-maker like TechCrunch trusted Otter to handle all their recording, sharing and transcription needs, that must mean Otter is a royally-fantastic piece of tech.

An all-star team of know-how and capital

Otter is the brainchild of AISense. Based out of Los Altos, California, CEO and Cofounder Sam Liang assembled a crack team of tech-industry vets under the shared ideological drive to develop AI tools to foster better human-to-computer synergy. His teammates had various experience at software supernovas like Google, Microsoft, and Yahoo.  

Seems they had the “Valley-cred” and the skills to wow enough heavyweight investors, attracting cash injections from some of the same deep pockets that bankrolled Tesla, Twitter, and Slack. They even had the moneyed-vote of confidence from Stanford Professor and VC Billionaire David Cheriton, one of Google’s very first backers.

With the right team and some serious confidence by way of capital, AISense launched the free version of Otter in February 2018. Three months later they rolled out the premium version. Disrupt 2018 took place just four months after that in early September.

Now that is lightspeed progress.

Ever since then the accolades have been streaming in. Otter Voice Notes was one of Google Play’s top apps of 2018. Mashable ranked it in the top 12 of that same year.

Otter’s aural capacity in action

Otter Voice Notes is not limited to the business world—although when it comes to brainstorm sessions, sales pitches, conference call recording, the need to share meeting notes, and even power-lunch convos, this tool is no doubt a powerful assistant.

It’s also useful for professors and students in the classroom, for journalists who need to transcribe interviews, writers recording their ideas, and even doctors, lawyers and other professionals who want to record and transcribe consultations.

Getting started is easy. You get the app on Android or iOS or sign up on their website.

There’s a free plan that’s got 10 hours of transcriptions per month, and a premium plan with 100 hours at only about 10 bucks a month, that’s just a fiver for students. The premium plan has some pretty cool extra features like the ability to skip over silences.

Once the app is first opened, you record your own voice as a sample to teach Otter how to recognize you from other speakers.


You can record using the in-app microphone or your phone’s Bluetooth, or import recordings from your phone, desktop or the web in any of these audio formats: MP3, AAC, WAV, M4A, WMA; or video formats, MP4, AVI, MOV, WMV, MPG.

As you speak, you’ll see the words instantly flow across the screen, often adjusting their sentence stops and paragraph breaks with amazing accuracy.

Otter also recognizes different speakers and clearly separates them along a color-coded timeline. You can add, search, and rematch speakers to get a clear, complete record of who said what. 

Maybe you’re listening to a presentation and want to grab a photo of the speaker’s whiteboard. With Otter, you can snap while recording, and the photo will be time-stamped within the recording and attached at the right space in the text.

Transcribe & playback

As Otter does its audio transcription, it learns to pick up on keywords, which it then displays above the text.

Click on a keyword and Otter highlights it in the text, letting you toggle through every instance it’s been jotted down. There’s also a similar search function for words that did not make Otter’s “key” cut.

As mentioned above, Otter differentiates between all the different people talking. Once you tag one of them with a name, Otter fills in the rest so you have a real script with properly-named speakers.

Tapping on a word in the text will begin playing the audio section to which that text is anchored. On the other hand, during playback, you can see the text scroll along with exactly which part you are hearing. Of course, playback speeds are also adjustable.

Edit & share

When the recording is done, Otter indexes it and compresses the audio. The audio track is exported as an MP3 for light file sizes which are easy to send by text or email, while texts enjoy many export formats.

You can share the entire selection of text and audio, or slice it up into relevant sections and edit the text. Based on your contact list, you can set up public or private collaboration groups to share the texts with individuals or whole teams who can then lend a hand with editing.

There are also conversation snapshots. This is a quick way to pick out a key part of a text and turn it into a bright text graphic geared for slides. As mentioned above, photos snapped during recording are in line with the text, but you can also attach other images during editing.

Finally, all your Otter files are not locked onto that one device you recorded them on. You can sync across all your devices, as many as you may have.

Intelligence is in the air

There are other apps that have varying degrees of overlap with Otter, like, whose focus on sales calls transcriptions for training purposes is unparalleled; or, which performs excellent AI sales call analysis.  

But when it comes to all-purpose speech to text transcription, not just for calls but any occasion (interviews, lectures, calls...), Otter’s speed, accuracy, and robust indexing features are making it first in its class as the anytime/anywhere voice note tool. 

This might be partly due to the tech that AISense has put into the heart of Otter. They call it ‘Ambient Voice Intelligence’ because it’s always working in the background.

Like any AI worth its weight in antimatter, being always on and ever attentive means it never stops learning.

And there’s always plenty of new things to learn. In the words of Liang: “Your brain can only remember 10-20% ... So we thought we can help people capture that information and then search for it really fast."

Capturing every bit of valuable knowledge out of fast-flying spoken information can be dizzying labor. Good thing there’s to take some of that heavy recall lifting off the shoulders of human memory.