We all know Google’s speech transcription technology is really, really, really good. Not only is it the best in the industry, it’s doing it without a data connection: Pixels have been transcribing audio on-device for some time now, and that’s been owed to Google’s extremely impressive transcription algorithms that utilize machine learning hardware on its smartphones. But accuracy isn’t everything when it comes to transcription, even if it the single most important feature—speed matters too. A video posted by James Cham on Twitter pits a Pixel 3 against an iPhone 11 (which has a much more powerful processor, I might add), using both to transcribe his voice in real time (the iPhone is using iOS’s built-in transcription, not Gboard’s—just to be clear). But the difference becomes immensely apparent within seconds: the Pixel 3 is displaying the words within a moment of Cham saying them, while the iPhone stutters, struggles to get the words right, then fixes them, and often pauses before spitting out a huge string of words after a long delay. By the end of the video, the iPhone is a full six seconds behind the Pixel 3 in the transcription. The iPhone also contains, by my count—not including the text at the beginning that was erroneously added by Cham—at least five very significant errors in its transcription that the Pixel does not. I don’t think that people appreciate how different the voice to text experience on a Pixel is from an iPhone. So here is a little head to head example. The Pixel is so responsive it feels like it is reading my mind! pic.twitter.com/zmxTKxL3LB — James Cham ✍🏻 (@jamescham) May 27, 2020 But Cham’s point isn’t about accuracy, even if it is still incredibly important—it’s about the way we talk and the speed at which we speak having a big impact on experiences with computers. If a computer is easily able to keep up with your speech in real time, it becomes much easier to spot errors or change your mind about what you’d like to say as you monitor its progress, making the experience a much more natural interaction. It’s a bit like asking a stenographer to take notes versus writing them yourself; with the former, you always have to ask for things to be read back, and that takes time. With the latter, you have total control. In the case of the text transcription example above, you feel more freedom to go back and restructure that sentence, or choose another word on the Pixel, whereas the iPhone is so far behind that, as you wait for it to catch up, you may well lose your train of thought (or just keep on going for fear thereof). As one reply puts it: speed is a feature. There are other use cases that real-time voice transcription will likely enable down the road, too, it’s just not as easy to articulate them yet. But I’ve long held the belief that the children growing up right now will be the
Read More
This viral video shows how the Pixel’s live voice transcription absolutely destroys the iPhone’s (and why it matters)
Must Read
- Advertisement -
More Articles Like This
- Advertisement -