Do you get this irony? We hear so much lately about artificial intelligence and how it can potentially affect radiology. But, for all this talk about the application of artificial intelligence, I have heard barely a squeak on anything tangible about applying artificial intelligence to real-world voice recognition technology. Why do I find this so strange? Startup companies espouse artificial intelligence for so many applications, some with questionable benefit. Yet, sitting right in front of everyone’s face is the most obvious work efficiency improvement, the application of artificial intelligence to enhance voice recognition. It is an area that desperately needs attention!
To me, it makes no sense that companies do not pursue this avenue. Unlike other health applications, applying artificial intelligence to voice recognition technology will unlikely result in lawsuits or untoward health effects (unless the AI switches rights with lefts or unwittingly adds a lot of nos to our dictations!) And, voice recognition is exactly the type of technology that fits the paradigm of why developers construct artificial intelligence. Everyone’s voice is different and we all choose different words to express ourselves. So, a technology like artificial intelligence that learns the subtleties of each of our voices and vocabulary should really make a difference in daily work life. So, why don’t we hear about breakthroughs on the voice recognition front? Let’s take a look at what’s out there already…
My Internet Literature Search
Since so much potential exists for the intersection of AI and voice recognition, I started a simple internet search on this topic. And, guess what? This is the first article I found. Microsoft announced a milestone. The company’s most accurate artificial intelligence enhanced software reached an error rate for transcription of conversational speech measuring 5.1%. (1)
Next, I found another article from Inc. that talks about the world’s most accurate voice recognition technologies. The top three are as follows: Baidu, Hound, and Siri. For those of you that do not know these enterprises well, I will briefly discuss each of them.
First of all, Baidu… Baidu is a Chinese company similar to Google but made for China. Why is this needed the most? Well, think about typing in Mandarin and how long it takes to type. In Mandarin, it is much shorter to speak than to write. So, that makes sense. Second, Hound… Honestly, I had never heard of this enterprise prior to writing this article. Apparently, it was a first comer in the voice recognition personal assistant realm and is a fairly accurate digital assistant. And lastly, of course, is Siri by Apple… To say the least from my experience, if this technology is considered to the be one of the world’s most accurate, artificial intelligence voice recognition does not even come close to where it should be. I can’t tell you how many times Siri interprets my language incorrectly! (2)
What’s In Store For Radiology Voice Recognition?
Now, call me crazy… But, none of these technologies sound so great to me. If a speech recognition system gets approximately 1 out of every 20 words wrong as in each of these technologies, that could be a recipe for disaster in the world of radiology reporting. And, this is the best that artificial intelligence offers for voice recognition?
In addition to these “seminal” articles, I did find an interesting merger between the ACR and Nuance Communications to set up a collaborative effort to improve radiology reporting. (3) But, nothing tangible has yet been created to significantly improve voice recognition technology. It’s all in the initial phase. This leads me to believe there is a long way to go.
Sorry to break the news but… I don’t see any significant improvement in the quality of our radiology dictation software technology for a long time. So, until artificial intelligence software developers take voice recognition technology seriously and apply their talents to this area, change will not be around the corner. Therefore, continue to check your work many times over and dictate cautiously!