The Selective Dragon

My dragonAt this point, I can say without reservation that speech recognition is my friend. I've been experimenting with the Dragon NaturallySpeaking 8 software since the end of December, and during that time it's gone from curiosity to indispensable communication tool. If, as has been suggested, version 9 has achieved significant improvements in recognition—and also requires little or no training—anyone using the current version is undoubtedly happier still. It seems the days of the keyboard as input device are indeed numbered. If the built-in speech-recognition capabilities of Vista live up to expectations, I think speech input will become more the rule than the exception within the next couple years or so. Not that users of the various non-Windows platforms will be left out in the cold; I don't have a Mac or run Linux, or Unix, but those platforms are likely to offer equal—if not better—capabilities. And as speech recognition increasingly finds its way into handheld devices, all the old problems associated with using fingers for computer input will simply go away. At least that's the theory.

Although I wasn't expecting the 99% accuracy reported by those using a high-quality headset under optimal conditions, I have, in fact, been able to achieve that level of performance using my handheld digital voice recorder setup. I can't say this level of accuracy is the rule, but it isn't the exception, either. Since I so often dictate while driving—frequently with a partially open window and the heater fan set on high—it isn't reasonable to expect stellar results. However, a quiet room and some care with enunciation almost always brings happiness. In other words, I don't believe using this voice recorder in place of a headset is necessarily the inferior solution some reviewers have indicated. While I'm sure it's possible to obtain unacceptable results using certain voice recorders, I'm equally certain that similarly disappointing results can be obtained with certain headsets, sound cards that are simply too noisy to be useful for speech-recognition applications, inadequate training, or a roomful of racket.

One thing I've noticed is that the Dragon ignores certain kinds of background noise; just because there's something loud and objectionable in the audio doesn't necessarily mean it will foul up the transcription. For example, an ambulance siren coming through an open window while driving recently was completely ignored; it had no effect on the transcription process. While a quiet environment dramatically improves recognition accuracy, steady "masking" sounds—things like fans, background music, and other voices—are a bigger problem than the occasional intruding audio event. An emergency siren, I suppose, has so little in common with the human voice that the speech-recognition system makes no attempt to extract language from it. Coughing, on the other hand, almost always results in a comical translation.

I've noticed, too, that it isn't unusual to see the same word misinterpreted at the beginning of a paragraph, then correctly identified later in the same paragraph. This may be the result of variations in context—some sentences are easier to parse for intended meaning—or changes in my enunciation, and possibly a combination of the two. A recent Dragon transcription initially rendered the word rum as ROM, but subsequently identified the word as I had intended it less than a dozen lines later, on the same page. In this situation, it's a mistake to correct the initial error; the fact it correctly translated the word on subsequent attempts points to a different kind of problem. The Dragon also seems to favor technical, business, and generally professional terminology when it's struggling to understand a word or phrase; this is probably a reflection of its lineage as a popular professional tool.

I'm beginning to get the hang of speaking as an alternative to typing, although I think I'll always have a preference for the latter. It just isn't the same mental exercise, no matter how much I try to convince myself otherwise. Still, it's hard to argue with the utility of, in effect, writing while driving, or when it's otherwise impossible—or just inconvenient—to type. It's an extension, and I'm not sure it will ever really be a substitute, except through dire necessity, maybe. Verbally commanding a computer to perform a task is one thing, but verbal writing is something else altogether. To my particular brain, anyway.


No comments:

Post a Comment