Using a Digital Voice Recorder for Speech Recognition

The MX20 with its little buddy the AAA I'm not ready to render a final verdict on the NaturallySpeaking software I've been trying out since just after Christmas; I want to make sure I've held up my end of the bargain—e.g. training, and trying not to fluff my lines—before I do that. But at this point I think I can say something about the digital recorder I'm using as my audio input device. When I was researching the speech-recognition possibilities a while back, I found there was a lack of information on this particular way of doing things; a headset is the preferred method, and therefore discussed in greater depth than a handheld recorder. Maybe this will help someone out there who's negotiating a similar research path right now.

I want to stress, again, that this voice recorder arrangement I'm using is a trade-off; it isn't the ideal situation where speech-recognition accuracy is the primary requirement. Although this method has exceeded my expectations, I don't believe speech recognition accuracy will ever equal what's possible using a high-quality headset. For me, the trade-off is worthwhile because I'm not tethered to a computer by a headset cord, or limited by the range of a wireless headset. The latter is better than a wire but still means I can't travel too far beyond the confines of my living quarters. This way, I can dictate while driving, or when I'm otherwise unable to type on a keyboard. I still have to do some editing, but the time I save by virtue of most of the words already being there is, for me, a compromise I can easily afford. And when I factor in the decreased stress on my arthritic joints . . . well, that's another story, but when it comes to writing, it's this way or no way.

Anyway, the recorder I've been using is a Sony ICD-MX20, which is a pocket-size—about 4 by 1.5 inches—unit weighing in at just under 3.5 ounces. It's available from the usual merchants, including some of the brick and mortar mega-retailers. Prices for this item vary in the extreme, so be careful.

If, like me, you're interested in the Dragon NaturallySpeaking voice-recognition software but your computer doesn't meet the minimum requirements for running the current version, this voice recorder is available as a bundle with version 8.0 of the NaturallySpeaking software. The model number for this bundle differs slightly: you would want the ICD-MX20VTP version, instead. For some reason, the existence of this package deal isn't widely advertised, if it's advertised at all. In any event, it's possible to save a few bucks this way, assuming a headset isn't on your list of priorities; this bundle doesn't include one.

The MX20 has four available recording modes, which offer increased recording time at the expense of audio quality. Using its internal 32 MB of memory, the highest-quality mode gives you less than an hour and a half of recording time, while the next step down doubles that time, but cuts the frequency response essentially in half. While these two modes permit stereo recording, the two below that record in mono, and by the time you arrive at the longest possible recording time—nearly 12 hours—the sound quality is abysmal. This may not be catastrophic for brief voice memos, but is unacceptable for any application beyond that. Here's how frequency response is affected by each of the four recording modes, from the user guide.

ST         60 Hz to 13500 Hz     stereo
STLP    60 Hz to  7000  Hz     stereo
SP         60 Hz to  7000  Hz     mono
LP         60 Hz to  3500  Hz     mono

Sampling rates for the four modes are 4800, 2400, 1600, and 8000 bits per second, respectively.

Speech recognition requires the highest possible fidelity, so you're stuck with that high-quality, 1:26 total-recording-time mode unless you spring for more memory, which is, of course, the Sony Memory Stick. Fortunately, memory prices continue to fall, so you can load the recorder with a gigabyte for under $50, thereby doing away with the anxiety altogether. Here's what you can expect from various Memory Stick capacities at the highest recording quality, according to the MX20 user guide.

256 MB    11 hours 10 minutes
512 MB    22 hours   minutes
1 GB         44 hours 55 minutes
2 GB         92 hours 10 minutes

As you can see, there's little need for worry, even with half a gigabyte installed. This recorder uses the Memory Stick Duo, including the Pro version, although the MX20 won't benefit from the Sony MagicGate technology supported by the Pro-designated sticks. It will work just fine; it just won't make use of the Pro version's copyright-protection features.

Battery life is rated at eight hours, using, I suppose, the two AAA alkaline batteries included with the recorder. I didn't keep track of daily usage during the three weeks of life I got from the included batteries, but I'm guessing a couple hours per day on average. I'm trying rechargeable Nickel-Hydride batteries now, but I don't know, yet, what sort of life to expect from those; they're still running on the initial charge. I disabled all beeping advisories, and also the status LED. I left the LCD's backlight on its default setting, which means the light comes on briefly when a key is pressed, or when recording starts or stops. The backlight remains on during any kind of menu operation; this is probably the largest battery drain, aside from normal recording and playback. It's possible to disable it entirely, which would further extend battery life.

There are only two microphone sensitivity levels available on this recorder: low and high. I use the low setting for dictation, since the other setting results in so much sensitivity that even quiet conversations in the next room are recorded. There's also a directional setting, selected by a slide switch on the front of the recorder. This results in a mono recording, which is appropriate for dictation and also activates the unit's noise-canceling mode. This is an interesting feature, at least in the way it's implemented on this particular recorder.

In the old days, a noise-canceling microphone was a purely mechanical device. It worked because sound waves arriving at both sides of the same diaphragm, at the same time, had a canceling effect on the mechanical movement of that diaphragm. The MX20, on the other hand, uses a scheme similar to that of modern noise-canceling headphones: microphones detect unwanted sound, and remove it electronically before it has the opportunity to get into your ears. When the directional switch is used, the recording mode is still stereo, but the left and right channels are used to detect off-axis, out-of-phase sound, as opposed to dictated speech from close proximity. This, it seems, would require a third microphone element, and in fact the packaging boasts a "built-in triple microphone configuration." In effect, this amounts to a three-channel input, which is then merged into a single channel on the recording. High fidelity—as high as is practical at any rate—is an absolute necessity for speech-recognition applications, but the lack of extraneous noise is equally important.

I mentioned, previously, that I'm using a voice-operated record mode during dictation. I know at least one or two of you out there are experiencing some doubt about my sanity, because historically, voice-operated recording schemes have always resulted in the first syllable of a word—or some part of it anyway—being chopped off due to the delay. But in this case there's a buffer that prevents this from happening; nothing is lost because the circuitry is awake and listening, even when the display indicates that recording has momentarily ceased.

If less is more, I'm already on the wrong side of the equation. This will be continued tomorrow.


No comments:

Post a Comment