Study on the Integration of Speech Recognition with an Operation System:

Phase 1 Report

By Daniel Wilson

Phase 1 of this project deals with the actual speech recognition. That is, the process of turning sound waves into ASCII strings.

Due to the difficulty of integrating a system with a non-open-source OS, I have chosen to develop this project under Linux. This decision has slowed down my progress in phase 1 of the project, because of the experimental nature of SR software developed for Linux. It appears that the best choice for SR development under Linux is IBM’s ViaVoice SDK for Linux. The Beta release of ViaVoice SDK is currently available over the Internet and appears to be fairly powerful.

ViaVoice has a few quirks, however, that made its learning curve rather steep. The user administration functions were hard to understand, and they kept me for most of two weeks from running the demos. I now have the demos working, and expect that ViaVoice technology will serve well in this project.

I will, however, be moving on to phase 2, the natural language processing phase, without actually receiving input by voice. This will not seriously hinder the project, as the processing of simple English once it is in text form is the main challenge. At a later date, I hope to have time to go back and get the SR unit working as well, but that is not a priority.