Making a voice-controlled character: Part 1 - implementing the speech recogniser


The keyword recogniser initially implemented the built-in UnityEngine.Windows.Speech KeywordRecogniser, however, the inability to select a microphone and a general lack of documentation warranted the conversion to a third-party service, and the Azure Cognitive Speech Service was chosen for this purpose.

Setup involved creating an Azure account and opening a free trial to set up the service. However, implementation within Unity proved more difficult, due to an incompatibility with Nuget and the necessity of the Microsoft.CognitiveServices namespace. This required the code that used this namespace to be implemented in a separate, non-Unity project, which was then built and the DLLs copied into Unity for accessibility. This multi-project setup required the git repository to be restructured, with the Unity directory being moved into a sub-folder of the root directory.

Despite these changes providing a functioning speech-recognition service, the initial goal of providing users the ability to change their microphone input remained elusive, due to an inability to access native audio device IDs from within Unity (as Unity simply returned a device name, and not necessarily the name used natively). Multiple nuget packages that would supposedly provide native audio device access were installed and unsuccessfully paired with the existing speech service code. Ultimately, the solution was to delete the working speech service code, and replace it with another project that implemented the nuget package CsCore, which provided both native audio access and replacement speech code, which overlapped heavily with the original Microsoft Cognitive Services package.

One questionable practice in this project was to commit the compiled DLLs to the git repository. This was done due to the fact that half the team was composed of artists, who weren't expected to build the DLLS locally before they could run the Unity project.

Ultimately, the new system provided the ability to change microphones while offering more documentation, albeit at the cost of using a premium service, the added requirement of an internet connection, and increased latency due to server-side processing.

Leave a comment

Log in with itch.io to leave a comment.