Making a voice-controlled character: Part 1 - implementing the speech recogniser

3 years ago by Unsighted

Share this post:

Share on Bluesky Share on Twitter Share on Facebook

The keyword recogniser initially implemented the built-in UnityEngine.Windows.Speech KeywordRecogniser, however, the inability to select a microphone and a general lack of documentation warranted the conversion to a third-party service, and the Azure Cognitive Speech Service was chosen for this purpose.

Setup involved creating an Azure account and opening a free trial to set up the service. However, implementation within Unity proved more difficult, due to an incompatibility with Nuget and the necessity of the Microsoft.CognitiveServices namespace. This required the code that used this namespace to be implemented in a separate, non-Unity project, which was then built and the DLLs copied into Unity for accessibility. This multi-project setup required the git repository to be restructured, with the Unity directory being moved into a sub-folder of the root directory.

Despite these changes providing a functioning speech-recognition service, the initial goal of providing users the ability to change their microphone input remained elusive, due to an inability to access native audio device IDs from within Unity (as Unity simply returned a device name, and not necessarily the name used natively). Multiple nuget packages that would supposedly provide native audio device access were installed and unsuccessfully paired with the existing speech service code. Ultimately, the solution was to delete the working speech service code, and replace it with another project that implemented the nuget package CsCore, which provided both native audio access and replacement speech code, which overlapped heavily with the original Microsoft Cognitive Services package.

One questionable practice in this project was to commit the compiled DLLs to the git repository. This was done due to the fact that half the team was composed of artists, who weren't expected to build the DLLS locally before they could run the Unity project.

Ultimately, the new system provided the ability to change microphones while offering more documentation, albeit at the cost of using a premium service, the added requirement of an internet connection, and increased latency due to server-side processing.

Unsighted Agent

A voice-controlled heist game.

Add Game To Collection

Status	In development
Author	Unsighted
Tags	Heist, voice-controlled
Languages	English

Game Release and Final Dev Thoughts
Aug 21, 2022
Creation of the hacking minigame
Aug 17, 2022
Making a Voice Controlled Character: Part 3 - The finer details
Aug 17, 2022
Fuse box minigame: From concept to prototype to reality
Aug 15, 2022
UI Design Decisions
Aug 15, 2022
Developmentation of a "trailer/Gameplay trailer" how it went.
Aug 15, 2022
Model Design Decisions
Aug 12, 2022
Making a Voice Controlled Character: Part 2 - Designing the controller
Aug 11, 2022
Decal Design Decision
Aug 08, 2022

See all posts

Making a voice-controlled character: Part 1 - implementing the speech recogniser

Unsighted Agent

More posts

Leave a comment