Millisecond Forums

Voice Recognition and voice recording on the Stroop Task

By jgranadossamayoa - 8/3/2015

Is there a way that Inquisit can recognize verbal responses and simultaneously save an audio file containing the responses? I know that there are scripts posted in the library that can do one at a time but I would like to combine these in one script. Any input would be appreciated.
By Dave - 10/23/2017

luis - Saturday, October 21, 2017
Dave - Friday, October 20, 2017
luis - Friday, October 20, 2017
seandr - Monday, October 10, 2016
Just a quick follow up - Inquisit 5 introduces a new soundcapture command that allows you to record audio for the entire duration of a trial, block, or experiment.
For example, you can record a participant speaking for a fixed duration of time, or until the participant clicks a continue button. Or, you could record audio for the entire length of an experiment. 
This feature is currently supported in Inquisit 5 Lab only. Support for uploading these and other media data with Inquisit Web will be added in a future release.

Why do you need to record responses? I am running Stroop tasks using speech recognition, and I've found that the engine is perfectly able to recognize and encode the responses, compute RTs, and give appropriate feedback. Incidentally, I am curious about whether the level of precision of these voice-relayed RTs could be comparable to those given on a keyboard... Anyone knows?



Computers generally are imperfect recognizers of human speech. Voice recognition will work very well under good controlled conditions, but for anything else having recordings may be preferable, which is why the option to record exists. A few examples.
- Tasks with only a limited number of response options (like a typical, simple Stroop paradigm) are more amenable to voice recognition than other kinds of tasks involving a larger number of e.g. similar-sounding response options ("rat" and "red" are easier to confuse for a computer than for a human listener).
- Voice recognition engines are not available for all languages, and recognition capabilities for some languages are more mature than for others.
- Environmental and speaker factors can adversely affect recognition accuracy and latency determination (noisy surroundings, bad microphone, bad microphone placement, speaking in a low or hushed voice, coughing).
- Voice recognition engines over time adapt to a given speaker, i.e. the more the engine has been trained to an individual speaker, the better the results will be. The less it has been trained, the likelier mistakes in automatic recognition are.

In short, under some circumstances, recording the audio and determining responses and (re-)measuring latency after the fact based on the recordings may be preferable.

Hope this clarifies.

Thank you Dave,
I just wanted to use this line to ask if you have information about the time accuracy of the voice-relayed RTs. I have seen that, in some result files, RTs start fine over the very first trials, but then, after a few trials, they adopt what seems to me like a suspicious regularity (for instance, all RTs ending in 0, like 540; 630;440;490;...).  I guess this must reflect a problem in how the system detects latency, but I wonder if you might have any idea of why this occurs, why it only occurs "sometimes", and if it could be remedied.
Many thanks

The engine will adapt to the current speaker over the course of the experiment, and generally get better at recognizing her/his responses; that may account for greater variability over the first few trials. This, however, should not lead to a suspicious regularity pattern in later trials -- I'm afraid I have no spontaneous idea where that comes from.