Computers generally are imperfect recognizers of human speech. Voice recognition will work very well under good controlled conditions, but for anything else having recordings may be preferable, which is why the option to record exists. A few examples.
- Tasks with only a limited number of response options (like a typical, simple Stroop paradigm) are more amenable to voice recognition than other kinds of tasks involving a larger number of e.g. similar-sounding response options ("rat" and "red" are easier to confuse for a computer than for a human listener).
- Voice recognition engines are not available for all languages, and recognition capabilities for some languages are more mature than for others.
- Environmental and speaker factors can adversely affect recognition accuracy and latency determination (noisy surroundings, bad microphone, bad microphone placement, speaking in a low or hushed voice, coughing).
- Voice recognition engines over time adapt to a given speaker, i.e. the more the engine has been trained to an individual speaker, the better the results will be. The less it has been trained, the likelier mistakes in automatic recognition are.
In short, under some circumstances, recording the audio and determining responses and (re-)measuring latency after the fact based on the recordings may be preferable.
Hope this clarifies.