Retrieved Audio Quality
Judges generally rated the audio quality of the retrieved sound effects quite highly, with the majority (38%) being rated as five out of five on the audio quality scale (Fig. \ref{318185}). We observe that judges usually rated audio quality as high, even when the generated sound effect did not match with the given conditioning text. This indicates that our generator is still able to generate plausible sound effects from an unconditional distribution, even when the generators conditional distribution has not converged to the true conditional distribution.