Hi @Jose Ruiz,
I fail to see how dictionary management would help in any meaningful way with regard to the stated objective of this thread. That objective is accurately collecting an email address from a caller. At present, I cannot do this in a reliable way with any of the Genesys STT engines or our own integration with Azure STT. I have tried both regular bot flows as well as using a exorbitantly expensive virtual agent flows with a fully-described AI slot, as noted in my original post.
I feel like you either skimmed my post or just blindly replied to my "STT isn't working well" sentiment, without looking at the specific context. This has literally nothing to do with Virtual Supervisor/Supervisor Copilot.
As an aside, dictionary management, at present, is a painfully manual, slow process, so we're not really leveraging it. Feel free to prioritize this idea if you want to help get adoption of it. I opened it in March, and it has 32 upvotes at this point. https://genesyscloud.ideas.aha.io/ideas/DARSTA-I-340
------------------------------
Paul McGurn
Senior Manager, Telecom & DevOps
Persistent Systems
------------------------------
Original Message:
Sent: 05-29-2025 20:21
From: Jose Ruiz
Subject: Voice virtual agent to collect email address - impossible?
Hello @Paul McGurn
Thank you for the thoughtful and detailed feedback - you've highlighted a crucial aspect of delivering high-performing AI-driven features like Virtual Supervisor and Supervisor Copilot: high-quality transcription. We completely agree that accurate speech-to-text output is foundational to ensuring reliable interaction scoring, sentiment analysis, real-time translation, and AI-generated summaries.
To address the specific challenges you've raised, I'd like to share more about the Custom Dictionary feature, which is designed to improve transcription accuracy by tuning the model to recognize key business-specific terms - including organizational names, product terminology, and industry jargon.
With Custom Dictionary, you can:
Add custom phrases (e.g., "Thank you for calling Presbyterian") to help the model understand and prioritize relevant business language.
Improve recognition of difficult or misheard terms by adding "sounds like" alternatives. This is particularly helpful when commonly misrecognized phrases yield similar-sounding but incorrect words, as in your example.
We recommend starting with boost-only entries for important terms and then iteratively enhancing them with "sounds like" variants for any recurring misrecognitions - as long as they don't conflict with valid words in the language.
This feature effectively gives you a way to embed business context directly into the transcription model, improving accuracy over time, especially in areas most important to your organization.
I'd be happy to connect directly to review your current transcription challenges and provide tailored recommendations on Custom Dictionary usage and other tuning options. Together, we can ensure that transcription-dependent features are delivering maximum value and reliability for your team.
Please don't hesitate to reach out!
------------------------------
Jose Ruiz
Genesys - Employees
Product Manager
jose.ruiz@genesys.com
Original Message:
Sent: 05-19-2025 12:54
From: Paul McGurn
Subject: Voice virtual agent to collect email address - impossible?
I'm trying to use a virtual agent as a proof of concept for collecting email address in a voice flow. I'm finding that the STT part is garbling this so badly that it's literally never accurate. Speaking naturally = no joy. Speaking characters phonetically is slightly better, but still also fails more than it succeeds. I'm using this for the Description (assuming this is pushed to the RAG prompt)
The email address of the caller. It should be a well-formed email address. Callers may speak it naturally, or phonetically. You should detect phonetic speech and convert that to individual characters.dot means to insert a period (.) into the textdash means to insert a dash (-) into the texthyphen means to insert a dash (-) into the textunderscore means to insert an underscore (_) into the textDo not insert a period (.) when the caller pauses. Only use one if they explicitly speak one.
I'm assuming that most of my issues are due to STT being inadequate for this use case, so it's a "garbage in, garbage out" situation. My question is, is anyone actually doing this successfully without resorting to off-platform bots?
Even if the detection/transcription is accurate, the continued lack of SSML support means that read-back is either way too fast, or one....lettter...at...a...time...very...slowly.....
#Architect
------------------------------
Paul McGurn
Senior Manager, Telecom & DevOps
Persistent Systems
------------------------------