Voice virtual agent to collect email address - impossible? | Genesys Cloud

View Only

Back to discussions

Expand all | Collapse all

Voice virtual agent to collect email address - impossible?

1. Voice virtual agent to collect email address - impossible?

Like
Paul McGurn

100 Posts
Posted 05-19-2025 12:55

Reply Reply Privately
I'm trying to use a virtual agent as a proof of concept for collecting email address in a voice flow. I'm finding that the STT part is garbling this so badly that it's literally never accurate. Speaking naturally = no joy. Speaking characters phonetically is slightly better, but still also fails more than it succeeds. I'm using this for the Description (assuming this is pushed to the RAG prompt)

The email address of the caller. It should be a well-formed email address. Callers may speak it naturally, or phonetically. You should detect phonetic speech and convert that to individual characters. dot means to insert a period (.) into the text dash means to insert a dash (-) into the text hyphen means to insert a dash (-) into the text underscore means to insert an underscore (_) into the text Do not insert a period (.) when the caller pauses. Only use one if they explicitly speak one.

I'm assuming that most of my issues are due to STT being inadequate for this use case, so it's a "garbage in, garbage out" situation. My question is, is anyone actually doing this successfully without resorting to off-platform bots?

Even if the detection/transcription is accurate, the continued lack of SSML support means that read-back is either way too fast, or one....lettter...at...a...time...very...slowly.....

#Architect

------------------------------
Paul McGurn
Senior Manager, Telecom & DevOps
Persistent Systems
------------------------------
2. RE: Voice virtual agent to collect email address - impossible?

Like
Simon Brown

Partner
Posted 05-21-2025 12:16

Reply Reply Privately
In relation to SSML, there is some support. It may be TTS engine specific thought.

I have used it to say a slot slowly (better than built in Data type per character).

MakeCommunication(

"You said ",

ToCommunicationSsml(

Append(

"<prosody rate=\"x-slow\">",

Slot.AccountNumber,

"</prosody>")

),

  "Is that correct?"

)

Another example - per character

ToCommunicationSsml(Append("<prosody rate=\"x-slow\">",

            "<say-as interpret-as=\"characters\">",

      Slot.PostCodeSlot,

    "</say-as>", "</prosody>")),

      "Is that correct?"),

As for BOT, just started using VA myself and, at least at present, seems more talk and hype that actually good from a STT source.

------------------------------
Simon Brown
Maintel Europe Limited
Senior Applications Consultant
------------------------------
3. RE: Voice virtual agent to collect email address - impossible?

Like
Jose Ruiz

Genesys
Posted 05-29-2025 20:22

Reply Reply Privately
Hello @Paul McGurn

Thank you for the thoughtful and detailed feedback - you've highlighted a crucial aspect of delivering high-performing AI-driven features like Virtual Supervisor and Supervisor Copilot: high-quality transcription. We completely agree that accurate speech-to-text output is foundational to ensuring reliable interaction scoring, sentiment analysis, real-time translation, and AI-generated summaries.

To address the specific challenges you've raised, I'd like to share more about the Custom Dictionary feature, which is designed to improve transcription accuracy by tuning the model to recognize key business-specific terms - including organizational names, product terminology, and industry jargon.

With Custom Dictionary, you can:

Add custom phrases (e.g., "Thank you for calling Presbyterian") to help the model understand and prioritize relevant business language.

Improve recognition of difficult or misheard terms by adding "sounds like" alternatives. This is particularly helpful when commonly misrecognized phrases yield similar-sounding but incorrect words, as in your example.

We recommend starting with boost-only entries for important terms and then iteratively enhancing them with "sounds like" variants for any recurring misrecognitions - as long as they don't conflict with valid words in the language.

This feature effectively gives you a way to embed business context directly into the transcription model, improving accuracy over time, especially in areas most important to your organization.

I'd be happy to connect directly to review your current transcription challenges and provide tailored recommendations on Custom Dictionary usage and other tuning options. Together, we can ensure that transcription-dependent features are delivering maximum value and reliability for your team.

Please don't hesitate to reach out!

------------------------------
Jose Ruiz
Genesys - Employees
Product Manager
jose.ruiz@genesys.com
------------------------------
4. RE: Voice virtual agent to collect email address - impossible?

Like
Paul McGurn

100 Posts
Posted 05-30-2025 11:07

Reply Reply Privately
Hi Simon, thank you for the insight on this. The SSML approach would definitely improve the read-back, and I had not thought of that. It doesn't cover the STT collection accuracy, though, so I think I'm still stuck here.

------------------------------
Paul McGurn
Senior Manager, Telecom & DevOps
Persistent Systems
------------------------------

Original Message
5. RE: Voice virtual agent to collect email address - impossible?

Like
Paul McGurn

100 Posts
Posted 05-30-2025 11:08

Reply Reply Privately
Hi @Jose Ruiz,

I fail to see how dictionary management would help in any meaningful way with regard to the stated objective of this thread. That objective is accurately collecting an email address from a caller. At present, I cannot do this in a reliable way with any of the Genesys STT engines or our own integration with Azure STT. I have tried both regular bot flows as well as using a exorbitantly expensive virtual agent flows with a fully-described AI slot, as noted in my original post.

I feel like you either skimmed my post or just blindly replied to my "STT isn't working well" sentiment, without looking at the specific context. This has literally nothing to do with Virtual Supervisor/Supervisor Copilot.

As an aside, dictionary management, at present, is a painfully manual, slow process, so we're not really leveraging it. Feel free to prioritize this idea if you want to help get adoption of it. I opened it in March, and it has 32 upvotes at this point. https://genesyscloud.ideas.aha.io/ideas/DARSTA-I-340

------------------------------
Paul McGurn
Senior Manager, Telecom & DevOps
Persistent Systems
------------------------------

Original Message

Genesys Cloud - Developer Community!

Voice virtual agent to collect email address - impossible?

Paul McGurn05-19-2025 12:55

Simon Brown05-21-2025 12:16

Jose Ruiz05-29-2025 20:22

Paul McGurn05-30-2025 11:07

Paul McGurn05-30-2025 11:08

1. Voice virtual agent to collect email address - impossible?

2. RE: Voice virtual agent to collect email address - impossible?

3. RE: Voice virtual agent to collect email address - impossible?

4. RE: Voice virtual agent to collect email address - impossible?

5. RE: Voice virtual agent to collect email address - impossible?

Related Content

Dictionary Management for EVTS supported languages (voice transcription)

Handling re-connects when collecting call-events / transcription

Phonetical comparison of data string

Problem with the Google STT Version

More detailed documentation on Notification Topics?