Genesys Cloud - Developer Community!

 View Only

Sign Up

  • 1.  Voice virtual agent to collect email address - impossible?

    Posted 05-19-2025 12:55

    I'm trying to use a virtual agent as a proof of concept for collecting email address in a voice flow.  I'm finding that the STT part is garbling this so badly that it's literally never accurate.  Speaking naturally = no joy.  Speaking characters phonetically is slightly better, but still also fails more than it succeeds.  I'm using this for the Description (assuming this is pushed to the RAG prompt)

    The email address of the caller.  It should be a well-formed email address.  Callers may speak it naturally, or phonetically.  You should detect phonetic speech and convert that to individual characters.
    
    dot means to insert a period (.) into the text
    dash means to insert a dash (-) into the text
    hyphen means to insert a dash (-) into the text
    underscore means to insert an underscore (_) into the text
    
    Do not insert a period (.) when the caller pauses.  Only use one if they explicitly speak one.

    I'm assuming that most of my issues are due to STT being inadequate for this use case, so it's a "garbage in, garbage out" situation.  My question is, is anyone actually doing this successfully without resorting to off-platform bots?

    Even if the detection/transcription is accurate, the continued lack of SSML support means that read-back is either way too fast, or one....lettter...at...a...time...very...slowly.....


    #Architect

    ------------------------------
    Paul McGurn
    Senior Manager, Telecom & DevOps
    Persistent Systems
    ------------------------------


  • 2.  RE: Voice virtual agent to collect email address - impossible?

    Posted 05-21-2025 12:16

    In relation to SSML, there is some support. It may be TTS engine specific thought.

    I have used it to say a slot slowly (better than built in Data type per character).

    MakeCommunication(

    "You said ",

    ToCommunicationSsml(

    Append(     

     "<prosody rate=\"x-slow\">",

     Slot.AccountNumber,

     "</prosody>")

     ),

      "Is that correct?"

      )

    Another example - per character

    ToCommunicationSsml(Append("<prosody rate=\"x-slow\">",

                "<say-as interpret-as=\"characters\">",

          Slot.PostCodeSlot,

        "</say-as>", "</prosody>")),

          "Is that correct?"),

    As for BOT, just started using VA myself and, at least at present, seems more talk and hype that actually good from a STT source.



    ------------------------------
    Simon Brown
    Maintel Europe Limited
    Senior Applications Consultant
    ------------------------------



  • 3.  RE: Voice virtual agent to collect email address - impossible?

    Posted 05-29-2025 20:22

    Hello @Paul McGurn

    Thank you for the thoughtful and detailed feedback - you've highlighted a crucial aspect of delivering high-performing AI-driven features like Virtual Supervisor and Supervisor Copilot: high-quality transcription. We completely agree that accurate speech-to-text output is foundational to ensuring reliable interaction scoring, sentiment analysis, real-time translation, and AI-generated summaries.

    To address the specific challenges you've raised, I'd like to share more about the Custom Dictionary feature, which is designed to improve transcription accuracy by tuning the model to recognize key business-specific terms - including organizational names, product terminology, and industry jargon.

    With Custom Dictionary, you can:

    • Add custom phrases (e.g., "Thank you for calling Presbyterian") to help the model understand and prioritize relevant business language.

    • Improve recognition of difficult or misheard terms by adding "sounds like" alternatives. This is particularly helpful when commonly misrecognized phrases yield similar-sounding but incorrect words, as in your example.

    We recommend starting with boost-only entries for important terms and then iteratively enhancing them with "sounds like" variants for any recurring misrecognitions - as long as they don't conflict with valid words in the language.

    This feature effectively gives you a way to embed business context directly into the transcription model, improving accuracy over time, especially in areas most important to your organization.

    I'd be happy to connect directly to review your current transcription challenges and provide tailored recommendations on Custom Dictionary usage and other tuning options. Together, we can ensure that transcription-dependent features are delivering maximum value and reliability for your team.

    Please don't hesitate to reach out!



    ------------------------------
    Jose Ruiz
    Genesys - Employees
    Product Manager
    jose.ruiz@genesys.com
    ------------------------------



  • 4.  RE: Voice virtual agent to collect email address - impossible?

    Posted 05-30-2025 11:07

    Hi Simon, thank you for the insight on this.  The SSML approach would definitely improve the read-back, and I had not thought of that.  It doesn't cover the STT collection accuracy, though, so I think I'm still stuck here.



    ------------------------------
    Paul McGurn
    Senior Manager, Telecom & DevOps
    Persistent Systems
    ------------------------------



  • 5.  RE: Voice virtual agent to collect email address - impossible?

    Posted 05-30-2025 11:08

    Hi @Jose Ruiz,

    I fail to see how dictionary management would help in any meaningful way with regard to the stated objective of this thread.  That objective is accurately collecting an email address from a caller.  At present, I cannot do this in a reliable way with any of the Genesys STT engines or our own integration with Azure STT.  I have tried both regular bot flows as well as using a exorbitantly expensive virtual agent flows with a fully-described AI slot, as noted in my original post.

    I feel like you either skimmed my post or just blindly replied to my "STT isn't working well" sentiment, without looking at the specific context.  This has literally nothing to do with Virtual Supervisor/Supervisor Copilot.

    As an aside, dictionary management, at present, is a painfully manual, slow process, so we're not really leveraging it.  Feel free to prioritize this idea if you want to help get adoption of it.  I opened it in March, and it has 32 upvotes at this point.  https://genesyscloud.ideas.aha.io/ideas/DARSTA-I-340 



    ------------------------------
    Paul McGurn
    Senior Manager, Telecom & DevOps
    Persistent Systems
    ------------------------------