I thought I'd share an update. I have finished the first iteration of the video/audio AI part of this solution. I have created the ability for the 'customer' to interact with the live audio/video AI so it can recognise a book via its ISBN in barcode:
Video and explanation in my LinkedIn post: https://www.linkedin.com/feed/update/urn:li:activity:7383263424652259328/
In more detail, it works by:
- Using the bowser's native ability to scan barcodes against the live webcam stream. This is better quality and with a quicker frame-rate than what my server receives to send to the AI
- Barcodes detected above are immediately sent to my server
- ISBN in barcode used to pre-emptively retrieve book details server-side, reducing latency in conversation
- Gemini Live API given a tool to 'read_barcode', making conversational agent sound like it is indeed reading the barcode. But tool returns result of pre-emptive search
What's next:
Next I want to focus on having a Digital Bot in Genesys Cloud trigger the audio/video appearing for the customer, and receive scanned books it can use in the chatbot's conversation.
------------------------------
Lucas Woodward
Winner of Orchestrator of the Year, Developer (2025)
LinkedIn -
https://www.linkedin.com/in/lucas-woodward-the-devNewsletter -
https://makingchatbots.com------------------------------