Legacy Dev Forum Posts

 View Only

Sign Up

Parsing Chat transcripts giving latin-1 codec encoding error

  • 1.  Parsing Chat transcripts giving latin-1 codec encoding error

    Posted 06-05-2025 19:20

    KathirAXA | 2019-03-25 02:11:38 UTC | #1

    Hi,

    We are extracting Chat transcripts from getconversationrecordings API. While parsing transcript body, often encounter errors like "'latin-1' codec can't encode character '\U00100083' in position 4850: Body ('\U00100083') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8." "\U00100083" is just a sample, parsing fails for various characters.

    We tried to handle it using Python replace command to replace some of these characters, but there are frequent failures for different characters.

    What is the encoding used to store the transcripts? Is there a best practice to handle this error?

    Thanks, Kathir


    tim.smith | 2019-03-25 13:46:54 UTC | #2

    The recording files should be in utf-8 format.


    system | 2019-04-25 13:48:32 UTC | #3

    This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.


    This post was migrated from the old Developer Forum.

    ref: 4859