Skip main navigation (Press Enter).

Legacy Dev Forum Posts

View Only

Back to discussions

Expand all | Collapse all

Parsing Chat transcripts giving latin-1 codec encoding error

1. Parsing Chat transcripts giving latin-1 codec encoding error

Like
Developer Community
Posted 06-05-2025 19:20
KathirAXA | 2019-03-25 02:11:38 UTC | #1

Hi,

We are extracting Chat transcripts from getconversationrecordings API. While parsing transcript body, often encounter errors like "'latin-1' codec can't encode character '\U00100083' in position 4850: Body ('\U00100083') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8." "\U00100083" is just a sample, parsing fails for various characters.

We tried to handle it using Python replace command to replace some of these characters, but there are frequent failures for different characters.

What is the encoding used to store the transcripts? Is there a best practice to handle this error?

Thanks, Kathir

tim.smith | 2019-03-25 13:46:54 UTC | #2

The recording files should be in utf-8 format.

system | 2019-04-25 13:48:32 UTC | #3

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

This post was migrated from the old Developer Forum.
ref: 4859

Powered by Higher Logic

Global message icon