Adding binary data to DOM tree in Python (charset encoding problems)
When I was reading some stuff from the database (Postgres) and adding it to a DOM tree in Python, I encountered a strange encoding problem (that’s got something to do with UTF-8).
Ultimately the solution was to set the client encoding to utf-8 in postgres:
alter user <myuser> set client_encoding to 'utf-8'
In the process I also discovered, that if you add binary (not unicode) data to a DOM tree you should decode it first:
<utf8-variable>.decode("utf-8")
And encode it back to utf-8 when writing XML:
doc.toprettyxml(indent=" ", encoding="utf-8"))
Obviously, if the variable contains UFT-8 to begin with, there’s not much point in encoding it back and fourth, however, if other encodings are involved, it should be done this way.