2 days ago
A collection of articles, ideas, and rambling from a guy who wrote some software that one time.
Are people claiming that there should be no default encoding?That's what I would say, yes. The default encoding is a process-global variable that sets you up for a lot of confusion, since encoding is always context and data-type dependent. Occasionally I get lazy and use the default encoding, since I know that regardless of what it is it probably has ASCII as a subset (and I know that my data is something like an email address or a URL which functionally must be ASCII), but this is not generally good behavior.
As long as we have non-Unicode strings, I find the argument less than convincing, and I think it reflects the perspective of people who take Unicode very seriously, as compared to programmers who aren't quite so concerned but just want their applications to not be broken; and the current status quo is very deeply broken.I believe that in the context of this discussion, the term "string" is meaningless. There is text, and there is byte-oriented data (which may very well represent text, but is not yet converted to it). In Python types, Text is unicode. Data is str. The idea of "non-Unicode text" is just a programming error waiting to happen.