PEP 263 – Defining Python Source Code Encodings
PEP: 263
Title: Defining Python Source Code Encodings
Author: mal at lemburg.com (Marc-André Lemburg), martin at v.loewis.de (Martin von Löwis)
Status: Final
Type: Standards Track
Created: 06-Jun-2001
Python-Version: 2.3
Post-History:
Contents
Abstract
Problem
Proposed Solution
Defining the Encoding
Examples
Concepts
Implementation
Phases
Scope
References
History
Copyright
Abstract
This PEP proposes to introduce a syntax to declare the encoding of a Python source file. The encoding information is then used by the Python parser to interpret the file using the given encoding. Most notably this enhances the interpretation of Unicode literals in the source code and makes it possible to write Unicode literals using e.g. UTF-8 directly in an Unicode aware editor.
Problem
In Python 2.1, Unicode literals can only be written using the Latin-1 based encoding “unicode-escape”. This makes the programming environment rather unfriendly to Python users who live and work in non-Latin-1 locales such as many of the Asian countries. Programmers can write their 8-bit strings using the favorite encoding, but are bound to the “unicode-escape” encoding for Unicode literals.
Proposed Solution
I propose to make the Python source code encoding both visible and changeable on a per-source file basis by using a special comment at the top of the file to declare the encoding.
To make Python aware of this encoding declaration a number of concept changes are necessary with respect t