str¶
The str object in Python 3 is quite similar but not identical to the
Python 2 unicode object.
The major difference is the stricter type-checking of Py3’s str that
enforces a distinction between unicode strings and byte-strings, such as when
comparing, concatenating, joining, or replacing parts of strings.
There are also other differences, such as the repr of unicode strings in
Py2 having a u'...' prefix, versus simply '...', and the removal of
the str.decode() method in Py3.
future contains a newstr type that is a backport of the
str object from Python 3. This inherits from the Python 2
unicode class but has customizations to improve compatibility with
Python 3’s str object. You can use it as follows:
>>> from __future__ import unicode_literals
>>> from builtins import str
On Py2, this gives us:
>>> str
future.types.newstr.newstr
(On Py3, it is simply the usual builtin str object.)
Then, for example, the following code has the same effect on Py2 as on Py3:
>>> s = str(u'ABCD')
>>> assert s != b'ABCD'
>>> assert isinstance(s.encode('utf-8'), bytes)
>>> assert isinstance(b.decode('utf-8'), str)
These raise TypeErrors:
>>> bytes(b'B') in s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not <type 'str'>
>>> s.find(bytes(b'A'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument can't be <type 'str'>
Various other operations that mix strings and bytes or other types are
permitted on Py2 with the newstr class even though they
are illegal with Python 3. For example:
>>> s2 = b'/' + str('ABCD')
>>> s2
'/ABCD'
>>> type(s2)
future.types.newstr.newstr
This is allowed for compatibility with parts of the Python 2 standard
library and various third-party libraries that mix byte-strings and unicode
strings loosely. One example is os.path.join on Python 2, which
attempts to add the byte-string b'/' to its arguments, whether or not
they are unicode. (See posixpath.py.) Another example is the
escape() function in Django 1.4’s django.utils.html.
In most other ways, these builtins.str objects on Py2 have the
same behaviours as Python 3’s str:
>>> s = str('ABCD')
>>> assert repr(s) == 'ABCD' # consistent repr with Py3 (no u prefix)
>>> assert list(s) == ['A', 'B', 'C', 'D']
>>> assert s.split('B') == ['A', 'CD']
The str type from builtins also provides support for the
surrogateescape error handler on Python 2.x. Here is an example that works
identically on Python 2.x and 3.x:
>>> from builtins import str
>>> s = str(u'\udcff')
>>> s.encode('utf-8', 'surrogateescape')
b'\xff'
This feature is in alpha. Please leave feedback here about whether this works for you.