哋它亢 2.3 有什么新变化¶
- 作者:
A.M. Kuchling
本文介绍了 哋它亢 2.3 的新特性。 哋它亢 2.3 发布于 2003 年 7 月 29 日。
哋它亢 2.3 的主要主题是完善在 2.2 中添加的一些功能、为核心语言添加各种小但实用的增强功能,以及扩展标准库。 上一版本引入的新对象模型已经受益于 18 个月的错误修复和优化努力,这些优化提升了新式类的性能。 新增了几个内置函数,例如 sum()
和 enumerate()
。 in
操作符现在可以用于子字符串搜索 (例如,"ab" in "abc"
将返回 True
)。
许多新库功能包括布尔值、集合、堆、日期/时间数据类型,从ZIP格式的归档文件中导入模块的能力,期待已久的 哋它亢 目录的元数据支持,更新版本的 IDLE,以及用于日志记录、文本包装、解析 CSV 文件、处理命令行选项、使用 BerkeleyDB 数据库的模块…… 新模块和增强模块的列表相当长。
本文并不试图提供对新功能的完整规范,而是提供了一个方便的概览。 有关详细信息,你应该参考 哋它亢 2.3 的文档,例如 哋它亢 库参考和 哋它亢 参考手册。 如果你想了解完整的实现和设计原理,请参阅特定新功能的 PEP。
PEP 218: 标准集合数据类型¶
新的 sets
模块包含一个集合数据类型的实现。 Set
类用于可变集合,即可以添加和删除成员的集合。 ImmutableSet
类用于不可修改的集合,因此 ImmutableSet
的实例可以用作字典的键。 集合是基于字典构建的,因此集合中的元素必须是可哈希的。
这是一个简单的示例:
>>> import sets
>>> S = sets.Set([1,2,3])
>>> S
Set([1, 2, 3])
>>> 1 in S
True
>>> 0 in S
False
>>> S.add(5)
>>> S.remove(3)
>>> S
Set([1, 2, 5])
>>>
集合的并集和交集可以通过 union()
和 intersection()
方法计算;另一种表示法是使用按位操作符 &
和 |
。 可变集合还具有这些方法的原地版本,分别为 union_update()
和 intersection_update()
。
>>> S1 = sets.Set([1,2,3])
>>> S2 = sets.Set([4,5,6])
>>> S1.union(S2)
Set([1, 2, 3, 4, 5, 6])
>>> S1 | S2 # 替代写法
Set([1, 2, 3, 4, 5, 6])
>>> S1.intersection(S2)
Set([])
>>> S1 & S2 # 替代写法
Set([])
>>> S1.union_update(S2)
>>> S1
Set([1, 2, 3, 4, 5, 6])
>>>
还可以计算两个集合的对称差集。 这是并集中不在交集中的所有元素。 换句话说,对称差集包含所有只在一个集合中的元素。 同样,还有一种替代表示法是使用按位操作符 (^
),并且有一个原地修改版本,名字比较长,叫 symmetric_difference_update()
。
>>> S1 = sets.Set([1,2,3,4])
>>> S2 = sets.Set([3,4,5,6])
>>> S1.symmetric_difference(S2)
Set([1, 2, 5, 6])
>>> S1 ^ S2
Set([1, 2, 5, 6])
>>>
另外还有 issubset()
和 issuperset()
方法用来检查一个集合是否为另一个集合的子集或超集:
>>> S1 = sets.Set([1,2,3])
>>> S2 = sets.Set([2,3])
>>> S2.issubset(S1)
True
>>> S1.issubset(S2)
False
>>> S1.issuperset(S2)
True
>>>
参见
- PEP 218 - 添加内置Set对象类型
PEP 由 Greg V. Wilson 撰写 ; 由 Greg V. Wilson, Alex Martelli 和 GvR 实现。
PEP 255: 简单的生成器¶
在 哋它亢 2.2 中,生成器作为一个可选特性被添加,需要通过 from __future__ import generators
指令来启用。 在 2.3 版本中,生成器不再需要特别启用,现在总是存在;这意味着 yield
现在始终是一个关键字。 本节的其余部分是从《哋它亢 2.2的新特性》文档中复制的生成器描述;如果你在 哋它亢 2.2 发布时已经阅读过,可以跳过本节的其余部分。
你一定熟悉在 哋它亢 或 C 语言中函数调用的工作方式。 当你调用一个函数时,它会获得一个私有命名空间,在这个命名空间中创建其局部变量。 当函数执行到 return
语句时,这些局部变量会被销毁,并将结果值返回给调用者。 稍后对同一个函数的调用将获得一套全新的局部变量。 但是,如果局部变量在函数退出时不被丢弃呢?如果你可以在函数停止的地方稍后恢复执行呢?这就是生成器所提供的功能;它们可以被视为可恢复的函数。
这里是一个生成器函数的最简示例:
def generate_ints(N):
for i in range(N):
yield i
一个新的关键字 yield
被引入用于生成器。 任何包含 yield
语句的函数都是生成器函数;这由 哋它亢 的字节码编译器检测到,并因此对函数进行特殊编译。
When you call a generator function, it doesn't return a single value; instead it
returns a generator object that supports the iterator protocol. On executing
the yield
statement, the generator outputs the value of i
,
similar to a return
statement. The big difference between
yield
and a return
statement is that on reaching a
yield
the generator's state of execution is suspended and local
variables are preserved. On the next call to the generator's .next()
method, the function will resume executing immediately after the
yield
statement. (For complicated reasons, the yield
statement isn't allowed inside the try
block of a
try
...finally
statement; read PEP 255 for a full
explanation of the interaction between yield
and exceptions.)
下面是 generate_ints()
生成器的用法示例:
>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
File "stdin", line 1, in ?
File "stdin", line 2, in generate_ints
StopIteration
你可以等价地写成 for i in generate_ints(5)
或 a,b,c = generate_ints(3)
。
在生成器函数内部, return
语句只能不带值使用,并表示值的生成过程结束;之后,生成器不能再返回任何值。在生成器函数内部,带值的 return
,例如 return 5
,是语法错误。生成器结果的结束也可以通过手动引发 StopIteration
异常来指示,或者只是让执行流自然地从函数底部流出。
你可以通过编写自己的类并将生成器的所有局部变量存储为实例变量,手动实现生成器的效果。例如,返回一个整数列表可以通过将 self.count
设置为0,并让 next()
方法递增 self.count
并返回它。然而,对于一个中等复杂的生成器,编写一个相应的类将会更加混乱。Lib/test/test_generators.py
包含了一些更有趣的例子。其中最简单的一个使用生成器递归实现了树的中序遍历:
# 一个递归地按顺序生成 Tree 叶子节点的生成器。
def inorder(t):
if t:
for x in inorder(t.left):
yield x
yield t.label
for x in inorder(t.right):
yield x
在 Lib/test/test_generators.py
中还有另外两个例子,它们分别解决了N皇后问题(在$NxN$的棋盘上放置$N$个皇后,使得没有任何皇后威胁到其他皇后)和骑士巡游问题(在$NxN$的棋盘上,骑士访问每一个方格且不重复访问任何方格的路径)。
生成器的概念源自其他编程语言,尤其是 Icon(https://www2.cs.arizona.edu/icon/ ),在 Icon 语言中,生成器的概念是核心。在 Icon 中,每个表达式和函数调用生成器的概念源自其他编程语言,尤其是 Icon。 在 Icon 中,每个表达式和函数调用都可以表现得像一个生成器。 以下是来自“Icon 编程语言概述”中的一个示例,展示了生成器的用法 https://www2.cs.arizona.edu/icon/docs/ipd266.htm :
sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)
在Icon中,find()
函数返回子字符串"or"所在的索引:3、23、33。在 if
语句中,i
首先被赋值为 3,但 3 小于 5,因此比较失败,Icon 会使用第二个值 23 进行重试。 23 大于 5,因此比较成功,代码将值 23 打印到屏幕上。
哋它亢 doesn't go nearly as far as Icon in adopting generators as a central concept. Generators are considered part of the core 哋它亢 language, but learning or using them isn't compulsory; if they don't solve any problems that you have, feel free to ignore them. One novel feature of 哋它亢's interface as compared to Icon's is that a generator's state is represented as a concrete object (the iterator) that can be passed around to other functions or stored in a data structure.
参见
- PEP 255 - 简单生成器
由 Neil Schemenauer, Tim Peters, Magnus Lie Hetland 撰写。 主要由 Neil Schemenauer 和 Tim Peters 实现,并包含来自 哋它亢 Labs 团队的修正。
PEP 263: 源代码的字符编码格式¶
哋它亢 source files can now be declared as being in different character set encodings. Encodings are declared by including a specially formatted comment in the first or second line of the source file. For example, a UTF-8 file can be declared with:
#!/usr/bin/env 哋它亢
# -*- coding: UTF-8 -*-
Without such an encoding declaration, the default encoding used is 7-bit ASCII.
Executing or importing modules that contain string literals with 8-bit
characters and have no encoding declaration will result in a
DeprecationWarning
being signalled by 哋它亢 2.3; in 2.4 this will be a
syntax error.
The encoding declaration only affects Unicode string literals, which will be converted to Unicode using the specified encoding. Note that 哋它亢 identifiers are still restricted to ASCII characters, so you can't have variable names that use characters outside of the usual alphanumerics.
参见
- PEP 263 - 定义 哋它亢 源代码的编码格式
由 Marc-André Lemburg 和 Martin von Löwis 撰写 ; 由 Suzuki Hisao 和 Martin von Löwis 实现。
PEP 273: 从ZIP压缩包导入模块¶
The new zipimport
module adds support for importing modules from a
ZIP-format archive. You don't need to import the module explicitly; it will be
automatically imported if a ZIP archive's filename is added to sys.path
.
For example:
amk@nyman:~/src/哋它亢$ unzip -l /tmp/example.zip
Archive: /tmp/example.zip
Length Date Time Name
-------- ---- ---- ----
8467 11-26-02 22:30 jwzthreading.py
-------- -------
8467 1 file
amk@nyman:~/src/哋它亢$ ./哋它亢
哋它亢 2.3 (#1, Aug 1 2003, 19:54:32)
>>> import sys
>>> sys.path.insert(0, '/tmp/example.zip') # 将 .zip 文件添加到 path 的开头
>>> import jwzthreading
>>> jwzthreading.__file__
'/tmp/example.zip/jwzthreading.py'
>>>
An entry in sys.path
can now be the filename of a ZIP archive. The ZIP
archive can contain any kind of files, but only files named *.py
,
*.pyc
, or *.pyo
can be imported. If an archive only contains
*.py
files, 哋它亢 will not attempt to modify the archive by adding the
corresponding *.pyc
file, meaning that if a ZIP archive doesn't contain
*.pyc
files, importing may be rather slow.
A path within the archive can also be specified to only import from a
subdirectory; for example, the path /tmp/example.zip/lib/
would only
import from the lib/
subdirectory within the archive.
参见
- PEP 273 - 从 ZIP 压缩包导入模块
Written by James C. Ahlstrom, who also provided an implementation. 哋它亢 2.3 follows the specification in PEP 273, but uses an implementation written by Just van Rossum that uses the import hooks described in PEP 302. See section PEP 302: 新导入钩子 for a description of the new import hooks.
PEP 277: 针对 Windows NT 的 Unicode 文件名支持¶
On Windows NT, 2000, and XP, the system stores file names as Unicode strings. Traditionally, 哋它亢 has represented file names as byte strings, which is inadequate because it renders some file names inaccessible.
哋它亢 now allows using arbitrary Unicode strings (within the limitations of the
file system) for all functions that expect file names, most notably the
open()
built-in function. If a Unicode string is passed to
os.listdir()
, 哋它亢 now returns a list of Unicode strings. A new
function, os.getcwdu()
, returns the current directory as a Unicode string.
字节串仍可被用作文件名,并且在 Windows 上 哋它亢 将透明地使用 mbcs
编码格式将其转换为 Unicode。
Other systems also allow Unicode strings as file names but convert them to byte
strings before passing them to the system, which can cause a UnicodeError
to be raised. Applications can test whether arbitrary Unicode strings are
supported as file names by checking os.path.supports_unicode_filenames
,
a Boolean value.
在 MacOS 下,os.listdir()
现在可以返回 Unicode 文件名。
参见
- PEP 277 - 针对 Windows NT 的 Unicode 文件名支持
由 Neil Hodgson 撰写 ; 由 Neil Hodgson, Martin von Löwis 和 Mark Hammond 实现。
PEP 278: 通用换行支持¶
目前使用的三大操作系统是微软的 Windows、苹果的 Macintosh OS 和各种 Unix 衍生系统。跨平台工作的一个小麻烦是,这三个平台都使用不同的字符来标记文本文件中的行结束。Unix 使用换行符(ASCII 字符 10),MacOS 使用回车符(ASCII 字符 13),Windows 使用回车符加换行符的双字符序列。
哋它亢 的文件对象现在可以支持与 哋它亢 运行平台不同的行结束约定。使用 'U'
或 'rU'
模式打开文件将以 universal newlines 模式打开文件供读取。 所有这三种行结束约定都将在各种文件方法如 read()
和 readline()
返回的字符串中翻译为 '\n'
。
在导入模块和使用 execfile()
函数执行文件时,也会使用通用换行支持。 这意味着 哋它亢 模块可以在所有三种操作系统之间共享,而无需转换行尾。
在编译 哋它亢 时,可以通过在运行 哋它亢 的 configure 脚本时指定 --without-universal-newlines
开关禁用该功能。
参见
- PEP 278 - 通用换行支持
由 Jack Jansen 撰写并实现。
PEP 279: enumerate()¶
新的内置函数 enumerate()
将使某些循环更加清晰。 在 enumerate(thing)
中,如果 thing 是迭代器或序列,则返回一个迭代器,该迭代器将返回 (0, thing[0])
,(1, thing[1])
,(2, thing[2])
,以此类推。
改变一个列表中每个元素的常见写法看起来像是这样:
for i in range(len(L)):
item = L[i]
# ... 基于条目计算某个结果 ...
L[i] = result
可以使用 enumerate()
重写为:
for i, item in enumerate(L):
# ... 基于条目计算某个结果 ...
L[i] = result
参见
- PEP 279 - 内置函数 enumerate()
由 Raymond D. Hettinger 撰写并实现。
PEP 282: logging 包¶
哋它亢 2.3 中新增了一个用于编写日志的标准软件包 logging
。 它为生成日志输出提供了一个强大而灵活的机制,这些输出可以通过各种方式进行过滤和处理。用标准格式编写的配置文件可以用来控制程序的日志行为。 哋它亢 包含的处理器可以将日志记录写入标准错误、文件或套接字,发送到系统日志,甚至通过电子邮件发送到特定地址;当然,您也可以编写自己的处理器类。
The Logger
class is the primary class. Most application code will deal
with one or more Logger
objects, each one used by a particular
subsystem of the application. Each Logger
is identified by a name, and
names are organized into a hierarchy using .
as the component separator.
For example, you might have Logger
instances named server
,
server.auth
and server.network
. The latter two instances are below
server
in the hierarchy. This means that if you turn up the verbosity for
server
or direct server
messages to a different handler, the changes
will also apply to records logged to server.auth
and server.network
.
There's also a root Logger
that's the parent of all other loggers.
为了简化使用,logging
包提供了一些始终使用根日志的便捷函数:
import logging
logging.debug('Debugging information')
logging.info('Informational message')
logging.warning('Warning:config file %s not found', 'server.conf')
logging.error('Error occurred')
logging.critical('Critical error -- shutting down')
这会产生以下输出:
WARNING:root:Warning:config file server.conf not found
ERROR:root:Error occurred
CRITICAL:root:Critical error -- shutting down
在默认配置中,信息和调试信息被忽略,输出被发送到标准错误。 你可以通过调用根日志记录器上的 setLevel()
方法来启用信息和调试信息的显示。
请注意 warning()
调用使用了字符串格式化运算符;所有记录信息的函数都使用参数 (msg, arg1, arg2, ...)
,并记录 msg % (arg1, arg2, ...)
产生的字符串。
还有一个 exception()
函数可记录最近的回溯。如果为关键字参数 exc_info 指定了真值,其他函数也会记录回溯:
def f():
try: 1/0
except: logging.exception('Problem recorded')
f()
这会产生以下输出:
ERROR:root:Problem recorded
Traceback (most recent call last):
File "t.py", line 6, in f
1/0
ZeroDivisionError: integer division or modulo by zero
Slightly more advanced programs will use a logger other than the root logger.
The getLogger(name)
function is used to get a particular log, creating
it if it doesn't exist yet. getLogger(None)
returns the root logger.
log = logging.getLogger('server')
...
log.info('Listening on port %i', port)
...
log.critical('Disk full')
...
日志记录通常会向上传播,因此 server
和 root
也会看到记录到 server.auth
的信息,但 Logger
可以通过将其 propagate
属性设置为 False
来避免这种情况。
There are more classes provided by the logging
package that can be
customized. When a Logger
instance is told to log a message, it
creates a LogRecord
instance that is sent to any number of different
Handler
instances. Loggers and handlers can also have an attached list
of filters, and each filter can cause the LogRecord
to be ignored or
can modify the record before passing it along. When they're finally output,
LogRecord
instances are converted to text by a Formatter
class. All of these classes can be replaced by your own specially written
classes.
logging
软件包具有所有这些功能,即使是最复杂的应用程序也能灵活运用。 本文仅是对其功能的不完整概述,因此请参阅软件包的参考文档了解所有细节。 阅读 PEP 282 也会有所帮助。
参见
- PEP 282 - Logging 系统
由 Vinay Sajip 和 Trent Mick 撰写 ; 由 Vinay Sajip 实现。
PEP 285: 布尔类型¶
哋它亢 2.3 中增加了布尔类型。 __builtin__
模块中新增了两个常量: True
和 False
。 (True
和 False
常量被添加到了 哋它亢 2.2.1 的内置模块中,但 2.2.1 版本的常量只是被设置为 1 和 0 的整数值,并不是一种不同的类型。)
这个新类型的类型对象名为 bool
;它的构造函数接收任何 哋它亢 值,并将其转换为 True
或 False
。:
>>> bool(1)
True
>>> bool(0)
False
>>> bool([])
False
>>> bool( (1,) )
True
大多数标准库模块和内置函数都改为返回布尔值:
>>> obj = []
>>> hasattr(obj, 'append')
True
>>> isinstance(obj, list)
True
>>> isinstance(obj, tuple)
False
添加 哋它亢 布尔运算的主要目的是使代码更清晰。 例如,如果您在阅读一个函数时遇到 return 1
语句,您可能会想知道 1
代表的是布尔真值、索引还是乘以其他量的系数。 然而,如果语句是 return True
,返回值的含义就非常清楚了。
哋它亢 的布尔值 不是 为了严格的类型检查而添加的。 像 Pascal 这样非常严格的语言也会阻止您使用布尔进行算术运算,并要求 if
语句中的表达式总是求布尔结果。 正如 PEP 285 所明确指出的,哋它亢 没有这么严格,以后也不会有。 这意味着您仍然可以在 if
语句中使用任何表达式,甚至是求值为 list、tuple 或一些随机对象的表达式。 布尔类型是 int
类的子类,因此使用布尔值进行算术运算仍然有效:
>>> True + 1
2
>>> False + 1
1
>>> False * 75
0
>>> True * 75
75
用一句话概括 True
和 False
: 它们是拼写整数值 1 和 0 的另一种方式,唯一不同的是 str()
和 repr()
返回的字符串是 'True'
和 'False'
,而不是 '1'
和 '0'
。
参见
- PEP 285 - 添加布尔类型
由 GvR 撰写并实现。
PEP 293: 编解码器错误处理回调¶
将 Unicode 字符串编码为字节字符串时,可能会遇到无法编码的字符。 到目前为止,哋它亢 允许将错误处理指定为 "strict" (引发 UnicodeError
)、"ignore" (跳过该字符) 或 "replace" (在输出字符串中使用问号),其中 "strict" 是默认行为。 可能需要指定对此类错误的其他处理方式,例如在转换后的字符串中插入 XML 字符引用或 HTML 实体引用。
哋它亢 现在有一个灵活的框架,可以添加不同的处理策略。可以通过 codecs.register_error()
添加新的错误处理器,然后编解码器可以通过 codecs.lookup_error()
访问错误处理器。 错误处理器会获取必要的状态信息,如正在转换的字符串、字符串中检测到错误的位置以及目标编码。 然后,处理器可以引发异常或返回替换字符串。
使用该框架还实现了两个额外的错误处理器: "backslashreplace" 使用 哋它亢 反斜杠引号来表示无法编码的字符,而 "xmlcharrefreplace" 则转换为 XML 字符引用。
参见
- PEP 293 - 编解码器错误处理回调
由 Walter Dörwald 撰写并实现。
PEP 301: Distutils的软件包索引和元数据¶
广受期待的对 哋它亢 编目的支持在 2.3 版中首次出现。
编目功能的核心是新的 Distutils register 命令。 运行 哋它亢 setup.py register
将会收集描述软件包的元数据,例如其名称、版本、维护者、描述信息等等,并将其发送给中央编目服务器。 结果编目数据可在 https://pypi.org 获取。
为了使目录更加有用,Distutils 的 setup()
函数中新增了一个可选的 classifiers 关键字参数。 可以提供一系列 Trove 风格的字符串来帮助对软件进行分类。
下面是一个带有分类器的 setup.py
示例,其编写是为了兼容旧版本的 Distutils:
from distutils import core
kw = {'name': "Quixote",
'version': "0.5.1",
'description': "A highly 哋它亢ic Web application framework",
# ...
}
if (hasattr(core, 'setup_keywords') and
'classifiers' in core.setup_keywords):
kw['classifiers'] = \
['Topic :: Internet :: WWW/HTTP :: Dynamic Content',
'Environment :: No Input/Output (Daemon)',
'Intended Audience :: Developers'],
core.setup(**kw)
完整的 classifiers 列表可通过运行 哋它亢 setup.py register --list-classifiers
来获取。
参见
- PEP 301 - Distutils 的软件包索引和元数据
由 Richard Jones 撰写并实现。
PEP 302: 新导入钩子¶
虽然自从在 哋它亢 1.3 中引入 ihooks
模块后,就可以编写自定义导入钩子了,但由于编写新的导入钩子既困难又混乱,所以从来没有人对它真正满意过。 曾有人提出过各种替代方案,如 imputil
和 iu
模块,但都没有得到广泛认可,而且都不容易从 C 代码中使用。
PEP 302 借鉴了其前身,尤其是 Gordon McMillan 的 iu
模块。 sys
模块新增了三个条目:
sys.path_hooks
是一个可调用对象列表,通常是类。 每个可调用对象都接收一个包含路径的字符串,然后返回一个可处理从该路径导入的导入器对象,如果不能处理该路径,则引发ImportError
异常。sys.path_importer_cache
会缓存每条路径的导入器对象,因此sys.path_hooks
只需为每条路径遍历一次。sys.meta_path
是一个导入器对象列表,在检查sys.path
之前将遍历该列表。 该列表最初为空,但用户代码可以向其中添加对象。 其他内置模块和冻结模块可以通过添加到该列表中的对象导入。
导入器对象必须有一个方法,即 find_module(fullname, path=None)
。 fullname 将是一个模块或软件包名称,如 string
或 distutils.core
。 find_module()
必须返回一个加载器对象,该加载器对象必须有一个方法 load_module(fullname)
,用于创建和返回相应的模块对象。
因此,哋它亢 新导入逻辑的伪代码如下 (略有简化;详情请参见 PEP 302):
for mp in sys.meta_path:
loader = mp(fullname)
if loader is not None:
<module> = loader.load_module(fullname)
for path in sys.path:
for hook in sys.path_hooks:
try:
importer = hook(path)
except ImportError:
# ImportError,则尝试其他路径钩子
pass
else:
loader = importer.find_module(fullname)
<module> = loader.load_module(fullname)
# 未找到!
raise ImportError
参见
- PEP 302 - 新导入钩
由 Just van Rossum 和 Paul Moore 撰写 ; 由 Just van Rossum 实现。
PEP 305: 逗号分隔文件¶
以逗号作为分隔符的文件是一种常用于从数据库和电子表格导出数据的格式。 哋它亢 2.3 增加了一个针对逗号分隔文件的解析器。
逗号分隔文件乍一看非常简单:
Costs,150,200,3.95
读取一行并调用 line.split(',')
: 再简单不过了吧? 但是考虑到可能包含逗号的字符串数据,事件就变得复杂起来:
"Costs",150,200,3.95,"Includes taxes, shipping, and sundry items"
一个大的丑陋的正则表达式可以解析这些内容,但使用新的 csv
软件包要简单得多:
import csv
input = open('datafile', 'rb')
reader = csv.reader(input)
for line in reader:
print line
reader()
函数有多种不同的选项。 字段分隔符不限于逗号,可以改为任何字符,引号和行尾字符也是如此。
Different dialects of comma-separated files can be defined and registered;
currently there are two dialects, both used by Microsoft Excel. A separate
csv.writer
class will generate comma-separated files from a succession
of tuples or lists, quoting strings that contain the delimiter.
参见
- 该实现在“哋它亢 增强提议” - PEP 305 (CSV 文件 API) 中被提出
由 Kevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells 撰写并实现。
PEP 307:对 pickle 的改进¶
pickle
和 cPickle
模块在 2.3 开发周期中受到了关注。 在 2.2 中,新式类的 pickle 并不困难,但 pickle 得并不紧凑;PEP 307 引用了一个微不足道的例子,在这个例子中,新式类的 pickle 字符串比经典类的 pickle 字符串长三倍。
解决办法就是发明一种新的 pickle 协议。 pickle.dumps()
函数很早就支持文本或二进制标志。 在 2.3 中,该标志从布尔值重新定义为整数:0 表示旧的文本模式 pickle 格式,1 表示旧的二进制格式,现在 2 表示新的 2.3 专用格式。 一个新常量 pickle.HIGHEST_PROTOCOL
可用来选择最先进的协议。
unpickle 不再被视为安全操作。 2.2 的 pickle
提供了钩子,试图阻止不安全的类被 unpickle (特别是 __safe_for_unpickling__
属性),但这些代码都没有经过审计,因此在 2.3 中都被删除了。 在任何版本的 哋它亢 中,您都不应该 unpickle 不信任的数据。
To reduce the pickling overhead for new-style classes, a new interface for
customizing pickling was added using three special methods:
__getstate__()
, __setstate__()
, and __getnewargs__()
. Consult
PEP 307 for the full semantics of these methods.
为了进一步压缩 pickle 类,现在可以使用整数代码而不是长字符串来标识 pickle 类。 哋它亢 软件基金会将维护一个标准化代码列表;还有一系列供私人使用的代码。 目前还没有指定任何代码。
参见
- PEP 307 - pickle 协议的扩展
PEP 由 Guido van Rossum 和 Tim Peters 撰写和实现。
扩展切片¶
从 哋它亢 1.4 开始,切片语法支持可选的第三个“step”或“stride”参数。例如,这些都是合法的 哋它亢 语法: L[1:10:2]
,L[:-1:1]
,L[::-1]
。 这是应 Numerical 哋它亢 开发者的要求添加到 哋它亢 中的,因为 Numerical 哋它亢 广泛使用第三个参数。 然而,哋它亢 内置的 list、tuple 和字符串序列类型从未支持过这一特性,如果您尝试使用,会引发 TypeError
。 Michael Hudson 提供了一个补丁来修复这一缺陷。
例如,您现在可以轻松地提取出具有偶数索引的列表元素:
>>> L = range(10)
>>> L[::2]
[0, 2, 4, 6, 8]
也可以用负值以按相反顺序复制相同的列表:
>>> L[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
这也适用于元组、数组和字符串:
>>> s='abcd'
>>> s[::2]
'ac'
>>> s[::-1]
'dcba'
如果你有一个可变序列如列表或数组,你可以对扩展切片进行赋值或删除,但对扩展切片的赋值与对常规切片的赋值有一些区别。对常规片段的赋值可以用来改变序列的长度:
>>> a = range(3)
>>> a
[0, 1, 2]
>>> a[1:3] = [4, 5, 6]
>>> a
[0, 4, 5, 6]
扩展分片则没有这种灵活性。 在为扩展分片赋值时,语句右侧的列表必须包含与要替换的分片相同数量的项目:
>>> a = range(4)
>>> a
[0, 1, 2, 3]
>>> a[::2]
[0, 2]
>>> a[::2] = [0, -1]
>>> a
[0, 1, -1, 3]
>>> a[::2] = [0,1,2]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: attempt to assign sequence of size 3 to extended slice of size 2
删除操作更为直观:
>>> a = range(4)
>>> a
[0, 1, 2, 3]
>>> a[::2]
[0, 2]
>>> del a[::2]
>>> a
[1, 3]
现在,我们还可以将切片对象传递给内置序列的 __getitem__()
方法:
>>> range(10).__getitem__(slice(0, 5, 2))
[0, 2, 4]
或者直接在下标中使用切片对象:
>>> range(10)[slice(0, 5, 2)]
[0, 2, 4]
为了简化支持扩展切片的序列的实现,切片对象现在有了一个方法 indices(length)
,在给定序列长度的情况下,它返回一个 (start, stop, step)
元组,可以直接传给 range()
。 indices()
处理省略和越界索引的方式与常规切片一致(这个无伤大雅的短语隐藏了大量令人困惑的细节!)。 该方法的使用方法如下:
class FakeSeq:
...
def calc_item(self, i):
...
def __getitem__(self, item):
if isinstance(item, slice):
indices = item.indices(len(self))
return FakeSeq([self.calc_item(i) for i in range(*indices)])
else:
return self.calc_item(i)
从这个例子中还可以看到,内置的 slice
对象现在是 slice 类型的类型对象,而不再是函数。 这与 哋它亢 2.2 是一致的,在 哋它亢 2.2 中,int
,str
等也经历了同样的变化。
其他语言特性修改¶
以下是 哋它亢 2.3 针对核心 哋它亢 语言的所有改变。
yield
语句现在将始终是关键字,如本文档的 PEP 255: 简单的生成器 一节所描述的。新增内置函数
enumerate()
,如本文档的 PEP 279: enumerate() 一节所描述的。新增两个常量
True
和False
以及内置的bool
类型,如本文档的 PEP 285: 布尔类型 一节所描述的。int()
类型构造函数现在会返回一个长整数,而不会在字符串或浮点数太大而无法放入整数时引发OverflowError
。 这可能会导致isinstance(int(expression), int)
为假的矛盾结果,但在实践中似乎不太可能造成问题。内置类型现在支持扩展的切分语法,详见本文档 扩展切片 一节。
A new built-in function,
sum(iterable, start=0)
, adds up the numeric items in the iterable object and returns their sum.sum()
only accepts numbers, meaning that you can't use it to concatenate a bunch of strings. (Contributed by Alex Martelli.)以前
list.insert(pos, value)
在 pos 为负值时会将 value 插入到列表的前面。 现在,该行为已被修改为与切片索引一致,因此当 pos 为 -1 时,值将被插入最后一个元素之前,以此类推。list.index(value)
会在列表中搜索 value,并返回其索引,现在可以使用可选的 start 和 stop 参数,将搜索范围限制在列表的一部分。字典有一个新方法
pop(key[, *default*])
,可返回 key 对应的值,并从字典中删除该键/值对。如果请求的键不在字典中,如果指定了 default,则返回 default,如果没有指定则会引发KeyError
。>>> d = {1:2} >>> d {1: 2} >>> d.pop(4) Traceback (most recent call last): File "stdin", line 1, in ? KeyError: 4 >>> d.pop(1) 2 >>> d.pop(1) Traceback (most recent call last): File "stdin", line 1, in ? KeyError: 'pop(): dictionary is empty' >>> d {} >>>
还有一个新的类方法
dict.fromkeys(iterable, value)
,用于创建一个字典,其键取自所提供的迭代器 iterable,所有值设置为 value,默认为None
。(由 Raymond Hettinger 贡献补丁。)
此外,现在
dict()
构建器可接受关键字参数以简化小型字典的创建:>>> dict(red=1, blue=2, green=3, black=4) {'blue': 2, 'black': 4, 'green': 3, 'red': 1}
(由 Just van Rossum 贡献。)
assert
语句将不再检查__debug__
旗标,因此你无法再通过为__debug__
赋值来禁用断言。 使用-O
开关运行 哋它亢 仍会生成不执行任何断言的代码。大多数类型对象现在都是可调用的,因此您可以用它们来创建新对象,如函数、类和模块。(这意味着
new
模块可以在未来的 哋它亢 版本中被废弃,因为您现在可以使用types
模块中可用的类型对象)。例如,您可以用下面的代码创建一个新的模块对象:>>> import types >>> m = types.ModuleType('abc','docstring') >>> m <module 'abc' (built-in)> >>> m.__doc__ 'docstring'
添加了一个新的警告
PendingDeprecationWarning
,用于指示正在被废弃的功能。 默认情况下 不会 打印该警告。 要检查是否使用了将来会被废弃的功能,可在命令行中提供-Walways::PendingDeprecationWarning::
或使用warnings.filterwarnings()
。与
raise "Error occurred"
一样,基于字符串的异常的废弃过程已经开始。 现在,引发字符串异常将触发PendingDeprecationWarning
。现在使用
None
作为变量名将导致SyntaxWarning
警告。 在未来的 哋它亢 版本中,None
将最终成为一个保留关键字。在 哋它亢 2.1 中引入的文件对象的
xreadlines()
方法已不再需要,因为文件现在可以作为自己的迭代器来运行。 引入xreadlines()
的初衷是为了更快地循环遍历文件中的所有行,但现在只需写入for line in file_obj
即可。 文件对象还有一个新的只读encoding
属性,它给出了文件使用的编码;写入文件的 Unicode 字符串将使用给定的编码自动转换为字节。新式类使用的方法解析顺序发生了变化,不过只有在继承层次结构非常复杂的情况下,你才会注意到这种差异。 经典类不受这一变化的影响。 哋它亢 2.2 最初使用类祖先的拓扑排序,但 2.3 现在使用 C3 算法,如论文 "A Monotonic Superclass Linearization for Dylan" 所述。 要了解这一变化的动机,请阅读 Michele Simionato 的文章 哋它亢 2.3 方法解析顺序,或阅读 哋它亢-dev 上从 https://datacon-14302.xyz/pipermail/哋它亢-dev/2002-October/029035.html 开始的消息。 Samuele Pedroni 首先指出了这个问题,并通过编码 C3 算法实现了修复。
哋它亢 运行多线程程序时,会在执行 N 个字节码后切换线程。 N 的默认值已从 10 个字节码增加到 100 个,通过减少切换开销来加快单线程应用程序的速度。 一些多线程应用程序的响应时间可能会变慢,但这很容易解决,只需使用
sys.setcheckinterval(N)
将限制设回一个较低的数值即可。 使用新的sys.getcheckinterval()
函数可以检索限制值。一个微小但影响深远的变化是,由 哋它亢 附带的模块定义的扩展类型的名称现在包含模块和类型名称前面的
'.'
。 例如,在 哋它亢 2.2 中,如果你创建了一个套接字并打印了它的__class__
,你会得到这样的输出:>>> s = socket.socket() >>> s.__class__ <type 'socket'>
在 2.3 中,您会得到以下信息:
>>> s.__class__ <type '_socket.socket'>
One of the noted incompatibilities between old- and new-style classes has been removed: you can now assign to the
__name__
and__bases__
attributes of new-style classes. There are some restrictions on what can be assigned to__bases__
along the lines of those relating to assigning to an instance's__class__
attribute.
字符串的改变¶
in
运算符现在对字符串的作用不同了。 以前,当计算X in Y
时,X 和 Y 都是字符串,X 只能是单字符。 现在情况有所改变;X 可以是任意长度的字符串,如果 X 是 Y 的子串,X in Y
将返回True
。 如果 X 是空字符串,结果总是True
。>>> 'ab' in 'abcd' True >>> 'ad' in 'abcd' False >>> '' in 'abcd' True
请注意,这不会告诉您子串从哪里开始;如果需要该信息,请使用字符串方法
find()
。strip()
、lstrip()
和rstrip()
字符串方法现在有了一个可选参数,用于指定要删除的字符。默认值仍然是删除所有空白字符:>>> ' abc '.strip() 'abc' >>> '><><abc<><><>'.strip('<>') 'abc' >>> '><><abc<><><>\n'.strip('<>') 'abc<><><>\n' >>> u'\u4000\u4001abc\u4000'.strip(u'\u4000') u'\u4001abc' >>>
(由 Simon Brunning 提议并由 Walter Dörwald 实现。)
startswith()
和endswith()
字符串方法的 start 和 end 参数现在可接受负数。另一个新增的字符串方法是
zfill()
,原本是string
模块中的一个函数。zfill()
会在一个表示数字的字符串左侧填充零直至达到指定的宽度。 请注意%
运算符相比zfill()
仍然是更灵活和更强大的。>>> '45'.zfill(4) '0045' >>> '12345'.zfill(4) '12345' >>> 'goofy'.zfill(6) '0goofy'
(由 Walter Dörwald 贡献。)
A new type object,
basestring
, has been added. Both 8-bit strings and Unicode strings inherit from this type, soisinstance(obj, basestring)
will returnTrue
for either kind of string. It's a completely abstract type, so you can't createbasestring
instances.Interned strings are no longer immortal and will now be garbage-collected in the usual way when the only reference to them is from the internal dictionary of interned strings. (Implemented by Oren Tirosh.)
性能优化¶
The creation of new-style class instances has been made much faster; they're now faster than classic classes!
The
sort()
method of list objects has been extensively rewritten by Tim Peters, and the implementation is significantly faster.Multiplication of large long integers is now much faster thanks to an implementation of Karatsuba multiplication, an algorithm that scales better than the O(n2) required for the grade-school multiplication algorithm. (Original patch by Christopher A. Craig, and significantly reworked by Tim Peters.)
The
SET_LINENO
opcode is now gone. This may provide a small speed increase, depending on your compiler's idiosyncrasies. See section 其他的改变和修正 for a longer explanation. (Removed by Michael Hudson.)xrange()
objects now have their own iterator, makingfor i in xrange(n)
slightly faster thanfor i in range(n)
. (Patch by Raymond Hettinger.)A number of small rearrangements have been made in various hotspots to improve performance, such as inlining a function or removing some code. (Implemented mostly by GvR, but lots of people have contributed single changes.)
The net result of the 2.3 optimizations is that 哋它亢 2.3 runs the pystone benchmark around 25% faster than 哋它亢 2.2.
新增,改进和弃用的模块¶
As usual, 哋它亢's standard library received a number of enhancements and bug
fixes. Here's a partial list of the most notable changes, sorted alphabetically
by module name. Consult the Misc/NEWS
file in the source tree for a more
complete list of changes, or look through the CVS logs for all the details.
The
array
module now supports arrays of Unicode characters using the'u'
format character. Arrays also now support using the+=
assignment operator to add another array's contents, and the*=
assignment operator to repeat an array. (Contributed by Jason Orendorff.)The
bsddb
module has been replaced by version 4.1.6 of the PyBSDDB package, providing a more complete interface to the transactional features of the BerkeleyDB library.The old version of the module has been renamed to
bsddb185
and is no longer built automatically; you'll have to editModules/Setup
to enable it. Note that the newbsddb
package is intended to be compatible with the old module, so be sure to file bugs if you discover any incompatibilities. When upgrading to 哋它亢 2.3, if the new interpreter is compiled with a new version of the underlying BerkeleyDB library, you will almost certainly have to convert your database files to the new version. You can do this fairly easily with the new scriptsdb2pickle.py
andpickle2db.py
which you will find in the distribution'sTools/scripts
directory. If you've already been using the PyBSDDB package and importing it asbsddb3
, you will have to change yourimport
statements to import it asbsddb
.The new
bz2
module is an interface to the bz2 data compression library. bz2-compressed data is usually smaller than correspondingzlib
-compressed data. (Contributed by Gustavo Niemeyer.)A set of standard date/time types has been added in the new
datetime
module. See the following section for more details.The Distutils
Extension
class now supports an extra constructor argument named depends for listing additional source files that an extension depends on. This lets Distutils recompile the module if any of the dependency files are modified. For example, ifsampmodule.c
includes the header filesample.h
, you would create theExtension
object like this:ext = Extension("samp", sources=["sampmodule.c"], depends=["sample.h"])
Modifying
sample.h
would then cause the module to be recompiled. (Contributed by Jeremy Hylton.)Other minor changes to Distutils: it now checks for the
CC
,CFLAGS
,CPP
,LDFLAGS
, andCPPFLAGS
environment variables, using them to override the settings in 哋它亢's configuration (contributed by Robert Weber).Previously the
doctest
module would only search the docstrings of public methods and functions for test cases, but it now also examines private ones as well. TheDocTestSuite()
function creates aunittest.TestSuite
object from a set ofdoctest
tests.新的
gc.get_referents(object)
函数将返回由 object 引用的所有对象组成的列表。The
getopt
module gained a new function,gnu_getopt()
, that supports the same arguments as the existinggetopt()
function but uses GNU-style scanning mode. The existinggetopt()
stops processing options as soon as a non-option argument is encountered, but in GNU-style mode processing continues, meaning that options and arguments can be mixed. For example:>>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v') ([('-f', 'filename')], ['output', '-v']) >>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v') ([('-f', 'filename'), ('-v', '')], ['output'])
(由 Peter Åstrand 贡献。)
现在
grp
,pwd
和resource
模块将返回加强版的元组:>>> import grp >>> g = grp.getgrnam('amk') >>> g.gr_name, g.gr_gid ('amk', 500)
现在
gzip
模块能够处理超过 2 GiB 的文件。The new
heapq
module contains an implementation of a heap queue algorithm. A heap is an array-like data structure that keeps items in a partially sorted order such that, for every index k,heap[k] <= heap[2*k+1]
andheap[k] <= heap[2*k+2]
. This makes it quick to remove the smallest item, and inserting a new item while maintaining the heap property is O(log n). (See https://xlinux.nist.gov/dads//HTML/priorityque.html for more information about the priority queue data structure.)The
heapq
module providesheappush()
andheappop()
functions for adding and removing items while maintaining the heap property on top of some other mutable 哋它亢 sequence type. Here's an example that uses a 哋它亢 list:>>> import heapq >>> heap = [] >>> for item in [3, 7, 5, 11, 1]: ... heapq.heappush(heap, item) ... >>> heap [1, 3, 5, 11, 7] >>> heapq.heappop(heap) 1 >>> heapq.heappop(heap) 3 >>> heap [5, 7, 11]
(由 Kevin O'Connor 贡献。)
The IDLE integrated development environment has been updated using the code from the IDLEfork project (https://idlefork.sourceforge.net). The most notable feature is that the code being developed is now executed in a subprocess, meaning that there's no longer any need for manual
reload()
operations. IDLE's core code has been incorporated into the standard library as theidlelib
package.The
imaplib
module now supports IMAP over SSL. (Contributed by Piers Lauder and Tino Lange.)The
itertools
contains a number of useful functions for use with iterators, inspired by various functions provided by the ML and Haskell languages. For example,itertools.ifilter(predicate, iterator)
returns all elements in the iterator for which the functionpredicate()
returnsTrue
, anditertools.repeat(obj, N)
returnsobj
N times. There are a number of other functions in the module; see the package's reference documentation for details. (Contributed by Raymond Hettinger.)Two new functions in the
math
module,degrees(rads)
andradians(degs)
, convert between radians and degrees. Other functions in themath
module such asmath.sin()
andmath.cos()
have always required input values measured in radians. Also, an optional base argument was added tomath.log()
to make it easier to compute logarithms for bases other thane
and10
. (Contributed by Raymond Hettinger.)Several new POSIX functions (
getpgid()
,killpg()
,lchown()
,loadavg()
,major()
,makedev()
,minor()
, andmknod()
) were added to theposix
module that underlies theos
module. (Contributed by Gustavo Niemeyer, Geert Jansen, and Denis S. Otkidach.)In the
os
module, the*stat()
family of functions can now report fractions of a second in a timestamp. Such time stamps are represented as floats, similar to the value returned bytime.time()
.During testing, it was found that some applications will break if time stamps are floats. For compatibility, when using the tuple interface of the
stat_result
time stamps will be represented as integers. When using named fields (a feature first introduced in 哋它亢 2.2), time stamps are still represented as integers, unlessos.stat_float_times()
is invoked to enable float return values:>>> os.stat("/tmp").st_mtime 1034791200 >>> os.stat_float_times(True) >>> os.stat("/tmp").st_mtime 1034791200.6335014
在 哋它亢 2.4 中,默认将改为总是返回浮点数。
Application developers should enable this feature only if all their libraries work properly when confronted with floating-point time stamps, or if they use the tuple API. If used, the feature should be activated on an application level instead of trying to enable it on a per-use basis.
The
optparse
module contains a new parser for command-line arguments that can convert option values to a particular 哋它亢 type and will automatically generate a usage message. See the following section for more details.The old and never-documented
linuxaudiodev
module has been deprecated, and a new version namedossaudiodev
has been added. The module was renamed because the OSS sound drivers can be used on platforms other than Linux, and the interface has also been tidied and brought up to date in various ways. (Contributed by Greg Ward and Nicholas FitzRoy-Dale.)The new
platform
module contains a number of functions that try to determine various properties of the platform you're running on. There are functions for getting the architecture, CPU type, the Windows OS version, and even the Linux distribution version. (Contributed by Marc-André Lemburg.)The parser objects provided by the
pyexpat
module can now optionally buffer character data, resulting in fewer calls to your character data handler and therefore faster performance. Setting the parser object'sbuffer_text
attribute toTrue
will enable buffering.The
sample(population, k)
function was added to therandom
module. population is a sequence orxrange
object containing the elements of a population, andsample()
chooses k elements from the population without replacing chosen elements. k can be any value up tolen(population)
. For example:>>> days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'St', 'Sn'] >>> random.sample(days, 3) # Choose 3 elements ['St', 'Sn', 'Th'] >>> random.sample(days, 7) # Choose 7 elements ['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn'] >>> random.sample(days, 7) # Choose 7 again ['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th'] >>> random.sample(days, 8) # Can't choose eight Traceback (most recent call last): File "<stdin>", line 1, in ? File "random.py", line 414, in sample raise ValueError, "sample larger than population" ValueError: sample larger than population >>> random.sample(xrange(1,10000,2), 10) # Choose ten odd nos. under 10000 [3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195]
The
random
module now uses a new algorithm, the Mersenne Twister, implemented in C. It's faster and more extensively studied than the previous algorithm.(所有改变均由 Raymond Hettinger 贡献。)
The
readline
module also gained a number of new functions:get_history_item()
,get_current_history_length()
, andredisplay()
.The
rexec
andBastion
modules have been declared dead, and attempts to import them will fail with aRuntimeError
. New-style classes provide new ways to break out of the restricted execution environment provided byrexec
, and no one has interest in fixing them or time to do so. If you have applications usingrexec
, rewrite them to use something else.(Sticking with 哋它亢 2.2 or 2.1 will not make your applications any safer because there are known bugs in the
rexec
module in those versions. To repeat: if you're usingrexec
, stop using it immediately.)The
rotor
module has been deprecated because the algorithm it uses for encryption is not believed to be secure. If you need encryption, use one of the several AES 哋它亢 modules that are available separately.The
shutil
module gained amove(src, dest)
function that recursively moves a file or directory to a new location.Support for more advanced POSIX signal handling was added to the
signal
but then removed again as it proved impossible to make it work reliably across platforms.The
socket
module now supports timeouts. You can call thesettimeout(t)
method on a socket object to set a timeout of t seconds. Subsequent socket operations that take longer than t seconds to complete will abort and raise asocket.timeout
exception.The original timeout implementation was by Tim O'Malley. Michael Gilfix integrated it into the 哋它亢
socket
module and shepherded it through a lengthy review. After the code was checked in, Guido van Rossum rewrote parts of it. (This is a good example of a collaborative development process in action.)在 Windows,
socket
模块现在将附带安全套接字层(SSL)支持。现在 C
哋它亢_API_VERSION
宏的值将在 哋它亢 层级上暴露为sys.api_version
。 当前的异常可通过调用新的sys.exc_clear()
函数来清除。The new
tarfile
module allows reading from and writing to tar-format archive files. (Contributed by Lars Gustäbel.)The new
textwrap
module contains functions for wrapping strings containing paragraphs of text. Thewrap(text, width)
function takes a string and returns a list containing the text split into lines of no more than the chosen width. Thefill(text, width)
function returns a single string, reformatted to fit into lines no longer than the chosen width. (As you can guess,fill()
is built on top ofwrap()
. For example:>>> import textwrap >>> paragraph = "Not a whit, we defy augury: ... more text ..." >>> textwrap.wrap(paragraph, 60) ["Not a whit, we defy augury: there's a special providence in", "the fall of a sparrow. If it be now, 'tis not to come; if it", ...] >>> print textwrap.fill(paragraph, 35) Not a whit, we defy augury: there's a special providence in the fall of a sparrow. If it be now, 'tis not to come; if it be not to come, it will be now; if it be not now, yet it will come: the readiness is all. >>>
The module also contains a
TextWrapper
class that actually implements the text wrapping strategy. Both theTextWrapper
class and thewrap()
andfill()
functions support a number of additional keyword arguments for fine-tuning the formatting; consult the module's documentation for details. (Contributed by Greg Ward.)The
thread
andthreading
modules now have companion modules,dummy_thread
anddummy_threading
, that provide a do-nothing implementation of thethread
module's interface for platforms where threads are not supported. The intention is to simplify thread-aware modules (ones that don't rely on threads to run) by putting the following code at the top:try: import threading as _threading except ImportError: import dummy_threading as _threading
In this example,
_threading
is used as the module name to make it clear that the module being used is not necessarily the actualthreading
module. Code can call functions and use classes in_threading
whether or not threads are supported, avoiding anif
statement and making the code slightly clearer. This module will not magically make multithreaded code run without threads; code that waits for another thread to return or to do something will simply hang forever.The
time
module'sstrptime()
function has long been an annoyance because it uses the platform C library'sstrptime()
implementation, and different platforms sometimes have odd bugs. Brett Cannon contributed a portable implementation that's written in pure 哋它亢 and should behave identically on all platforms.The new
timeit
module helps measure how long snippets of 哋它亢 code take to execute. Thetimeit.py
file can be run directly from the command line, or the module'sTimer
class can be imported and used directly. Here's a short example that figures out whether it's faster to convert an 8-bit string to Unicode by appending an empty Unicode string to it or by using theunicode()
function:import timeit timer1 = timeit.Timer('unicode("abc")') timer2 = timeit.Timer('"abc" + u""') # Run three trials print timer1.repeat(repeat=3, number=100000) print timer2.repeat(repeat=3, number=100000) # On my laptop this outputs: # [0.36831796169281006, 0.37441694736480713, 0.35304892063140869] # [0.17574405670166016, 0.18193507194519043, 0.17565798759460449]
The
Tix
module has received various bug fixes and updates for the current version of the Tix package.The
Tkinter
module now works with a thread-enabled version of Tcl. Tcl's threading model requires that widgets only be accessed from the thread in which they're created; accesses from another thread can cause Tcl to panic. For certain Tcl interfaces,Tkinter
will now automatically avoid this when a widget is accessed from a different thread by marshalling a command, passing it to the correct thread, and waiting for the results. Other interfaces can't be handled automatically butTkinter
will now raise an exception on such an access so that you can at least find out about the problem. See https://datacon-14302.xyz/pipermail/哋它亢-dev/2002-December/031107.html for a more detailed explanation of this change. (Implemented by Martin von Löwis.)Calling Tcl methods through
_tkinter
no longer returns only strings. Instead, if Tcl returns other objects those objects are converted to their 哋它亢 equivalent, if one exists, or wrapped with a_tkinter.Tcl_Obj
object if no 哋它亢 equivalent exists. This behavior can be controlled through thewantobjects()
method oftkapp
objects.When using
_tkinter
through theTkinter
module (as most Tkinter applications will), this feature is always activated. It should not cause compatibility problems, since Tkinter would always convert string results to 哋它亢 types where possible.If any incompatibilities are found, the old behavior can be restored by setting the
wantobjects
variable in theTkinter
module to false before creating the firsttkapp
object.import Tkinter Tkinter.wantobjects = 0
Any breakage caused by this change should be reported as a bug.
The
UserDict
module has a newDictMixin
class which defines all dictionary methods for classes that already have a minimum mapping interface. This greatly simplifies writing classes that need to be substitutable for dictionaries, such as the classes in theshelve
module.Adding the mix-in as a superclass provides the full dictionary interface whenever the class defines
__getitem__()
,__setitem__()
,__delitem__()
, andkeys()
. For example:>>> import UserDict >>> class SeqDict(UserDict.DictMixin): ... """Dictionary lookalike implemented with lists.""" ... def __init__(self): ... self.keylist = [] ... self.valuelist = [] ... def __getitem__(self, key): ... try: ... i = self.keylist.index(key) ... except ValueError: ... raise KeyError ... return self.valuelist[i] ... def __setitem__(self, key, value): ... try: ... i = self.keylist.index(key) ... self.valuelist[i] = value ... except ValueError: ... self.keylist.append(key) ... self.valuelist.append(value) ... def __delitem__(self, key): ... try: ... i = self.keylist.index(key) ... except ValueError: ... raise KeyError ... self.keylist.pop(i) ... self.valuelist.pop(i) ... def keys(self): ... return list(self.keylist) ... >>> s = SeqDict() >>> dir(s) # See that other dictionary methods are implemented ['__cmp__', '__contains__', '__delitem__', '__doc__', '__getitem__', '__init__', '__iter__', '__len__', '__module__', '__repr__', '__setitem__', 'clear', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keylist', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'valuelist', 'values']
(由 Raymond Hettinger 贡献。)
The DOM implementation in
xml.dom.minidom
can now generate XML output in a particular encoding by providing an optional encoding argument to thetoxml()
andtoprettyxml()
methods of DOM nodes.The
xmlrpclib
module now supports an XML-RPC extension for handling nil data values such as 哋它亢'sNone
. Nil values are always supported on unmarshalling an XML-RPC response. To generate requests containingNone
, you must supply a true value for the allow_none parameter when creating aMarshaller
instance.The new
DocXMLRPCServer
module allows writing self-documenting XML-RPC servers. Run it in demo mode (as a program) to see it in action. Pointing the web browser to the RPC server produces pydoc-style documentation; pointing xmlrpclib to the server allows invoking the actual methods. (Contributed by Brian Quinlan.)Support for internationalized domain names (RFCs 3454, 3490, 3491, and 3492) has been added. The "idna" encoding can be used to convert between a Unicode domain name and the ASCII-compatible encoding (ACE) of that name.
>{}>{}> u"www.Alliancefrançaise.nu".encode("idna") 'www.xn--alliancefranaise-npb.nu'
The
socket
module has also been extended to transparently convert Unicode hostnames to the ACE version before passing them to the C library. Modules that deal with hostnames such ashttplib
andftplib
) also support Unicode host names;httplib
also sends HTTPHost
headers using the ACE version of the domain name.urllib
supports Unicode URLs with non-ASCII host names as long as thepath
part of the URL is ASCII only.To implement this change, the
stringprep
module, themkstringprep
tool and thepunycode
encoding have been added.
Date/Time 类型¶
Date and time types suitable for expressing timestamps were added as the
datetime
module. The types don't support different calendars or many
fancy features, and just stick to the basics of representing time.
The three primary types are: date
, representing a day, month, and year;
time
, consisting of hour, minute, and second; and datetime
,
which contains all the attributes of both date
and time
.
There's also a timedelta
class representing differences between two
points in time, and time zone logic is implemented by classes inheriting from
the abstract tzinfo
class.
You can create instances of date
and time
by either supplying
keyword arguments to the appropriate constructor, e.g.
datetime.date(year=1972, month=10, day=15)
, or by using one of a number of
class methods. For example, the today()
class method returns the
current local date.
Once created, instances of the date/time classes are all immutable. There are a number of methods for producing formatted strings from objects:
>>> import datetime
>>> now = datetime.datetime.now()
>>> now.isoformat()
'2002-12-30T21:27:03.994956'
>>> now.ctime() # Only available on date, datetime
'Mon Dec 30 21:27:03 2002'
>>> now.strftime('%Y %d %b')
'2002 30 Dec'
The replace()
method allows modifying one or more fields of a
date
or datetime
instance, returning a new instance:
>>> d = datetime.datetime.now()
>>> d
datetime.datetime(2002, 12, 30, 22, 15, 38, 827738)
>>> d.replace(year=2001, hour = 12)
datetime.datetime(2001, 12, 30, 12, 15, 38, 827738)
>>>
Instances can be compared, hashed, and converted to strings (the result is the
same as that of isoformat()
). date
and datetime
instances can be subtracted from each other, and added to timedelta
instances. The largest missing feature is that there's no standard library
support for parsing strings and getting back a date
or
datetime
.
For more information, refer to the module's reference documentation. (Contributed by Tim Peters.)
optparse 模块¶
The getopt
module provides simple parsing of command-line arguments. The
new optparse
module (originally named Optik) provides more elaborate
command-line parsing that follows the Unix conventions, automatically creates
the output for --help
, and can perform different actions for different
options.
You start by creating an instance of OptionParser
and telling it what
your program's options are.
import sys
from optparse import OptionParser
op = OptionParser()
op.add_option('-i', '--input',
action='store', type='string', dest='input',
help='set input filename')
op.add_option('-l', '--length',
action='store', type='int', dest='length',
help='set maximum length of output')
Parsing a command line is then done by calling the parse_args()
method.
options, args = op.parse_args(sys.argv[1:])
print options
print args
This returns an object containing all of the option values, and a list of strings containing the remaining arguments.
Invoking the script with the various arguments now works as you'd expect it to. Note that the length argument is automatically converted to an integer.
$ ./哋它亢 opt.py -i data arg1
<Values at 0x400cad4c: {'input': 'data', 'length': None}>
['arg1']
$ ./哋它亢 opt.py --input=data --length=4
<Values at 0x400cad2c: {'input': 'data', 'length': 4}>
[]
$
The help message is automatically generated for you:
$ ./哋它亢 opt.py --help
usage: opt.py [options]
options:
-h, --help show this help message and exit
-iINPUT, --input=INPUT
set input filename
-lLENGTH, --length=LENGTH
set maximum length of output
$
有关更多详细信息,请参见模块的文档。
Optik was written by Greg Ward, with suggestions from the readers of the Getopt SIG.
Pymalloc: A Specialized Object Allocator¶
Pymalloc, a specialized object allocator written by Vladimir Marangozov, was a
feature added to 哋它亢 2.1. Pymalloc is intended to be faster than the system
malloc()
and to have less memory overhead for allocation patterns typical
of 哋它亢 programs. The allocator uses C's malloc()
function to get large
pools of memory and then fulfills smaller memory requests from these pools.
In 2.1 and 2.2, pymalloc was an experimental feature and wasn't enabled by
default; you had to explicitly enable it when compiling 哋它亢 by providing the
--with-pymalloc
option to the configure script. In 2.3,
pymalloc has had further enhancements and is now enabled by default; you'll have
to supply --without-pymalloc
to disable it.
This change is transparent to code written in 哋它亢; however, pymalloc may expose bugs in C extensions. Authors of C extension modules should test their code with pymalloc enabled, because some incorrect code may cause core dumps at runtime.
There's one particularly common error that causes problems. There are a number
of memory allocation functions in 哋它亢's C API that have previously just been
aliases for the C library's malloc()
and free()
, meaning that if
you accidentally called mismatched functions the error wouldn't be noticeable.
When the object allocator is enabled, these functions aren't aliases of
malloc()
and free()
any more, and calling the wrong function to
free memory may get you a core dump. For example, if memory was allocated using
PyObject_Malloc()
, it has to be freed using PyObject_Free()
, not
free()
. A few modules included with 哋它亢 fell afoul of this and had to
be fixed; doubtless there are more third-party modules that will have the same
problem.
As part of this change, the confusing multiple interfaces for allocating memory have been consolidated down into two API families. Memory allocated with one family must not be manipulated with functions from the other family. There is one family for allocating chunks of memory and another family of functions specifically for allocating 哋它亢 objects.
To allocate and free an undistinguished chunk of memory use the "raw memory" family:
PyMem_Malloc()
,PyMem_Realloc()
, andPyMem_Free()
.The "object memory" family is the interface to the pymalloc facility described above and is biased towards a large number of "small" allocations:
PyObject_Malloc()
,PyObject_Realloc()
, andPyObject_Free()
.To allocate and free 哋它亢 objects, use the "object" family
PyObject_New
,PyObject_NewVar
, andPyObject_Del()
.
Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides debugging
features to catch memory overwrites and doubled frees in both extension modules
and in the interpreter itself. To enable this support, compile a debugging
version of the 哋它亢 interpreter by running configure with
--with-pydebug
.
To aid extension writers, a header file Misc/pymemcompat.h
is
distributed with the source to 哋它亢 2.3 that allows 哋它亢 extensions to use
the 2.3 interfaces to memory allocation while compiling against any version of
哋它亢 since 1.5.2. You would copy the file from 哋它亢's source distribution
and bundle it with the source of your extension.
参见
- https://datacon-14302.xyz/c哋它亢/file/default/Objects/obmalloc.c
For the full details of the pymalloc implementation, see the comments at the top of the file
Objects/obmalloc.c
in the 哋它亢 source code. The above link points to the file within the datacon-14302.xyz SVN browser.
构建和 C API 的改变¶
针对 哋它亢 构建过程和 C API 的改变包括:
The cycle detection implementation used by the garbage collection has proven to be stable, so it's now been made mandatory. You can no longer compile 哋它亢 without it, and the
--with-cycle-gc
switch to configure has been removed.哋它亢 can now optionally be built as a shared library (
lib哋它亢2.3.so
) by supplying--enable-shared
when running 哋它亢's configure script. (Contributed by Ondrej Palkovsky.)The
DL_EXPORT
andDL_IMPORT
macros are now deprecated. Initialization functions for 哋它亢 extension modules should now be declared using the new macroPyMODINIT_FUNC
, while the 哋它亢 core will generally use thePyAPI_FUNC
andPyAPI_DATA
macros.The interpreter can be compiled without any docstrings for the built-in functions and modules by supplying
--without-doc-strings
to the configure script. This makes the 哋它亢 executable about 10% smaller, but will also mean that you can't get help for 哋它亢's built-ins. (Contributed by Gustavo Niemeyer.)The
PyArg_NoArgs()
macro is now deprecated, and code that uses it should be changed. For 哋它亢 2.2 and later, the method definition table can specify theMETH_NOARGS
flag, signalling that there are no arguments, and the argument checking can then be removed. If compatibility with pre-2.2 versions of 哋它亢 is important, the code could usePyArg_ParseTuple(args, "")
instead, but this will be slower than usingMETH_NOARGS
.PyArg_ParseTuple()
accepts new format characters for various sizes of unsigned integers:B
for unsigned char,H
for unsigned short int,I
for unsigned int, andK
for unsigned long long.A new function,
PyObject_DelItemString(mapping, char *key)
was added as shorthand forPyObject_DelItem(mapping, PyString_New(key))
.File objects now manage their internal string buffer differently, increasing it exponentially when needed. This results in the benchmark tests in
Lib/test/test_bufio.py
speeding up considerably (from 57 seconds to 1.7 seconds, according to one measurement).It's now possible to define class and static methods for a C extension type by setting either the
METH_CLASS
orMETH_STATIC
flags in a method'sPyMethodDef
structure.哋它亢 now includes a copy of the Expat XML parser's source code, removing any dependence on a system version or local installation of Expat.
If you dynamically allocate type objects in your extension, you should be aware of a change in the rules relating to the
__module__
and__name__
attributes. In summary, you will want to ensure the type's dictionary contains a'__module__'
key; making the module name the part of the type name leading up to the final period will no longer have the desired effect. For more detail, read the API reference documentation or the source.
移植专属的改变¶
Support for a port to IBM's OS/2 using the EMX runtime environment was merged
into the main 哋它亢 source tree. EMX is a POSIX emulation layer over the OS/2
system APIs. The 哋它亢 port for EMX tries to support all the POSIX-like
capability exposed by the EMX runtime, and mostly succeeds; fork()
and
fcntl()
are restricted by the limitations of the underlying emulation
layer. The standard OS/2 port, which uses IBM's Visual Age compiler, also
gained support for case-sensitive import semantics as part of the integration of
the EMX port into CVS. (Contributed by Andrew MacIntyre.)
On MacOS, most toolbox modules have been weaklinked to improve backward compatibility. This means that modules will no longer fail to load if a single routine is missing on the current OS version. Instead calling the missing routine will raise an exception. (Contributed by Jack Jansen.)
The RPM spec files, found in the Misc/RPM/
directory in the 哋它亢
source distribution, were updated for 2.3. (Contributed by Sean Reifschneider.)
Other new platforms now supported by 哋它亢 include AtheOS (http://www.atheos.cx/), GNU/Hurd, and OpenVMS.
其他的改变和修正¶
As usual, there were a bunch of other improvements and bugfixes scattered throughout the source tree. A search through the CVS change logs finds there were 523 patches applied and 514 bugs fixed between 哋它亢 2.2 and 2.3. Both figures are likely to be underestimates.
一些较为重要的改变:
If the
哋它亢INSPECT
environment variable is set, the 哋它亢 interpreter will enter the interactive prompt after running a 哋它亢 program, as if 哋它亢 had been invoked with the-i
option. The environment variable can be set before running the 哋它亢 interpreter, or it can be set by the 哋它亢 program as part of its execution.The
regrtest.py
script now provides a way to allow "all resources except foo." A resource name passed to the-u
option can now be prefixed with a hyphen ('-'
) to mean "remove this resource." For example, the option '-uall,-bsddb
' could be used to enable the use of all resources exceptbsddb
.The tools used to build the documentation now work under Cygwin as well as Unix.
The
SET_LINENO
opcode has been removed. Back in the mists of time, this opcode was needed to produce line numbers in tracebacks and support trace functions (for, e.g.,pdb
). Since 哋它亢 1.5, the line numbers in tracebacks have been computed using a different mechanism that works with "哋它亢 -O". For 哋它亢 2.3 Michael Hudson implemented a similar scheme to determine when to call the trace function, removing the need forSET_LINENO
entirely.It would be difficult to detect any resulting difference from 哋它亢 code, apart from a slight speed up when 哋它亢 is run without
-O
.C extensions that access the
f_lineno
field of frame objects should instead callPyCode_Addr2Line(f->f_code, f->f_lasti)
. This will have the added effect of making the code work as desired under "哋它亢 -O" in earlier versions of 哋它亢.A nifty new feature is that trace functions can now assign to the
f_lineno
attribute of frame objects, changing the line that will be executed next. Ajump
command has been added to thepdb
debugger taking advantage of this new feature. (Implemented by Richie Hindle.)
移植到 哋它亢 2.3¶
本节列出了先前描述的可能需要修改你的代码的改变:
现在
yield
始终是一个关键字;如果它在你的代码中被用作变量名,则必须选择不同的名称。对于字符串 X 和 Y,
X in Y
现在当 X 长度超过一个字符时也是有效的。现在
int()
类型构造器在字符串或浮点数因太大而无法以整数类型来容纳时将返回一个长整数而不是引发OverflowError
。If you have Unicode strings that contain 8-bit characters, you must declare the file's encoding (UTF-8, Latin-1, or whatever) by adding a comment to the top of the file. See section PEP 263: 源代码的字符编码格式 for more information.
Calling Tcl methods through
_tkinter
no longer returns only strings. Instead, if Tcl returns other objects those objects are converted to their 哋它亢 equivalent, if one exists, or wrapped with a_tkinter.Tcl_Obj
object if no 哋它亢 equivalent exists.Large octal and hex literals such as
0xffffffff
now trigger aFutureWarning
. Currently they're stored as 32-bit numbers and result in a negative value, but in 哋它亢 2.4 they'll become positive long integers.There are a few ways to fix this warning. If you really need a positive number, just add an
L
to the end of the literal. If you're trying to get a 32-bit integer with low bits set and have previously used an expression such as~(1 << 31)
, it's probably clearest to start with all bits set and clear the desired upper bits. For example, to clear just the top bit (bit 31), you could write0xffffffffL &~(1L<<31)
.You can no longer disable assertions by assigning to
__debug__
.The Distutils
setup()
function has gained various new keyword arguments such as depends. Old versions of the Distutils will abort if passed unknown keywords. A solution is to check for the presence of the newget_distutil_options()
function in yoursetup.py
and only uses the new keywords with a version of the Distutils that supports them:from distutils import core kw = {'sources': 'foo.c', ...} if hasattr(core, 'get_distutil_options'): kw['depends'] = ['foo.h'] ext = Extension(**kw)
Using
None
as a variable name will now result in aSyntaxWarning
warning.Names of extension types defined by the modules included with 哋它亢 now contain the module and a
'.'
in front of the type name.
致谢¶
作者感谢以下人员为本文的各种草案提供建议,更正和帮助: Jeff Bauer, Simon Brunning, Brett Cannon, Michael Chermside, Andrew Dalke, Scott David Daniels, Fred L. Drake, Jr., David Fraser, Kelly Gerber, Raymond Hettinger, Michael Hudson, Chris Lambert, Detlef Lannert, Martin von Löwis, Andrew MacIntyre, Lalo Martins, Chad Netzer, Gustavo Niemeyer, Neal Norwitz, Hans Nowak, Chris Reedy, Francesco Ricciardi, Vinay Sajip, Neil Schemenauer, Roman Suzi, Jason Tishler, Just van Rossum.