在Python中连接字符串的首选方式是什么?

由于Python的字符串不能更改，我想知道如何更有效地连接字符串?

我可以这样写:

s += stringfromelsewhere

或者像这样:

s = []

s.append(somestring)
    
# later
    
s = ''.join(s)

在写这个问题的时候，我发现了一篇关于这个话题的好文章。

http://www.skymind.com/~ocrow/python_string/

但它在Python 2.x中。，所以问题是Python 3中有什么变化吗?

当前回答

如果要连接很多值，则两者都不使用。附加列表的开销很大。你可以使用StringIO。特别是当你通过大量的操作建立它的时候。

from cStringIO import StringIO
# python3:  from io import StringIO

buf = StringIO()

buf.write('foo')
buf.write('foo')
buf.write('foo')

buf.getvalue()
# 'foofoofoo'

如果您已经从其他操作返回了一个完整的列表，那么只需使用“.join(aList)”

来自python常见问题:将多个字符串连接在一起的最有效方法是什么?

str and bytes objects are immutable, therefore concatenating many strings together is inefficient as each concatenation creates a new object. In the general case, the total runtime cost is quadratic in the total string length. To accumulate many str objects, the recommended idiom is to place them into a list and call str.join() at the end: chunks = [] for s in my_strings: chunks.append(s) result = ''.join(chunks) (another reasonably efficient idiom is to use io.StringIO) To accumulate many bytes objects, the recommended idiom is to extend a bytearray object using in-place concatenation (the += operator): result = bytearray() for b in my_bytes_objects: result += b

编辑:我很愚蠢，把结果向后粘贴，使它看起来像添加到列表中比cStringIO更快。我还添加了对bytearray/str concat的测试，以及使用更大字符串的更大列表的第二轮测试。(python 2.7.3)

大型字符串列表的Ipython测试示例

try:
    from cStringIO import StringIO
except:
    from io import StringIO

source = ['foo']*1000

%%timeit buf = StringIO()
for i in source:
    buf.write(i)
final = buf.getvalue()
# 1000 loops, best of 3: 1.27 ms per loop

%%timeit out = []
for i in source:
    out.append(i)
final = ''.join(out)
# 1000 loops, best of 3: 9.89 ms per loop

%%timeit out = bytearray()
for i in source:
    out += i
# 10000 loops, best of 3: 98.5 µs per loop

%%timeit out = ""
for i in source:
    out += i
# 10000 loops, best of 3: 161 µs per loop

## Repeat the tests with a larger list, containing
## strings that are bigger than the small string caching 
## done by the Python
source = ['foo']*1000

# cStringIO
# 10 loops, best of 3: 19.2 ms per loop

# list append and join
# 100 loops, best of 3: 144 ms per loop

# bytearray() +=
# 100 loops, best of 3: 3.8 ms per loop

# str() +=
# 100 loops, best of 3: 5.11 ms per loop

2012-08-29 01:48:41

其他回答

虽然有些过时，但像Pythonista一样编程:Idiomatic Python在本节中推荐join()而不是+。就像PythonSpeedPerformanceTips在其关于字符串连接的部分中所做的那样，并附带以下免责声明:

这一节的准确性后来受到争议 Python的不同版本。在CPython 2.5中，字符串连接是公平的虽然这可能不适用于其他Python 实现。有关讨论，请参阅ConcatenationTestCode。

2012-08-29 01:57:35

如果要连接很多值，则两者都不使用。附加列表的开销很大。你可以使用StringIO。特别是当你通过大量的操作建立它的时候。

from cStringIO import StringIO
# python3:  from io import StringIO

buf = StringIO()

buf.write('foo')
buf.write('foo')
buf.write('foo')

buf.getvalue()
# 'foofoofoo'

如果您已经从其他操作返回了一个完整的列表，那么只需使用“.join(aList)”

来自python常见问题:将多个字符串连接在一起的最有效方法是什么?

大型字符串列表的Ipython测试示例

try:
    from cStringIO import StringIO
except:
    from io import StringIO

source = ['foo']*1000

%%timeit buf = StringIO()
for i in source:
    buf.write(i)
final = buf.getvalue()
# 1000 loops, best of 3: 1.27 ms per loop

%%timeit out = []
for i in source:
    out.append(i)
final = ''.join(out)
# 1000 loops, best of 3: 9.89 ms per loop

%%timeit out = bytearray()
for i in source:
    out += i
# 10000 loops, best of 3: 98.5 µs per loop

%%timeit out = ""
for i in source:
    out += i
# 10000 loops, best of 3: 161 µs per loop

## Repeat the tests with a larger list, containing
## strings that are bigger than the small string caching 
## done by the Python
source = ['foo']*1000

# cStringIO
# 10 loops, best of 3: 19.2 ms per loop

# list append and join
# 100 loops, best of 3: 144 ms per loop

# bytearray() +=
# 100 loops, best of 3: 3.8 ms per loop

# str() +=
# 100 loops, best of 3: 5.11 ms per loop

2012-08-29 01:48:41

写出这个函数

def str_join(*args):
    return ''.join(map(str, args))

这样你就可以随时随地打电话了

str_join('Pine')  # Returns : Pine
str_join('Pine', 'apple')  # Returns : Pineapple
str_join('Pine', 'apple', 3)  # Returns : Pineapple3

2017-07-15 08:20:06

在Python >= 3.6中，新的f-string是连接字符串的有效方法。

>>> name = 'some_name'
>>> number = 123
>>>
>>> f'Name is {name} and the number is {number}.'
'Name is some_name and the number is 123.'

2018-05-22 18:45:08

正如@jdi提到的，Python文档建议使用str.join或io。StringIO用于字符串连接。并且说开发人员应该期望在循环中使用+=的二次时间，尽管自Python 2.4以来已经进行了优化。正如这个答案所说:

如果Python检测到左边的参数没有其他引用，它会调用realloc，试图通过适当地调整字符串大小来避免复制。这不是您应该依赖的东西，因为这是一个实现细节，而且如果realloc最终需要频繁移动字符串，性能无论如何都会下降到O(n^2)。

我将展示一个真实世界的代码示例，该代码天真地依赖于+=这种优化，但它并不适用。下面的代码将短字符串的可迭代对象转换为更大的块，以便在批量API中使用。

def test_concat_chunk(seq, split_by):
    result = ['']
    for item in seq:
        if len(result[-1]) + len(item) > split_by: 
            result.append('')
        result[-1] += item
    return result

由于二次时间复杂度，这段代码可以运行几个小时。以下是建议数据结构的备选方案:

import io

def test_stringio_chunk(seq, split_by):
    def chunk():
        buf = io.StringIO()
        size = 0
        for item in seq:
            if size + len(item) <= split_by:
                size += buf.write(item)
            else:
                yield buf.getvalue()
                buf = io.StringIO()
                size = buf.write(item)
        if size:
            yield buf.getvalue()

    return list(chunk())

def test_join_chunk(seq, split_by):
    def chunk():
        buf = []
        size = 0
        for item in seq:
            if size + len(item) <= split_by:
                buf.append(item)
                size += len(item)
            else:
                yield ''.join(buf)                
                buf.clear()
                buf.append(item)
                size = len(item)
        if size:
            yield ''.join(buf)

    return list(chunk())

还有一个微观基准:

import timeit
import random
import string
import matplotlib.pyplot as plt

line = ''.join(random.choices(
    string.ascii_uppercase + string.digits, k=512)) + '\n'
x = []
y_concat = []
y_stringio = []
y_join = []
n = 5
for i in range(1, 11):
    x.append(i)
    seq = [line] * (20 * 2 ** 20 // len(line))
    chunk_size = i * 2 ** 20
    y_concat.append(
        timeit.timeit(lambda: test_concat_chunk(seq, chunk_size), number=n) / n)
    y_stringio.append(
        timeit.timeit(lambda: test_stringio_chunk(seq, chunk_size), number=n) / n)
    y_join.append(
        timeit.timeit(lambda: test_join_chunk(seq, chunk_size), number=n) / n)
plt.plot(x, y_concat)
plt.plot(x, y_stringio)
plt.plot(x, y_join)
plt.legend(['concat', 'stringio', 'join'], loc='upper left')
plt.show()

2018-09-28 18:37:07

在Python中连接字符串的首选方式是什么?

推荐文章

最新文章

标签