我正在努力理解Python中的线程。我看过文档和示例,但坦率地说,许多示例过于复杂,我很难理解它们。

如何清楚地显示为多线程划分的任务?


当前回答

这里有一个简单的示例:您需要尝试一些替代URL,并返回第一个URL的内容以进行响应。

import Queue
import threading
import urllib2

# Called by each thread
def get_url(q, url):
    q.put(urllib2.urlopen(url).read())

theurls = ["http://google.com", "http://yahoo.com"]

q = Queue.Queue()

for u in theurls:
    t = threading.Thread(target=get_url, args = (q,u))
    t.daemon = True
    t.start()

s = q.get()
print s

在这种情况下,线程被用作一种简单的优化:每个子线程都在等待URL解析和响应,以将其内容放入队列;每个线程都是一个守护进程(如果主线程结束,则不会保持进程运行——这比不结束更常见);主线程启动所有子线程,在队列中执行get以等待其中一个线程完成put,然后发出结果并终止(这将删除所有可能仍在运行的子线程,因为它们是守护进程线程)。

Python中线程的正确使用总是与I/O操作相关(因为CPython无论如何都不使用多个内核来运行CPU绑定的任务,线程的唯一原因是在等待一些I/O时不会阻塞进程)。顺便说一句,队列几乎总是将工作分配给线程和/或收集工作结果的最佳方式,而且它们本质上是线程安全的,因此它们使您不用担心锁、条件、事件、信号量和其他线程间协调/通信概念。

其他回答

这里是使用线程导入CSV的一个非常简单的示例。(图书馆的收录可能因不同的目的而有所不同。)

助手函数:

from threading import Thread
from project import app
import csv


def import_handler(csv_file_name):
    thr = Thread(target=dump_async_csv_data, args=[csv_file_name])
    thr.start()

def dump_async_csv_data(csv_file_name):
    with app.app_context():
        with open(csv_file_name) as File:
            reader = csv.DictReader(File)
            for row in reader:
                # DB operation/query

驾驶员功能:

import_handler(csv_file_name)

给定函数f,如下所示:

import threading
threading.Thread(target=f).start()

向f传递参数

threading.Thread(target=f, args=(a,b,c)).start()

这很容易理解。这里有两种简单的线程处理方法。

import time
from concurrent.futures import ThreadPoolExecutor, as_completed
import threading

def a(a=1, b=2):
    print(a)
    time.sleep(5)
    print(b)
    return a+b

def b(**kwargs):
    if "a" in kwargs:
        print("am b")
    else:
        print("nothing")
        
to_do=[]
executor = ThreadPoolExecutor(max_workers=4)
ex1=executor.submit(a)
to_do.append(ex1)
ex2=executor.submit(b, **{"a":1})
to_do.append(ex2)

for future in as_completed(to_do):
    print("Future {} and Future Return is {}\n".format(future, future.result()))

print("threading")

to_do=[]
to_do.append(threading.Thread(target=a))
to_do.append(threading.Thread(target=b, kwargs={"a":1}))

for threads in to_do:
    threads.start()
    
for threads in to_do:
    threads.join()

只需注意:线程不需要队列。

这是我可以想象的最简单的例子,它显示了10个并发运行的进程。

import threading
from random import randint
from time import sleep


def print_number(number):

    # Sleeps a random 1 to 10 seconds
    rand_int_var = randint(1, 10)
    sleep(rand_int_var)
    print "Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds"

thread_list = []

for i in range(1, 10):

    # Instantiates the thread
    # (i) does not make a sequence, so (i,)
    t = threading.Thread(target=print_number, args=(i,))
    # Sticks the thread in a list so that it remains accessible
    thread_list.append(t)

# Starts threads
for thread in thread_list:
    thread.start()

# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
    thread.join()

# Demonstrates that the main process waited for threads to complete
print "Done"

Alex Martelli的回答对我有所帮助。不过,这里有一个我认为更有用的修改版本(至少对我来说)。

更新:可在Python 2和Python 3中使用

try:
    # For Python 3
    import queue
    from urllib.request import urlopen
except:
    # For Python 2 
    import Queue as queue
    from urllib2 import urlopen

import threading

worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']

# Load up a queue with your data. This will handle locking
q = queue.Queue()
for url in worker_data:
    q.put(url)

# Define a worker function
def worker(url_queue):
    queue_full = True
    while queue_full:
        try:
            # Get your data off the queue, and do some work
            url = url_queue.get(False)
            data = urlopen(url).read()
            print(len(data))

        except queue.Empty:
            queue_full = False

# Create as many threads as you want
thread_count = 5
for i in range(thread_count):
    t = threading.Thread(target=worker, args = (q,))
    t.start()