我在学习python线程时遇到了join()。

作者告诉,如果线程在守护进程模式,那么我需要使用join(),以便线程可以在主线程终止之前完成自己。

但我也见过他使用t.join(),即使t不是daemon

示例代码如下所示

import threading
import time
import logging

logging.basicConfig(level=logging.DEBUG,
                    format='(%(threadName)-10s) %(message)s',
                    )

def daemon():
    logging.debug('Starting')
    time.sleep(2)
    logging.debug('Exiting')

d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)

def non_daemon():
    logging.debug('Starting')
    logging.debug('Exiting')

t = threading.Thread(name='non-daemon', target=non_daemon)

d.start()
t.start()

d.join()
t.join()

我不知道t.join()的用途是什么,因为它不是守护进程,即使我删除它,我也看不到任何变化


当前回答

When making join(t) function for both non-daemon thread and daemon thread, the main thread (or main process) should wait t seconds, then can go further to work on its own process. During the t seconds waiting time, both of the children threads should do what they can do, such as printing out some text. After the t seconds, if non-daemon thread still didn't finish its job, and it still can finish it after the main process finishes its job, but for daemon thread, it just missed its opportunity window. However, it will eventually die after the python program exits. Please correct me if there is something wrong.

其他回答

join()方法

阻塞调用线程,直到调用join()方法的线程终止。

来源:http://docs.python.org/2/library/threading.html

直接从医生那里

加入((超时)) 等待线程终止。这将阻塞调用线程,直到调用join()方法的线程终止—正常终止或通过未处理的异常终止—或者直到可选超时发生。

这意味着生成t和d的主线程等待t完成,直到它完成。

根据程序使用的逻辑,您可能希望等到主线程结束后再继续执行。

文档中还提到:

一个线程可以被标记为“守护线程”。这个标志的意义在于,当只剩下守护线程时,整个Python程序将退出。

举个简单的例子:

def non_daemon():
    time.sleep(5)
    print 'Test non-daemon'

t = threading.Thread(name='non-daemon', target=non_daemon)

t.start()

最后是:

print 'Test one'
t.join()
print 'Test two'

这将输出:

Test one
Test non-daemon
Test two

在这里,主线程显式地等待t线程完成,直到它第二次调用print。

另一种情况是:

print 'Test one'
print 'Test two'
t.join()

我们将得到这样的输出:

Test one
Test two
Test non-daemon

在这里,我们在主线程中完成工作,然后等待t线程完成。在这种情况下,我们甚至可以删除显式连接t.join(),程序将隐式地等待t完成。

这里似乎误解了同步和异步处理之间的区别。

A thread is meant to execute a sub-procedure, most of the times on a "parallel" or "concurrent" fashion (depends on whether the device has multi-processors or not). But, what's the point on concurrency? For the most part, it's about improving performance of a process, by applying the idea of "divide and conquer". Have several threads (sub-processes) executing a "portion" of the whole process simultaneously, and then have a "final" step where all sub-processes results are combined (joined; hence the "join" method).

Of course, in order to achieve such gain on efficiency, the portions that are divided into threads, must be "mutually exclusive" (i.e., they don't share values to be updated... -- known in parallel computing as "critical section" -- ). If there is at least one value that is updated by two or more threads, then one has to wait for the other to "finish" its update, otherwise obtaining inconsistent results (i.e., two persons owning a bank account intend to withdraw certain amount of money in an ATM... if there won't be a proper mechanism that "locks" or "protects" the variable "balance" in both of the ATM devices, withdraws will completely screw-up the final value of the balance, causing obvious serious financial problem to the account owners).

那么,回到并行计算中线程的目的:让所有线程完成各自的部分,并使用“join”使它们“回到”主进程,这样每个单独的结果就会被“合并”成一个全局结果。

例子吗?有很多,但让我们列举几个解释清楚的:

Matrix multiplication: have each thread multiplying a vector of matrix A by the whole second matrix B, to obtain a vector of matrix C. At the end, have all resulting vestors put together to "display" (show) result: matrix C. In this example, although matrix B is used by all threads, no value of it is ever updated or modified (read-only). Summation, product of an array of massive numbers (an array of thousand of values, whether integer or float). Make threads to execute partial sums/products (say, if you have to sum 10K values, create 5 threads, each with 2K values); then with "join" make them return to the main process and sum individual results of all 5 threads. Theoretically, the process will do 2000 + 5 steps (2000 simultaneously in 5 threads, plus summation of final 5 sub-totals in the main process). In practice, though, how long do the 5 threads take to do its own 2000 numbers summation is completely variable as different factors get involved here (processor speed, electrical flow, or if it is a web service, network latency, and so on). However, the amount ot time invested would be in the "worst case", the amount of time the "slowest" thread takes, plus the final summation of 5 results step. Also, in practice, a thread that is meant to do 20% of the whole job, unlikely will take much longer than a single sequential process that would do 100% of the job (of course, it also depends on the size of the sample to be processed... the advantage won't be the same on a summation of 10K values, than summation of just 10 values with the same 5 threads... it's non-practicall, not worth it). Quick sort: We all know in general how quick sort works. However, there's a chance to improve it, if, say, we execute it in TWO threads: one that does the odd numbers and one that does the even ones. Then executes recursively and at some point it joins results of both threads and does a final quick sort in a fashion that will not require so many repetitions as numbers will be sufficiently ordered after the two threads did its initial job. That's a serios gain on performance with a quite big and unordered number of items. Chances are three threads can be used by doing some arrangement to the logic behind it, but its gain is really minimum and not worth to be programmed. However, two threads have a decent performance (time) gain.

因此,在python中使用“join”(或在其他“并发性”语言中使用“join”)具有重要的意义;但这在很大程度上取决于编程理解她/他想要“并行化”什么,以及她/他在将算法分割成需要并行化的正确步骤和需要在主进程中保留哪些步骤方面的熟练程度。这更像是一个“逻辑”思考的问题,而不是编程的“反模式”问题。

Join()同时等待非守护进程和守护进程线程完成。 如果没有join(),将运行非守护进程线程,并与主线程并发完成。 如果没有join(),守护线程将与主线程并发运行,当主线程完成时,如果守护线程仍在运行,守护线程将在未完成的情况下退出。

因此,下面的join()和daemon=False(守护线程)(daemon默认为False):

import time
from threading import Thread

def test1():
    for _ in range(3):
        print("Test1 is running...")
        time.sleep(1)
    print("Test1 is completed")
    
def test2():
    for _ in range(3):
        print("Test2 is running...")
        time.sleep(1)
    print("Test2 is completed")
                               # Here
thread1 = Thread(target=test1, daemon=False)
thread2 = Thread(target=test2, daemon=False)
                               # Here
thread1.start()
thread2.start()
thread1.join() # Here
thread2.join() # Here
print("Main is completed")

或者,使用join()和daemon=True(非守护线程):

# ...
                               # Here
thread1 = Thread(target=test1, daemon=True)
thread2 = Thread(target=test2, daemon=True)
                               # Here
# ...
thread1.join() # Here
thread2.join() # Here
print("Main is completed")

join()等待Test1和Test2非守护进程或守护进程线程完成。因此,Main is completed在Test1和Test2线程完成后打印,如下所示:

Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is completed
Test2 is completed
Main is completed

并且,如果不使用join(),如果daemon=False(非守护线程):

# ...
                               # Here
thread1 = Thread(target=test1, daemon=False)
thread2 = Thread(target=test2, daemon=False)
                               # Here
# ...
# thread1.join()
# thread2.join()
print("Main is completed")

Test1和Test2非守护线程正在与主线程并发地运行和完成。因此,Main is completed在Test1和Test2线程完成之前打印,如下所示:

Test1 is running...
Test2 is running...
Main is completed
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is completed
Test2 is completed

并且,如果没有使用join(),如果daemon=True(守护线程):

# ...
                               # Here
thread1 = Thread(target=test1, daemon=True)
thread2 = Thread(target=test2, daemon=True)
                               # Here
# ...
# thread1.join()
# thread2.join()
print("Main is completed")

Test1和Test2守护线程与主线程并发运行。因此,Main is completed在Test1和Test2守护线程完成之前打印,当主线程完成时,Test1和Test2守护线程将在未完成的情况下退出,如下所示:

Test1 is running...
Test2 is running...
Main is completed

主线程(或任何其他线程)加入其他线程有几个原因

线程可能已经创建或持有(锁定)一些资源。调用连接的线程可以代表它清除资源 Join()是一个自然的阻塞调用,用于调用连接的线程在被调用的线程终止后继续执行。

如果一个python程序没有加入其他线程,python解释器仍然会代表它加入非守护线程。