我在学习python线程时遇到了join()。

作者告诉,如果线程在守护进程模式,那么我需要使用join(),以便线程可以在主线程终止之前完成自己。

但我也见过他使用t.join(),即使t不是daemon

示例代码如下所示

import threading
import time
import logging

logging.basicConfig(level=logging.DEBUG,
                    format='(%(threadName)-10s) %(message)s',
                    )

def daemon():
    logging.debug('Starting')
    time.sleep(2)
    logging.debug('Exiting')

d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)

def non_daemon():
    logging.debug('Starting')
    logging.debug('Exiting')

t = threading.Thread(name='non-daemon', target=non_daemon)

d.start()
t.start()

d.join()
t.join()

我不知道t.join()的用途是什么,因为它不是守护进程,即使我删除它,我也看不到任何变化


当前回答

这里似乎误解了同步和异步处理之间的区别。

A thread is meant to execute a sub-procedure, most of the times on a "parallel" or "concurrent" fashion (depends on whether the device has multi-processors or not). But, what's the point on concurrency? For the most part, it's about improving performance of a process, by applying the idea of "divide and conquer". Have several threads (sub-processes) executing a "portion" of the whole process simultaneously, and then have a "final" step where all sub-processes results are combined (joined; hence the "join" method).

Of course, in order to achieve such gain on efficiency, the portions that are divided into threads, must be "mutually exclusive" (i.e., they don't share values to be updated... -- known in parallel computing as "critical section" -- ). If there is at least one value that is updated by two or more threads, then one has to wait for the other to "finish" its update, otherwise obtaining inconsistent results (i.e., two persons owning a bank account intend to withdraw certain amount of money in an ATM... if there won't be a proper mechanism that "locks" or "protects" the variable "balance" in both of the ATM devices, withdraws will completely screw-up the final value of the balance, causing obvious serious financial problem to the account owners).

那么,回到并行计算中线程的目的:让所有线程完成各自的部分,并使用“join”使它们“回到”主进程,这样每个单独的结果就会被“合并”成一个全局结果。

例子吗?有很多,但让我们列举几个解释清楚的:

Matrix multiplication: have each thread multiplying a vector of matrix A by the whole second matrix B, to obtain a vector of matrix C. At the end, have all resulting vestors put together to "display" (show) result: matrix C. In this example, although matrix B is used by all threads, no value of it is ever updated or modified (read-only). Summation, product of an array of massive numbers (an array of thousand of values, whether integer or float). Make threads to execute partial sums/products (say, if you have to sum 10K values, create 5 threads, each with 2K values); then with "join" make them return to the main process and sum individual results of all 5 threads. Theoretically, the process will do 2000 + 5 steps (2000 simultaneously in 5 threads, plus summation of final 5 sub-totals in the main process). In practice, though, how long do the 5 threads take to do its own 2000 numbers summation is completely variable as different factors get involved here (processor speed, electrical flow, or if it is a web service, network latency, and so on). However, the amount ot time invested would be in the "worst case", the amount of time the "slowest" thread takes, plus the final summation of 5 results step. Also, in practice, a thread that is meant to do 20% of the whole job, unlikely will take much longer than a single sequential process that would do 100% of the job (of course, it also depends on the size of the sample to be processed... the advantage won't be the same on a summation of 10K values, than summation of just 10 values with the same 5 threads... it's non-practicall, not worth it). Quick sort: We all know in general how quick sort works. However, there's a chance to improve it, if, say, we execute it in TWO threads: one that does the odd numbers and one that does the even ones. Then executes recursively and at some point it joins results of both threads and does a final quick sort in a fashion that will not require so many repetitions as numbers will be sufficiently ordered after the two threads did its initial job. That's a serios gain on performance with a quite big and unordered number of items. Chances are three threads can be used by doing some arrangement to the logic behind it, but its gain is really minimum and not worth to be programmed. However, two threads have a decent performance (time) gain.

因此,在python中使用“join”(或在其他“并发性”语言中使用“join”)具有重要的意义;但这在很大程度上取决于编程理解她/他想要“并行化”什么,以及她/他在将算法分割成需要并行化的正确步骤和需要在主进程中保留哪些步骤方面的熟练程度。这更像是一个“逻辑”思考的问题,而不是编程的“反模式”问题。

其他回答

您可能会说:“使用join()有什么用?”实际上,这和“关闭文件有什么用,因为python和操作系统会在程序退出时为我关闭文件?”的答案是一样的。

这只是一个好的编程问题。您应该在代码中线程不应该再运行的地方join()您的线程,因为您必须确保线程的运行不会干扰您自己的代码,或者您希望在更大的系统中正确地运行。

您可能会说“我不希望我的代码延迟给出答案”,只是因为join()可能需要额外的时间。在某些情况下,这可能是完全有效的,但现在你需要考虑到你的代码“留下了粗糙的东西,让python和操作系统来清理”。如果您这样做是出于性能原因,我强烈建议您记录该行为。如果您正在构建其他人希望使用的库/包,则尤其如此。

除了性能原因之外,没有理由不使用join(),而且我认为您的代码不需要执行得那么好。

When making join(t) function for both non-daemon thread and daemon thread, the main thread (or main process) should wait t seconds, then can go further to work on its own process. During the t seconds waiting time, both of the children threads should do what they can do, such as printing out some text. After the t seconds, if non-daemon thread still didn't finish its job, and it still can finish it after the main process finishes its job, but for daemon thread, it just missed its opportunity window. However, it will eventually die after the python program exits. Please correct me if there is something wrong.

join()方法

阻塞调用线程,直到调用join()方法的线程终止。

来源:http://docs.python.org/2/library/threading.html

下面的例子演示了.join()操作:

import threading
import time

def threaded_worker():
    for r in range(10):
        print('Other: ', r)
        time.sleep(2)

thread_ = threading.Timer(1, threaded_worker)
thread_.daemon = True  # If the main thread is killed, this thread will be killed as well. 
thread_.start()

flag = True

for i in range(10):
    print('Main: ', i)
    time.sleep(2)
    if flag and i > 4:
        print(
            '''
            Threaded_worker() joined to the main thread. 
            Now we have a sequential behavior instead of concurrency.
            ''')
        thread_.join()
        flag = False

Out:

Main:  0
Other:  0
Main:  1
Other:  1
Main:  2
Other:  2
Main:  3
Other:  3
Main:  4
Other:  4
Main:  5
Other:  5

            Threaded_worker() joined to the main thread. 
            Now we have a sequential behavior instead of concurrency.
            
Other:  6
Other:  7
Other:  8
Other:  9
Main:  6
Main:  7
Main:  8
Main:  9

直接从医生那里

加入((超时)) 等待线程终止。这将阻塞调用线程,直到调用join()方法的线程终止—正常终止或通过未处理的异常终止—或者直到可选超时发生。

这意味着生成t和d的主线程等待t完成,直到它完成。

根据程序使用的逻辑,您可能希望等到主线程结束后再继续执行。

文档中还提到:

一个线程可以被标记为“守护线程”。这个标志的意义在于,当只剩下守护线程时,整个Python程序将退出。

举个简单的例子:

def non_daemon():
    time.sleep(5)
    print 'Test non-daemon'

t = threading.Thread(name='non-daemon', target=non_daemon)

t.start()

最后是:

print 'Test one'
t.join()
print 'Test two'

这将输出:

Test one
Test non-daemon
Test two

在这里,主线程显式地等待t线程完成,直到它第二次调用print。

另一种情况是:

print 'Test one'
print 'Test two'
t.join()

我们将得到这样的输出:

Test one
Test two
Test non-daemon

在这里,我们在主线程中完成工作,然后等待t线程完成。在这种情况下,我们甚至可以删除显式连接t.join(),程序将隐式地等待t完成。