我在学习python线程时遇到了join()。

作者告诉,如果线程在守护进程模式,那么我需要使用join(),以便线程可以在主线程终止之前完成自己。

但我也见过他使用t.join(),即使t不是daemon

示例代码如下所示

import threading
import time
import logging

logging.basicConfig(level=logging.DEBUG,
                    format='(%(threadName)-10s) %(message)s',
                    )

def daemon():
    logging.debug('Starting')
    time.sleep(2)
    logging.debug('Exiting')

d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)

def non_daemon():
    logging.debug('Starting')
    logging.debug('Exiting')

t = threading.Thread(name='non-daemon', target=non_daemon)

d.start()
t.start()

d.join()
t.join()

我不知道t.join()的用途是什么,因为它不是守护进程,即使我删除它,我也看不到任何变化


当前回答

一个有点笨拙的ascii-art来演示机制: join()可能是由主线程调用的。它也可以由另一个线程调用,但会不必要地使图复杂化。

join调用应该放在主线程的轨道中,但是为了表示线程关系并尽可能保持简单,我选择将其放在子线程中。

without join:
+---+---+------------------                     main-thread
    |   |
    |   +...........                            child-thread(short)
    +..................................         child-thread(long)

with join
+---+---+------------------***********+###      main-thread
    |   |                             |
    |   +...........join()            |         child-thread(short)
    +......................join()......         child-thread(long)

with join and daemon thread
+-+--+---+------------------***********+###     parent-thread
  |  |   |                             |
  |  |   +...........join()            |        child-thread(short)
  |  +......................join()......        child-thread(long)
  +,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,     child-thread(long + daemonized)

'-' main-thread/parent-thread/main-program execution
'.' child-thread execution
'#' optional parent-thread execution after join()-blocked parent-thread could 
    continue
'*' main-thread 'sleeping' in join-method, waiting for child-thread to finish
',' daemonized thread - 'ignores' lifetime of other threads;
    terminates when main-programs exits; is normally meant for 
    join-independent tasks

所以你看不到任何变化的原因是因为你的主线程在你的连接之后什么都没有做。 您可以说join(仅)与主线程的执行流相关。

例如,如果您希望并发下载一堆页面以将它们连接到单个大页面,则可以使用线程开始并发下载,但需要等到最后一个页面/线程完成后才开始从许多页面中组装单个页面。这就是使用join()的时候。

其他回答

Join()同时等待非守护进程和守护进程线程完成。 如果没有join(),将运行非守护进程线程,并与主线程并发完成。 如果没有join(),守护线程将与主线程并发运行,当主线程完成时,如果守护线程仍在运行,守护线程将在未完成的情况下退出。

因此,下面的join()和daemon=False(守护线程)(daemon默认为False):

import time
from threading import Thread

def test1():
    for _ in range(3):
        print("Test1 is running...")
        time.sleep(1)
    print("Test1 is completed")
    
def test2():
    for _ in range(3):
        print("Test2 is running...")
        time.sleep(1)
    print("Test2 is completed")
                               # Here
thread1 = Thread(target=test1, daemon=False)
thread2 = Thread(target=test2, daemon=False)
                               # Here
thread1.start()
thread2.start()
thread1.join() # Here
thread2.join() # Here
print("Main is completed")

或者,使用join()和daemon=True(非守护线程):

# ...
                               # Here
thread1 = Thread(target=test1, daemon=True)
thread2 = Thread(target=test2, daemon=True)
                               # Here
# ...
thread1.join() # Here
thread2.join() # Here
print("Main is completed")

join()等待Test1和Test2非守护进程或守护进程线程完成。因此,Main is completed在Test1和Test2线程完成后打印,如下所示:

Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is completed
Test2 is completed
Main is completed

并且,如果不使用join(),如果daemon=False(非守护线程):

# ...
                               # Here
thread1 = Thread(target=test1, daemon=False)
thread2 = Thread(target=test2, daemon=False)
                               # Here
# ...
# thread1.join()
# thread2.join()
print("Main is completed")

Test1和Test2非守护线程正在与主线程并发地运行和完成。因此,Main is completed在Test1和Test2线程完成之前打印,如下所示:

Test1 is running...
Test2 is running...
Main is completed
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is completed
Test2 is completed

并且,如果没有使用join(),如果daemon=True(守护线程):

# ...
                               # Here
thread1 = Thread(target=test1, daemon=True)
thread2 = Thread(target=test2, daemon=True)
                               # Here
# ...
# thread1.join()
# thread2.join()
print("Main is completed")

Test1和Test2守护线程与主线程并发运行。因此,Main is completed在Test1和Test2守护线程完成之前打印,当主线程完成时,Test1和Test2守护线程将在未完成的情况下退出,如下所示:

Test1 is running...
Test2 is running...
Main is completed

直接从医生那里

加入((超时)) 等待线程终止。这将阻塞调用线程,直到调用join()方法的线程终止—正常终止或通过未处理的异常终止—或者直到可选超时发生。

这意味着生成t和d的主线程等待t完成,直到它完成。

根据程序使用的逻辑,您可能希望等到主线程结束后再继续执行。

文档中还提到:

一个线程可以被标记为“守护线程”。这个标志的意义在于,当只剩下守护线程时,整个Python程序将退出。

举个简单的例子:

def non_daemon():
    time.sleep(5)
    print 'Test non-daemon'

t = threading.Thread(name='non-daemon', target=non_daemon)

t.start()

最后是:

print 'Test one'
t.join()
print 'Test two'

这将输出:

Test one
Test non-daemon
Test two

在这里,主线程显式地等待t线程完成,直到它第二次调用print。

另一种情况是:

print 'Test one'
print 'Test two'
t.join()

我们将得到这样的输出:

Test one
Test two
Test non-daemon

在这里,我们在主线程中完成工作,然后等待t线程完成。在这种情况下,我们甚至可以删除显式连接t.join(),程序将隐式地等待t完成。

join()方法

阻塞调用线程,直到调用join()方法的线程终止。

来源:http://docs.python.org/2/library/threading.html

这里似乎误解了同步和异步处理之间的区别。

A thread is meant to execute a sub-procedure, most of the times on a "parallel" or "concurrent" fashion (depends on whether the device has multi-processors or not). But, what's the point on concurrency? For the most part, it's about improving performance of a process, by applying the idea of "divide and conquer". Have several threads (sub-processes) executing a "portion" of the whole process simultaneously, and then have a "final" step where all sub-processes results are combined (joined; hence the "join" method).

Of course, in order to achieve such gain on efficiency, the portions that are divided into threads, must be "mutually exclusive" (i.e., they don't share values to be updated... -- known in parallel computing as "critical section" -- ). If there is at least one value that is updated by two or more threads, then one has to wait for the other to "finish" its update, otherwise obtaining inconsistent results (i.e., two persons owning a bank account intend to withdraw certain amount of money in an ATM... if there won't be a proper mechanism that "locks" or "protects" the variable "balance" in both of the ATM devices, withdraws will completely screw-up the final value of the balance, causing obvious serious financial problem to the account owners).

那么,回到并行计算中线程的目的:让所有线程完成各自的部分,并使用“join”使它们“回到”主进程,这样每个单独的结果就会被“合并”成一个全局结果。

例子吗?有很多,但让我们列举几个解释清楚的:

Matrix multiplication: have each thread multiplying a vector of matrix A by the whole second matrix B, to obtain a vector of matrix C. At the end, have all resulting vestors put together to "display" (show) result: matrix C. In this example, although matrix B is used by all threads, no value of it is ever updated or modified (read-only). Summation, product of an array of massive numbers (an array of thousand of values, whether integer or float). Make threads to execute partial sums/products (say, if you have to sum 10K values, create 5 threads, each with 2K values); then with "join" make them return to the main process and sum individual results of all 5 threads. Theoretically, the process will do 2000 + 5 steps (2000 simultaneously in 5 threads, plus summation of final 5 sub-totals in the main process). In practice, though, how long do the 5 threads take to do its own 2000 numbers summation is completely variable as different factors get involved here (processor speed, electrical flow, or if it is a web service, network latency, and so on). However, the amount ot time invested would be in the "worst case", the amount of time the "slowest" thread takes, plus the final summation of 5 results step. Also, in practice, a thread that is meant to do 20% of the whole job, unlikely will take much longer than a single sequential process that would do 100% of the job (of course, it also depends on the size of the sample to be processed... the advantage won't be the same on a summation of 10K values, than summation of just 10 values with the same 5 threads... it's non-practicall, not worth it). Quick sort: We all know in general how quick sort works. However, there's a chance to improve it, if, say, we execute it in TWO threads: one that does the odd numbers and one that does the even ones. Then executes recursively and at some point it joins results of both threads and does a final quick sort in a fashion that will not require so many repetitions as numbers will be sufficiently ordered after the two threads did its initial job. That's a serios gain on performance with a quite big and unordered number of items. Chances are three threads can be used by doing some arrangement to the logic behind it, but its gain is really minimum and not worth to be programmed. However, two threads have a decent performance (time) gain.

因此,在python中使用“join”(或在其他“并发性”语言中使用“join”)具有重要的意义;但这在很大程度上取决于编程理解她/他想要“并行化”什么,以及她/他在将算法分割成需要并行化的正确步骤和需要在主进程中保留哪些步骤方面的熟练程度。这更像是一个“逻辑”思考的问题,而不是编程的“反模式”问题。

在python中3。X join()用于连接线程与主线程,即当join()用于特定线程时,主线程将停止执行,直到被连接的线程执行完成。

#1 - Without Join():
import threading
import time
def loiter():
    print('You are loitering!')
    time.sleep(5)
    print('You are not loitering anymore!')

t1 = threading.Thread(target = loiter)
t1.start()
print('Hey, I do not want to loiter!')
'''
Output without join()--> 
You are loitering!
Hey, I do not want to loiter!
You are not loitering anymore! #After 5 seconds --> This statement will be printed

'''
#2 - With Join():
import threading
import time
def loiter():
    print('You are loitering!')
    time.sleep(5)
    print('You are not loitering anymore!')

t1 = threading.Thread(target = loiter)
t1.start()
t1.join()
print('Hey, I do not want to loiter!')

'''
Output with join() -->
You are loitering!
You are not loitering anymore! #After 5 seconds --> This statement will be printed
Hey, I do not want to loiter! 

'''