在Python中使用多处理时，我应该如何记录日志?

现在我在框架中有一个中心模块，它使用Python 2.6 multiprocessing模块生成多个进程。因为它使用多处理，所以有一个模块级的多处理感知日志，log = multiprocessing.get_logger()。根据文档，这个日志记录器(EDIT)没有进程共享锁，所以你不会在sys. exe中弄乱东西。Stderr(或任何文件句柄)，让多个进程同时写入它。

我现在遇到的问题是框架中的其他模块不支持多处理。在我看来，我需要让这个中心模块上的所有依赖都使用多处理感知日志。这在框架内很烦人，更不用说对框架的所有客户端了。还有我想不到的选择吗?

当前回答

下面是我简单的破解/变通方法…不是最全面的，但很容易修改，比我在写这篇文章之前找到的任何其他答案都更容易阅读和理解:

import logging
import multiprocessing

class FakeLogger(object):
    def __init__(self, q):
        self.q = q
    def info(self, item):
        self.q.put('INFO - {}'.format(item))
    def debug(self, item):
        self.q.put('DEBUG - {}'.format(item))
    def critical(self, item):
        self.q.put('CRITICAL - {}'.format(item))
    def warning(self, item):
        self.q.put('WARNING - {}'.format(item))

def some_other_func_that_gets_logger_and_logs(num):
    # notice the name get's discarded
    # of course you can easily add this to your FakeLogger class
    local_logger = logging.getLogger('local')
    local_logger.info('Hey I am logging this: {} and working on it to make this {}!'.format(num, num*2))
    local_logger.debug('hmm, something may need debugging here')
    return num*2

def func_to_parallelize(data_chunk):
    # unpack our args
    the_num, logger_q = data_chunk
    # since we're now in a new process, let's monkeypatch the logging module
    logging.getLogger = lambda name=None: FakeLogger(logger_q)
    # now do the actual work that happens to log stuff too
    new_num = some_other_func_that_gets_logger_and_logs(the_num)
    return (the_num, new_num)

if __name__ == '__main__':
    multiprocessing.freeze_support()
    m = multiprocessing.Manager()
    logger_q = m.Queue()
    # we have to pass our data to be parallel-processed
    # we also need to pass the Queue object so we can retrieve the logs
    parallelable_data = [(1, logger_q), (2, logger_q)]
    # set up a pool of processes so we can take advantage of multiple CPU cores
    pool_size = multiprocessing.cpu_count() * 2
    pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=4)
    worker_output = pool.map(func_to_parallelize, parallelable_data)
    pool.close() # no more tasks
    pool.join()  # wrap up current tasks
    # get the contents of our FakeLogger object
    while not logger_q.empty():
        print logger_q.get()
    print 'worker output contained: {}'.format(worker_output)

2016-09-13 16:55:19

其他回答

解决这个问题的唯一方法是非侵入性的:

Spawn each worker process such that its log goes to a different file descriptor (to disk or to pipe.) Ideally, all log entries should be timestamped. Your controller process can then do one of the following: If using disk files: Coalesce the log files at the end of the run, sorted by timestamp If using pipes (recommended): Coalesce log entries on-the-fly from all pipes, into a central log file. (E.g., Periodically select from the pipes' file descriptors, perform merge-sort on the available log entries, and flush to centralized log. Repeat.)

2009-03-13 04:39:42

只需将日志记录器的实例发布到某个地方。这样，其他模块和客户端就可以使用您的API来获取记录器，而不必导入multiprocessing。

2009-03-13 04:40:00

其中一个替代方案是将多处理日志写入一个已知文件，并注册一个atexit处理程序来加入这些进程，并在stderr上读取它;但是，您无法通过这种方式获得stderr上输出消息的实时流。

2009-03-13 04:40:17

然而，另一种选择可能是日志包中各种非基于文件的日志处理程序:

套接字处理程序数据报处理程序系统日志处理程序

(和其他人)

通过这种方式，您可以轻松地在某个地方创建一个日志守护进程，以便安全地对其进行写入并正确地处理结果。(例如，一个简单的套接字服务器，它只是解pickle消息并将其发送到自己的旋转文件处理程序。)

SyslogHandler也会为您处理这个问题。当然，您可以使用自己的syslog实例，而不是系统实例。

2009-03-13 11:19:29

我刚刚写了一个我自己的日志处理程序，它只是通过管道将所有内容提供给父进程。我只测试了十分钟，但它似乎工作得很好。

(注意:这是硬编码到RotatingFileHandler，这是我自己的用例。)

更新:@javier现在将这种方法作为Pypi上可用的包来维护-参见Pypi上的multiprocessing-logging, github上的https://github.com/jruere/multiprocessing-logging

更新:实现!

现在它使用队列来正确地处理并发，并正确地从错误中恢复。现在，我已经在生产中使用了几个月了，下面的当前版本工作起来没有问题。

from logging.handlers import RotatingFileHandler
import multiprocessing, threading, logging, sys, traceback

class MultiProcessingLog(logging.Handler):
    def __init__(self, name, mode, maxsize, rotate):
        logging.Handler.__init__(self)

        self._handler = RotatingFileHandler(name, mode, maxsize, rotate)
        self.queue = multiprocessing.Queue(-1)

        t = threading.Thread(target=self.receive)
        t.daemon = True
        t.start()

    def setFormatter(self, fmt):
        logging.Handler.setFormatter(self, fmt)
        self._handler.setFormatter(fmt)

    def receive(self):
        while True:
            try:
                record = self.queue.get()
                self._handler.emit(record)
            except (KeyboardInterrupt, SystemExit):
                raise
            except EOFError:
                break
            except:
                traceback.print_exc(file=sys.stderr)

    def send(self, s):
        self.queue.put_nowait(s)

    def _format_record(self, record):
        # ensure that exc_info and args
        # have been stringified.  Removes any chance of
        # unpickleable things inside and possibly reduces
        # message size sent over the pipe
        if record.args:
            record.msg = record.msg % record.args
            record.args = None
        if record.exc_info:
            dummy = self.format(record)
            record.exc_info = None

        return record

    def emit(self, record):
        try:
            s = self._format_record(record)
            self.send(s)
        except (KeyboardInterrupt, SystemExit):
            raise
        except:
            self.handleError(record)

    def close(self):
        self._handler.close()
        logging.Handler.close(self)

2009-05-21 18:10:33

在Python中使用多处理时，我应该如何记录日志?

推荐文章

最新文章

标签