我有一个日志文件正在写的另一个进程,我想观察变化。每次发生更改时,我都希望将新数据读入并对其进行一些处理。

最好的方法是什么?我希望在PyWin32库中有某种钩子。我找到了win32文件。函数FindNextChangeNotification,但不知道如何要求它监视特定的文件。

如果有人做过类似的事情,我真的很感激能听到…

[编辑]我应该提到我追求的是一种不需要轮询的解决方案。

[编辑]诅咒!这似乎不能在映射的网络驱动器上工作。我猜windows不会像在本地磁盘上那样“听到”任何对文件的更新。


当前回答

如果轮询对您来说足够好,我只观察“修改的时间”文件统计是否发生变化。阅读方法:

os.stat(filename).st_mtime

(还要注意,Windows本机更改事件解决方案并不在所有情况下都有效,例如在网络驱动器上。)

import os

class Monkey(object):
    def __init__(self):
        self._cached_stamp = 0
        self.filename = '/path/to/file'

    def ook(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...

其他回答

这是对Tim Goldan脚本的另一种修改,它运行在unix类型上,并通过使用dict (file=>time)为文件修改添加了一个简单的监控器。

用法:whatvername .py path_to_dir_to_watch

#!/usr/bin/env python

import os, sys, time

def files_to_timestamp(path):
    files = [os.path.join(path, f) for f in os.listdir(path)]
    return dict ([(f, os.path.getmtime(f)) for f in files])

if __name__ == "__main__":

    path_to_watch = sys.argv[1]
    print('Watching {}..'.format(path_to_watch))

    before = files_to_timestamp(path_to_watch)

    while 1:
        time.sleep (2)
        after = files_to_timestamp(path_to_watch)

        added = [f for f in after.keys() if not f in before.keys()]
        removed = [f for f in before.keys() if not f in after.keys()]
        modified = []

        for f in before.keys():
            if not f in removed:
                if os.path.getmtime(f) != before.get(f):
                    modified.append(f)

        if added: print('Added: {}'.format(', '.join(added)))
        if removed: print('Removed: {}'.format(', '.join(removed)))
        if modified: print('Modified: {}'.format(', '.join(modified)))

        before = after

看看pyinotify。

Inotify在新的linux中取代了dnotify(从以前的答案),并允许文件级而不是目录级监控。

为了用轮询和最小依赖来观察单个文件,下面是一个完整的例子,基于Deestan的回答(上面):

import os
import sys 
import time

class Watcher(object):
    running = True
    refresh_delay_secs = 1

    # Constructor
    def __init__(self, watch_file, call_func_on_change=None, *args, **kwargs):
        self._cached_stamp = 0
        self.filename = watch_file
        self.call_func_on_change = call_func_on_change
        self.args = args
        self.kwargs = kwargs

    # Look for changes
    def look(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...
            print('File changed')
            if self.call_func_on_change is not None:
                self.call_func_on_change(*self.args, **self.kwargs)

    # Keep watching in a loop        
    def watch(self):
        while self.running: 
            try: 
                # Look for changes
                time.sleep(self.refresh_delay_secs) 
                self.look() 
            except KeyboardInterrupt: 
                print('\nDone') 
                break 
            except FileNotFoundError:
                # Action on file not found
                pass
            except: 
                print('Unhandled error: %s' % sys.exc_info()[0])

# Call this function each time a change happens
def custom_action(text):
    print(text)

watch_file = 'my_file.txt'

# watcher = Watcher(watch_file)  # simple
watcher = Watcher(watch_file, custom_action, text='yes, changed')  # also call custom action function
watcher.watch()  # start the watch going
ACTIONS = {
  1 : "Created",
  2 : "Deleted",
  3 : "Updated",
  4 : "Renamed from something",
  5 : "Renamed to something"
}
FILE_LIST_DIRECTORY = 0x0001

class myThread (threading.Thread):
    def __init__(self, threadID, fileName, directory, origin):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.fileName = fileName
        self.daemon = True
        self.dir = directory
        self.originalFile = origin
    def run(self):
        startMonitor(self.fileName, self.dir, self.originalFile)

def startMonitor(fileMonitoring,dirPath,originalFile):
    hDir = win32file.CreateFile (
        dirPath,
        FILE_LIST_DIRECTORY,
        win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE,
        None,
        win32con.OPEN_EXISTING,
        win32con.FILE_FLAG_BACKUP_SEMANTICS,
        None
    )
    # Wait for new data and call ProcessNewData for each new chunk that's
    # written
    while 1:
        # Wait for a change to occur
        results = win32file.ReadDirectoryChangesW (
            hDir,
            1024,
            False,
            win32con.FILE_NOTIFY_CHANGE_LAST_WRITE,
            None,
            None
        )
        # For each change, check to see if it's updating the file we're
        # interested in
        for action, file_M in results:
            full_filename = os.path.join (dirPath, file_M)
            #print file, ACTIONS.get (action, "Unknown")
            if len(full_filename) == len(fileMonitoring) and action == 3:
                #copy to main file
                ...

因为没有人提到它,所以我要把它放在那里:在标准库中有一个名为filecmp的Python模块,它有这个cmp()函数来比较两个文件。

只是要确保你没有使用from filecmp import cmp来掩盖Python 2.x中内置的cmp()函数。这在Python 3中是可以的。X,因为不再有这样的内置cmp()函数了。

不管怎样,它的用法是这样的:

import filecmp
filecmp.cmp(path_to_file_1, path_to_file_2, shallow=True)

参数shallow默认为True。如果参数的值为True,则只比较文件的元数据;但是,如果参数的值为False,则比较文件的内容。

也许这个信息对某人有用。