将字符串转换为有效的文件名?

我有一个字符串，我想用它作为文件名，所以我想用Python删除文件名中不允许的所有字符。

我宁愿严格一点，所以假设我想只保留字母、数字和一小组其他字符，如“_-.()”。”。最优雅的解决方案是什么?

文件名需要在多个操作系统(Windows, Linux和Mac OS)上有效——它是我库中的一个MP3文件，以歌曲标题为文件名，并在3台机器之间共享和备份。

当前回答

请记住，在Unix系统上实际上没有文件名限制

它可能不包含\0 它可能不包含/

其他一切都是公平的。

$ touch "
> even multiline
> haha
> ^[[31m red ^[[0m
> evil"
$ ls -la 
-rw-r--r--       0 Nov 17 23:39 ?even multiline?haha??[31m red ?[0m?evil
$ ls -lab
-rw-r--r--       0 Nov 17 23:39 \neven\ multiline\nhaha\n\033[31m\ red\ \033[0m\nevil
$ perl -e 'for my $i ( glob(q{./*even*}) ){ print $i; } '
./
even multiline
haha
 red 
evil

是的，我只是将ANSI颜色代码存储在一个文件名中，并使它们生效。

为了娱乐，在目录名中放入一个BEL字符，并观看当您CD到其中时所产生的乐趣;)

2008-11-17 10:45:54

其他回答

你可以使用re.sub()方法替换任何非“类文件”的东西。但实际上，每个字符都可以是有效的;所以没有预先构建的函数(我相信)来完成它。

import re

str = "File!name?.txt"
f = open(os.path.join("/tmp", re.sub('[^-a-zA-Z0-9_.() ]+', '', str))

将导致/tmp/filename.txt的文件句柄。

2008-11-17 09:10:53

为python 3.6修改的答案

import string
import unicodedata

validFilenameChars = "-_.() %s%s" % (string.ascii_letters, string.digits)
def removeDisallowedFilenameChars(filename):
    cleanedFilename = unicodedata.normalize('NFKD', filename).encode('ASCII', 'ignore')
    return ''.join(chr(c) for c in cleanedFilename if chr(c) in validFilenameChars)

2019-04-22 04:48:57

我知道有很多答案，但它们大多依赖于正则表达式或外部模块，所以我想抛出我自己的答案。一个纯python函数，不需要外部模块，不使用正则表达式。我的方法不是清除无效字符，而是只允许有效字符。

def normalizefilename(fn):
    validchars = "-_.() "
    out = ""
    for c in fn:
      if str.isalpha(c) or str.isdigit(c) or (c in validchars):
        out += c
      else:
        out += "_"
    return out

如果您愿意，您可以在开头向validchars变量添加您自己的有效字符，例如您的国家字母在英语字母表中不存在。这是您可能想要也可能不想要的:一些不运行UTF-8的文件系统在使用非ascii字符时可能仍然存在问题。

此函数用于测试单个文件名的有效性，因此它将路径分隔符替换为_，认为它们是无效字符。如果你想添加它，修改If以包含os路径分隔符是很简单的。

2019-03-11 12:21:02

为什么不直接用try/except来包装“osopen”，让底层操作系统来判断文件是否有效?

这看起来工作量少得多，而且无论您使用哪种操作系统都是有效的。

2008-11-17 11:24:49

This whitelist approach (ie, allowing only the chars present in valid_chars) will work if there aren't limits on the formatting of the files or combination of valid chars that are illegal (like ".."), for example, what you say would allow a filename named " . txt" which I think is not valid on Windows. As this is the most simple approach I'd try to remove whitespace from the valid_chars and prepend a known valid string in case of error, any other approach will have to know about what is allowed where to cope with Windows file naming limitations and thus be a lot more complex.

>>> import string
>>> valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
>>> valid_chars
'-_.() abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
>>> filename = "This Is a (valid) - filename%$&$ .txt"
>>> ''.join(c for c in filename if c in valid_chars)
'This Is a (valid) - filename .txt'

2008-11-17 09:10:49

将字符串转换为有效的文件名?

推荐文章

最新文章

标签