将字符串转换为有效的文件名?

我有一个字符串，我想用它作为文件名，所以我想用Python删除文件名中不允许的所有字符。

我宁愿严格一点，所以假设我想只保留字母、数字和一小组其他字符，如“_-.()”。”。最优雅的解决方案是什么?

文件名需要在多个操作系统(Windows, Linux和Mac OS)上有效——它是我库中的一个MP3文件，以歌曲标题为文件名，并在3台机器之间共享和备份。

当前回答

你可以使用re.sub()方法替换任何非“类文件”的东西。但实际上，每个字符都可以是有效的;所以没有预先构建的函数(我相信)来完成它。

import re

str = "File!name?.txt"
f = open(os.path.join("/tmp", re.sub('[^-a-zA-Z0-9_.() ]+', '', str))

将导致/tmp/filename.txt的文件句柄。

2008-11-17 09:10:53

其他回答

This whitelist approach (ie, allowing only the chars present in valid_chars) will work if there aren't limits on the formatting of the files or combination of valid chars that are illegal (like ".."), for example, what you say would allow a filename named " . txt" which I think is not valid on Windows. As this is the most simple approach I'd try to remove whitespace from the valid_chars and prepend a known valid string in case of error, any other approach will have to know about what is allowed where to cope with Windows file naming limitations and thus be a lot more complex.

>>> import string
>>> valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
>>> valid_chars
'-_.() abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
>>> filename = "This Is a (valid) - filename%$&$ .txt"
>>> ''.join(c for c in filename if c in valid_chars)
'This Is a (valid) - filename .txt'

2008-11-17 09:10:49

我知道有很多答案，但它们大多依赖于正则表达式或外部模块，所以我想抛出我自己的答案。一个纯python函数，不需要外部模块，不使用正则表达式。我的方法不是清除无效字符，而是只允许有效字符。

def normalizefilename(fn):
    validchars = "-_.() "
    out = ""
    for c in fn:
      if str.isalpha(c) or str.isdigit(c) or (c in validchars):
        out += c
      else:
        out += "_"
    return out

如果您愿意，您可以在开头向validchars变量添加您自己的有效字符，例如您的国家字母在英语字母表中不存在。这是您可能想要也可能不想要的:一些不运行UTF-8的文件系统在使用非ascii字符时可能仍然存在问题。

此函数用于测试单个文件名的有效性，因此它将路径分隔符替换为_，认为它们是无效字符。如果你想添加它，修改If以包含os路径分隔符是很简单的。

2019-03-11 12:21:02

你可以使用re.sub()方法替换任何非“类文件”的东西。但实际上，每个字符都可以是有效的;所以没有预先构建的函数(我相信)来完成它。

import re

str = "File!name?.txt"
f = open(os.path.join("/tmp", re.sub('[^-a-zA-Z0-9_.() ]+', '', str))

将导致/tmp/filename.txt的文件句柄。

2008-11-17 09:10:53

使用字符串作为文件名的原因是什么?如果人类的可读性不是一个因素，我会使用base64模块，它可以产生文件系统安全字符串。它是不可读的，但你不需要处理碰撞，而且它是可逆的。

import base64
file_name_string = base64.urlsafe_b64encode(your_string)

更新:根据Matthew的评论修改。

2008-11-17 09:12:02

您可以将列表推导式与字符串方法一起使用。

>>> s
'foo-bar#baz?qux@127/\\9]'
>>> "".join(x for x in s if x.isalnum())
'foobarbazqux1279'

2008-11-17 09:12:49

将字符串转换为有效的文件名?

推荐文章

最新文章

标签