我需要一个强大的和简单的方法来删除非法的路径和文件字符从一个简单的字符串。我已经使用了下面的代码,但它似乎没有做任何事情,我错过了什么?

using System;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";

            illegal = illegal.Trim(Path.GetInvalidFileNameChars());
            illegal = illegal.Trim(Path.GetInvalidPathChars());

            Console.WriteLine(illegal);
            Console.ReadLine();
        }
    }
}

当前回答

我绝对更喜欢杰夫·耶茨的想法。它将完美地工作,如果你稍微修改它:

string regex = String.Format("[{0}]", Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);

改进仅仅是转义自动生成的正则表达式。

其他回答

下面的代码片段应该对。net 3及更高版本有所帮助。

using System.IO;
using System.Text.RegularExpressions;

public static class PathValidation
{
    private static string pathValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex pathValidator = new Regex(pathValidatorExpression, RegexOptions.Compiled);

    private static string fileNameValidatorExpression = "^[^" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]+$";
    private static Regex fileNameValidator = new Regex(fileNameValidatorExpression, RegexOptions.Compiled);

    private static string pathCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidPathChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex pathCleaner = new Regex(pathCleanerExpression, RegexOptions.Compiled);

    private static string fileNameCleanerExpression = "[" + string.Join("", Array.ConvertAll(Path.GetInvalidFileNameChars(), x => Regex.Escape(x.ToString()))) + "]";
    private static Regex fileNameCleaner = new Regex(fileNameCleanerExpression, RegexOptions.Compiled);

    public static bool ValidatePath(string path)
    {
        return pathValidator.IsMatch(path);
    }

    public static bool ValidateFileName(string fileName)
    {
        return fileNameValidator.IsMatch(fileName);
    }

    public static string CleanPath(string path)
    {
        return pathCleaner.Replace(path, "");
    }

    public static string CleanFileName(string fileName)
    {
        return fileNameCleaner.Replace(fileName, "");
    }
}

如果您必须在项目中的许多地方使用该方法,您还可以创建一个扩展方法,并在项目中的任何地方调用它来获取字符串。

 public static class StringExtension
    {
        public static string RemoveInvalidChars(this string originalString)
        {            
            string finalString=string.Empty;
            if (!string.IsNullOrEmpty(originalString))
            {
                return string.Concat(originalString.Split(Path.GetInvalidFileNameChars()));
            }
            return finalString;            
        }
    }

你可以这样调用上面的扩展方法:

string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
string afterIllegalChars = illegal.RemoveInvalidChars();

浏览这里的答案,它们似乎都涉及使用无效文件名字符的char数组。

当然,这可能是一种微观优化——但对于那些希望检查大量值是否为有效文件名的人来说,值得注意的是,构建一个由无效字符组成的哈希集将带来明显更好的性能。

过去,我对哈希集(或字典)比遍历列表的速度快得多感到非常惊讶。对于字符串,它是一个低得可笑的数字(从记忆中大约5-7项)。对于大多数其他简单数据(对象引用,数字等),神奇的交叉似乎是大约20个项目。

路径中有40个无效字符。InvalidFileNameChars“列表”。今天搜索了一下,在StackOverflow上有一个很好的基准测试,显示hashset将花费40个数组/列表一半多一点的时间:https://stackoverflow.com/a/10762995/949129

下面是我用于清理路径的helper类。我现在忘记了为什么我在里面有漂亮的替换选项,但这是一个可爱的奖励。

额外的奖励方法“IsValidLocalPath”:)

(**那些不使用正则表达式的)

public static class PathExtensions
{
    private static HashSet<char> _invalidFilenameChars;
    private static HashSet<char> InvalidFilenameChars
    {
        get { return _invalidFilenameChars ?? (_invalidFilenameChars = new HashSet<char>(Path.GetInvalidFileNameChars())); }
    }


    /// <summary>Replaces characters in <c>text</c> that are not allowed in file names with the 
    /// specified replacement character.</summary>
    /// <param name="text">Text to make into a valid filename. The same string is returned if 
    /// it is valid already.</param>
    /// <param name="replacement">Replacement character, or NULL to remove bad characters.</param>
    /// <param name="fancyReplacements">TRUE to replace quotes and slashes with the non-ASCII characters ” and ⁄.</param>
    /// <returns>A string that can be used as a filename. If the output string would otherwise be empty, "_" is returned.</returns>
    public static string ToValidFilename(this string text, char? replacement = '_', bool fancyReplacements = false)
    {
        StringBuilder sb = new StringBuilder(text.Length);
        HashSet<char> invalids = InvalidFilenameChars;
        bool changed = false;

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (invalids.Contains(c))
            {
                changed = true;
                char repl = replacement ?? '\0';
                if (fancyReplacements)
                {
                    if (c == '"') repl = '”'; // U+201D right double quotation mark
                    else if (c == '\'') repl = '’'; // U+2019 right single quotation mark
                    else if (c == '/') repl = '⁄'; // U+2044 fraction slash
                }
                if (repl != '\0')
                    sb.Append(repl);
            }
            else
                sb.Append(c);
        }

        if (sb.Length == 0)
            return "_";

        return changed ? sb.ToString() : text;
    }


    /// <summary>
    /// Returns TRUE if the specified path is a valid, local filesystem path.
    /// </summary>
    /// <param name="pathString"></param>
    /// <returns></returns>
    public static bool IsValidLocalPath(this string pathString)
    {
        // From solution at https://stackoverflow.com/a/11636052/949129
        Uri pathUri;
        Boolean isValidUri = Uri.TryCreate(pathString, UriKind.Absolute, out pathUri);
        return isValidUri && pathUri != null && pathUri.IsLoopback;
    }
}

我创建了一个扩展方法,它结合了几个建议:

在哈希集中保存非法字符 过滤ascii 127以下的字符。因为路径。getinvalidfilenamecars不包括ascii码从0到255的所有无效字符。看这里和MSDN 定义替换字符的可能性

来源:

public static class FileNameCorrector
{
    private static HashSet<char> invalid = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string ToValidFileName(this string name, char replacement = '\0')
    {
        var builder = new StringBuilder();
        foreach (var cur in name)
        {
            if (cur > 31 && cur < 128 && !invalid.Contains(cur))
            {
                builder.Append(cur);
            }
            else if (replacement != '\0')
            {
                builder.Append(replacement);
            }
        }

        return builder.ToString();
    }
}

这看起来是O(n),并且不会在字符串上花费太多内存:

    private static readonly HashSet<char> invalidFileNameChars = new HashSet<char>(Path.GetInvalidFileNameChars());

    public static string RemoveInvalidFileNameChars(string name)
    {
        if (!name.Any(c => invalidFileNameChars.Contains(c))) {
            return name;
        }

        return new string(name.Where(c => !invalidFileNameChars.Contains(c)).ToArray());
    }