假设我有两个字符串,

String s1 = "AbBaCca";
String s2 = "bac";

我想执行一个检查,返回s2包含在s1中。我可以这样做:

return s1.contains(s2);

我非常确定contains()是区分大小写的,但是我不能从阅读文档中确定这一点。如果是的话,我想我最好的方法是:

return s1.toLowerCase().contains(s2.toLowerCase());

撇开所有这些不谈,有没有另一种(可能更好的)方法可以在不考虑大小写敏感性的情况下完成这个任务?


当前回答

import java.text.Normalizer;

import org.apache.commons.lang3.StringUtils;

public class ContainsIgnoreCase {

    public static void main(String[] args) {

        String in = "   Annulée ";
        String key = "annulee";

        // 100% java
        if (Normalizer.normalize(in, Normalizer.Form.NFD).replaceAll("[\\p{InCombiningDiacriticalMarks}]", "").toLowerCase().contains(key)) {
            System.out.println("OK");
        } else {
            System.out.println("KO");
        }

        // use commons.lang lib
        if (StringUtils.containsIgnoreCase(Normalizer.normalize(in, Normalizer.Form.NFD).replaceAll("[\\p{InCombiningDiacriticalMarks}]", ""), key)) {
            System.out.println("OK");
        } else {
            System.out.println("KO");
        }

    }

}

其他回答

你可以简单地这样做:

String s1 = "AbBaCca";
String s2 = "bac";
String toLower = s1.toLowerCase();
return toLower.contains(s2);

是的,这是可以实现的:

String s1 = "abBaCca";
String s2 = "bac";

String s1Lower = s1;

//s1Lower is exact same string, now convert it to lowercase, I left the s1 intact for print purposes if needed

s1Lower = s1Lower.toLowerCase();

String trueStatement = "FALSE!";
if (s1Lower.contains(s2)) {

    //THIS statement will be TRUE
    trueStatement = "TRUE!"
}

return trueStatement;

这段代码将返回字符串“TRUE!”,因为它发现包含了您的字符。

更快的实现:使用String.regionMatches()

使用regexp可能相对较慢。如果你只是想检查一种情况,(慢)没有关系。但如果你有一个数组或一个包含成千上万个字符串的集合,事情就会变得非常缓慢。

下面给出的解决方案既不使用正则表达式,也不使用toLowerCase()(这也很慢,因为它创建了另一个字符串,并在检查后丢弃它们)。

解决方案构建在String.regionMatches()方法上,该方法似乎是未知的。它检查2个String区域是否匹配,但重要的是它还有一个重载,带有一个方便的ignoreCase参数。

public static boolean containsIgnoreCase(String src, String what) {
    final int length = what.length();
    if (length == 0)
        return true; // Empty string is contained
        
    final char firstLo = Character.toLowerCase(what.charAt(0));
    final char firstUp = Character.toUpperCase(what.charAt(0));
    
    for (int i = src.length() - length; i >= 0; i--) {
        // Quick check before calling the more expensive regionMatches() method:
        final char ch = src.charAt(i);
        if (ch != firstLo && ch != firstUp)
            continue;
        
        if (src.regionMatches(true, i, what, 0, length))
            return true;
    }
    
    return false;
}

速度分析

这种速度分析并不意味着是火箭科学,只是对不同方法的速度有多快的粗略描述。

我比较了5种方法。

Our containsIgnoreCase() method. By converting both strings to lower-case and call String.contains(). By converting source string to lower-case and call String.contains() with the pre-cached, lower-cased substring. This solution is already not as flexible because it tests a predefiend substring. Using regular expression (the accepted answer Pattern.compile().matcher().find()...) Using regular expression but with pre-created and cached Pattern. This solution is already not as flexible because it tests a predefined substring.

结果(通过调用该方法1000万次):

我们的方法是670毫秒 2x toLowerCase() and contains(): 2829 ms 1x toLowerCase()和contains(),缓存子字符串:2446毫秒 Regexp: 7180 ms 缓存模式的Regexp: 1845毫秒

表格中的结果:

                                            RELATIVE SPEED   1/RELATIVE SPEED
 METHOD                          EXEC TIME    TO SLOWEST      TO FASTEST (#1)
------------------------------------------------------------------------------
 1. Using regionMatches()          670 ms       10.7x            1.0x
 2. 2x lowercase+contains         2829 ms        2.5x            4.2x
 3. 1x lowercase+contains cache   2446 ms        2.9x            3.7x
 4. Regexp                        7180 ms        1.0x           10.7x
 5. Regexp+cached pattern         1845 ms        3.9x            2.8x

我们的方法比使用小写和contains()快4倍,比使用正则表达式快10倍,即使Pattern是预缓存的也快3倍(并且失去了检查任意子字符串的灵活性)。


分析测试代码

如果你对分析是如何执行的感兴趣,下面是完整的可运行应用程序:

import java.util.regex.Pattern;

public class ContainsAnalysis {
    
    // Case 1 utilizing String.regionMatches()
    public static boolean containsIgnoreCase(String src, String what) {
        final int length = what.length();
        if (length == 0)
            return true; // Empty string is contained
            
        final char firstLo = Character.toLowerCase(what.charAt(0));
        final char firstUp = Character.toUpperCase(what.charAt(0));
        
        for (int i = src.length() - length; i >= 0; i--) {
            // Quick check before calling the more expensive regionMatches()
            // method:
            final char ch = src.charAt(i);
            if (ch != firstLo && ch != firstUp)
                continue;
            
            if (src.regionMatches(true, i, what, 0, length))
                return true;
        }
        
        return false;
    }
    
    // Case 2 with 2x toLowerCase() and contains()
    public static boolean containsConverting(String src, String what) {
        return src.toLowerCase().contains(what.toLowerCase());
    }
    
    // The cached substring for case 3
    private static final String S = "i am".toLowerCase();
    
    // Case 3 with pre-cached substring and 1x toLowerCase() and contains()
    public static boolean containsConverting(String src) {
        return src.toLowerCase().contains(S);
    }
    
    // Case 4 with regexp
    public static boolean containsIgnoreCaseRegexp(String src, String what) {
        return Pattern.compile(Pattern.quote(what), Pattern.CASE_INSENSITIVE)
                    .matcher(src).find();
    }
    
    // The cached pattern for case 5
    private static final Pattern P = Pattern.compile(
            Pattern.quote("i am"), Pattern.CASE_INSENSITIVE);
    
    // Case 5 with pre-cached Pattern
    public static boolean containsIgnoreCaseRegexp(String src) {
        return P.matcher(src).find();
    }
    
    // Main method: perfroms speed analysis on different contains methods
    // (case ignored)
    public static void main(String[] args) throws Exception {
        final String src = "Hi, I am Adam";
        final String what = "i am";
        
        long start, end;
        final int N = 10_000_000;
        
        start = System.nanoTime();
        for (int i = 0; i < N; i++)
            containsIgnoreCase(src, what);
        end = System.nanoTime();
        System.out.println("Case 1 took " + ((end - start) / 1000000) + "ms");
        
        start = System.nanoTime();
        for (int i = 0; i < N; i++)
            containsConverting(src, what);
        end = System.nanoTime();
        System.out.println("Case 2 took " + ((end - start) / 1000000) + "ms");
        
        start = System.nanoTime();
        for (int i = 0; i < N; i++)
            containsConverting(src);
        end = System.nanoTime();
        System.out.println("Case 3 took " + ((end - start) / 1000000) + "ms");
        
        start = System.nanoTime();
        for (int i = 0; i < N; i++)
            containsIgnoreCaseRegexp(src, what);
        end = System.nanoTime();
        System.out.println("Case 4 took " + ((end - start) / 1000000) + "ms");
        
        start = System.nanoTime();
        for (int i = 0; i < N; i++)
            containsIgnoreCaseRegexp(src);
        end = System.nanoTime();
        System.out.println("Case 5 took " + ((end - start) / 1000000) + "ms");
    }
    
}

是的,contains区分大小写。你可以使用带有CASE_INSENSITIVE标志的java.util.regex.Pattern进行不区分大小写的匹配:

Pattern.compile(Pattern.quote(wantedStr), Pattern.CASE_INSENSITIVE).matcher(source).find();

编辑:如果s2包含regex特殊字符(其中有很多),首先引用它是很重要的。我已经更正了我的答案,因为这是人们看到的第一个答案,但是给马特·奎尔的答案投票,因为他指出了这一点。

我做了一个测试,找到一个字符串的大小写不敏感匹配。我有一个150000个对象的向量,所有对象都有一个字符串作为一个字段,并想找到匹配字符串的子集。我尝试了三种方法:

Convert all to lower case for (SongInformation song: songs) { if (song.artist.toLowerCase().indexOf(pattern.toLowercase() > -1) { ... } } Use the String matches() method for (SongInformation song: songs) { if (song.artist.matches("(?i).*" + pattern + ".*")) { ... } } Use regular expressions Pattern p = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE); Matcher m = p.matcher(""); for (SongInformation song: songs) { m.reset(song.artist); if (m.find()) { ... } }

定时结果为:

没有尝试匹配:20毫秒 低匹配:182毫秒 字符串匹配:278毫秒 正则表达式:65毫秒

对于这个用例,正则表达式看起来是最快的。