我有一个多行字符串,由一组不同的分隔符分隔:

(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)

我可以使用string将这个字符串分割成各个部分。分裂,但似乎我无法获得与分隔符正则表达式匹配的实际字符串。

换句话说,这就是我得到的结果:

Text1 Text2 Text3 Text4

这就是我想要的

Text1 DelimiterA Text2 DelimiterC Text3 DelimiterB Text4

JDK中是否有任何方法可以使用分隔符正则表达式分割字符串,但同时保留分隔符?


当前回答

    String expression = "((A+B)*C-D)*E";
    expression = expression.replaceAll("\\+", "~+~");
    expression = expression.replaceAll("\\*", "~*~");
    expression = expression.replaceAll("-", "~-~");
    expression = expression.replaceAll("/+", "~/~");
    expression = expression.replaceAll("\\(", "~(~"); //also you can use [(] instead of \\(
    expression = expression.replaceAll("\\)", "~)~"); //also you can use [)] instead of \\)
    expression = expression.replaceAll("~~", "~");
    if(expression.startsWith("~")) {
        expression = expression.substring(1);
    }

    String[] expressionArray = expression.split("~");
    System.out.println(Arrays.toString(expressionArray));

其他回答

我看了一下上面的答案,老实说,没有一个让我满意。您要做的实际上是模拟Perl分割功能。为什么Java不允许这样做,并且在某个地方有一个join()方法,这超出了我的范围,但我离题了。你甚至不需要专门的课程。它只是一个函数。运行这个示例程序:

一些早期的答案有过多的空检查,我最近写了一个回答这里的问题:

https://stackoverflow.com/users/18393/cletus

不管怎样,代码:

public class Split {
    public static List<String> split(String s, String pattern) {
        assert s != null;
        assert pattern != null;
        return split(s, Pattern.compile(pattern));
    }

    public static List<String> split(String s, Pattern pattern) {
        assert s != null;
        assert pattern != null;
        Matcher m = pattern.matcher(s);
        List<String> ret = new ArrayList<String>();
        int start = 0;
        while (m.find()) {
            ret.add(s.substring(start, m.start()));
            ret.add(m.group());
            start = m.end();
        }
        ret.add(start >= s.length() ? "" : s.substring(start));
        return ret;
    }

    private static void testSplit(String s, String pattern) {
        System.out.printf("Splitting '%s' with pattern '%s'%n", s, pattern);
        List<String> tokens = split(s, pattern);
        System.out.printf("Found %d matches%n", tokens.size());
        int i = 0;
        for (String token : tokens) {
            System.out.printf("  %d/%d: '%s'%n", ++i, tokens.size(), token);
        }
        System.out.println();
    }

    public static void main(String args[]) {
        testSplit("abcdefghij", "z"); // "abcdefghij"
        testSplit("abcdefghij", "f"); // "abcde", "f", "ghi"
        testSplit("abcdefghij", "j"); // "abcdefghi", "j", ""
        testSplit("abcdefghij", "a"); // "", "a", "bcdefghij"
        testSplit("abcdefghij", "[bdfh]"); // "a", "b", "c", "d", "e", "f", "g", "h", "ij"
    }
}

调整pattern .split()以将匹配的模式包含到列表中

添加

// add match to the list
        matchList.add(input.subSequence(start, end).toString());

完整的源

public static String[] inclusiveSplit(String input, String re, int limit) {
    int index = 0;
    boolean matchLimited = limit > 0;
    ArrayList<String> matchList = new ArrayList<String>();

    Pattern pattern = Pattern.compile(re);
    Matcher m = pattern.matcher(input);

    // Add segments before each match found
    while (m.find()) {
        int end = m.end();
        if (!matchLimited || matchList.size() < limit - 1) {
            int start = m.start();
            String match = input.subSequence(index, start).toString();
            matchList.add(match);
            // add match to the list
            matchList.add(input.subSequence(start, end).toString());
            index = end;
        } else if (matchList.size() == limit - 1) { // last one
            String match = input.subSequence(index, input.length())
                    .toString();
            matchList.add(match);
            index = end;
        }
    }

    // If no match was found, return this
    if (index == 0)
        return new String[] { input.toString() };

    // Add remaining segment
    if (!matchLimited || matchList.size() < limit)
        matchList.add(input.subSequence(index, input.length()).toString());

    // Construct result
    int resultSize = matchList.size();
    if (limit == 0)
        while (resultSize > 0 && matchList.get(resultSize - 1).equals(""))
            resultSize--;
    String[] result = new String[resultSize];
    return matchList.subList(0, resultSize).toArray(result);
}
    String expression = "((A+B)*C-D)*E";
    expression = expression.replaceAll("\\+", "~+~");
    expression = expression.replaceAll("\\*", "~*~");
    expression = expression.replaceAll("-", "~-~");
    expression = expression.replaceAll("/+", "~/~");
    expression = expression.replaceAll("\\(", "~(~"); //also you can use [(] instead of \\(
    expression = expression.replaceAll("\\)", "~)~"); //also you can use [)] instead of \\)
    expression = expression.replaceAll("~~", "~");
    if(expression.startsWith("~")) {
        expression = expression.substring(1);
    }

    String[] expressionArray = expression.split("~");
    System.out.println(Arrays.toString(expressionArray));

我来晚了,但回到最初的问题,为什么不使用搜索呢?

Pattern p = Pattern.compile("(?<=\\w)(?=\\W)|(?<=\\W)(?=\\w)");
System.out.println(Arrays.toString(p.split("'ab','cd','eg'")));
System.out.println(Arrays.toString(p.split("boo:and:foo")));

输出:

[', ab, ',', cd, ',', eg, ']
[boo, :, and, :, foo]

编辑:您在上面看到的是我运行该代码时命令行上出现的内容,但我现在看到它有点令人困惑。很难跟踪哪些逗号是结果的一部分,哪些是由Arrays.toString()添加的。SO的语法高亮显示也没有帮助。为了让突出显示与我一起工作而不是反对我,下面是我在源代码中声明这些数组的样子:

{ "'", "ab", "','", "cd", "','", "eg", "'" }
{ "boo", ":", "and", ":", "foo" }

我希望这更容易理解。谢谢你的提醒,@finnw。

import java.util.regex.*;
import java.util.LinkedList;

public class Splitter {
    private static final Pattern DEFAULT_PATTERN = Pattern.compile("\\s+");

    private Pattern pattern;
    private boolean keep_delimiters;

    public Splitter(Pattern pattern, boolean keep_delimiters) {
        this.pattern = pattern;
        this.keep_delimiters = keep_delimiters;
    }
    public Splitter(String pattern, boolean keep_delimiters) {
        this(Pattern.compile(pattern==null?"":pattern), keep_delimiters);
    }
    public Splitter(Pattern pattern) { this(pattern, true); }
    public Splitter(String pattern) { this(pattern, true); }
    public Splitter(boolean keep_delimiters) { this(DEFAULT_PATTERN, keep_delimiters); }
    public Splitter() { this(DEFAULT_PATTERN); }

    public String[] split(String text) {
        if (text == null) {
            text = "";
        }

        int last_match = 0;
        LinkedList<String> splitted = new LinkedList<String>();

        Matcher m = this.pattern.matcher(text);

        while (m.find()) {

            splitted.add(text.substring(last_match,m.start()));

            if (this.keep_delimiters) {
                splitted.add(m.group());
            }

            last_match = m.end();
        }

        splitted.add(text.substring(last_match));

        return splitted.toArray(new String[splitted.size()]);
    }

    public static void main(String[] argv) {
        if (argv.length != 2) {
            System.err.println("Syntax: java Splitter <pattern> <text>");
            return;
        }

        Pattern pattern = null;
        try {
            pattern = Pattern.compile(argv[0]);
        }
        catch (PatternSyntaxException e) {
            System.err.println(e);
            return;
        }

        Splitter splitter = new Splitter(pattern);

        String text = argv[1];
        int counter = 1;
        for (String part : splitter.split(text)) {
            System.out.printf("Part %d: \"%s\"\n", counter++, part);
        }
    }
}

/*
    Example:
    > java Splitter "\W+" "Hello World!"
    Part 1: "Hello"
    Part 2: " "
    Part 3: "World"
    Part 4: "!"
    Part 5: ""
*/

我不太喜欢另一种方式,前后都有一个空元素。分隔符通常不在字符串的开头或结尾,因此通常会浪费两个良好的数组插槽。

编辑:固定的限制情况。带有测试用例的注释源代码可以在这里找到:http://snippets.dzone.com/posts/show/6453