我有一个多行字符串,由一组不同的分隔符分隔:
(Text1)(DelimiterA)(Text2)(DelimiterC)(Text3)(DelimiterB)(Text4)
我可以使用string将这个字符串分割成各个部分。分裂,但似乎我无法获得与分隔符正则表达式匹配的实际字符串。
换句话说,这就是我得到的结果:
Text1
Text2
Text3
Text4
这就是我想要的
Text1
DelimiterA
Text2
DelimiterC
Text3
DelimiterB
Text4
JDK中是否有任何方法可以使用分隔符正则表达式分割字符串,但同时保留分隔符?
另一个使用正则表达式的候选解决方案。保留令牌顺序,正确匹配一行中相同类型的多个令牌。缺点是正则表达式有点讨厌。
package javaapplication2;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class JavaApplication2 {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
String num = "58.5+variable-+98*78/96+a/78.7-3443*12-3";
// Terrifying regex:
// (a)|(b)|(c) match a or b or c
// where
// (a) is one or more digits optionally followed by a decimal point
// followed by one or more digits: (\d+(\.\d+)?)
// (b) is one of the set + * / - occurring once: ([+*/-])
// (c) is a sequence of one or more lowercase latin letter: ([a-z]+)
Pattern tokenPattern = Pattern.compile("(\\d+(\\.\\d+)?)|([+*/-])|([a-z]+)");
Matcher tokenMatcher = tokenPattern.matcher(num);
List<String> tokens = new ArrayList<>();
while (!tokenMatcher.hitEnd()) {
if (tokenMatcher.find()) {
tokens.add(tokenMatcher.group());
} else {
// report error
break;
}
}
System.out.println(tokens);
}
}
样例输出:
[58.5, +, variable, -, +, 98, *, 78, /, 96, +, a, /, 78.7, -, 3443, *, 12, -, 3]
我知道这是一个非常非常古老的问题,答案也被接受了。但我仍然想对最初的问题提出一个非常简单的答案。考虑下面的代码:
String str = "Hello-World:How\nAre You&doing";
inputs = str.split("(?!^)\\b");
for (int i=0; i<inputs.length; i++) {
System.out.println("a[" + i + "] = \"" + inputs[i] + '"');
}
输出:
a[0] = "Hello"
a[1] = "-"
a[2] = "World"
a[3] = ":"
a[4] = "How"
a[5] = "
"
a[6] = "Are"
a[7] = " "
a[8] = "You"
a[9] = "&"
a[10] = "doing"
我只是使用单词边界\b来分隔单词,除非它是文本的开始。
另一个使用正则表达式的候选解决方案。保留令牌顺序,正确匹配一行中相同类型的多个令牌。缺点是正则表达式有点讨厌。
package javaapplication2;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class JavaApplication2 {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
String num = "58.5+variable-+98*78/96+a/78.7-3443*12-3";
// Terrifying regex:
// (a)|(b)|(c) match a or b or c
// where
// (a) is one or more digits optionally followed by a decimal point
// followed by one or more digits: (\d+(\.\d+)?)
// (b) is one of the set + * / - occurring once: ([+*/-])
// (c) is a sequence of one or more lowercase latin letter: ([a-z]+)
Pattern tokenPattern = Pattern.compile("(\\d+(\\.\\d+)?)|([+*/-])|([a-z]+)");
Matcher tokenMatcher = tokenPattern.matcher(num);
List<String> tokens = new ArrayList<>();
while (!tokenMatcher.hitEnd()) {
if (tokenMatcher.find()) {
tokens.add(tokenMatcher.group());
} else {
// report error
break;
}
}
System.out.println(tokens);
}
}
样例输出:
[58.5, +, variable, -, +, 98, *, 78, /, 96, +, a, /, 78.7, -, 3443, *, 12, -, 3]
String expression = "((A+B)*C-D)*E";
expression = expression.replaceAll("\\+", "~+~");
expression = expression.replaceAll("\\*", "~*~");
expression = expression.replaceAll("-", "~-~");
expression = expression.replaceAll("/+", "~/~");
expression = expression.replaceAll("\\(", "~(~"); //also you can use [(] instead of \\(
expression = expression.replaceAll("\\)", "~)~"); //also you can use [)] instead of \\)
expression = expression.replaceAll("~~", "~");
if(expression.startsWith("~")) {
expression = expression.substring(1);
}
String[] expressionArray = expression.split("~");
System.out.println(Arrays.toString(expressionArray));