我想通过分隔符“-”将字符串“004-034556”拆分为两个字符串:
part1 = "004";
part2 = "034556";
这意味着第一个字符串将包含“-”之前的字符,第二个字符串将包括“-”之后的字符。
我还想检查字符串中是否有“-”。
我想通过分隔符“-”将字符串“004-034556”拆分为两个字符串:
part1 = "004";
part2 = "034556";
这意味着第一个字符串将包含“-”之前的字符,第二个字符串将包括“-”之后的字符。
我还想检查字符串中是否有“-”。
当前回答
我查看了所有答案,发现所有答案都是第三方许可或基于正则表达式的。
下面是我使用的一个很好的哑实现:
/**
* Separates a string into pieces using
* case-sensitive-non-regex-char-separators.
* <p>
* <code>separate("12-34", '-') = "12", "34"</code><br>
* <code>separate("a-b-", '-') = "a", "b", ""</code>
* <p>
* When the separator is the first character in the string, the first result is
* an empty string. When the separator is the last character in the string the
* last element will be an empty string. One separator after another in the
* string will create an empty.
* <p>
* If no separators are set the source is returned.
* <p>
* This method is very fast, but it does not focus on memory-efficiency. The memory
* consumption is approximately double the size of the string. This method is
* thread-safe but not synchronized.
*
* @param source The string to split, never <code>null</code>.
* @param separator The character to use as splitting.
* @return The mutable array of pieces.
* @throws NullPointerException When the source or separators are <code>null</code>.
*/
public final static String[] separate(String source, char... separator) throws NullPointerException {
String[] resultArray = {};
boolean multiSeparators = separator.length > 1;
if (!multiSeparators) {
if (separator.length == 0) {
return new String[] { source };
}
}
int charIndex = source.length();
int lastSeparator = source.length();
while (charIndex-- > -1) {
if (charIndex < 0 || (multiSeparators ? Arrays.binarySearch(separator, source.charAt(charIndex)) >= 0 : source.charAt(charIndex) == separator[0])) {
String piece = source.substring(charIndex + 1, lastSeparator);
lastSeparator = charIndex;
String[] tmp = new String[resultArray.length + 1];
System.arraycopy(resultArray, 0, tmp, 1, resultArray.length);
tmp[0] = piece;
resultArray = tmp;
}
}
return resultArray;
}
其他回答
使用Java 8:
List<String> stringList = Pattern.compile("-")
.splitAsStream("004-034556")
.collect(Collectors.toList());
stringList.forEach(s -> System.out.println(s));
使用Regex使用多个字符拆分字符串
public class StringSplitTest {
public static void main(String args[]) {
String s = " ;String; String; String; String, String; String;;String;String; String; String; ;String;String;String;String";
//String[] strs = s.split("[,\\s\\;]");
String[] strs = s.split("[,\\;]");
System.out.println("Substrings length:"+strs.length);
for (int i=0; i < strs.length; i++) {
System.out.println("Str["+i+"]:"+strs[i]);
}
}
}
输出:
Substrings length:17
Str[0]:
Str[1]:String
Str[2]: String
Str[3]: String
Str[4]: String
Str[5]: String
Str[6]: String
Str[7]:
Str[8]:String
Str[9]:String
Str[10]: String
Str[11]: String
Str[12]:
Str[13]:String
Str[14]:String
Str[15]:String
Str[16]:String
但不要期望所有JDK版本都有相同的输出。我看到了一个bug,在某些JDK版本中,第一个空字符串被忽略了。此错误在最新的JDK版本中不存在,但在JDK1.7晚期版本和1.8早期版本之间的某些版本中存在。
这些要求为解释留下了空间。我建议写一个方法,
public final static String[] mySplit(final String s)
其封装了该功能。当然,您可以使用String.split(..),如实现的其他答案中所述。
您应该为输入字符串以及期望的结果和行为编写一些单元测试。
优秀的考生应包括:
- "0022-3333"
- "-"
- "5555-"
- "-333"
- "3344-"
- "--"
- ""
- "553535"
- "333-333-33"
- "222--222"
- "222--"
- "--4555"
通过定义相应的测试结果,您可以指定行为。
例如,如果“-333”应在[,333]中返回,或者如果它是一个错误。“333-333-33”是否可以在[333333-33]或[3333-333,33]中分开,或者这是一个错误?等等
我查看了所有答案,发现所有答案都是第三方许可或基于正则表达式的。
下面是我使用的一个很好的哑实现:
/**
* Separates a string into pieces using
* case-sensitive-non-regex-char-separators.
* <p>
* <code>separate("12-34", '-') = "12", "34"</code><br>
* <code>separate("a-b-", '-') = "a", "b", ""</code>
* <p>
* When the separator is the first character in the string, the first result is
* an empty string. When the separator is the last character in the string the
* last element will be an empty string. One separator after another in the
* string will create an empty.
* <p>
* If no separators are set the source is returned.
* <p>
* This method is very fast, but it does not focus on memory-efficiency. The memory
* consumption is approximately double the size of the string. This method is
* thread-safe but not synchronized.
*
* @param source The string to split, never <code>null</code>.
* @param separator The character to use as splitting.
* @return The mutable array of pieces.
* @throws NullPointerException When the source or separators are <code>null</code>.
*/
public final static String[] separate(String source, char... separator) throws NullPointerException {
String[] resultArray = {};
boolean multiSeparators = separator.length > 1;
if (!multiSeparators) {
if (separator.length == 0) {
return new String[] { source };
}
}
int charIndex = source.length();
int lastSeparator = source.length();
while (charIndex-- > -1) {
if (charIndex < 0 || (multiSeparators ? Arrays.binarySearch(separator, source.charAt(charIndex)) >= 0 : source.charAt(charIndex) == separator[0])) {
String piece = source.substring(charIndex + 1, lastSeparator);
lastSeparator = charIndex;
String[] tmp = new String[resultArray.length + 1];
System.arraycopy(resultArray, 0, tmp, 1, resultArray.length);
tmp[0] = piece;
resultArray = tmp;
}
}
return resultArray;
}
直接处理字符串的另一种方法是将正则表达式与捕获组一起使用。这样做的优点是,它可以直接暗示对输入的更复杂的约束。例如,以下命令将字符串拆分为两部分,并确保两者仅由数字组成:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class SplitExample
{
private static Pattern twopart = Pattern.compile("(\\d+)-(\\d+)");
public static void checkString(String s)
{
Matcher m = twopart.matcher(s);
if (m.matches()) {
System.out.println(s + " matches; first part is " + m.group(1) +
", second part is " + m.group(2) + ".");
} else {
System.out.println(s + " does not match.");
}
}
public static void main(String[] args) {
checkString("123-4567");
checkString("foo-bar");
checkString("123-");
checkString("-4567");
checkString("123-4567-890");
}
}
由于模式在本例中是固定的,因此可以预先编译并存储为静态成员(在示例中是在类加载时初始化的)。正则表达式为:
(\d+)-(\d+)
括号表示捕获组;可以通过Match.group()方法访问与正则表达式的该部分匹配的字符串,如图所示。\d匹配一个十进制数字,+表示“匹配一个或多个前一个表达式)。-没有特殊含义,因此只匹配输入中的字符。请注意,当将其写成Java字符串时,需要对反斜杠进行双转义。其他一些示例:
([A-Z]+)-([A-Z]+) // Each part consists of only capital letters
([^-]+)-([^-]+) // Each part consists of characters other than -
([A-Z]{2})-(\d+) // The first part is exactly two capital letters,
// the second consists of digits