Java标识符中的“连接字符”是什么?

我正在阅读SCJP，我对这一行有一个问题:

标识符必须以字母、货币字符($)或连接字符如下划线(_)。标识符不能从数字开始!

它规定有效的标识符名称可以以下划线等连接字符开头。我以为下划线是唯一有效的选项?还有其他连接字符吗?

遍历整个65k个字符，并询问Character.isJavaIdentifierStart(c)。答案是:"undertie"十进制8255

2012-08-02 08:57:36

下面是连接字符的列表。这些字是用来连接单词的。

http://www.fileformat.info/info/unicode/category/Pc/list.htm

U+005F _ LOW LINE
U+203F ‿ UNDERTIE
U+2040 ⁀ CHARACTER TIE
U+2054 ⁔ INVERTED UNDERTIE
U+FE33 ︳ PRESENTATION FORM FOR VERTICAL LOW LINE
U+FE34 ︴ PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
U+FE4D ﹍ DASHED LOW LINE
U+FE4E ﹎ CENTRELINE LOW LINE
U+FE4F ﹏ WAVY LOW LINE
U+FF3F ＿ FULLWIDTH LOW LINE

它在Java 7上编译。

int _, ‿, ⁀, ⁔, ︳, ︴, ﹍, ﹎, ﹏, ＿;

一个例子。在本例中，tp是列的名称和给定行的值。

Column<Double> ︴tp︴ = table.getColumn("tp", double.class);

double tp = row.getDouble(︴tp︴);

以下

for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++)
    if (Character.isJavaIdentifierStart(i) && !Character.isAlphabetic(i))
        System.out.print((char) i + " ");
}

打印

$ _ ¢ £ ¤ ¥ ؋৲৳৻૱௹฿៛‿⁀⁔₠₡₢₣₤₥₦₧₨₩₪₫€₭₮₯₰₱₲₳₴₵₶₷₸₹꠸﷼︳︴﹍﹎﹏$ _￠￡￥￦

2012-08-02 08:59:03

这里是Unicode中的连接器字符列表。你不会在你的键盘上找到它们。

U+ 005f低线_ U+ 203f底边系‿ U+2040字符领带⁀ U+2054反向内搭⁔ U+ fe33表示形式为垂直低线︳ U+ fe34表示形式为垂直波浪低线︴ U+ fe4d虚线低﹍ U+ fe4e中心线低线 U+ fe4f波浪低线﹏ U+ ff3f全宽低线_

2012-08-02 08:59:37

合法Java标识符的最终规范可以在Java语言规范中找到。

2012-08-02 08:59:54

连接字符用于连接两个字符。

在Java中，连接字符是character。方法(int codePoint) /字符。getType(char ch)返回一个等于Character.CONNECTOR_PUNCTUATION的值。

请注意，在Java中，字符信息基于Unicode标准，该标准通过为连接字符分配一般类别Pc来标识它们，Pc是Connector_Punctuation的别名。

下面的代码片段，

for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++) {
    if (Character.getType(i) == Character.CONNECTOR_PUNCTUATION
            && Character.isJavaIdentifierStart(i)) {
        System.out.println("character: " + String.valueOf(Character.toChars(i))
                + ", codepoint: " + i + ", hexcode: " + Integer.toHexString(i));
    }
}

打印可用于在jdk1.6.0_45上开始标识符的连接字符

character: _, codepoint: 95, hexcode: 5f
character: ‿, codepoint: 8255, hexcode: 203f
character: ⁀, codepoint: 8256, hexcode: 2040
character: ⁔, codepoint: 8276, hexcode: 2054
character: ・, codepoint: 12539, hexcode: 30fb
character: ︳, codepoint: 65075, hexcode: fe33
character: ︴, codepoint: 65076, hexcode: fe34
character: ﹍, codepoint: 65101, hexcode: fe4d
character: ﹎, codepoint: 65102, hexcode: fe4e
character: ﹏, codepoint: 65103, hexcode: fe4f
character: ＿, codepoint: 65343, hexcode: ff3f
character: ･, codepoint: 65381, hexcode: ff65

以下代码在jdk1.6.0_45上编译，

int _, ‿, ⁀, ⁔, ・, ︳, ︴, ﹍, ﹎, ﹏, ＿, ･ = 0;

显然，上述声明无法在jdk1.7.0_80和jdk1.8.0_51上编译以下两个连接字符(向后兼容……哎呀!!)

character: ・, codepoint: 12539, hexcode: 30fb
character: ･, codepoint: 65381, hexcode: ff65

不管怎样，撇开细节不谈，考试只关注基本拉丁字符集。

另外，对于Java中的合法标识符，这里提供了规范。使用Character类api获取更多详细信息。

2015-08-18 07:10:02

你可以在你的标识符中使用的字符列表(而不仅仅是在开头)更有趣:

for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++)
    if (Character.isJavaIdentifierPart(i) && !Character.isAlphabetic(i))
        System.out.print((char) i + " ");

清单如下:

I wanted to post the output, but it's forbidden by the SO spam filter. That's how fun it is!

它包括大多数控制字符!我是说铃铛之类的东西!你可以让你的源代码敲响fn的钟声!或者使用偶尔才会显示的字符，比如软连字符。

2016-06-02 19:45:48

Java标识符中允许的最有趣的字符之一(但在开始时不允许)是名为“Zero Width Non Joiner”的unicode字符(&zwnj;， U+200C, https://en.wikipedia.org/wiki/Zero-width_non-joiner)。

I had this once in a piece of XML inside an attribute value holding a reference to another piece of that XML. Since the ZWNJ is "zero width" it cannot be seen (except when walking along with the cursor, it is displayed right on the character before). It also couldn't be seen in the logfile and/or console output. But it was there all the time: copy & paste into search fields got it and thus did not find the referred position. Typing the (visible part of the) string into the search field however found the referred position. Took me a while to figure this out.

当使用欧洲键盘布局时，输入零宽度非joiner实际上非常容易(太容易了)，至少在其德语变体中，例如。“Europatastatur 2.02”-可以通过AltGr +“.”访问，不幸的是，这两个键在大多数键盘上都是紧挨着的，很容易被意外击中。

回到Java:我想，你可以写一些像这样的代码:

void foo() {
    int i = 1;
    int i‌ = 2;
}

与第二个我附加了一个零宽度非joiner(不能这样做，在上面的代码剪辑在stackoverflow的编辑器)，但这没有工作。IntelliJ(16.3.3)没有抱怨，但JavaC (Java 8)确实抱怨了一个已经定义的标识符——似乎JavaC实际上允许ZWNJ字符作为标识符的一部分，但当使用反射来查看它的功能时，ZWNJ字符被剥离了标识符——像‿这样的字符不会。

2017-02-09 08:37:35

Java标识符中的“连接字符”是什么?

推荐文章

最新文章

标签