我在一个正则表达式后,将验证一个完整的复杂的英国邮政编码只在输入字符串。所有不常见的邮政编码形式必须包括以及通常。例如:

匹配

CW3 9不锈钢 SE5 0EG SE50EG Se5 0eg WC2H 7LT

不匹配

aWC2H 7LT WC2H 7LTa WC2H

我怎么解决这个问题?


当前回答

我建议你看看英国政府的邮政编码数据标准[链接现在死了;XML的存档,参见维基百科的讨论]。这里有关于数据的简要描述,附带的xml模式提供了一个正则表达式。这可能不是你想要的,但会是一个很好的起点。RegEx与XML略有不同,因为给定的定义允许在格式A9A 9AA中第三个位置的P字符。

英国政府提供的正则表达式为:

([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})

正如维基百科讨论中指出的那样,这将允许一些非真实的邮政编码(例如以AA, ZY开头的邮政编码),并且它们确实提供了一个更严格的测试,您可以尝试一下。

其他回答

看看本页的python代码:

http://www.brunningonline.net/simon/blog/archives/001292.html

I've got some postcode parsing to do. The requirement is pretty simple; I have to parse a postcode into an outcode and (optional) incode. The good new is that I don't have to perform any validation - I just have to chop up what I've been provided with in a vaguely intelligent manner. I can't assume much about my import in terms of formatting, i.e. case and embedded spaces. But this isn't the bad news; the bad news is that I have to do it all in RPG. :-( Nevertheless, I threw a little Python function together to clarify my thinking.

我用它来处理邮政编码。

基本规则:

^[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][ABD-HJLNP-UW-Z]{2}$

英国的邮政编码(或称为邮政编码)由5到7个字母数字字符组成,中间用空格隔开。规定哪些角色可以出现在特定位置的规则相当复杂,而且充满了例外。因此,刚才显示的正则表达式遵循基本规则。

完整的规则:

如果你需要一个以牺牲可读性为代价的正则表达式来满足所有的邮政编码规则,这里你可以:

^(?:(?:[A-PR-UWYZ][0-9]{1,2}|[A-PR-UWYZ][A-HK-Y][0-9]{1,2}|[A-PR-UWYZ][0-9][A-HJKSTUW]|[A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRV-Y]) [0-9][ABD-HJLNP-UW-Z]{2}|GIR 0AA)$

来源:https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch04s16.html

在我们的客户数据库中进行了测试,似乎非常准确。

我需要一个可以在SAS中使用PRXMATCH和相关函数的版本,所以我想到了这个:

^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$

测试用例和注意事项:

/* 
Notes
The letters QVX are not used in the 1st position.
The letters IJZ are not used in the second position.
The only letters to appear in the third position are ABCDEFGHJKPSTUW when the structure starts with A9A.
The only letters to appear in the fourth position are ABEHMNPRVWXY when the structure starts with AA9A.
The final two letters do not use the letters CIKMOV, so as not to resemble digits or each other when hand-written.
*/

/*
    Bits and pieces
    1st position (any):         [A-PR-UWYZ]         
    2nd position (if letter):   [A-HK-Y]
    3rd position (A1A format):  [A-HJKPSTUW]
    4th position (AA1A format): [ABEHMNPRV-Y]
    Last 2 positions:           [ABD-HJLNP-UW-Z]    
*/


data example;
infile cards truncover;
input valid 1. postcode &$10. Notes &$100.;
flag = prxmatch('/^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$/',strip(postcode));
cards;
1  EC1A 1BB  Special case 1
1  W1A 0AX   Special case 2
1  M1 1AE    Standard format
1  B33 8TH   Standard format
1  CR2 6XH   Standard format
1  DN55 1PT  Standard format
0  QN55 1PT  Bad letter in 1st position
0  DI55 1PT  Bad letter in 2nd position
0  W1Z 0AX   Bad letter in 3rd position
0  EC1Z 1BB  Bad letter in 4th position
0  DN55 1CT  Bad letter in 2nd group
0  A11A 1AA  Invalid digits in 1st group
0  AA11A 1AA  1st group too long
0  AA11 1AAA  2nd group too long
0  AA11 1AAA  2nd group too long
0  AAA 1AA   No digit in 1st group
0  AA 1AA    No digit in 1st group
0  A 1AA     No digit in 1st group
0  1A 1AA    Missing letter in 1st group
0  1 1AA     Missing letter in 1st group
0  11 1AA    Missing letter in 1st group
0  AA1 1A    Missing letter in 2nd group
0  AA1 1     Missing letter in 2nd group
;
run;

根据维基百科的表格

这种模式适用于所有情况

(?:[A-Za-z]\d ?\d[A-Za-z]{2})|(?:[A-Za-z][A-Za-z\d]\d ?\d[A-Za-z]{2})|(?:[A-Za-z]{2}\d{2} ?\d[A-Za-z]{2})|(?:[A-Za-z]\d[A-Za-z] ?\d[A-Za-z]{2})|(?:[A-Za-z]{2}\d[A-Za-z] ?\d[A-Za-z]{2})

当在Android / Java上使用它时,使用\\d

前半段邮政编码有效格式

[a - z] [a - z][0 - 9]的[a -ž] [a - z] [a - z] [0 - 9] [0 - 9] [a - z] [0 - 9] [0 - 9] [a - z] [a - z] [0 - 9] [a - z] [a - z]的[a -ž] [a - z][0 - 9]的[a -ž] [a - z] [0 - 9]

异常 位置1 - QVX未使用 位置2 -除GIR 0AA外,IJZ不使用 位置3 - AEHMNPRTVXY只使用 位置4 - ABEHMNPRVWXY

邮政编码的后半部分

[0 - 9] [a - z]的[a -ž]

异常 位置2+3 - CIKMOV未使用

记住,不是所有可能的代码都被使用了,所以这个列表是有效代码的必要条件,而不是充分条件。只是匹配所有有效代码的列表可能会更容易?