我在一个正则表达式后,将验证一个完整的复杂的英国邮政编码只在输入字符串。所有不常见的邮政编码形式必须包括以及通常。例如:
匹配
CW3 9不锈钢 SE5 0EG SE50EG Se5 0eg WC2H 7LT
不匹配
aWC2H 7LT WC2H 7LTa WC2H
我怎么解决这个问题?
我在一个正则表达式后,将验证一个完整的复杂的英国邮政编码只在输入字符串。所有不常见的邮政编码形式必须包括以及通常。例如:
匹配
CW3 9不锈钢 SE5 0EG SE50EG Se5 0eg WC2H 7LT
不匹配
aWC2H 7LT WC2H 7LTa WC2H
我怎么解决这个问题?
当前回答
一个老帖子,但在谷歌的结果仍然很高,所以我认为我应该更新。10月14日的文档将英国邮政编码正则表达式定义为:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([**AZ**a-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
来自:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/359448/4__Bulk_Data_Transfer_-_additional_validation_valid.pdf
该文档还解释了它背后的逻辑。然而,它有一个错误(粗体),也允许小写,虽然合法的是不常见的,所以修改版本:
^(GIR 0AA)|((([A-Z][0-9]{1,2})|(([A-Z][A-HJ-Y][0-9]{1,2})|(([A-Z][0-9][A-Z])|([A-Z][A-HJ-Y][0-9]?[A-Z])))) [0-9][A-Z]{2})$
这适用于新的伦敦邮政编码(例如W1D 5LH),以前的版本没有。
其他回答
根据皇家邮政的程序员指南,检查邮政编码是否为有效格式:
|----------------------------outward code------------------------------| |------inward code-----|
#special↓ α1 α2 AAN AANA AANN AN ANN ANA (α3) N AA
^(GIR 0AA|[A-PR-UWYZ]([A-HK-Y]([0-9][A-Z]?|[1-9][0-9])|[1-9]([0-9]|[A-HJKPSTUW])?) [0-9][ABD-HJLNP-UW-Z]{2})$
uk上的所有邮编都匹配,除了那些不再使用的邮编。
增加一个?空格后,使用不区分大小写的匹配来回答这个问题:
'se50eg'.match(/^(GIR 0AA|[A-PR-UWYZ]([A-HK-Y]([0-9][A-Z]?|[1-9][0-9])|[1-9]([0-9]|[A-HJKPSTUW])?) ?[0-9][ABD-HJLNP-UW-Z]{2})$/ig);
Array [ "se50eg" ]
这里的大多数答案都不能适用于我数据库中的所有邮政编码。我终于找到了一个验证与所有,使用政府提供的新正则表达式:
https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/413338/Bulk_Data_Transfer_-_additional_validation_valid_from_March_2015.pdf
在之前的答案中都没有,所以我把它贴在这里,以防他们把链接拿下来:
^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) [0-9][A-Za-z]{2})$
更新:更新的正则表达式由杰米公牛指出。不确定这是我的错误复制或它是一个错误在政府的正则表达式,链接是现在…
更新:正如ctwheels发现的那样,这个正则表达式与javascript的正则表达式兼容。请参阅他的评论,了解一个适用于pcre (php)风格的评论。
我需要一个可以在SAS中使用PRXMATCH和相关函数的版本,所以我想到了这个:
^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$
测试用例和注意事项:
/*
Notes
The letters QVX are not used in the 1st position.
The letters IJZ are not used in the second position.
The only letters to appear in the third position are ABCDEFGHJKPSTUW when the structure starts with A9A.
The only letters to appear in the fourth position are ABEHMNPRVWXY when the structure starts with AA9A.
The final two letters do not use the letters CIKMOV, so as not to resemble digits or each other when hand-written.
*/
/*
Bits and pieces
1st position (any): [A-PR-UWYZ]
2nd position (if letter): [A-HK-Y]
3rd position (A1A format): [A-HJKPSTUW]
4th position (AA1A format): [ABEHMNPRV-Y]
Last 2 positions: [ABD-HJLNP-UW-Z]
*/
data example;
infile cards truncover;
input valid 1. postcode &$10. Notes &$100.;
flag = prxmatch('/^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$/',strip(postcode));
cards;
1 EC1A 1BB Special case 1
1 W1A 0AX Special case 2
1 M1 1AE Standard format
1 B33 8TH Standard format
1 CR2 6XH Standard format
1 DN55 1PT Standard format
0 QN55 1PT Bad letter in 1st position
0 DI55 1PT Bad letter in 2nd position
0 W1Z 0AX Bad letter in 3rd position
0 EC1Z 1BB Bad letter in 4th position
0 DN55 1CT Bad letter in 2nd group
0 A11A 1AA Invalid digits in 1st group
0 AA11A 1AA 1st group too long
0 AA11 1AAA 2nd group too long
0 AA11 1AAA 2nd group too long
0 AAA 1AA No digit in 1st group
0 AA 1AA No digit in 1st group
0 A 1AA No digit in 1st group
0 1A 1AA Missing letter in 1st group
0 1 1AA Missing letter in 1st group
0 11 1AA Missing letter in 1st group
0 AA1 1A Missing letter in 2nd group
0 AA1 1 Missing letter in 2nd group
;
run;
这是谷歌在i18napis.appspot.com域名上的正则表达式:
GIR[ ]?0AA|((AB|AL|B|BA|BB|BD|BH|BL|BN|BR|BS|BT|BX|CA|CB|CF|CH|CM|CO|CR|CT|CV|CW|DA|DD|DE|DG|DH|DL|DN|DT|DY|E|EC|EH|EN|EX|FK|FY|G|GL|GY|GU|HA|HD|HG|HP|HR|HS|HU|HX|IG|IM|IP|IV|JE|KA|KT|KW|KY|L|LA|LD|LE|LL|LN|LS|LU|M|ME|MK|ML|N|NE|NG|NN|NP|NR|NW|OL|OX|PA|PE|PH|PL|PO|PR|RG|RH|RM|S|SA|SE|SG|SK|SL|SM|SN|SO|SP|SR|SS|ST|SW|SY|TA|TD|TF|TN|TQ|TR|TS|TW|UB|W|WA|WC|WD|WF|WN|WR|WS|WV|YO|ZE)(\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}))|BFPO[ ]?\d{1,4}
虽然这里有很多答案,但我对其中任何一个都不满意。他们中的大多数只是简单地坏了,太复杂或只是坏了。
我看了@ctwheels的答案,我发现它非常具有解释性和正确性;我们必须为此感谢他。然而,对我来说,如此简单的事情又有太多的“数据”了。
幸运的是,我设法获得了一个数据库,其中仅包含英国的100多万个活动邮政编码,并编写了一个小型PowerShell脚本来测试和基准测试结果。
英国邮政编码规格:有效的邮政编码格式。
这是“我的”正则表达式:
^([a-zA-Z]{1,2}[a-zA-Z\d]{1,2})\s(\d[a-zA-Z]{2})$
简短,简单,甜蜜。即使是最没有经验的人也能明白发生了什么。
解释:
^ asserts position at start of a line
1st Capturing Group ([a-zA-Z]{1,2}[a-zA-Z\d]{1,2})
Match a single character present in the list below [a-zA-Z]
{1,2} matches the previous token between 1 and 2 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Match a single character present in the list below [a-zA-Z\d]
{1,2} matches the previous token between 1 and 2 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
\d matches a digit (equivalent to [0-9])
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
2nd Capturing Group (\d[a-zA-Z]{2})
\d matches a digit (equivalent to [0-9])
Match a single character present in the list below [a-zA-Z]
{2} matches the previous token exactly 2 times
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
$ asserts position at the end of a line
结果(已核对邮编):
TOTAL OK: 1469193
TOTAL FAILED: 0
-------------------------------------------------------------------------
Days : 0
Hours : 0
Minutes : 5
Seconds : 22
Milliseconds : 718
Ticks : 3227185939
TotalDays : 0.00373516891087963
TotalHours : 0.0896440538611111
TotalMinutes : 5.37864323166667
TotalSeconds : 322.7185939
TotalMilliseconds : 322718.5939