我在一个正则表达式后,将验证一个完整的复杂的英国邮政编码只在输入字符串。所有不常见的邮政编码形式必须包括以及通常。例如:
匹配
CW3 9不锈钢 SE5 0EG SE50EG Se5 0eg WC2H 7LT
不匹配
aWC2H 7LT WC2H 7LTa WC2H
我怎么解决这个问题?
我在一个正则表达式后,将验证一个完整的复杂的英国邮政编码只在输入字符串。所有不常见的邮政编码形式必须包括以及通常。例如:
匹配
CW3 9不锈钢 SE5 0EG SE50EG Se5 0eg WC2H 7LT
不匹配
aWC2H 7LT WC2H 7LTa WC2H
我怎么解决这个问题?
当前回答
我需要一个可以在SAS中使用PRXMATCH和相关函数的版本,所以我想到了这个:
^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$
测试用例和注意事项:
/*
Notes
The letters QVX are not used in the 1st position.
The letters IJZ are not used in the second position.
The only letters to appear in the third position are ABCDEFGHJKPSTUW when the structure starts with A9A.
The only letters to appear in the fourth position are ABEHMNPRVWXY when the structure starts with AA9A.
The final two letters do not use the letters CIKMOV, so as not to resemble digits or each other when hand-written.
*/
/*
Bits and pieces
1st position (any): [A-PR-UWYZ]
2nd position (if letter): [A-HK-Y]
3rd position (A1A format): [A-HJKPSTUW]
4th position (AA1A format): [ABEHMNPRV-Y]
Last 2 positions: [ABD-HJLNP-UW-Z]
*/
data example;
infile cards truncover;
input valid 1. postcode &$10. Notes &$100.;
flag = prxmatch('/^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$/',strip(postcode));
cards;
1 EC1A 1BB Special case 1
1 W1A 0AX Special case 2
1 M1 1AE Standard format
1 B33 8TH Standard format
1 CR2 6XH Standard format
1 DN55 1PT Standard format
0 QN55 1PT Bad letter in 1st position
0 DI55 1PT Bad letter in 2nd position
0 W1Z 0AX Bad letter in 3rd position
0 EC1Z 1BB Bad letter in 4th position
0 DN55 1CT Bad letter in 2nd group
0 A11A 1AA Invalid digits in 1st group
0 AA11A 1AA 1st group too long
0 AA11 1AAA 2nd group too long
0 AA11 1AAA 2nd group too long
0 AAA 1AA No digit in 1st group
0 AA 1AA No digit in 1st group
0 A 1AA No digit in 1st group
0 1A 1AA Missing letter in 1st group
0 1 1AA Missing letter in 1st group
0 11 1AA Missing letter in 1st group
0 AA1 1A Missing letter in 2nd group
0 AA1 1 Missing letter in 2nd group
;
run;
其他回答
我看了一下上面的一些答案,我不建议使用@Dan的答案(12月15日至10日)中的模式,因为它错误地将近0.4%的有效邮政编码标记为无效,而其他的则没有。
军械测量所提供的代号开放服务包括:
包含英国所有当前邮政编码单位的列表
我使用grep从这些数据中针对完整的邮政编码列表(7月6日至13日)运行了上面的每个正则表达式:
cat CSV/*.csv |
# Strip leading quotes
sed -e 's/^"//g' |
# Strip trailing quote and everything after it
sed -e 's/".*//g' |
# Strip any spaces
sed -E -e 's/ +//g' |
# Find any lines that do not match the expression
grep --invert-match --perl-regexp "$pattern"
邮政编码共有1,686,202个。
以下是与每个$模式不匹配的有效邮政编码的数量:
'^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]?[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$'
# => 6016 (0.36%)
'^(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})$'
# => 0
'^GIR[ ]?0AA|((AB|AL|B|BA|BB|BD|BH|BL|BN|BR|BS|BT|BX|CA|CB|CF|CH|CM|CO|CR|CT|CV|CW|DA|DD|DE|DG|DH|DL|DN|DT|DY|E|EC|EH|EN|EX|FK|FY|G|GL|GY|GU|HA|HD|HG|HP|HR|HS|HU|HX|IG|IM|IP|IV|JE|KA|KT|KW|KY|L|LA|LD|LE|LL|LN|LS|LU|M|ME|MK|ML|N|NE|NG|NN|NP|NR|NW|OL|OX|PA|PE|PH|PL|PO|PR|RG|RH|RM|S|SA|SE|SG|SK|SL|SM|SN|SO|SP|SR|SS|ST|SW|SY|TA|TD|TF|TN|TQ|TR|TS|TW|UB|W|WA|WC|WD|WF|WN|WR|WS|WV|YO|ZE)(\d[\dA-Z]?[ ]?\d[ABD-HJLN-UW-Z]{2}))|BFPO[ ]?\d{1,4}$'
# => 0
当然,这些结果只处理被错误地标记为无效的有效邮政编码。所以:
'^.*$'
# => 0
在过滤无效邮编方面,我并没有说哪种模式是最好的。
在这个列表中添加一个更实用的正则表达式,允许用户输入一个空字符串:
^$|^(([gG][iI][rR] {0,}0[aA]{2})|((([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y]?[0-9][0-9]?)|(([a-pr-uwyzA-PR-UWYZ][0-9][a-hjkstuwA-HJKSTUW])|([a-pr-uwyzA-PR-UWYZ][a-hk-yA-HK-Y][0-9][abehmnprv-yABEHMNPRV-Y]))) {0,1}[0-9][abd-hjlnp-uw-zABD-HJLNP-UW-Z]{2}))$
这个正则表达式允许大写字母和小写字母,中间有可选的空格
从软件开发人员的角度来看,这个正则表达式对于地址可能是可选的软件很有用。例如,如果用户不想提供他们的地址详细信息
我今天做了英国邮政编码验证的正则表达式,据我所知,它适用于所有的英国邮政编码,如果你放一个空格或如果你不放。
^((([a-zA-Z][0-9])|([a-zA-Z][0-9]{2})|([a-zA-Z]{2}[0-9])|([a-zA-Z]{2}[0-9]{2})|([A-Za-z][0-9][a-zA-Z])|([a-zA-Z]{2}[0-9][a-zA-Z]))(\s*[0-9][a-zA-Z]{2})$)
如果有什么格式没有涵盖,请告诉我
^([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {1,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)$
Regular expression to match valid UK postcodes. In the UK postal system not all letters are used in all positions (the same with vehicle registration plates) and there are various rules to govern this. This regex takes into account those rules. Details of the rules: First half of postcode Valid formats [A-Z][A-Z][0-9][A-Z] [A-Z][A-Z][0-9][0-9] [A-Z][0-9][0-9] [A-Z][A-Z][0-9] [A-Z][A-Z][A-Z] [A-Z][0-9][A-Z] [A-Z][0-9] Exceptions Position - First. Contraint - QVX not used Position - Second. Contraint - IJZ not used except in GIR 0AA Position - Third. Constraint - AEHMNPRTVXY only used Position - Forth. Contraint - ABEHMNPRVWXY Second half of postcode Valid formats [0-9][A-Z][A-Z] Exceptions Position - Second and Third. Contraint - CIKMOV not used
http://regexlib.com/REDetails.aspx?regexp_id=260
我使用下面的正则表达式,我已经测试了所有有效的英国邮政编码。它基于推荐的规则,但尽可能地精简,并且没有使用任何特殊语言特定的正则表达式规则。
([A-PR-UWYZ]([A-HK-Y][0-9]([0-9]|[ABEHMNPRV-Y])?|[0-9]([0-9]|[A-HJKPSTUW])?) ?[0-9][ABD-HJLNP-UW-Z]{2})
它假定邮政编码已转换为大写,并且没有前导字符或尾随字符,但在出码和入码之间接受可选空格。
特殊的“GIR0 0AA”邮政编码被排除在外,并且不会生效,因为它不在官方邮局的邮政编码列表中,据我所知,它不会被用作注册地址。如果需要,作为特殊情况添加它应该是微不足道的。