我在一个正则表达式后,将验证一个完整的复杂的英国邮政编码只在输入字符串。所有不常见的邮政编码形式必须包括以及通常。例如:
匹配
CW3 9不锈钢 SE5 0EG SE50EG Se5 0eg WC2H 7LT
不匹配
aWC2H 7LT WC2H 7LTa WC2H
我怎么解决这个问题?
我在一个正则表达式后,将验证一个完整的复杂的英国邮政编码只在输入字符串。所有不常见的邮政编码形式必须包括以及通常。例如:
匹配
CW3 9不锈钢 SE5 0EG SE50EG Se5 0eg WC2H 7LT
不匹配
aWC2H 7LT WC2H 7LTa WC2H
我怎么解决这个问题?
当前回答
接受的答案反映了皇家邮政给出的规则,尽管正则表达式中有一个拼写错误。这个错字似乎在gov.uk网站上也有(就像在XML存档页面中一样)。
在格式A9A 9AA中,规则允许在第三个位置出现P字符,而正则表达式不允许这样。正确的正则表达式应该是:
(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][A-HJKPSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})
将其缩短为以下正则表达式(使用Perl/Ruby语法):
(GIR 0AA)|([A-PR-UWYZ](([0-9]([0-9A-HJKPSTUW])?)|([A-HK-Y][0-9]([0-9ABEHMNPRVWXY])?))\s?[0-9][ABD-HJLNP-UW-Z]{2})
它还在第一个和第二个块之间包含一个可选的空格。
其他回答
以下是我们处理英国邮政编码问题的方法:
^([A-Za-z]{1,2}[0-9]{1,2}[A-Za-z]?[ ]?)([0-9]{1}[A-Za-z]{2})$
解释:
期望有1或2个a-z字符,上或下都没问题 预期有1到2个数字 期望0或1个a-z字符,上或下精细 允许使用可选空间 期望1个数字 期望有2个a-z,上下都没问题
这将获得大多数格式,然后我们使用db来验证邮政编码是否真实,该数据由openpoint https://www.ordnancesurvey.co.uk/opendatadownload/products.html驱动
希望这能有所帮助
看看本页的python代码:
http://www.brunningonline.net/simon/blog/archives/001292.html
I've got some postcode parsing to do. The requirement is pretty simple; I have to parse a postcode into an outcode and (optional) incode. The good new is that I don't have to perform any validation - I just have to chop up what I've been provided with in a vaguely intelligent manner. I can't assume much about my import in terms of formatting, i.e. case and embedded spaces. But this isn't the bad news; the bad news is that I have to do it all in RPG. :-( Nevertheless, I threw a little Python function together to clarify my thinking.
我用它来处理邮政编码。
虽然这里有很多答案,但我对其中任何一个都不满意。他们中的大多数只是简单地坏了,太复杂或只是坏了。
我看了@ctwheels的答案,我发现它非常具有解释性和正确性;我们必须为此感谢他。然而,对我来说,如此简单的事情又有太多的“数据”了。
幸运的是,我设法获得了一个数据库,其中仅包含英国的100多万个活动邮政编码,并编写了一个小型PowerShell脚本来测试和基准测试结果。
英国邮政编码规格:有效的邮政编码格式。
这是“我的”正则表达式:
^([a-zA-Z]{1,2}[a-zA-Z\d]{1,2})\s(\d[a-zA-Z]{2})$
简短,简单,甜蜜。即使是最没有经验的人也能明白发生了什么。
解释:
^ asserts position at start of a line
1st Capturing Group ([a-zA-Z]{1,2}[a-zA-Z\d]{1,2})
Match a single character present in the list below [a-zA-Z]
{1,2} matches the previous token between 1 and 2 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Match a single character present in the list below [a-zA-Z\d]
{1,2} matches the previous token between 1 and 2 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
\d matches a digit (equivalent to [0-9])
\s matches any whitespace character (equivalent to [\r\n\t\f\v ])
2nd Capturing Group (\d[a-zA-Z]{2})
\d matches a digit (equivalent to [0-9])
Match a single character present in the list below [a-zA-Z]
{2} matches the previous token exactly 2 times
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
$ asserts position at the end of a line
结果(已核对邮编):
TOTAL OK: 1469193
TOTAL FAILED: 0
-------------------------------------------------------------------------
Days : 0
Hours : 0
Minutes : 5
Seconds : 22
Milliseconds : 718
Ticks : 3227185939
TotalDays : 0.00373516891087963
TotalHours : 0.0896440538611111
TotalMinutes : 5.37864323166667
TotalSeconds : 322.7185939
TotalMilliseconds : 322718.5939
通过经验测试和观察,以及https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation的确认,以下是我的Python正则表达式版本,可以正确地解析和验证英国邮政编码:
UK_POSTCODE_REGEX = r ' (? P < postcode_area > [a - z] {1,2}) (? P <区> (?:[0 - 9]{1,2})| (?:[0 - 9][a - z])) (? P <部门> [0 - 9])(? P <邮编> [a - z]{2})”
这个正则表达式很简单,并且有捕获组。它不包括所有合法的英国邮政编码的验证,而只考虑字母与数字的位置。
下面是我在代码中如何使用它:
@dataclass
class UKPostcode:
postcode_area: str
district: str
sector: int
postcode: str
# https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation
# Original author of this regex: @jontsai
# NOTE TO FUTURE DEVELOPER:
# Verified through empirical testing and observation, as well as confirming with the Wiki article
# If this regex fails to capture all valid UK postcodes, then I apologize, for I am only human.
UK_POSTCODE_REGEX = r'(?P<postcode_area>[A-Z]{1,2})(?P<district>(?:[0-9]{1,2})|(?:[0-9][A-Z]))(?P<sector>[0-9])(?P<postcode>[A-Z]{2})'
@classmethod
def from_postcode(cls, postcode):
"""Parses a string into a UKPostcode
Returns a UKPostcode or None
"""
m = re.match(cls.UK_POSTCODE_REGEX, postcode.replace(' ', ''))
if m:
uk_postcode = UKPostcode(
postcode_area=m.group('postcode_area'),
district=m.group('district'),
sector=m.group('sector'),
postcode=m.group('postcode')
)
else:
uk_postcode = None
return uk_postcode
def parse_uk_postcode(postcode):
"""Wrapper for UKPostcode.from_postcode
"""
uk_postcode = UKPostcode.from_postcode(postcode)
return uk_postcode
下面是单元测试:
@pytest.mark.parametrize(
'postcode, expected', [
# https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation
(
'EC1A1BB',
UKPostcode(
postcode_area='EC',
district='1A',
sector='1',
postcode='BB'
),
),
(
'W1A0AX',
UKPostcode(
postcode_area='W',
district='1A',
sector='0',
postcode='AX'
),
),
(
'M11AE',
UKPostcode(
postcode_area='M',
district='1',
sector='1',
postcode='AE'
),
),
(
'B338TH',
UKPostcode(
postcode_area='B',
district='33',
sector='8',
postcode='TH'
)
),
(
'CR26XH',
UKPostcode(
postcode_area='CR',
district='2',
sector='6',
postcode='XH'
)
),
(
'DN551PT',
UKPostcode(
postcode_area='DN',
district='55',
sector='1',
postcode='PT'
)
)
]
)
def test_parse_uk_postcode(postcode, expected):
uk_postcode = parse_uk_postcode(postcode)
assert(uk_postcode == expected)
下面的方法将检查邮政编码并提供完整的信息
const isValidUKPostcode = postcode => {
try {
postcode = postcode.replace(/\s/g, "");
const fromat = postcode
.toUpperCase()
.match(/^([A-Z]{1,2}\d{1,2}[A-Z]?)\s*(\d[A-Z]{2})$/);
const finalValue = `${fromat[1]} ${fromat[2]}`;
const regex = /^([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([AZa-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z]))))[0-9][A-Za-z]{2})$/i;
return {
isValid: regex.test(postcode),
formatedPostCode: finalValue,
error: false,
message: 'It is a valid postcode'
};
} catch (error) {
return { error: true , message: 'Invalid postcode'};
}
};
console.log(isValidUKPostcode('GU348RR'))
{isValid: true, formattedPostcode: "GU34 8RR", error: false, message: "It is a valid postcode"}
console.log(isValidUKPostcode('sdasd4746asd'))
{error: true, message: "Invalid postcode!"}
valid_postcode('787898523')
result => {error: true, message: "Invalid postcode"}