我在一个正则表达式后,将验证一个完整的复杂的英国邮政编码只在输入字符串。所有不常见的邮政编码形式必须包括以及通常。例如:

匹配

CW3 9不锈钢 SE5 0EG SE50EG Se5 0eg WC2H 7LT

不匹配

aWC2H 7LT WC2H 7LTa WC2H

我怎么解决这个问题?


当前回答

我想要一个简单的正则表达式,可以允许太多,但不能拒绝有效的邮政编码。我这样做(输入是一个剥离/修剪的字符串):

/^([a-z0-9]\s*){5,8}$/i

这允许最短的邮政编码,如“L1 8JQ”和最长的邮政编码,如“OL14 5ET”。

因为它最多允许8个字符,如果没有空格,它也将允许不正确的8个字符邮政编码:“OL145ETX”。但是,这是一个简单的正则表达式,当它足够好的时候。

其他回答

我们得到了一个说明:

UK postcodes must be in one of the following forms (with one exception, see below): 
    § A9 9AA 
    § A99 9AA
    § AA9 9AA
    § AA99 9AA
    § A9A 9AA
    § AA9A 9AA
where A represents an alphabetic character and 9 represents a numeric character.
Additional rules apply to alphabetic characters, as follows:
    § The character in position 1 may not be Q, V or X
    § The character in position 2 may not be I, J or Z
    § The character in position 3 may not be I, L, M, N, O, P, Q, R, V, X, Y or Z
    § The character in position 4 may not be C, D, F, G, I, J, K, L, O, Q, S, T, U or Z
    § The characters in the rightmost two positions may not be C, I, K, M, O or V
The one exception that does not follow these general rules is the postcode "GIR 0AA", which is a special valid postcode.

我们想出了这个:

/^([A-PR-UWYZ][A-HK-Y0-9](?:[A-HJKS-UW0-9][ABEHMNPRV-Y0-9]?)?\s*[0-9][ABD-HJLNP-UW-Z]{2}|GIR\s*0AA)$/i

但是注意-这允许组之间有任意数量的空格。

我需要一个可以在SAS中使用PRXMATCH和相关函数的版本,所以我想到了这个:

^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$

测试用例和注意事项:

/* 
Notes
The letters QVX are not used in the 1st position.
The letters IJZ are not used in the second position.
The only letters to appear in the third position are ABCDEFGHJKPSTUW when the structure starts with A9A.
The only letters to appear in the fourth position are ABEHMNPRVWXY when the structure starts with AA9A.
The final two letters do not use the letters CIKMOV, so as not to resemble digits or each other when hand-written.
*/

/*
    Bits and pieces
    1st position (any):         [A-PR-UWYZ]         
    2nd position (if letter):   [A-HK-Y]
    3rd position (A1A format):  [A-HJKPSTUW]
    4th position (AA1A format): [ABEHMNPRV-Y]
    Last 2 positions:           [ABD-HJLNP-UW-Z]    
*/


data example;
infile cards truncover;
input valid 1. postcode &$10. Notes &$100.;
flag = prxmatch('/^[A-PR-UWYZ](([A-HK-Y]?\d\d?)|(\d[A-HJKPSTUW])|([A-HK-Y]\d[ABEHMNPRV-Y]))\s?\d[ABD-HJLNP-UW-Z]{2}$/',strip(postcode));
cards;
1  EC1A 1BB  Special case 1
1  W1A 0AX   Special case 2
1  M1 1AE    Standard format
1  B33 8TH   Standard format
1  CR2 6XH   Standard format
1  DN55 1PT  Standard format
0  QN55 1PT  Bad letter in 1st position
0  DI55 1PT  Bad letter in 2nd position
0  W1Z 0AX   Bad letter in 3rd position
0  EC1Z 1BB  Bad letter in 4th position
0  DN55 1CT  Bad letter in 2nd group
0  A11A 1AA  Invalid digits in 1st group
0  AA11A 1AA  1st group too long
0  AA11 1AAA  2nd group too long
0  AA11 1AAA  2nd group too long
0  AAA 1AA   No digit in 1st group
0  AA 1AA    No digit in 1st group
0  A 1AA     No digit in 1st group
0  1A 1AA    Missing letter in 1st group
0  1 1AA     Missing letter in 1st group
0  11 1AA    Missing letter in 1st group
0  AA1 1A    Missing letter in 2nd group
0  AA1 1     Missing letter in 2nd group
;
run;

根据维基百科的表格

这种模式适用于所有情况

(?:[A-Za-z]\d ?\d[A-Za-z]{2})|(?:[A-Za-z][A-Za-z\d]\d ?\d[A-Za-z]{2})|(?:[A-Za-z]{2}\d{2} ?\d[A-Za-z]{2})|(?:[A-Za-z]\d[A-Za-z] ?\d[A-Za-z]{2})|(?:[A-Za-z]{2}\d[A-Za-z] ?\d[A-Za-z]{2})

当在Android / Java上使用它时,使用\\d

这个允许两边有空格和制表符,以防你不想验证失败,然后在另一边修剪它。

^\s*(([Gg][Ii][Rr] 0[Aa]{2})|((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9]?[A-Za-z])))) {0,1}[0-9][A-Za-z]{2})\s*$)

通过经验测试和观察,以及https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation的确认,以下是我的Python正则表达式版本,可以正确地解析和验证英国邮政编码:

UK_POSTCODE_REGEX = r ' (? P < postcode_area > [a - z] {1,2}) (? P <区> (?:[0 - 9]{1,2})| (?:[0 - 9][a - z])) (? P <部门> [0 - 9])(? P <邮编> [a - z]{2})”

这个正则表达式很简单,并且有捕获组。它不包括所有合法的英国邮政编码的验证,而只考虑字母与数字的位置。

下面是我在代码中如何使用它:

@dataclass
class UKPostcode:
    postcode_area: str
    district: str
    sector: int
    postcode: str

    # https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation
    # Original author of this regex: @jontsai
    # NOTE TO FUTURE DEVELOPER:
    # Verified through empirical testing and observation, as well as confirming with the Wiki article
    # If this regex fails to capture all valid UK postcodes, then I apologize, for I am only human.
    UK_POSTCODE_REGEX = r'(?P<postcode_area>[A-Z]{1,2})(?P<district>(?:[0-9]{1,2})|(?:[0-9][A-Z]))(?P<sector>[0-9])(?P<postcode>[A-Z]{2})'

    @classmethod
    def from_postcode(cls, postcode):
        """Parses a string into a UKPostcode

        Returns a UKPostcode or None
        """
        m = re.match(cls.UK_POSTCODE_REGEX, postcode.replace(' ', ''))

        if m:
            uk_postcode = UKPostcode(
                postcode_area=m.group('postcode_area'),
                district=m.group('district'),
                sector=m.group('sector'),
                postcode=m.group('postcode')
            )
        else:
            uk_postcode = None

        return uk_postcode


def parse_uk_postcode(postcode):
    """Wrapper for UKPostcode.from_postcode
    """
    uk_postcode = UKPostcode.from_postcode(postcode)
    return uk_postcode

下面是单元测试:

@pytest.mark.parametrize(
    'postcode, expected', [
        # https://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom#Validation
        (
            'EC1A1BB',
            UKPostcode(
                postcode_area='EC',
                district='1A',
                sector='1',
                postcode='BB'
            ),
        ),
        (
            'W1A0AX',
            UKPostcode(
                postcode_area='W',
                district='1A',
                sector='0',
                postcode='AX'
            ),
        ),
        (
            'M11AE',
            UKPostcode(
                postcode_area='M',
                district='1',
                sector='1',
                postcode='AE'
            ),
        ),
        (
            'B338TH',
            UKPostcode(
                postcode_area='B',
                district='33',
                sector='8',
                postcode='TH'
            )
        ),
        (
            'CR26XH',
            UKPostcode(
                postcode_area='CR',
                district='2',
                sector='6',
                postcode='XH'
            )
        ),
        (
            'DN551PT',
            UKPostcode(
                postcode_area='DN',
                district='55',
                sector='1',
                postcode='PT'
            )
        )
    ]
)
def test_parse_uk_postcode(postcode, expected):
    uk_postcode = parse_uk_postcode(postcode)
    assert(uk_postcode == expected)