有没有办法告诉一个字符串是否代表一个整数(例如,'3','-17'但不是'3.14'或'asfasfas')而不使用try/except机制?

is_int('3.14') == False
is_int('-7')   == True

当前回答

检查后将值转换为字符串为整数,然后检查字符串第一个字符值为-或+,其余字符串为数字。最后检查isdigit。 Test = ['1', '12015', '1..]2 ', ' a2kk78”、“1.5”,2,1.24,“-8.5”,“+”、“1”、“88751.71 + 7)

检查

for k,v in enumerate(test): 
    print(k, v, 'test: ', True if isinstance(v, int) is not False else True if str(v)[0] in ['-', '+'] and str(v)[1:].isdigit() else str(v).isdigit())

结果

0 1 test:  True
1 12015 test:  True
2 1..2 test:  False
3 a2kk78 test:  False
4 1.5 test:  False
5 2 test:  True
6 1.24 test:  False
7 -8.5 test:  False
8 +88751.71 test:  False
9 -1 test:  True
10 +7 test:  True

其他回答

正确的RegEx解决方案应该结合Greg Hewgill和Nowell的思想,但不使用全局变量。可以通过将属性附加到方法来实现这一点。另外,我知道在方法中导入是不受欢迎的,但我想要的是像http://peak.telecommunity.com/DevCenter/Importing#lazy-imports这样的“惰性模块”效果

edit:到目前为止,我最喜欢的技术是使用String对象的独占方法。

#!/usr/bin/env python

# Uses exclusively methods of the String object
def isInteger(i):
    i = str(i)
    return i=='0' or (i if i.find('..') > -1 else i.lstrip('-+').rstrip('0').rstrip('.')).isdigit()

# Uses re module for regex
def isIntegre(i):
    import re
    if not hasattr(isIntegre, '_re'):
        print("I compile only once. Remove this line when you are confident in that.")
        isIntegre._re = re.compile(r"[-+]?\d+(\.0*)?$")
    return isIntegre._re.match(str(i)) is not None

# When executed directly run Unit Tests
if __name__ == '__main__':
    for obj in [
                # integers
                0, 1, -1, 1.0, -1.0,
                '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0',
                # non-integers
                1.1, -1.1, '1.1', '-1.1', '+1.1',
                '1.1.1', '1.1.0', '1.0.1', '1.0.0',
                '1.0.', '1..0', '1..',
                '0.0.', '0..0', '0..',
                'one', object(), (1,2,3), [1,2,3], {'one':'two'}
            ]:
        # Notice the integre uses 're' (intended to be humorous)
        integer = ('an integer' if isInteger(obj) else 'NOT an integer')
        integre = ('an integre' if isIntegre(obj) else 'NOT an integre')
        # Make strings look like strings in the output
        if isinstance(obj, str):
            obj = ("'%s'" % (obj,))
        print("%30s is %14s is %14s" % (obj, integer, integre))

对于那些不太喜欢冒险的同学,输出如下:

I compile only once. Remove this line when you are confident in that.
                             0 is     an integer is     an integre
                             1 is     an integer is     an integre
                            -1 is     an integer is     an integre
                           1.0 is     an integer is     an integre
                          -1.0 is     an integer is     an integre
                           '0' is     an integer is     an integre
                          '0.' is     an integer is     an integre
                         '0.0' is     an integer is     an integre
                           '1' is     an integer is     an integre
                          '-1' is     an integer is     an integre
                          '+1' is     an integer is     an integre
                         '1.0' is     an integer is     an integre
                        '-1.0' is     an integer is     an integre
                        '+1.0' is     an integer is     an integre
                           1.1 is NOT an integer is NOT an integre
                          -1.1 is NOT an integer is NOT an integre
                         '1.1' is NOT an integer is NOT an integre
                        '-1.1' is NOT an integer is NOT an integre
                        '+1.1' is NOT an integer is NOT an integre
                       '1.1.1' is NOT an integer is NOT an integre
                       '1.1.0' is NOT an integer is NOT an integre
                       '1.0.1' is NOT an integer is NOT an integre
                       '1.0.0' is NOT an integer is NOT an integre
                        '1.0.' is NOT an integer is NOT an integre
                        '1..0' is NOT an integer is NOT an integre
                         '1..' is NOT an integer is NOT an integre
                        '0.0.' is NOT an integer is NOT an integre
                        '0..0' is NOT an integer is NOT an integre
                         '0..' is NOT an integer is NOT an integre
                         'one' is NOT an integer is NOT an integre
<object object at 0x103b7d0a0> is NOT an integer is NOT an integre
                     (1, 2, 3) is NOT an integer is NOT an integre
                     [1, 2, 3] is NOT an integer is NOT an integre
                {'one': 'two'} is NOT an integer is NOT an integre

检查后将值转换为字符串为整数,然后检查字符串第一个字符值为-或+,其余字符串为数字。最后检查isdigit。 Test = ['1', '12015', '1..]2 ', ' a2kk78”、“1.5”,2,1.24,“-8.5”,“+”、“1”、“88751.71 + 7)

检查

for k,v in enumerate(test): 
    print(k, v, 'test: ', True if isinstance(v, int) is not False else True if str(v)[0] in ['-', '+'] and str(v)[1:].isdigit() else str(v).isdigit())

结果

0 1 test:  True
1 12015 test:  True
2 1..2 test:  False
3 a2kk78 test:  False
4 1.5 test:  False
5 2 test:  True
6 1.24 test:  False
7 -8.5 test:  False
8 +88751.71 test:  False
9 -1 test:  True
10 +7 test:  True

我猜这个问题与速度有关,因为try/except有一个时间惩罚:

测试数据

首先,我创建了一个包含200个字符串、100个失败字符串和100个数字字符串的列表。

from random import shuffle
numbers = [u'+1'] * 100
nonumbers = [u'1abc'] * 100
testlist = numbers + nonumbers
shuffle(testlist)
testlist = np.array(testlist)

numpy解决方案(仅适用于数组和unicode)

Np.core.defchararray.isnumeric也可以用于unicode字符串,但它返回一个数组。所以,如果你必须做成千上万的转换,并且有丢失的数据或非数值数据,这是一个很好的解决方案。

import numpy as np
%timeit np.core.defchararray.isnumeric(testlist)
10000 loops, best of 3: 27.9 µs per loop # 200 numbers per loop

试/除了

def check_num(s):
  try:
    int(s)
    return True
  except:
    return False

def check_list(l):
  return [check_num(e) for e in l]

%timeit check_list(testlist)
1000 loops, best of 3: 217 µs per loop # 200 numbers per loop

numpy解决方案似乎更快。

先决条件:

我们谈论的是整数(不是小数/浮点数); 内置int()的行为是我们的标准(有时很奇怪:“-00”是它的正确输入)

简短的回答:

使用下面的代码。它简单,正确(虽然这个线程中的许多变体不是),并且几乎是try/except和regex变体的两倍。

def is_int_str(string):
    return (
        string.startswith(('-', '+')) and string[1:].isdigit()
    ) or string.isdigit()

TL;博士答:

我已经测试了3个主要变体(1)try/except, (2) re.match()和(3)字符串操作(见上文)。第三个变体比try/except和re.match()快两倍。顺便说一句:regex变体是最慢的!请参见下面的测试脚本。

import re
import time


def test(func, test_suite):
    for test_case in test_suite:
        actual_result = func(*test_case[0])
        expected_result = test_case[1]
        assert (
            actual_result == expected_result
        ), f'Expected: {expected_result} but actual: {actual_result}'


def perf(func, test_suite):
    start = time.time()

    for _ in range(0, 1_000_000):
        test(func, test_suite)

    return time.time() - start


def is_int_str_1(string):
    try:
        int(string)
        return True
    except ValueError:
        return False


def is_int_str_2(string):
    return re.match(r'^[\-+]?\d+$', string) is not None


def is_int_str_3(string):
    return (
        string.startswith(('-', '+')) and string[1:].isdigit()
    ) or string.isdigit()


# Behavior of built-in int() function is a standard for the following tests
test_suite = [
    [['1'], True],  # func('1') -> True
    [['-1'], True],
    [['+1'], True],
    [['--1'], False],
    [['++1'], False],
    [['001'], True],  # because int() can read it
    [['-00'], True],  # because of quite strange behavior of int()
    [['-'], False],
    [['abracadabra'], False],
    [['57938759283475928347592347598357098458405834957984755200000000'], True],
]

time_span_1 = perf(is_int_str_1, test_suite)
time_span_2 = perf(is_int_str_2, test_suite)
time_span_3 = perf(is_int_str_3, test_suite)

print(f'{is_int_str_1.__name__}: {time_span_1} seconds')
print(f'{is_int_str_2.__name__}: {time_span_2} seconds')
print(f'{is_int_str_3.__name__}: {time_span_3} seconds')

输出是:

is_int_str_1: 4.314162969589233 seconds
is_int_str_2: 5.7216269969940186 seconds
is_int_str_3: 2.5828163623809814 seconds

我真的很喜欢Shavais的帖子,但我又添加了一个测试用例(&内置的isdigit()函数):

def isInt_loop(v):
    v = str(v).strip()
    # swapping '0123456789' for '9876543210' makes nominal difference (might have because '1' is toward the beginning of the string)
    numbers = '0123456789'
    for i in v:
        if i not in numbers:
            return False
    return True

def isInt_Digit(v):
    v = str(v).strip()
    return v.isdigit()

而且它一直明显优于其他时代:

timings..
isInt_try:   0.4628
isInt_str:   0.3556
isInt_re:    0.4889
isInt_re2:   0.2726
isInt_loop:   0.1842
isInt_Digit:   0.1577

使用普通2.7 python:

$ python --version
Python 2.7.10

我添加的两个测试用例(isInt_loop和isInt_digit)都通过了完全相同的测试用例(它们都只接受无符号整数),但我认为人们可以更聪明地修改字符串实现(isInt_loop)而不是内置的isdigit()函数,所以我包括了它,尽管执行时间略有不同。(这两种方法都比其他方法好很多,但没有处理额外的东西:“。/ + / -)

此外,我发现有趣的是,regex (isInt_re2方法)在2012年(目前是2018年)由Shavais执行的相同测试中击败了字符串比较。也许正则表达式库已经改进了?