我需要将RFC 3339字符串(如“2008-09-03T20:56:55.450686Z”)解析为Python的datetime类型。

我在Python标准库中找到了strptime,但它不是很方便。

最好的方法是什么?


当前回答

def parseISO8601DateTime(datetimeStr):
    import time
    from datetime import datetime, timedelta

    def log_date_string(when):
        gmt = time.gmtime(when)
        if time.daylight and gmt[8]:
            tz = time.altzone
        else:
            tz = time.timezone
        if tz > 0:
            neg = 1
        else:
            neg = 0
            tz = -tz
        h, rem = divmod(tz, 3600)
        m, rem = divmod(rem, 60)
        if neg:
            offset = '-%02d%02d' % (h, m)
        else:
            offset = '+%02d%02d' % (h, m)

        return time.strftime('%d/%b/%Y:%H:%M:%S ', gmt) + offset

    dt = datetime.strptime(datetimeStr, '%Y-%m-%dT%H:%M:%S.%fZ')
    timestamp = dt.timestamp()
    return dt + timedelta(hours=dt.hour-time.gmtime(timestamp).tm_hour)

注意,如果字符串不以Z结尾,我们可以使用%Z进行解析。

其他回答

只需使用python dateutil模块:

>>> import dateutil.parser as dp
>>> t = '1984-06-02T19:05:00.000Z'
>>> parsed_t = dp.parse(t)
>>> print(parsed_t)
datetime.datetime(1984, 6, 2, 19, 5, tzinfo=tzutc())

文档

因为ISO 8601允许出现许多可选冒号和破折号的变体,基本上是CCYY MM DDThh:MM:ss[Z|(+|-)hh:MM]。如果你想使用strptime,你需要先去掉这些变体。目标是生成utc-datetime对象。如果您只需要一个适用于UTC的Z后缀的基本案例,如2016-06-29T19:36:29.3453Z:

datetime.datetime.strptime(timestamp.translate(None, ':-'), "%Y%m%dT%H%M%S.%fZ")

如果您想处理时区偏移,如2016-06-29T19:36:29.3453-0400或2008-09-03T20:56:55.450686+05:00,请使用以下命令。这些将把所有变体转换成没有变量分隔符的东西,如20080903T205635.450686+0500,使其更一致/更容易解析。

import re
# this regex removes all colons and all 
# dashes EXCEPT for the dash indicating + or - utc offset for the timezone
conformed_timestamp = re.sub(r"[:]|([-](?!((\d{2}[:]\d{2})|(\d{4}))$))", '', timestamp)
datetime.datetime.strptime(conformed_timestamp, "%Y%m%dT%H%M%S.%f%z" )

如果您的系统不支持%z strptime指令(您看到类似ValueError的内容:“z”是格式为“%Y%m%dT%H%m%S.%f%z”的错误指令),则需要手动从z(UTC)偏移时间。注意%z在python版本<3的系统上可能不起作用,因为它依赖于c库支持,而c库支持随系统/python构建类型(例如Jython、Cython等)而变化。

import re
import datetime

# this regex removes all colons and all 
# dashes EXCEPT for the dash indicating + or - utc offset for the timezone
conformed_timestamp = re.sub(r"[:]|([-](?!((\d{2}[:]\d{2})|(\d{4}))$))", '', timestamp)

# split on the offset to remove it. use a capture group to keep the delimiter
split_timestamp = re.split(r"[+|-]",conformed_timestamp)
main_timestamp = split_timestamp[0]
if len(split_timestamp) == 3:
    sign = split_timestamp[1]
    offset = split_timestamp[2]
else:
    sign = None
    offset = None

# generate the datetime object without the offset at UTC time
output_datetime = datetime.datetime.strptime(main_timestamp +"Z", "%Y%m%dT%H%M%S.%fZ" )
if offset:
    # create timedelta based on offset
    offset_delta = datetime.timedelta(hours=int(sign+offset[:-2]), minutes=int(sign+offset[-2:]))
    # offset datetime with timedelta
    output_datetime = output_datetime + offset_delta

要获得与2.X标准库兼容的功能,请尝试:

calendar.timegm(time.strptime(date.split(".")[0]+"UTC", "%Y-%m-%dT%H:%M:%S%Z"))

calendar.timegm是time.mktime缺少的gm版本。

如今,Arrow还可以作为第三方解决方案:

>>> import arrow
>>> date = arrow.get("2008-09-03T20:56:35.450686Z")
>>> date.datetime
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686, tzinfo=tzutc())

如果不想使用dateutil,可以尝试使用以下函数:

def from_utc(utcTime,fmt="%Y-%m-%dT%H:%M:%S.%fZ"):
    """
    Convert UTC time string to time.struct_time
    """
    # change datetime.datetime to time, return time.struct_time type
    return datetime.datetime.strptime(utcTime, fmt)

测试:

from_utc("2007-03-04T21:08:12.123Z")

结果:

datetime.datetime(2007, 3, 4, 21, 8, 12, 123000)