我需要做一个简单的字符串分割,但似乎没有一个函数,我测试的手动方式似乎不起作用。我该怎么做呢?


当前回答

因为剥猫皮的方法不止一种,下面是我的方法:

代码:

#!/usr/bin/env lua

local content = [=[
Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna 
aliqua. Ut enim ad minim veniam, quis nostrud exercitation 
ullamco laboris nisi ut aliquip ex ea commodo consequat.
]=]

local function split(str, sep)
   local result = {}
   local regex = ("([^%s]+)"):format(sep)
   for each in str:gmatch(regex) do
      table.insert(result, each)
   end
   return result
end

local lines = split(content, "\n")
for _,line in ipairs(lines) do
   print(line)
end

输出: 圣洁圣洁的人, sed做eiusmod时间incididunt ut并dolore麦格纳 aliqua。Ut enim ad minivenim, quis nostrud practice 这一切都是徒劳的结果。

解释:

gmatch函数作为一个迭代器,它获取所有与regex匹配的字符串。正则表达式接受所有字符,直到找到分隔符为止。

其他回答

对于那些从“在Lua中编程”这本书的练习10.1中学习过的人来说,似乎很清楚我们不能使用后面书中解释的概念(迭代器),而且函数应该接受多个字符分隔符。

split()是一个让模式匹配不需要的内容(split)并在空字符串上返回一个空表的技巧。plainSplit()的返回更像其他语言中的split。

magic = "([%%%.%(%)%+%*%?%[%]%^%$])"

function split(str, sep, plain)
    if plain then sep = string.gsub(sep, magic, "%%%1") end
    
    local N = '\255'
    str = N..str..N
    str = string.gsub(str, sep, N..N)

    local result = {}
    for word in string.gmatch(str, N.."(.-)"..N) do
        if word ~= "" then
            table.insert(result, word)
        end
    end
    return result
end


function plainSplit(str, sep)
    sep = string.gsub(sep, magic, "%%%1")

    local result = {}
    local start = 0
    repeat
        start = start + 1

        local from, to = string.find(str, sep, start)
        from = from and from-1
        
        local word = string.sub(str, start, from, true)
        table.insert(result, word)

        start = to
    until start == nil

    return result
end


function tableToString(t)
    local ret = "{"
    for _, word in ipairs(t) do
        ret = ret .. '"' .. word .. '", '
    end
    ret = string.sub(ret, 1, -3)
    ret = ret .. "}"

    return #ret > 1 and ret or "{}"
end

function runSplit(func, title, str, sep, plain)
    print("\n" .. title)
    print("str: '"..str.."'")
    print("sep: '"..sep.."'")
    local t = func(str, sep, plain)
    print("-- t = " .. tableToString(t))
end



print("\n\n\n=== Pattern split ===")
runSplit(split, "Exercice 10.1", "a whole new world", " ")
runSplit(split, "With trailing seperator", "  a  whole   new world  ", " ")
runSplit(split, "A word seperator", "a whole new world", " whole ")
runSplit(split, "Pattern seperator", "a1whole2new3world", "%d")
runSplit(split, "Magic characters as plain seperator", "a$.%whole$.%new$.%world", "$.%", true)
runSplit(split, "Control seperator", "a\0whole\1new\2world", "%c")
runSplit(split, "ISO Time", "2020-07-10T15:00:00.000", "[T:%-%.]")

runSplit(split, " === [Fails] with \\255 ===", "a\255whole\0new\0world", "\0", true)

runSplit(split, "How does your function handle empty string?", "", " ")



print("\n\n\n=== Plain split ===")
runSplit(plainSplit, "Exercice 10.1", "a whole new world", " ")
runSplit(plainSplit, "With trailing seperator", "  a  whole   new world  ", " ")
runSplit(plainSplit, "A word seperator", "a whole new world", " whole ")
runSplit(plainSplit, "Magic characters as plain seperator", "a$.%whole$.%new$.%world", "$.%")

runSplit(plainSplit, "How does your function handle empty string?", "", " ")

输出

=== Pattern split ===

Exercice 10.1
str: 'a whole new world'
sep: ' '
-- t = {"a", "whole", "new", "world"}

With trailing seperator
str: '  a  whole   new world  '
sep: ' '
-- t = {"a", "whole", "new", "world"}

A word seperator
str: 'a whole new world'
sep: ' whole '
-- t = {"a", "new world"}

Pattern seperator
str: 'a1whole2new3world'
sep: '%d'
-- t = {"a", "whole", "new", "world"}

Magic characters as plain seperator
str: 'a$.%whole$.%new$.%world'
sep: '$.%'
-- t = {"a", "whole", "new", "world"}

Control seperator
str: 'awholenewworld'
sep: '%c'
-- t = {"a", "whole", "new", "world"}

ISO Time
str: '2020-07-10T15:00:00.000'
sep: '[T:%-%.]'
-- t = {"2020", "07", "10", "15", "00", "00", "000"}

 === [Fails] with \255 ===
str: 'a�wholenewworld'
sep: ''
-- t = {"a"}

How does your function handle empty string?
str: ''
sep: ' '
-- t = {}



=== Plain split ===

Exercice 10.1
str: 'a whole new world'
sep: ' '
-- t = {"a", "whole", "new", "world"}

With trailing seperator
str: '  a  whole   new world  '
sep: ' '
-- t = {"", "", "a", "", "whole", "", "", "new", "world", "", ""}

A word seperator
str: 'a whole new world'
sep: ' whole '
-- t = {"a", "new world"}

Magic characters as plain seperator
str: 'a$.%whole$.%new$.%world'
sep: '$.%'
-- t = {"a", "whole", "new", "world"}

How does your function handle empty string?
str: ''
sep: ' '
-- t = {""}

根据用例,这可能是有用的。它将剪切标志两侧的所有文本:

b = "This is a string used for testing"

--Removes unwanted text
c = (b:match("a([^/]+)used"))

print (c)

输出:

string

一种别人没有的方式

function str_split(str, sep)
    if sep == nil then
        sep = '%s'
    end 

    local res = {}
    local func = function(w)
        table.insert(res, w)
    end 

    string.gsub(str, '[^'..sep..']+', func)
    return res 
end

下面是一个在Lua 4.0中工作的例程,返回inputstr中由sep分隔的子字符串的表t:

function string_split(inputstr, sep)
    local inputstr = inputstr .. sep
    local idx, inc, t = 0, 1, {}
    local idx_prev, substr
    repeat 
        idx_prev = idx
        inputstr = strsub(inputstr, idx + 1, -1)    -- chop off the beginning of the string containing the match last found by strfind (or initially, nothing); keep the rest (or initially, all)
        idx = strfind(inputstr, sep)                -- find the 0-based r_index of the first occurrence of separator 
        if idx == nil then break end                -- quit if nothing's found
        substr = strsub(inputstr, 0, idx)           -- extract the substring occurring before the separator (i.e., data field before the next delimiter)
        substr = gsub(substr, "[%c" .. sep .. " ]", "") -- eliminate control characters, separator and spaces
        t[inc] = substr             -- store the substring (i.e., data field)
        inc = inc + 1               -- iterate to next
    until idx == nil
    return t
end

这个简单的测试

inputstr = "the brown lazy fox jumped over the fat grey hen ... or something."
sep = " " 
t = {}
t = string_split(inputstr,sep)
for i=1,15 do
    print(i, t[i])
end

收益率:

--> t[1]=the
--> t[2]=brown
--> t[3]=lazy
--> t[4]=fox
--> t[5]=jumped
--> t[6]=over
--> t[7]=the
--> t[8]=fat
--> t[9]=grey
--> t[10]=hen
--> t[11]=...
--> t[12]=or
--> t[13]=something.

我发现许多其他答案都有失败的边缘情况(例如。当给定的字符串包含#,{或}字符,或给定的分隔符,如%,需要转义)。下面是我使用的实现:

local function newsplit(delimiter, str)
    assert(type(delimiter) == "string")
    assert(#delimiter > 0, "Must provide non empty delimiter")

    -- Add escape characters if delimiter requires it
    delimiter = delimiter:gsub("[%(%)%.%%%+%-%*%?%[%]%^%$]", "%%%0")

    local start_index = 1
    local result = {}

    while true do
       local delimiter_index, _ = str:find(delimiter, start_index)

       if delimiter_index == nil then
          table.insert(result, str:sub(start_index))
          break
       end

       table.insert(result, str:sub(start_index, delimiter_index - 1))

       start_index = delimiter_index + 1
    end

    return result
end