我需要做一个简单的字符串分割,但似乎没有一个函数,我测试的手动方式似乎不起作用。我该怎么做呢?
如果你在Lua中拆分字符串,你应该尝试string.gmatch()或string.sub()方法。如果知道要分割字符串的索引,则使用string.sub()方法;如果要解析字符串以找到要分割字符串的位置,则使用string.gmatch()方法。
使用Lua 5.1参考手册中的string.gmatch()示例:
t = {}
s = "from=world, to=Lua"
for k, v in string.gmatch(s, "(%w+)=(%w+)") do
t[k] = v
end
函数如下:
function split(pString, pPattern)
local Table = {} -- NOTE: use {n = 0} in Lua-5.0
local fpat = "(.-)" .. pPattern
local last_end = 1
local s, e, cap = pString:find(fpat, 1)
while s do
if s ~= 1 or cap ~= "" then
table.insert(Table,cap)
end
last_end = e+1
s, e, cap = pString:find(fpat, last_end)
end
if last_end <= #pString then
cap = pString:sub(last_end)
table.insert(Table, cap)
end
return Table
end
这样称呼它:
list=split(string_to_split,pattern_to_match)
例如:
list=split("1:2:3:4","\:")
更多信息请点击这里: http://lua-users.org/wiki/SplitJoin
就像字符串一样。Gmatch将查找字符串中的模式,这个函数将查找模式之间的内容:
function string:split(pat)
pat = pat or '%s+'
local st, g = 1, self:gmatch("()("..pat..")")
local function getter(segs, seps, sep, cap1, ...)
st = sep and seps + #sep
return self:sub(segs, (seps or 0) - 1), cap1 or sep, ...
end
return function() if st then return getter(st, g()) end end
end
默认情况下,它返回由空格分隔的任何内容。
如果你只想遍历这些令牌,这是非常简洁的:
line = "one, two and 3!"
for token in string.gmatch(line, "[^%s]+") do
print(token)
end
输出:
一个, 两个 而且 3!
简单解释:“[^%s]+”模式匹配空格字符之间的每个非空字符串。
你可以使用这个方法:
function string:split(delimiter)
local result = { }
local from = 1
local delim_from, delim_to = string.find( self, delimiter, from )
while delim_from do
table.insert( result, string.sub( self, from , delim_from-1 ) )
from = delim_to + 1
delim_from, delim_to = string.find( self, delimiter, from )
end
table.insert( result, string.sub( self, from ) )
return result
end
delimiter = string.split(stringtodelimite,pattern)
我有一个非常简单的解决办法。使用gmatch()函数捕获包含除所需分隔符以外的至少一个字符的字符串。分隔符默认为任何空格(Lua中的%s):
function mysplit (inputstr, sep)
if sep == nil then
sep = "%s"
end
local t={}
for str in string.gmatch(inputstr, "([^"..sep.."]+)") do
table.insert(t, str)
end
return t
end
我喜欢这个简短的解决方案
function split(s, delimiter)
result = {};
for match in (s..delimiter):gmatch("(.-)"..delimiter) do
table.insert(result, match);
end
return result;
end
因为剥猫皮的方法不止一种,下面是我的方法:
代码:
#!/usr/bin/env lua
local content = [=[
Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
]=]
local function split(str, sep)
local result = {}
local regex = ("([^%s]+)"):format(sep)
for each in str:gmatch(regex) do
table.insert(result, each)
end
return result
end
local lines = split(content, "\n")
for _,line in ipairs(lines) do
print(line)
end
输出: 圣洁圣洁的人, sed做eiusmod时间incididunt ut并dolore麦格纳 aliqua。Ut enim ad minivenim, quis nostrud practice 这一切都是徒劳的结果。
解释:
gmatch函数作为一个迭代器,它获取所有与regex匹配的字符串。正则表达式接受所有字符,直到找到分隔符为止。
我使用上面的例子来创建我自己的函数。但对我来说,缺失的部分是自动逃脱魔法角色。
以下是我的观点:
function split(text, delim)
-- returns an array of fields based on text and delimiter (one character only)
local result = {}
local magic = "().%+-*?[]^$"
if delim == nil then
delim = "%s"
elseif string.find(delim, magic, 1, true) then
-- escape magic
delim = "%"..delim
end
local pattern = "[^"..delim.."]+"
for w in string.gmatch(text, pattern) do
table.insert(result, w)
end
return result
end
只是坐在分隔符上
local str = 'one,two'
local regxEverythingExceptComma = '([^,]+)'
for x in string.gmatch(str, regxEverythingExceptComma) do
print(x)
end
很多答案只接受单字符分隔符,或者不能很好地处理边缘情况(例如空分隔符),所以我认为我将提供一个更明确的解决方案。
这里有两个函数,gsplit和split,改编自Scribunto MediaWiki扩展中的代码,用于维基百科等维基。该代码在GPL v2下获得许可。我已经更改了变量名,并添加了注释,使代码更容易理解,我还更改了代码,使用常规的Lua字符串模式,而不是Scribunto的Unicode字符串模式。原始代码在这里有测试用例。
-- gsplit: iterate over substrings in a string separated by a pattern
--
-- Parameters:
-- text (string) - the string to iterate over
-- pattern (string) - the separator pattern
-- plain (boolean) - if true (or truthy), pattern is interpreted as a plain
-- string, not a Lua pattern
--
-- Returns: iterator
--
-- Usage:
-- for substr in gsplit(text, pattern, plain) do
-- doSomething(substr)
-- end
local function gsplit(text, pattern, plain)
local splitStart, length = 1, #text
return function ()
if splitStart then
local sepStart, sepEnd = string.find(text, pattern, splitStart, plain)
local ret
if not sepStart then
ret = string.sub(text, splitStart)
splitStart = nil
elseif sepEnd < sepStart then
-- Empty separator!
ret = string.sub(text, splitStart, sepStart)
if sepStart < length then
splitStart = sepStart + 1
else
splitStart = nil
end
else
ret = sepStart > splitStart and string.sub(text, splitStart, sepStart - 1) or ''
splitStart = sepEnd + 1
end
return ret
end
end
end
-- split: split a string into substrings separated by a pattern.
--
-- Parameters:
-- text (string) - the string to iterate over
-- pattern (string) - the separator pattern
-- plain (boolean) - if true (or truthy), pattern is interpreted as a plain
-- string, not a Lua pattern
--
-- Returns: table (a sequence table containing the substrings)
local function split(text, pattern, plain)
local ret = {}
for match in gsplit(text, pattern, plain) do
table.insert(ret, match)
end
return ret
end
split函数的一些例子:
local function printSequence(t)
print(unpack(t))
end
printSequence(split('foo, bar,baz', ',%s*')) -- foo bar baz
printSequence(split('foo, bar,baz', ',%s*', true)) -- foo, bar,baz
printSequence(split('foo', '')) -- f o o
一种别人没有的方式
function str_split(str, sep)
if sep == nil then
sep = '%s'
end
local res = {}
local func = function(w)
table.insert(res, w)
end
string.gsub(str, '[^'..sep..']+', func)
return res
end
你可以使用penlight图书馆。它有一个使用分隔符分割字符串的函数,输出列表。
它实现了许多我们在编程时可能需要和Lua中缺少的功能。
下面是使用它的示例。
>
> stringx = require "pl.stringx"
>
> str = "welcome to the world of lua"
>
> arr = stringx.split(str, " ")
>
> arr
{welcome,to,the,world,of,lua}
>
根据用例,这可能是有用的。它将剪切标志两侧的所有文本:
b = "This is a string used for testing"
--Removes unwanted text
c = (b:match("a([^/]+)used"))
print (c)
输出:
string
这个问题已经很晚了,但如果有人想要一个处理分裂数量的版本,你想要.....
-- Split a string into a table using a delimiter and a limit
string.split = function(str, pat, limit)
local t = {}
local fpat = "(.-)" .. pat
local last_end = 1
local s, e, cap = str:find(fpat, 1)
while s do
if s ~= 1 or cap ~= "" then
table.insert(t, cap)
end
last_end = e+1
s, e, cap = str:find(fpat, last_end)
if limit ~= nil and limit <= #t then
break
end
end
if last_end <= #str then
cap = str:sub(last_end)
table.insert(t, cap)
end
return t
end
下面是一个在Lua 4.0中工作的例程,返回inputstr中由sep分隔的子字符串的表t:
function string_split(inputstr, sep)
local inputstr = inputstr .. sep
local idx, inc, t = 0, 1, {}
local idx_prev, substr
repeat
idx_prev = idx
inputstr = strsub(inputstr, idx + 1, -1) -- chop off the beginning of the string containing the match last found by strfind (or initially, nothing); keep the rest (or initially, all)
idx = strfind(inputstr, sep) -- find the 0-based r_index of the first occurrence of separator
if idx == nil then break end -- quit if nothing's found
substr = strsub(inputstr, 0, idx) -- extract the substring occurring before the separator (i.e., data field before the next delimiter)
substr = gsub(substr, "[%c" .. sep .. " ]", "") -- eliminate control characters, separator and spaces
t[inc] = substr -- store the substring (i.e., data field)
inc = inc + 1 -- iterate to next
until idx == nil
return t
end
这个简单的测试
inputstr = "the brown lazy fox jumped over the fat grey hen ... or something."
sep = " "
t = {}
t = string_split(inputstr,sep)
for i=1,15 do
print(i, t[i])
end
收益率:
--> t[1]=the
--> t[2]=brown
--> t[3]=lazy
--> t[4]=fox
--> t[5]=jumped
--> t[6]=over
--> t[7]=the
--> t[8]=fat
--> t[9]=grey
--> t[10]=hen
--> t[11]=...
--> t[12]=or
--> t[13]=something.
我发现许多其他答案都有失败的边缘情况(例如。当给定的字符串包含#,{或}字符,或给定的分隔符,如%,需要转义)。下面是我使用的实现:
local function newsplit(delimiter, str)
assert(type(delimiter) == "string")
assert(#delimiter > 0, "Must provide non empty delimiter")
-- Add escape characters if delimiter requires it
delimiter = delimiter:gsub("[%(%)%.%%%+%-%*%?%[%]%^%$]", "%%%0")
local start_index = 1
local result = {}
while true do
local delimiter_index, _ = str:find(delimiter, start_index)
if delimiter_index == nil then
table.insert(result, str:sub(start_index))
break
end
table.insert(result, str:sub(start_index, delimiter_index - 1))
start_index = delimiter_index + 1
end
return result
end
对于那些从“在Lua中编程”这本书的练习10.1中学习过的人来说,似乎很清楚我们不能使用后面书中解释的概念(迭代器),而且函数应该接受多个字符分隔符。
split()是一个让模式匹配不需要的内容(split)并在空字符串上返回一个空表的技巧。plainSplit()的返回更像其他语言中的split。
magic = "([%%%.%(%)%+%*%?%[%]%^%$])"
function split(str, sep, plain)
if plain then sep = string.gsub(sep, magic, "%%%1") end
local N = '\255'
str = N..str..N
str = string.gsub(str, sep, N..N)
local result = {}
for word in string.gmatch(str, N.."(.-)"..N) do
if word ~= "" then
table.insert(result, word)
end
end
return result
end
function plainSplit(str, sep)
sep = string.gsub(sep, magic, "%%%1")
local result = {}
local start = 0
repeat
start = start + 1
local from, to = string.find(str, sep, start)
from = from and from-1
local word = string.sub(str, start, from, true)
table.insert(result, word)
start = to
until start == nil
return result
end
function tableToString(t)
local ret = "{"
for _, word in ipairs(t) do
ret = ret .. '"' .. word .. '", '
end
ret = string.sub(ret, 1, -3)
ret = ret .. "}"
return #ret > 1 and ret or "{}"
end
function runSplit(func, title, str, sep, plain)
print("\n" .. title)
print("str: '"..str.."'")
print("sep: '"..sep.."'")
local t = func(str, sep, plain)
print("-- t = " .. tableToString(t))
end
print("\n\n\n=== Pattern split ===")
runSplit(split, "Exercice 10.1", "a whole new world", " ")
runSplit(split, "With trailing seperator", " a whole new world ", " ")
runSplit(split, "A word seperator", "a whole new world", " whole ")
runSplit(split, "Pattern seperator", "a1whole2new3world", "%d")
runSplit(split, "Magic characters as plain seperator", "a$.%whole$.%new$.%world", "$.%", true)
runSplit(split, "Control seperator", "a\0whole\1new\2world", "%c")
runSplit(split, "ISO Time", "2020-07-10T15:00:00.000", "[T:%-%.]")
runSplit(split, " === [Fails] with \\255 ===", "a\255whole\0new\0world", "\0", true)
runSplit(split, "How does your function handle empty string?", "", " ")
print("\n\n\n=== Plain split ===")
runSplit(plainSplit, "Exercice 10.1", "a whole new world", " ")
runSplit(plainSplit, "With trailing seperator", " a whole new world ", " ")
runSplit(plainSplit, "A word seperator", "a whole new world", " whole ")
runSplit(plainSplit, "Magic characters as plain seperator", "a$.%whole$.%new$.%world", "$.%")
runSplit(plainSplit, "How does your function handle empty string?", "", " ")
输出
=== Pattern split ===
Exercice 10.1
str: 'a whole new world'
sep: ' '
-- t = {"a", "whole", "new", "world"}
With trailing seperator
str: ' a whole new world '
sep: ' '
-- t = {"a", "whole", "new", "world"}
A word seperator
str: 'a whole new world'
sep: ' whole '
-- t = {"a", "new world"}
Pattern seperator
str: 'a1whole2new3world'
sep: '%d'
-- t = {"a", "whole", "new", "world"}
Magic characters as plain seperator
str: 'a$.%whole$.%new$.%world'
sep: '$.%'
-- t = {"a", "whole", "new", "world"}
Control seperator
str: 'awholenewworld'
sep: '%c'
-- t = {"a", "whole", "new", "world"}
ISO Time
str: '2020-07-10T15:00:00.000'
sep: '[T:%-%.]'
-- t = {"2020", "07", "10", "15", "00", "00", "000"}
=== [Fails] with \255 ===
str: 'a�wholenewworld'
sep: ''
-- t = {"a"}
How does your function handle empty string?
str: ''
sep: ' '
-- t = {}
=== Plain split ===
Exercice 10.1
str: 'a whole new world'
sep: ' '
-- t = {"a", "whole", "new", "world"}
With trailing seperator
str: ' a whole new world '
sep: ' '
-- t = {"", "", "a", "", "whole", "", "", "new", "world", "", ""}
A word seperator
str: 'a whole new world'
sep: ' whole '
-- t = {"a", "new world"}
Magic characters as plain seperator
str: 'a$.%whole$.%new$.%world'
sep: '$.%'
-- t = {"a", "whole", "new", "world"}
How does your function handle empty string?
str: ''
sep: ' '
-- t = {""}
推荐文章
- 在Lua中拆分字符串?
- 如何在Python中按字母顺序排序字符串中的字母
- python: SyntaxError: EOL扫描字符串文字
- PHP子字符串提取。获取第一个'/'之前的字符串或整个字符串
- 双引号vs单引号
- 如何知道一个字符串开始/结束在jQuery特定的字符串?
- 在Swift中根据字符串计算UILabel的大小
- 创建一个可变长度的字符串,用重复字符填充
- 字符串比较:InvariantCultureIgnoreCase vs OrdinalIgnoreCase?
- 在大写字母前加空格
- 如何改变日期时间格式在熊猫
- 为什么字符串在Java中是不可变的?
- 在JavaScript中转换为字符串
- 将string (or char*)转换为wstring (or wchar_t*)
- Java的assertEquals方法可靠吗?