我试图使用sed来清理url行来提取域。
所以从:
http://www.suepearson.co.uk/product/174/71/3816/
我想要:
http://www.suepearson.co.uk/
(不管后面有没有斜杠,都没有关系)
我试过:
sed 's|\(http:\/\/.*?\/\).*|\1|'
And(转义非贪婪量词)
sed 's|\(http:\/\/.*\?\/\).*|\1|'
但我似乎不能让非贪婪量词(?)工作,所以它总是匹配整个字符串。
使用sed,我通常通过搜索除分隔符以外的任何东西来实现非贪婪搜索,直到分隔符:
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;\(http://[^/]*\)/.*;\1;p'
输出:
http://www.suon.co.uk
这是:
don't output -n
search, match pattern, replace and print s/<pattern>/<replace>/p
use ; search command separator instead of / to make it easier to type so s;<pattern>;<replace>;p
remember match between brackets \( ... \), later accessible with \1,\2...
match http://
followed by anything in brackets [], [ab/] would mean either a or b or /
first ^ in [] means not, so followed by anything but the thing in the []
so [^/] means anything except / character
* is to repeat previous group so [^/]* means characters except /.
so far sed -n 's;\(http://[^/]*\) means search and remember http://followed by any characters except / and remember what you've found
we want to search untill the end of domain so stop on the next / so add another / at the end: sed -n 's;\(http://[^/]*\)/' but we want to match the rest of the line after the domain so add .*
now the match remembered in group 1 (\1) is the domain so replace matched line with stuff saved in group \1 and print: sed -n 's;\(http://[^/]*\)/.*;\1;p'
如果你想在域名后面加上反斜杠,那么在组中再加一个反斜杠来记住:
echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;\(http://[^/]*/\).*;\1;p'
输出:
http://www.suon.co.uk/
您还应该考虑没有匹配界限的情况。你是否想输出这一行。如果不匹配,我这里的示例不会输出任何内容。
你需要前缀到第三个/,所以选择两次字符串的任何长度不包含/和后面的/,然后字符串的任何长度不包含/,然后匹配/后面的任何字符串,然后打印选择。这个想法适用于任何单个的char delims。
echo http://www.suepearson.co.uk/product/174/71/3816/ | \
sed -nr 's,(([^/]*/){2}[^/]*)/.*,\1,p'
使用sed命令,您可以快速删除前缀或delim选择,如:
echo 'aaa @cee: { "foo":" @cee: " }' | \
sed -r 't x;s/ @cee: /\n/;D;:x'
这比一次吃焦肉快多了。
如果之前匹配成功,跳转到标签。在第一道线/之前加\n。移除到第一个\n。如果添加了\n,则跳转到结束并打印。
如果有开始和结束delim,很容易删除结束delim,直到你到达你想要的第n -2个元素,然后做D技巧,在结束delim后删除,如果不匹配跳转到删除,在开始delim和打印之前删除。这仅在开始/结束分隔成对出现时有效。
echo 'foobar start block #1 end barfoo start block #2 end bazfoo start block #3 end goo start block #4 end faa' | \
sed -r 't x;s/end//;s/end/\n/;D;:x;s/(end).*/\1/;T y;s/.*(start)/\1/;p;:y;d'