我试图使用sed来清理url行来提取域。

所以从:

http://www.suepearson.co.uk/product/174/71/3816/

我想要:

http://www.suepearson.co.uk/

(不管后面有没有斜杠,都没有关系)

我试过:

 sed 's|\(http:\/\/.*?\/\).*|\1|'

And(转义非贪婪量词)

sed 's|\(http:\/\/.*\?\/\).*|\1|'

但我似乎不能让非贪婪量词(?)工作,所以它总是匹配整个字符串。


当前回答

@Daniel H(关于你对andcoz的回答的评论,虽然是很久以前的事了):删除后面的零

s,([[:digit:]]\.[[:digit:]]*[1-9])[0]*$,\1,g

这是关于清楚地定义匹配条件……

其他回答

echo "/home/one/two/three/myfile.txt" | sed 's|\(.*\)/.*|\1|'

别麻烦了,我在另一个论坛上看到的:)

我知道这是一个旧条目,但有人可能会发现它有用。 由于完整域名的总长度不超过253个字符,请将。*替换为。\{1,255 \}

以下是你可以用两步方法和awk完成的事情:

A=http://www.suepearson.co.uk/product/174/71/3816/  
echo $A|awk '  
{  
  var=gensub(///,"||",3,$0) ;  
  sub(/\|\|.*/,"",var);  
  print var  
}'  

输出: http://www.suepearson.co.uk

希望有帮助!

sed 's|(http:\/\/[^\/]+\/).*|\1|'

使用sed,我通常通过搜索除分隔符以外的任何东西来实现非贪婪搜索,直到分隔符:

echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;\(http://[^/]*\)/.*;\1;p'

输出:

http://www.suon.co.uk

这是:

don't output -n search, match pattern, replace and print s/<pattern>/<replace>/p use ; search command separator instead of / to make it easier to type so s;<pattern>;<replace>;p remember match between brackets \( ... \), later accessible with \1,\2... match http:// followed by anything in brackets [], [ab/] would mean either a or b or / first ^ in [] means not, so followed by anything but the thing in the [] so [^/] means anything except / character * is to repeat previous group so [^/]* means characters except /. so far sed -n 's;\(http://[^/]*\) means search and remember http://followed by any characters except / and remember what you've found we want to search untill the end of domain so stop on the next / so add another / at the end: sed -n 's;\(http://[^/]*\)/' but we want to match the rest of the line after the domain so add .* now the match remembered in group 1 (\1) is the domain so replace matched line with stuff saved in group \1 and print: sed -n 's;\(http://[^/]*\)/.*;\1;p'

如果你想在域名后面加上反斜杠,那么在组中再加一个反斜杠来记住:

echo "http://www.suon.co.uk/product/1/7/3/" | sed -n 's;\(http://[^/]*/\).*;\1;p'

输出:

http://www.suon.co.uk/