我试图写一个bash脚本测试,需要一个参数,并通过curl发送到网站。我需要url编码的值,以确保特殊字符被正确处理。最好的方法是什么?
这是我到目前为止的基本脚本:
#!/bin/bash
host=${1:?'bad host'}
value=$2
shift
shift
curl -v -d "param=${value}" http://${host}/somepath $@
我试图写一个bash脚本测试,需要一个参数,并通过curl发送到网站。我需要url编码的值,以确保特殊字符被正确处理。最好的方法是什么?
这是我到目前为止的基本脚本:
#!/bin/bash
host=${1:?'bad host'}
value=$2
shift
shift
curl -v -d "param=${value}" http://${host}/somepath $@
当前回答
有什么能比javascript更好地解析url呢?
node -p "encodeURIComponent('$url')"
其他回答
下面是纯BASH的答案。
更新:由于已经讨论了许多变化,我已经把这个放在https://github.com/sfinktah/bash/blob/master/rawurlencode.inc.sh上,任何人都可以发布一个PR。
注意:这个解决方案并不打算编码unicode或多字节字符——这完全超出了BASH的简单本机功能。它仅用于编码那些在POST或GET请求中会破坏参数传递的符号,例如。'&', '='等等。
非常重要的提示:不要尝试用任何语言编写你自己的UNICODE转换函数。见答案结尾。
rawurlencode() {
local string="${1}"
local strlen=${#string}
local encoded=""
local pos c o
for (( pos=0 ; pos<strlen ; pos++ )); do
c=${string:$pos:1}
case "$c" in
[-_.~a-zA-Z0-9] ) o="${c}" ;;
* ) printf -v o '%%%02x' "'$c"
esac
encoded+="${o}"
done
echo "${encoded}" # You can either set a return variable (FASTER)
REPLY="${encoded}" #+or echo the result (EASIER)... or both... :p
}
你可以在两种情况下使用它:
easier: echo http://url/q?=$( rawurlencode "$args" )
faster: rawurlencode "$args"; echo http://url/q?${REPLY}
(编辑)
下面是匹配的rawurldecode()函数,它非常棒。
# Returns a string in which the sequences with percent (%) signs followed by
# two hex digits have been replaced with literal characters.
rawurldecode() {
# This is perhaps a risky gambit, but since all escape characters must be
# encoded, we can replace %NN with \xNN and pass the lot to printf -b, which
# will decode hex for us
printf -v REPLY '%b' "${1//%/\\x}" # You can either set a return variable (FASTER)
echo "${REPLY}" #+or echo the result (EASIER)... or both... :p
}
有了匹配集,我们现在可以执行一些简单的测试:
$ diff rawurlencode.inc.sh \
<( rawurldecode "$( rawurlencode "$( cat rawurlencode.inc.sh )" )" ) \
&& echo Matched
Output: Matched
如果你真的觉得你需要一个外部工具(好吧,它会快得多,可能会处理二进制文件之类的…)我在我的OpenWRT路由器上发现了这个…
replace_value=$(echo $replace_value | sed -f /usr/lib/ddns/url_escape.sed)
url_escape的地方。Sed是一个包含以下规则的文件:
# sed url escaping
s:%:%25:g
s: :%20:g
s:<:%3C:g
s:>:%3E:g
s:#:%23:g
s:{:%7B:g
s:}:%7D:g
s:|:%7C:g
s:\\:%5C:g
s:\^:%5E:g
s:~:%7E:g
s:\[:%5B:g
s:\]:%5D:g
s:`:%60:g
s:;:%3B:g
s:/:%2F:g
s:?:%3F:g
s^:^%3A^g
s:@:%40:g
s:=:%3D:g
s:&:%26:g
s:\$:%24:g
s:\!:%21:g
s:\*:%2A:g
虽然在BASH中编写这样一个能够处理UTF-8输入的脚本(可能使用xxd和一个非常长的规则集)并不是不可能的,但是有更快更可靠的方法。试图将UTF-8解码为UTF-32是一项非常重要的任务,要做到准确,尽管很容易做得不准确,以至于您认为它有效,直到有一天它不起作用。
甚至Unicode联盟在发现样本代码不再100%与实际标准兼容后,也删除了样本代码。
Unicode标准在不断发展,变得非常微妙。您可以拼凑的任何实现都不会具有适当的遵从性,如果您通过一些极端的努力来管理它,它也不会保持遵从性。
你可以在perl中模拟javascript的encodeURIComponent。下面是命令:
perl -pe 's/([^a-zA-Z0-9_.!~*()'\''-])/sprintf("%%%02X", ord($1))/ge'
你可以在.bash_profile中设置它为bash别名:
alias encodeURIComponent='perl -pe '\''s/([^a-zA-Z0-9_.!~*()'\''\'\'''\''-])/sprintf("%%%02X",ord($1))/ge'\'
现在你可以管道到encodeURIComponent:
$ echo -n 'hèllo wôrld!' | encodeURIComponent
h%C3%A8llo%20w%C3%B4rld!
url=$(echo "$1" | sed -e 's/%/%25/g' -e 's/ /%20/g' -e 's/!/%21/g' -e 's/"/%22/g' -e 's/#/%23/g' -e 's/\$/%24/g' -e 's/\&/%26/g' -e 's/'\''/%27/g' -e 's/(/%28/g' -e 's/)/%29/g' -e 's/\*/%2a/g' -e 's/+/%2b/g' -e 's/,/%2c/g' -e 's/-/%2d/g' -e 's/\./%2e/g' -e 's/\//%2f/g' -e 's/:/%3a/g' -e 's/;/%3b/g' -e 's//%3e/g' -e 's/?/%3f/g' -e 's/@/%40/g' -e 's/\[/%5b/g' -e 's/\\/%5c/g' -e 's/\]/%5d/g' -e 's/\^/%5e/g' -e 's/_/%5f/g' -e 's/`/%60/g' -e 's/{/%7b/g' -e 's/|/%7c/g' -e 's/}/%7d/g' -e 's/~/%7e/g')
这将对$1中的字符串进行编码,并将其输出到$url中。尽管你不需要把它放在var中。顺便说一句,没有包括sed for选项卡,认为它会把它变成空格
Python 3基于@sandro在2010年的好答案:
echo "Test & /me" | python -c "import urllib.parse;print (urllib.parse.quote(input()))"
测试% 20% 26% 20 /我
awk版本的直接链接:http://www.shelldorado.com/scripts/cmds/urlencode 我用了很多年了,效果很好
:
##########################################################################
# Title : urlencode - encode URL data
# Author : Heiner Steven (heiner.steven@odn.de)
# Date : 2000-03-15
# Requires : awk
# Categories : File Conversion, WWW, CGI
# SCCS-Id. : @(#) urlencode 1.4 06/10/29
##########################################################################
# Description
# Encode data according to
# RFC 1738: "Uniform Resource Locators (URL)" and
# RFC 1866: "Hypertext Markup Language - 2.0" (HTML)
#
# This encoding is used i.e. for the MIME type
# "application/x-www-form-urlencoded"
#
# Notes
# o The default behaviour is not to encode the line endings. This
# may not be what was intended, because the result will be
# multiple lines of output (which cannot be used in an URL or a
# HTTP "POST" request). If the desired output should be one
# line, use the "-l" option.
#
# o The "-l" option assumes, that the end-of-line is denoted by
# the character LF (ASCII 10). This is not true for Windows or
# Mac systems, where the end of a line is denoted by the two
# characters CR LF (ASCII 13 10).
# We use this for symmetry; data processed in the following way:
# cat | urlencode -l | urldecode -l
# should (and will) result in the original data
#
# o Large lines (or binary files) will break many AWK
# implementations. If you get the message
# awk: record `...' too long
# record number xxx
# consider using GNU AWK (gawk).
#
# o urlencode will always terminate it's output with an EOL
# character
#
# Thanks to Stefan Brozinski for pointing out a bug related to non-standard
# locales.
#
# See also
# urldecode
##########################################################################
PN=`basename "$0"` # Program name
VER='1.4'
: ${AWK=awk}
Usage () {
echo >&2 "$PN - encode URL data, $VER
usage: $PN [-l] [file ...]
-l: encode line endings (result will be one line of output)
The default is to encode each input line on its own."
exit 1
}
Msg () {
for MsgLine
do echo "$PN: $MsgLine" >&2
done
}
Fatal () { Msg "$@"; exit 1; }
set -- `getopt hl "$@" 2>/dev/null` || Usage
[ $# -lt 1 ] && Usage # "getopt" detected an error
EncodeEOL=no
while [ $# -gt 0 ]
do
case "$1" in
-l) EncodeEOL=yes;;
--) shift; break;;
-h) Usage;;
-*) Usage;;
*) break;; # First file name
esac
shift
done
LANG=C export LANG
$AWK '
BEGIN {
# We assume an awk implementation that is just plain dumb.
# We will convert an character to its ASCII value with the
# table ord[], and produce two-digit hexadecimal output
# without the printf("%02X") feature.
EOL = "%0A" # "end of line" string (encoded)
split ("1 2 3 4 5 6 7 8 9 A B C D E F", hextab, " ")
hextab [0] = 0
for ( i=1; i<=255; ++i ) ord [ sprintf ("%c", i) "" ] = i + 0
if ("'"$EncodeEOL"'" == "yes") EncodeEOL = 1; else EncodeEOL = 0
}
{
encoded = ""
for ( i=1; i<=length ($0); ++i ) {
c = substr ($0, i, 1)
if ( c ~ /[a-zA-Z0-9.-]/ ) {
encoded = encoded c # safe character
} else if ( c == " " ) {
encoded = encoded "+" # special handling
} else {
# unsafe character, encode it as a two-digit hex-number
lo = ord [c] % 16
hi = int (ord [c] / 16);
encoded = encoded "%" hextab [hi] hextab [lo]
}
}
if ( EncodeEOL ) {
printf ("%s", encoded EOL)
} else {
print encoded
}
}
END {
#if ( EncodeEOL ) print ""
}
' "$@"