我试图写一个bash脚本测试,需要一个参数,并通过curl发送到网站。我需要url编码的值,以确保特殊字符被正确处理。最好的方法是什么?
这是我到目前为止的基本脚本:
#!/bin/bash
host=${1:?'bad host'}
value=$2
shift
shift
curl -v -d "param=${value}" http://${host}/somepath $@
我试图写一个bash脚本测试,需要一个参数,并通过curl发送到网站。我需要url编码的值,以确保特殊字符被正确处理。最好的方法是什么?
这是我到目前为止的基本脚本:
#!/bin/bash
host=${1:?'bad host'}
value=$2
shift
shift
curl -v -d "param=${value}" http://${host}/somepath $@
当前回答
你可以在perl中模拟javascript的encodeURIComponent。下面是命令:
perl -pe 's/([^a-zA-Z0-9_.!~*()'\''-])/sprintf("%%%02X", ord($1))/ge'
你可以在.bash_profile中设置它为bash别名:
alias encodeURIComponent='perl -pe '\''s/([^a-zA-Z0-9_.!~*()'\''\'\'''\''-])/sprintf("%%%02X",ord($1))/ge'\'
现在你可以管道到encodeURIComponent:
$ echo -n 'hèllo wôrld!' | encodeURIComponent
h%C3%A8llo%20w%C3%B4rld!
其他回答
awk版本的直接链接:http://www.shelldorado.com/scripts/cmds/urlencode 我用了很多年了,效果很好
:
##########################################################################
# Title : urlencode - encode URL data
# Author : Heiner Steven (heiner.steven@odn.de)
# Date : 2000-03-15
# Requires : awk
# Categories : File Conversion, WWW, CGI
# SCCS-Id. : @(#) urlencode 1.4 06/10/29
##########################################################################
# Description
# Encode data according to
# RFC 1738: "Uniform Resource Locators (URL)" and
# RFC 1866: "Hypertext Markup Language - 2.0" (HTML)
#
# This encoding is used i.e. for the MIME type
# "application/x-www-form-urlencoded"
#
# Notes
# o The default behaviour is not to encode the line endings. This
# may not be what was intended, because the result will be
# multiple lines of output (which cannot be used in an URL or a
# HTTP "POST" request). If the desired output should be one
# line, use the "-l" option.
#
# o The "-l" option assumes, that the end-of-line is denoted by
# the character LF (ASCII 10). This is not true for Windows or
# Mac systems, where the end of a line is denoted by the two
# characters CR LF (ASCII 13 10).
# We use this for symmetry; data processed in the following way:
# cat | urlencode -l | urldecode -l
# should (and will) result in the original data
#
# o Large lines (or binary files) will break many AWK
# implementations. If you get the message
# awk: record `...' too long
# record number xxx
# consider using GNU AWK (gawk).
#
# o urlencode will always terminate it's output with an EOL
# character
#
# Thanks to Stefan Brozinski for pointing out a bug related to non-standard
# locales.
#
# See also
# urldecode
##########################################################################
PN=`basename "$0"` # Program name
VER='1.4'
: ${AWK=awk}
Usage () {
echo >&2 "$PN - encode URL data, $VER
usage: $PN [-l] [file ...]
-l: encode line endings (result will be one line of output)
The default is to encode each input line on its own."
exit 1
}
Msg () {
for MsgLine
do echo "$PN: $MsgLine" >&2
done
}
Fatal () { Msg "$@"; exit 1; }
set -- `getopt hl "$@" 2>/dev/null` || Usage
[ $# -lt 1 ] && Usage # "getopt" detected an error
EncodeEOL=no
while [ $# -gt 0 ]
do
case "$1" in
-l) EncodeEOL=yes;;
--) shift; break;;
-h) Usage;;
-*) Usage;;
*) break;; # First file name
esac
shift
done
LANG=C export LANG
$AWK '
BEGIN {
# We assume an awk implementation that is just plain dumb.
# We will convert an character to its ASCII value with the
# table ord[], and produce two-digit hexadecimal output
# without the printf("%02X") feature.
EOL = "%0A" # "end of line" string (encoded)
split ("1 2 3 4 5 6 7 8 9 A B C D E F", hextab, " ")
hextab [0] = 0
for ( i=1; i<=255; ++i ) ord [ sprintf ("%c", i) "" ] = i + 0
if ("'"$EncodeEOL"'" == "yes") EncodeEOL = 1; else EncodeEOL = 0
}
{
encoded = ""
for ( i=1; i<=length ($0); ++i ) {
c = substr ($0, i, 1)
if ( c ~ /[a-zA-Z0-9.-]/ ) {
encoded = encoded c # safe character
} else if ( c == " " ) {
encoded = encoded "+" # special handling
} else {
# unsafe character, encode it as a two-digit hex-number
lo = ord [c] % 16
hi = int (ord [c] / 16);
encoded = encoded "%" hextab [hi] hextab [lo]
}
}
if ( EncodeEOL ) {
printf ("%s", encoded EOL)
} else {
print encoded
}
}
END {
#if ( EncodeEOL ) print ""
}
' "$@"
如果不想依赖Perl,也可以使用sed。这有点混乱,因为每个角色都必须单独转义。用以下内容创建一个文件,并将其命名为urlencode.sed
s/%/%25/g
s/ /%20/g
s/ /%09/g
s/!/%21/g
s/"/%22/g
s/#/%23/g
s/\$/%24/g
s/\&/%26/g
s/'\''/%27/g
s/(/%28/g
s/)/%29/g
s/\*/%2a/g
s/+/%2b/g
s/,/%2c/g
s/-/%2d/g
s/\./%2e/g
s/\//%2f/g
s/:/%3a/g
s/;/%3b/g
s//%3e/g
s/?/%3f/g
s/@/%40/g
s/\[/%5b/g
s/\\/%5c/g
s/\]/%5d/g
s/\^/%5e/g
s/_/%5f/g
s/`/%60/g
s/{/%7b/g
s/|/%7c/g
s/}/%7d/g
s/~/%7e/g
s/ /%09/g
要使用它,请执行以下操作。
STR1=$(echo "https://www.example.com/change&$ ^this to?%checkthe@-functionality" | cut -d\? -f1)
STR2=$(echo "https://www.example.com/change&$ ^this to?%checkthe@-functionality" | cut -d\? -f2)
OUT2=$(echo "$STR2" | sed -f urlencode.sed)
echo "$STR1?$OUT2"
这将把字符串分成需要编码的部分,而没问题的部分,编码需要它的部分,然后缝合在一起。
为了方便起见,你可以把它放在sh脚本中,也许让它带一个参数来编码,把它放在你的路径上,然后你可以调用:
urlencode https://www.exxample.com?isThisFun=HellNo
源
Uni2ascii非常方便:
$ echo -ne '你好世界' | uni2ascii -aJ
%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C
我发现在python中可读性更好:
encoded_value=$(python3 -c "import urllib.parse; print urllib.parse.quote('''$value''')")
三重'确保单引号的值不会有伤害。Urllib在标准库中。它的工作,例如这个疯狂的(现实世界)url:
"http://www.rai.it/dl/audio/" "1264165523944Ho servito il re d'Inghilterra - Puntata 7
对于我的一个案例,我发现NodeJS url库有最简单的解决方案。当然是YMMV
$ urlencode(){ node -e "console.log(require('url').parse(process.argv.slice(1).join('+')).href)" "$@"; }
$ urlencode "https://example.com?my_database_has=these 'nasty' query strings in it"
https://example.com/?my_database_has=these%20%27nasty%27%20query%20strings%20in%20it