我试图写一个bash脚本测试,需要一个参数,并通过curl发送到网站。我需要url编码的值,以确保特殊字符被正确处理。最好的方法是什么?

这是我到目前为止的基本脚本:

#!/bin/bash
host=${1:?'bad host'}
value=$2
shift
shift
curl -v -d "param=${value}" http://${host}/somepath $@

当前回答

awk版本的直接链接:http://www.shelldorado.com/scripts/cmds/urlencode 我用了很多年了,效果很好

:
##########################################################################
# Title      :  urlencode - encode URL data
# Author     :  Heiner Steven (heiner.steven@odn.de)
# Date       :  2000-03-15
# Requires   :  awk
# Categories :  File Conversion, WWW, CGI
# SCCS-Id.   :  @(#) urlencode  1.4 06/10/29
##########################################################################
# Description
#   Encode data according to
#       RFC 1738: "Uniform Resource Locators (URL)" and
#       RFC 1866: "Hypertext Markup Language - 2.0" (HTML)
#
#   This encoding is used i.e. for the MIME type
#   "application/x-www-form-urlencoded"
#
# Notes
#    o  The default behaviour is not to encode the line endings. This
#   may not be what was intended, because the result will be
#   multiple lines of output (which cannot be used in an URL or a
#   HTTP "POST" request). If the desired output should be one
#   line, use the "-l" option.
#
#    o  The "-l" option assumes, that the end-of-line is denoted by
#   the character LF (ASCII 10). This is not true for Windows or
#   Mac systems, where the end of a line is denoted by the two
#   characters CR LF (ASCII 13 10).
#   We use this for symmetry; data processed in the following way:
#       cat | urlencode -l | urldecode -l
#   should (and will) result in the original data
#
#    o  Large lines (or binary files) will break many AWK
#       implementations. If you get the message
#       awk: record `...' too long
#        record number xxx
#   consider using GNU AWK (gawk).
#
#    o  urlencode will always terminate it's output with an EOL
#       character
#
# Thanks to Stefan Brozinski for pointing out a bug related to non-standard
# locales.
#
# See also
#   urldecode
##########################################################################

PN=`basename "$0"`          # Program name
VER='1.4'

: ${AWK=awk}

Usage () {
    echo >&2 "$PN - encode URL data, $VER
usage: $PN [-l] [file ...]
    -l:  encode line endings (result will be one line of output)

The default is to encode each input line on its own."
    exit 1
}

Msg () {
    for MsgLine
    do echo "$PN: $MsgLine" >&2
    done
}

Fatal () { Msg "$@"; exit 1; }

set -- `getopt hl "$@" 2>/dev/null` || Usage
[ $# -lt 1 ] && Usage           # "getopt" detected an error

EncodeEOL=no
while [ $# -gt 0 ]
do
    case "$1" in
        -l) EncodeEOL=yes;;
    --) shift; break;;
    -h) Usage;;
    -*) Usage;;
    *)  break;;         # First file name
    esac
    shift
done

LANG=C  export LANG
$AWK '
    BEGIN {
    # We assume an awk implementation that is just plain dumb.
    # We will convert an character to its ASCII value with the
    # table ord[], and produce two-digit hexadecimal output
    # without the printf("%02X") feature.

    EOL = "%0A"     # "end of line" string (encoded)
    split ("1 2 3 4 5 6 7 8 9 A B C D E F", hextab, " ")
    hextab [0] = 0
    for ( i=1; i<=255; ++i ) ord [ sprintf ("%c", i) "" ] = i + 0
    if ("'"$EncodeEOL"'" == "yes") EncodeEOL = 1; else EncodeEOL = 0
    }
    {
    encoded = ""
    for ( i=1; i<=length ($0); ++i ) {
        c = substr ($0, i, 1)
        if ( c ~ /[a-zA-Z0-9.-]/ ) {
        encoded = encoded c     # safe character
        } else if ( c == " " ) {
        encoded = encoded "+"   # special handling
        } else {
        # unsafe character, encode it as a two-digit hex-number
        lo = ord [c] % 16
        hi = int (ord [c] / 16);
        encoded = encoded "%" hextab [hi] hextab [lo]
        }
    }
    if ( EncodeEOL ) {
        printf ("%s", encoded EOL)
    } else {
        print encoded
    }
    }
    END {
        #if ( EncodeEOL ) print ""
    }
' "$@"

其他回答

在本例中,我需要对主机名进行URL编码。不要问为什么。作为一个极简主义者和Perl爱好者,下面是我想到的。

url_encode()
  {
  echo -n "$1" | perl -pe 's/[^a-zA-Z0-9\/_.~-]/sprintf "%%%02x", ord($&)/ge'
  }

很适合我。

安装php后,我使用这种方式:

URL_ENCODED_DATA=`php -r "echo urlencode('$DATA');"`

你可以在perl中模拟javascript的encodeURIComponent。下面是命令:

perl -pe 's/([^a-zA-Z0-9_.!~*()'\''-])/sprintf("%%%02X", ord($1))/ge'

你可以在.bash_profile中设置它为bash别名:

alias encodeURIComponent='perl -pe '\''s/([^a-zA-Z0-9_.!~*()'\''\'\'''\''-])/sprintf("%%%02X",ord($1))/ge'\'

现在你可以管道到encodeURIComponent:

$ echo -n 'hèllo wôrld!' | encodeURIComponent
h%C3%A8llo%20w%C3%B4rld!

对于我的一个案例,我发现NodeJS url库有最简单的解决方案。当然是YMMV

$ urlencode(){ node -e "console.log(require('url').parse(process.argv.slice(1).join('+')).href)" "$@"; }

$ urlencode "https://example.com?my_database_has=these 'nasty' query strings in it"
https://example.com/?my_database_has=these%20%27nasty%27%20query%20strings%20in%20it

Orwellophile给出了一个很好的答案,它包含了一个纯bash选项(函数rawurlencode),我在我的网站上使用过(基于shell的CGI脚本,响应搜索请求的大量url)。唯一的缺点是高峰期间CPU过高。

我找到了一个改进的解决方案,利用bash的“全局替换”特性。有了这个解决方案,url编码的处理时间快了4倍。解决方案确定要转义的字符,并使用“全局替换”操作符(${var//source/replacement})来处理所有替换。这种速度的提高显然来自于使用bash内部循环,而不是显式循环。

性能:核心i3-8100 3.60Ghz。测试用例:来自堆栈溢出的1000个URL,类似于这个票据:“https://stackoverflow.com/questions/296536/how-to-urlencode-data-for-curl-command”。

现有解决方案:0.807秒 优化方案:0.162秒(5倍加速)

url_encode()
{
    local key="${1}" varname="${2:-_rval}" prefix="${3:-_ENCKEY_}"
    local unsafe=${key//[-_.~a-zA-Z0-9 ]/} 
    local -i key_len=${#unsafe}
    local ch ch1 ch0

    while [ "$unsafe" ] ;do
        ch=${unsafe:0:1}
        ch0="\\$ch"
        printf -v ch1 '%%%02x' "'$ch'" 
        key=${key//$ch0/"$ch1"}
        unsafe=${unsafe//"$ch0"}
    done
    key=${key// /+} 

    REPLY="$key"
    # printf "%s" "$REPLY"
    return 0
}

作为一个次要的额外字符,它使用'+'来编码空格。稍微紧凑的URL。

基准:

function t {
    local key
    for (( i=1 ; i<=$1 ; i++ )) do url_encode "$2" kkk2 ; done
    echo "K=$REPLY"
}

t 1000 "https://stackoverflow.com/questions/296536/how-to-urlencode-data-for-curl-command"