我试图写一个bash脚本测试,需要一个参数,并通过curl发送到网站。我需要url编码的值,以确保特殊字符被正确处理。最好的方法是什么?

这是我到目前为止的基本脚本:

#!/bin/bash
host=${1:?'bad host'}
value=$2
shift
shift
curl -v -d "param=${value}" http://${host}/somepath $@

当前回答

This is a simpler pure bash/ksh version without the substring logic. Stated differently the other pure shell solutions reparsed the string to get each character (using parameter substitution ${#str} for the lenght and ${str:$i:1} to discover each character). The below method does just one loop over the string to process each character. It is the difference between O(n^2) and O(n). In this answer: https://stackoverflow.com/a/40833433/1344599 Thunderbeef saw ~150x speed improvement on a large text file. This solution is also a shorter oneliner:

while IFS='' read -n 1 c ; do [[ "$c" =~ [A-Za-z0-9.~_-] ]] && printf "$c" || printf '%%%02X' "'$c" ; done

在函数中,你可以使用stdin或形参:

function urlen_stdin {
  while IFS='' read -n 1 c ; do [[ "$c" =~ [A-Za-z0-9.~_-] ]] && printf "$c" || printf '%%%02X' "'$c" ; done
}
function urlen_param {
  printf '%s' "$1" | while IFS='' read -n 1 c ; do [[ "$c" =~ [A-Za-z0-9.~_-] ]] && printf "$c" || printf '%%%02X' "'$c" ; done
}
function urlen_here {
  while IFS='' read -n 1 c ; do [[ "$c" =~ [A-Za-z0-9.~_-] ]] && printf "$c" || printf '%%%02X' "'$c" ; done <<< "$1"
}

#usage: 
echo -n 'hello !@#$%^&*()[]:;{}\/|-_=+.,? world' | urlen_stdin
urlen_param 'hello !@#$%^&*()[]:;{}\/|-_=+.,? world'
urlen_here 'hello !@#$%^&*()[]:;{}\/|-_=+.,? world'
# all methods render:
hello%20%21%40%23%24%25%5E%26%2A%28%29%5B%5D%3A%3B%7B%7D%2F%7C-_%3D%2B.%2C%3F%20world

解释:

IFS= "使空格像普通字符一样 Read -n 1一次读取1个字符 [[=~]]是一个正则表达式比较。如果字符匹配,则遵循&&路径,否则遵循||路径 printf '%%%02X'打印一个%和字符作为零填充长度为2的十六进制代码

其他回答

awk版本的直接链接:http://www.shelldorado.com/scripts/cmds/urlencode 我用了很多年了,效果很好

:
##########################################################################
# Title      :  urlencode - encode URL data
# Author     :  Heiner Steven (heiner.steven@odn.de)
# Date       :  2000-03-15
# Requires   :  awk
# Categories :  File Conversion, WWW, CGI
# SCCS-Id.   :  @(#) urlencode  1.4 06/10/29
##########################################################################
# Description
#   Encode data according to
#       RFC 1738: "Uniform Resource Locators (URL)" and
#       RFC 1866: "Hypertext Markup Language - 2.0" (HTML)
#
#   This encoding is used i.e. for the MIME type
#   "application/x-www-form-urlencoded"
#
# Notes
#    o  The default behaviour is not to encode the line endings. This
#   may not be what was intended, because the result will be
#   multiple lines of output (which cannot be used in an URL or a
#   HTTP "POST" request). If the desired output should be one
#   line, use the "-l" option.
#
#    o  The "-l" option assumes, that the end-of-line is denoted by
#   the character LF (ASCII 10). This is not true for Windows or
#   Mac systems, where the end of a line is denoted by the two
#   characters CR LF (ASCII 13 10).
#   We use this for symmetry; data processed in the following way:
#       cat | urlencode -l | urldecode -l
#   should (and will) result in the original data
#
#    o  Large lines (or binary files) will break many AWK
#       implementations. If you get the message
#       awk: record `...' too long
#        record number xxx
#   consider using GNU AWK (gawk).
#
#    o  urlencode will always terminate it's output with an EOL
#       character
#
# Thanks to Stefan Brozinski for pointing out a bug related to non-standard
# locales.
#
# See also
#   urldecode
##########################################################################

PN=`basename "$0"`          # Program name
VER='1.4'

: ${AWK=awk}

Usage () {
    echo >&2 "$PN - encode URL data, $VER
usage: $PN [-l] [file ...]
    -l:  encode line endings (result will be one line of output)

The default is to encode each input line on its own."
    exit 1
}

Msg () {
    for MsgLine
    do echo "$PN: $MsgLine" >&2
    done
}

Fatal () { Msg "$@"; exit 1; }

set -- `getopt hl "$@" 2>/dev/null` || Usage
[ $# -lt 1 ] && Usage           # "getopt" detected an error

EncodeEOL=no
while [ $# -gt 0 ]
do
    case "$1" in
        -l) EncodeEOL=yes;;
    --) shift; break;;
    -h) Usage;;
    -*) Usage;;
    *)  break;;         # First file name
    esac
    shift
done

LANG=C  export LANG
$AWK '
    BEGIN {
    # We assume an awk implementation that is just plain dumb.
    # We will convert an character to its ASCII value with the
    # table ord[], and produce two-digit hexadecimal output
    # without the printf("%02X") feature.

    EOL = "%0A"     # "end of line" string (encoded)
    split ("1 2 3 4 5 6 7 8 9 A B C D E F", hextab, " ")
    hextab [0] = 0
    for ( i=1; i<=255; ++i ) ord [ sprintf ("%c", i) "" ] = i + 0
    if ("'"$EncodeEOL"'" == "yes") EncodeEOL = 1; else EncodeEOL = 0
    }
    {
    encoded = ""
    for ( i=1; i<=length ($0); ++i ) {
        c = substr ($0, i, 1)
        if ( c ~ /[a-zA-Z0-9.-]/ ) {
        encoded = encoded c     # safe character
        } else if ( c == " " ) {
        encoded = encoded "+"   # special handling
        } else {
        # unsafe character, encode it as a two-digit hex-number
        lo = ord [c] % 16
        hi = int (ord [c] / 16);
        encoded = encoded "%" hextab [hi] hextab [lo]
        }
    }
    if ( EncodeEOL ) {
        printf ("%s", encoded EOL)
    } else {
        print encoded
    }
    }
    END {
        #if ( EncodeEOL ) print ""
    }
' "$@"

Python 3基于@sandro在2010年的好答案:

echo "Test & /me" | python -c "import urllib.parse;print (urllib.parse.quote(input()))"

测试% 20% 26% 20 /我

Ruby,为了完整性

value="$(ruby -r cgi -e 'puts CGI.escape(ARGV[0])' "$2")"

为了完整起见,许多使用sed或awk的解决方案只翻译一组特殊的字符,因此代码大小相当大,也不翻译其他应该编码的特殊字符。

urlencode的一个安全方法是对每个字节进行编码——即使是那些允许的字节。

echo -ne 'some random\nbytes' | xxd -plain | tr -d '\n' | sed 's/\(..\)/%\1/g'

XXD在这里小心地将输入处理为字节而不是字符。

编辑:

xxd附带了Debian中的vim-common包,我只是在一个没有安装它的系统上,我不想安装它。另一种选择是使用Debian中的bsdmainutils包中的hexdump。根据下图,bsdmainutils和vim-common应该有相同的可能性被安装:

http://qa.debian.org/popcon-png.php?packages=vim-common%2Cbsdmainutils&show_installed=1&want_legend=1&want_ticks=1

但是这里有一个使用hexdump代替XXD的版本,并且允许避免tr调用:

echo -ne 'some random\nbytes' | hexdump -v -e '/1 "%02x"' | sed 's/\(..\)/%\1/g'

安装php后,我使用这种方式:

URL_ENCODED_DATA=`php -r "echo urlencode('$DATA');"`