如何零垫一个序列的整数在bash，使所有有相同的宽度?

我需要循环一些值，

for i in $(seq $first $last)
do
    does something here
done

对于$first和$last，我需要它的固定长度为5。所以如果输入是1，我需要在前面加上0，这样它就变成了00001。例如，它循环到99999，但长度必须是5。

例如:00002,00042,00212,12312等等。

你知道我该怎么做吗?

当前回答

博士TL;

$ seq 1 10 | awk '{printf("%05d\n", $1)}'

输入(模式1。慢):

$ seq 1 10 | xargs -n 1 printf "%05d\n"

输入(模式2。快):

$ seq 1 10 | awk '{printf("%05d\n", $1)}'

输出(每种情况下的结果相同):

解释

我想建议以上的模式。这些实现可以作为命令使用，以便我们可以轻松地再次使用它们。在这些命令中，您需要关心的是转换后的数字的长度。(比如把数字%05d改成%09d。)另外，它也适用于其他解决方案，如以下。这个示例太依赖于我的环境，所以您的输出可能不同，但我认为您可以很容易地看出它的有用性。

$ wc -l * | awk '{printf("%05d\n", $1)}'
00007
00001
00001
00001
00013
00017
00001
00001
00001
00043

就像这样:

$ wc -l * | awk '{printf("%05d\n", $1)}' | sort | uniq
00001
00007
00013
00017
00043

此外，如果以这种方式编写，我们还可以异步执行命令。(我找到了一篇不错的文章: https://www.dataart.com/en/blog/linux-pipes-tips-tricks)

免责声明:我不确定这一点，我不是*nix专家。

性能测试:

超级慢:

$ time seq 1 1000 | xargs -n 1 printf "%09d\n" > test
seq 1 1000  0.00s user 0.00s system 48% cpu 0.008 total
xargs -n 1 printf "%09d\n" > test  1.14s user 2.17s system 84% cpu 3.929 total

相对速度:

for i in {1..1000}
do
   printf "%09d\n" $i
done
$ time sh k.sh > test
sh k.sh > test  0.01s user 0.01s system 74% cpu 0.021 total


for i in {1..1000000}
do
   printf "%09d\n" $i
done
$ time sh k.sh > test
sh k.sh > test  7.10s user 1.52s system 99% cpu 8.669 total

快速:

$ time seq 1 1000 | awk '{printf("%09d\n", $1)}' > test
seq 1 1000  0.00s user 0.00s system 47% cpu 0.008 total
awk '{printf("%09d\n", $1)}' > test  0.00s user 0.00s system 52% cpu 0.009 total


$ time seq 1 1000000 | awk '{printf("%09d\n", $1)}' > test
seq 1 1000000  0.27s user 0.00s system 28% cpu 0.927 total
awk '{printf("%09d\n", $1)}' > test  0.92s user 0.01s system 99% cpu 0.937 total

如果必须实现更高性能的解决方案，可能需要其他技术，而不是使用shell脚本。

2022-05-07 19:30:45

其他回答

你可以做得更简单

for i in {00001..99999}; do
  echo $i
done

2013-07-20 09:43:26

像这样使用awk:

awk -v start=1 -v end=10 'BEGIN{for (i=start; i<=end; i++) printf("%05d\n", i)}'

输出:

更新:

作为纯bash的替代方案，你可以这样做来获得相同的输出:

for i in {1..10}
do
   printf "%05d\n" $i
done

通过这种方式，您可以避免使用外部程序seq，该程序在所有*nix版本中都不可用。

2012-01-09 14:20:26

一种不使用外部进程分叉的方法是字符串操作，在一般情况下，它看起来像这样:

#start value
CNT=1

for [whatever iterative loop, seq, cat, find...];do
   # number of 0s is at least the amount of decimals needed, simple concatenation
   TEMP="000000$CNT"
   # for example 6 digits zero padded, get the last 6 character of the string
   echo ${TEMP:(-6)}
   # increment, if the for loop doesn't provide the number directly
   TEMP=$(( TEMP + 1 ))
done

这在WSL上也能很好地工作，在WSL中，分叉是一个非常繁重的操作。我有一个110000个文件列表，使用printf“%06d”$NUM花了1分钟多，上面的解决方案在1秒左右运行。

2019-10-04 21:03:05

使用printf和“%05d”，例如:

printf "%05d" 1

2012-01-09 14:15:22

你不需要awk - seq或jot单独就足够了:

% seq -f '%05.f' 6     # bsd-seq
00001
00002
00003
00004
00005
00006

% gseq -f '%05.f' 6    # gnu-seq
00001
00002
00003
00004
00005
00006

% jot -w '%05.f' 6
00001
00002
00003
00004
00005
00006

......除非你要进入bigint领域:

% gawk -Mbe '

  function __(_,___) {
      return +_<+___?___:_
  }
  BEGIN {
        _+=_^=_<_                 
      ____="%0*.f\n"   
  } {                      
       ___=__($--_, !+$++_)                
     _____=__(++_+--_, length(______=+$NF)) 
     do {                     
        printf(____,_____,___)
     }  while (___++<______) 
                                                       
  }' <<< '999999999999999999996 1000000000000000000003'

0999999999999999999996
0999999999999999999997
0999999999999999999998
0999999999999999999999
1000000000000000000000
1000000000000000000001
1000000000000000000002
1000000000000000000003

——————————————————————————————————————————————————

如果你需要打印出一个巨大的数字范围，那么这种方法可能会更快一点-

在0.049秒内打印出从1到100万的每一个整数，左零填充到9位宽 *警告:我没有多余的时间让它覆盖所有的输入范围::这只是一个接受10次方增量的概念证明

——————————————————————————————————————————————————

 ( time ( LC_ALL=C mawk2 '
 
   function jot(____,_______,_____,_,__,___,______) {
       if(____==(____^!____)) {
           return +____<+_______\
               ? sprintf("%0*.f",_______,____)\
               : +____ 
        }
        _______= (_______-=____=length(____)-\
                 (_=!(_<_)))<+_ \
                 ? "" \
                 : sprintf("%0*.f",_______,!_)
           __=_= (!(__=_+=_+_))(__=(-+--_)+(__+=_)^++_)\
                 (__+=_=(((_--^_--+_++)^++_-_^!_)/_))(__+_)
          _____= "."
     gsub(_____,"\\&&",__)
     ____—-
     do { 
         gsub(_____,__,_)
        _____=_____"." 
     } while(—____)

     gsub(_____,(_______)"&\n",_)
     sub("^[^\n]+[\n]","",_)
     sub(".$",""~"",_______)
     
     return \
     (_)(_______)\
     sprintf("%0*.f",length(_____),__<__)

   } { print jot($1,$2) }' <<< '10000000 9'
  
 ) | pvE9 ) |xxh128sum |ggXy3 | lgp3

 sleep 2
 ( time ( LC_ALL=C jot 1000000 | 
          LC_ALL=C mawk2 '{ printf("%09.f\n", $1) }' 
 
 ) | pvE9 ) |xxh128sum |ggXy3 | lgp3


     out9: 9.54MiB 0:00:00 [ 275MiB/s] [ 275MiB/s] [<=> ]
( LC_ALL=C mawk2  <<< '1000000 9'; )

0.04s user 0.01s system 93% cpu 0.049 total

e0491043bdb4c8bc16769072f3b71f98  stdin


     out9: 9.54MiB 0:00:00 [36.5MiB/s] [36.5MiB/s] [  <=> ]
( LC_ALL=C jot 1000000 | LC_ALL=C mawk2 '{printf("%09.f\n", $1)}'; )

0.43s user 0.01s system 158% cpu 0.275 total

e0491043bdb4c8bc16769072f3b71f98  stdin

当你做了1000万的时候，时差就变得很明显了:

 out9: 95.4MiB 0:00:00 [ 216MiB/s] [ 216MiB/s] [<=> ]
 ( LC_ALL=C mawk2  <<< '10000000 9'; )

 0.38s user 0.06s system 95% cpu 0.458 total

 be3ed6c8e9ee947e5ba4ce51af753663  stdin


 out9: 95.4MiB 0:00:02 [36.3MiB/s] [36.3MiB/s] [ <=> ]
 ( LC_ALL=C jot 10000000 | LC_ALL=C mawk2 '{printf("%09.f\n", $1)}'; )

 4.30s user 0.04s system 164% cpu 2.638 total
 
 be3ed6c8e9ee947e5ba4ce51af753663  stdin




 out9: 95.4MiB 0:00:02 [35.2MiB/s] [35.2MiB/s] [ <=> ]

 ( LC_ALL=C python3 -c '__=1; ___=10**7;

   [ print("{0:09d}".format(_)) for _ in range(__,___+__) ]' 

 ) | pvE9 ) | xxh128sum |ggXy3 | lgp3 ;  )

 2.68s user 0.04s system 99% cpu 2.725 total
 
 be3ed6c8e9ee947e5ba4ce51af753663  stdin

2022-05-09 07:00:50

如何零垫一个序列的整数在bash，使所有有相同的宽度?

推荐文章

最新文章

标签