我有一个~23000行的SQL转储,其中包含几个数据库的数据价值。我需要提取这个文件的某个部分(即单个数据库的数据),并将其放在一个新文件中。我知道我想要的数据的开始行号和结束行号。

谁知道一个Unix命令(或一系列命令)可以从文件中提取16224到16482行之间的所有行,然后将它们重定向到一个新文件中?


当前回答

我编写了一个小型bash脚本,您可以从命令行运行它,只要您更新PATH以包含它的目录(或者您可以将它放在PATH中已经包含的目录中)。

用法:$ pinch filename起始行结束行

#!/bin/bash
# Display line number ranges of a file to the terminal.
# Usage: $ pinch filename start-line end-line
# By Evan J. Coon

FILENAME=$1
START=$2
END=$3

ERROR="[PINCH ERROR]"

# Check that the number of arguments is 3
if [ $# -lt 3 ]; then
    echo "$ERROR Need three arguments: Filename Start-line End-line"
    exit 1
fi

# Check that the file exists.
if [ ! -f "$FILENAME" ]; then
    echo -e "$ERROR File does not exist. \n\t$FILENAME"
    exit 1
fi

# Check that start-line is not greater than end-line
if [ "$START" -gt "$END" ]; then
    echo -e "$ERROR Start line is greater than End line."
    exit 1
fi

# Check that start-line is positive.
if [ "$START" -lt 0 ]; then
    echo -e "$ERROR Start line is less than 0."
    exit 1
fi

# Check that end-line is positive.
if [ "$END" -lt 0 ]; then
    echo -e "$ERROR End line is less than 0."
    exit 1
fi

NUMOFLINES=$(wc -l < "$FILENAME")

# Check that end-line is not greater than the number of lines in the file.
if [ "$END" -gt "$NUMOFLINES" ]; then
    echo -e "$ERROR End line is greater than number of lines in file."
    exit 1
fi

# The distance from the end of the file to end-line
ENDDIFF=$(( NUMOFLINES - END ))

# For larger files, this will run more quickly. If the distance from the
# end of the file to the end-line is less than the distance from the
# start of the file to the start-line, then start pinching from the
# bottom as opposed to the top.
if [ "$START" -lt "$ENDDIFF" ]; then
    < "$FILENAME" head -n $END | tail -n +$START
else
    < "$FILENAME" tail -n +$START | head -n $(( END-START+1 ))
fi

# Success
exit 0

其他回答

使用ruby:

ruby -ne 'puts "#{$.}: #{$_}" if $. >= 32613500 && $. <= 32614500' < GND.rdf > GND.extract.rdf

我们甚至可以在命令行检查:

cat filename|sed 'n1,n2!d' > abc.txt

例如:

cat foo.pl|sed '100,200!d' > abc.txt

Sed -n '16224,16482p' < dump.sql

我已经为sed、perl、head+tail和我自己的awk代码编译了一些最高评级的解决方案,并通过管道关注性能,同时使用LC_ALL=C确保所有候选程序以尽可能快的速度运行,并在两者之间分配2秒的睡眠间隔。

差距是显而易见的:

   abs time    awk/app speed ratio
 ----------------------------------
   0.0672 sec :   1.00x mawk-2
   0.0839 sec :   1.25x gnu-sed
   0.1289 sec :   1.92x perl
   0.2151 sec :   3.20x gnu-head+tail

还没有机会测试这些工具的python或BSD变体。

 (fg && fg && fg && fg) 2>/dev/null; 
 echo;
 ( time ( pvE0 < "${m3t}" 
        | LC_ALL=C  mawk2 '

           BEGIN {  
                     _=10420001-(\
                    __=10420256)^(FS="^$") 
           } _<NR { 
                   print

                   if(__==NR) { exit } 
     
     }' ) | pvE9) | tee >(xxh128sum >&2) | LC_ALL=C gwc -lcm | lgp3 ; 
    sleep 2;
    (fg && fg && fg && fg) 2>/dev/null
    echo; 
    ( time ( pvE0 < "${m3t}" 
           | LC_ALL=C gsed -n '10420001,10420256p;10420256q' 
    
     ) | pvE9 ) |  tee >(xxh128sum >&2) |  LC_ALL=C gwc -lcm  | lgp3 ;
     sleep  2; (fg && fg && fg && fg) 2>/dev/null
     echo
    ( time ( pvE0 < "${m3t}" 
           | LC_ALL=C perl -ne 'print if 10420001..10420256'
    
    ) | pvE9 ) |  tee >(xxh128sum >&2) |  LC_ALL=C gwc -lcm | lgp3 ;
    sleep  2; (fg && fg && fg && fg) 2>/dev/null
    echo
    ( time ( pvE0 < "${m3t}" 
           | LC_ALL=C ghead -n +10420256 
           | LC_ALL=C gtail -n +10420001 
    ) | pvE9 ) |  tee >(xxh128sum >&2) |  LC_ALL=C gwc -lcm  | lgp3 ; 


      in0: 1.51GiB 0:00:00 [2.31GiB/s] [2.31GiB/s] [============> ] 81%            
     out9: 42.5KiB 0:00:00 [64.9KiB/s] [64.9KiB/s] [ <=> ]
( pvE 0.1 in0 < "${m3t}" | LC_ALL=C mawk2 ; )
     
   0.43s user 0.36s system 117% cpu 0.672 total
    256   43487   43487

54313365c2e66a48dc1dc33595716cc8  stdin

     out9: 42.5KiB 0:00:00 [51.7KiB/s] [51.7KiB/s] [ <=> ]
      in0: 1.51GiB 0:00:00 [1.84GiB/s] [1.84GiB/s] [==========> ] 81%            

   ( pvE 0.1 in0 < "${m3t}" |LC_ALL=C gsed -n '10420001,10420256p;10420256q'; )  
  
   0.68s user 0.34s system 121% cpu 0.839 total
    256   43487   43487

54313365c2e66a48dc1dc33595716cc8  stdin


      in0: 1.85GiB 0:00:01 [1.46GiB/s] [1.46GiB/s] [=============>] 100%            
     out9: 42.5KiB 0:00:01 [33.5KiB/s] [33.5KiB/s] [  <=> ]

( pvE 0.1 in0 < "${m3t}" | LC_ALL=C perl -ne 'print if 10420001..10420256'; )
     
   1.10s user 0.44s system 119% cpu 1.289 total
    256   43487   43487

54313365c2e66a48dc1dc33595716cc8  stdin

      in0: 1.51GiB 0:00:02 [ 728MiB/s] [ 728MiB/s] [=============> ] 81%            
     out9: 42.5KiB 0:00:02 [19.9KiB/s] [19.9KiB/s] [ <=> ]

 ( pvE 0.1 in0 < "${m3t}" 
   | LC_ALL=C ghead -n +10420256 
   | LC_ALL=C gtail -n ; )  
  
 1.98s user 1.40s system 157% cpu 2.151 total
 256   43487   43487

54313365c2e66a48dc1dc33595716cc8  stdin

我想从一个使用变量的脚本中做同样的事情,并通过在$变量周围加上引号来分隔变量名和p来实现:

sed -n "$first","$count"p imagelist.txt >"$imageblock"

我想把一个列表分成不同的文件夹,找到最初的问题和答案,这是一个有用的步骤。(分裂命令不是旧操作系统上的选项,我必须将代码移植到)。