如何从另一个文件A中删除文件B中出现的行?

我有一个很大的文件a(由电子邮件组成)，每封邮件一行。我还有另一个文件B，其中包含另一组邮件。

我将使用哪个命令从文件A中删除文件B中出现的所有地址。

因此，如果文件A包含:

A
B
C

文件B包含:

B    
D
E

那么文件A应该剩下:

A
C

现在我知道这是一个可能经常被问到的问题，但我只在网上找到一个命令，它给我一个错误的分隔符。

任何帮助都将不胜感激!肯定有人会想出一个聪明的俏皮话，但我不是shell专家。

当前回答

你可以使用Python:

python -c '
lines_to_remove = set()
with open("file B", "r") as f:
    for line in f.readlines():
        lines_to_remove.add(line.strip())

with open("file A", "r") as f:
    for line in [line.strip() for line in f.readlines()]:
        if line not in lines_to_remove:
            print(line)
'

2017-08-10 07:49:59

其他回答

Awk来拯救!

这个解决方案不需要排序的输入。你必须先提供fileB。

awk 'NR==FNR{a[$0];next} !($0 in a)' fileB fileA

A
C

它是如何工作的?

NR==FNR{a[$0];next} idiom is for storing the first file in an associative array as keys for a later "contains" test. NR==FNR is checking whether we're scanning the first file, where the global line counter (NR) equals to the current file line counter (FNR). a[$0] adds the current line to the associative array as key, note that this behaves like a set, where there won't be any duplicate values (keys) !($0 in a) we're now in the next file(s), in is a contains test, here it's checking whether current line is in the set we populated in the first step from the first file, ! negates the condition. What is missing here is the action, which by default is {print} and usually not written explicitly.

注意，这现在可以用来删除列入黑名单的单词。

$ awk '...' badwords allwords > goodwords

只需稍作更改，它就可以清理多个列表并创建清理过的版本。

$ awk 'NR==FNR{a[$0];next} !($0 in a){print > FILENAME".clean"}' bad file1 file2 file3 ...

2015-09-23 19:04:41

要删除两个文件之间的公共行，可以使用grep、comm或join命令。

Grep只适用于小文件。使用-v和-f。

grep -vf file2 file1

这将显示file1中与file2中任何行不匹配的行。

Comm是一个实用命令，用于按词法排序的文件。它以两个文件作为输入，并产生三个文本列作为输出: 只在第一个文件中的行;只在第二个文件中的行;和线在两个文件中。可以使用-1、-2来抑制任何列的打印或相应的-3选项。

comm -1 -3 file2 file1

这将显示file1中与file2中任何行不匹配的行。

最后是join，这是一个执行相等操作的实用命令在指定文件上连接。它的-v选项也允许删除两个文件之间的公共行。

join -v1 -v2 file1 file2

2020-04-27 07:40:18

这是一个使用grep和lynx输出网站并删除导航元素的一行程序!你可以用cat FileA替换lynx，用FileB替换unwanted-elements.txt。

lynx -dump -accept_all_cookies -nolist -width 1000 https://stackoverflow.com/ | grep -Fxvf unwanted-elements.txt

2023-01-08 00:20:34

你可以使用Python:

python -c '
lines_to_remove = set()
with open("file B", "r") as f:
    for line in f.readlines():
        lines_to_remove.add(line.strip())

with open("file A", "r") as f:
    for line in [line.strip() for line in f.readlines()]:
        if line not in lines_to_remove:
            print(line)
'

2017-08-10 07:49:59

grep -Fvxf <删除行> <所有行>

适用于未排序的文件(与comm不同) 维护秩序是POSIX

例子:

cat <<EOF > A
b
1
a
0
01
b
1
EOF

cat <<EOF > B
0
1
EOF

grep -Fvxf B A

输出:

b
a
01
b

解释:

-F:使用文字字符串而不是默认的BRE -x:只考虑匹配整行的匹配 -v:打印不匹配 -f file:从给定文件中获取模式

这种方法在预排序文件上比其他方法慢，因为它更通用。如果速度也很重要，请参阅:查找一个文件中不在另一个文件中的行的快速方法?

下面是一个用于内联操作的快速bash自动化:

remove-lines() (
  remove_lines="$1"
  all_lines="$2"
  tmp_file="$(mktemp)"
  grep -Fvxf "$remove_lines" "$all_lines" > "$tmp_file"
  mv "$tmp_file" "$all_lines"
)

GitHub上游。

用法:

remove-lines lines-to-remove remove-from-this-file

参见:https://unix.stackexchange.com/questions/28158/is-there-a-tool-to-get-the-lines-in-one-file-that-are-not-in-another

2015-08-28 09:37:52

如何从另一个文件A中删除文件B中出现的行?

推荐文章

最新文章

标签