如何从另一个文件A中删除文件B中出现的行?

我有一个很大的文件a(由电子邮件组成)，每封邮件一行。我还有另一个文件B，其中包含另一组邮件。

我将使用哪个命令从文件A中删除文件B中出现的所有地址。

因此，如果文件A包含:

A
B
C

文件B包含:

B    
D
E

那么文件A应该剩下:

A
C

现在我知道这是一个可能经常被问到的问题，但我只在网上找到一个命令，它给我一个错误的分隔符。

任何帮助都将不胜感激!肯定有人会想出一个聪明的俏皮话，但我不是shell专家。

当前回答

你可以使用- diff fileA fileB | grep "^>" | cut -c3- > fileA

这也适用于没有排序的文件。

2018-03-30 10:33:18

其他回答

另一种方法来做同样的事情(也需要排序输入):

join -v 1 fileA fileB

在Bash中，如果文件没有预先排序:

join -v 1 <(sort fileA) <(sort fileB)

2010-12-06 16:37:44

如果文件已经排序(在你的例子中):

comm -23 file1 file2

-23抑制两个文件中的行，或仅在文件2中。如果文件没有排序，那么首先将它们通过sort管道…

点击这里查看手册页

2010-12-06 12:53:24

你可以使用- diff fileA fileB | grep "^>" | cut -c3- > fileA

这也适用于没有排序的文件。

2018-03-30 10:33:18

Awk来拯救!

这个解决方案不需要排序的输入。你必须先提供fileB。

awk 'NR==FNR{a[$0];next} !($0 in a)' fileB fileA

A
C

它是如何工作的?

NR==FNR{a[$0];next} idiom is for storing the first file in an associative array as keys for a later "contains" test. NR==FNR is checking whether we're scanning the first file, where the global line counter (NR) equals to the current file line counter (FNR). a[$0] adds the current line to the associative array as key, note that this behaves like a set, where there won't be any duplicate values (keys) !($0 in a) we're now in the next file(s), in is a contains test, here it's checking whether current line is in the set we populated in the first step from the first file, ! negates the condition. What is missing here is the action, which by default is {print} and usually not written explicitly.

注意，这现在可以用来删除列入黑名单的单词。

$ awk '...' badwords allwords > goodwords

只需稍作更改，它就可以清理多个列表并创建清理过的版本。

$ awk 'NR==FNR{a[$0];next} !($0 in a){print > FILENAME".clean"}' bad file1 file2 file3 ...

2015-09-23 19:04:41

删除出现在另一个文件上的行后获取该文件

comm -23 <(sort bigFile.txt) <(sort smallfile.txt) > diff.txt . com

2021-05-11 01:42:44

如何从另一个文件A中删除文件B中出现的行?

推荐文章

最新文章

标签