它们之间的主要区别是什么?在哪些典型场景中使用每种语言更好?


我不会称sed为一种成熟的编程语言,它是一种流编辑器,具有旨在以编程方式编辑文本文件的语言结构。

Awk是一种更通用的语言,但它仍然最适合文本处理。

Perl和Python是成熟的通用编程语言。Perl起源于文本处理,并具有许多类似awk的构造(网络上甚至有一个awk-to-perl脚本)。Perl和Python之间有许多不同之处,您最好的选择可能是在维基百科之类的网站上阅读这两种语言的摘要,以很好地掌握它们是什么。


按出现的顺序,这些语言是sed, awk, perl, python。

The sed program is a stream editor and is designed to apply the actions from a script to each line (or, more generally, to specified ranges of lines) of the input file or files. Its language is based on ed, the Unix editor, and although it has conditionals and so on, it is hard to work with for complex tasks. You can work minor miracles with it - but at a cost to the hair on your head. However, it is probably the fastest of the programs when attempting tasks within its remit. (It has the least powerful regular expressions of the programs discussed - adequate for many purposes, but certainly not PCRE - Perl-Compatible Regular Expressions)

The awk program (name from the initials of its authors - Aho, Weinberger, and Kernighan) is a tool initially for formatting reports. It can be used as a souped-up sed; in its more recent versions, it is computationally complete. It uses an interesting idea - the program is based on 'patterns matched' and 'actions taken when the pattern matches'. The patterns are fairly powerful (Extended Regular Expressions). The language for the actions is similar to C. One of the key features of awk is that it splits the input automatically into records and each record into fields.

Perl was written in part as an awk-killer and sed-killer. Two of the programs provided with it are a2p and s2p for converting awk scripts and sed scripts into Perl. Perl is one of the earliest of the next generation of scripting languages (Tcl/Tk can probably claim primacy). It has powerful integrated regular expression handling with a vastly more powerful language. It provides access to almost all system calls and has the extensibility of the CPAN modules. (Neither awk nor sed is extensible.) One of Perl's mottos is "TMTOWTDI - There's more than one way to do it" (pronounced "tim-toady"). Perl has 'objects', but it is more of an add-on than a fundamental part of the language.

Python是最后写的,在某种程度上可能是对Perl的反应。它有一些有趣的语法思想(缩进来表示级别-没有大括号或等量词)。它从根本上比Perl更面向对象;它就像Perl一样可扩展。

好的,什么时候使用它们?

Sed -当您需要对文件进行简单的文本转换时。 Awk -当你只需要简单的格式和总结或转换数据。 Perl—几乎适用于任何任务,但当任务需要复杂的正则表达式时尤其如此。 Python -用于可以使用Perl完成的相同任务。

我不知道Perl能做而Python不能做的任何事情,反之亦然。两者之间的选择取决于其他因素。我在Python出现之前就学会了Perl,所以我倾向于使用它。Python的语法较少,通常比较容易学习。当Perl 6可用时,它将是一个迷人的开发。

(请注意,特别是Perl和Python的“概述”是不完整的;关于这个话题可以写一整本书。)


首先,列表中有两个不相关的东西“Perl, Python awk和sed”。

事情1 -简单的文本操作工具。

sed。它有一个固定的、相对简单的工作范围,该范围由读取和检查文件的每一行的思想定义。Sed并不是特别设计为可读的。它被设计成非常小,在非常小的unix服务器上非常高效。 awk。它的工作范围稍微不那么固定和简单。然而,awk程序的主循环是通过隐式读取源文件的行来定义的。

这些都不是“完整的”编程语言。虽然您可以(通过一些工作)在awk中编写相当复杂的程序,但它很快就会变得复杂且难以阅读。

第二件事——通用编程语言。它们具有丰富的语句类型、大量内置的数据结构,并且没有所谓的内置假设或快捷方式。

Perl。 Python。

什么时候使用它们。

sed. Never. It really doesn't have any value in the modern era of computers with more than 32K of memory. Perl or Python do the same things more clearly. awk. Never. Like sed, it reflects an earlier era of computing. Rather than maintain this language (in addition to all the other required for a successful system), it's more pleasant to simply do everything in one pleasant language. Perl. Any programming problem of any kind. If you like free-thinking syntax, where there are many, many ways to do the same thing, perl is fun. Python. Any programming problem of any kind. If you like fairly limited syntax, where there are fewer choices, less subtlety, and (perhaps) more clarity. Python's object-oriented nature makes it more suitable for large, complex problems.

背景——我不是出于无知而抨击sed和awk。我20多年前就学会了awk。用它做了很多事情;用来教授核心Unix技能。我大约在15年前学习了Perl。用它做了很多复杂的事情。我把它们都抛在了后面,因为我可以在Python中做同样的事情——而且它更简单、更清楚。

sed和awk有两个严重的问题,都不是它们的年龄。

执行的不完整性。sed和awk所做的一切都可以用Python或Perl完成,通常更简单,有时也更快。由于其多处理,外壳管道具有一些性能优势。Python提供了一个子流程模块,使我能够恢复这些优势。 学习另一种语言的需要。通过使用Python(或Perl),您的实现依赖于更少的语言,从而增加了清晰度。


何时使用:awk - never - S. Lott。

I think S. Lott slightly missed the mark with this recommendation. The fact is, on Linux and the other UNIX environments, awk is a useful tool to be used with bash, sh, and ksh for quick text processings. The idea of scripting itself is you solve your problem by gluing together this tool, that tool. Hence in admin scripts, it is common to has ls, grep, |, awk, time, ps, etc. Each is a tool that the scripter combines like a builder brick by brick to finish the building (to solve the problem at hand).

For instance I am a team member of the team managing paintball gear supplies dotcom. This e-commerce site is based on the LAMP stack. For automated processing and normalizing data feeds from various suppliers into the back end database, we employ and maintain a diversified mix of scripts, including bash, perl, php, and even expect. Each has its strengths based on the available modules and API. In the bash scripts we do quick patterns match and appropriate actions on the patterns as needed using awk without the need to switch to PERL. One thing I would also like to point out, which has not been emphasized in the thread, is that a fair number of these scripts were purchased, or gotten from the open source. If the script came as Perl, we maintain it as Perl; if the script came as Php, we maintain it as Php; if it came as bash, we maintain it as bash; we do not re-write it in another language just because we think it is less efficient in the original language.


在掌握了几十种语言之后,你会对S. Lott这样的人感到厌倦(请看他对这个问题有争议的回答,在回答这个问题六年后,反对票几乎是赞成票的一半(+45/-22))。

Sed是极其简单的命令行管道的最佳工具。在sed管理员手中,它适用于任意复杂程度的一次性代码,但是不应该在生产代码中使用,除非在非常简单的替换管道中使用。比如's/this/that/。'

当只有一个输入源和一个输出(或按顺序写入的多个输出)时,Gawk (GNU awk)是目前为止重新格式化复杂数据的最佳选择。由于大量的现实工作符合这一描述,并且一个优秀的程序员可以在两个小时内学会gawk,因此它是最好的选择。在这个星球上,越简单越快越好!

Perl or Python are far better than any version of awk or sed when you have very complex input/output scenarios. The more complex the problem is, the better off you are using python, from a maintenance and readability standpoint. Note, however, that a good programmer can write readable code in any language, and a bad programmer can write unmaintainable crap in any useful language, so the choice of perl or python can safely be left to the preferences of the programmer if said programmer is skilled and clever.