Python的“虚拟机”似乎很少读到,而在Java中“虚拟机”一直被使用。
两者都解释字节码;为什么一个叫虚拟机,另一个叫解释器?
Python的“虚拟机”似乎很少读到,而在Java中“虚拟机”一直被使用。
两者都解释字节码;为什么一个叫虚拟机,另一个叫解释器?
术语不同的一个原因可能是,人们通常认为向python解释器提供人类可读的原始源代码,而不用担心字节码之类的问题。
在Java中,必须显式地编译为字节码,然后在VM上只运行字节码,而不是源代码。
尽管Python在幕后使用虚拟机,但从用户的角度来看,大多数时候可以忽略这个细节。
术语解释器是一个遗留术语,可以追溯到早期的shell脚本语言。由于“脚本语言”已经演变成功能齐全的语言,它们对应的平台也变得更加复杂和沙箱化,虚拟机和解释器(在Python意义上)之间的区别非常小,甚至不存在。
Python解释器仍然以与shell脚本相同的方式运行,从某种意义上说,它可以在不需要单独的编译步骤的情况下执行。除此之外,Python解释器(或Perl或Ruby的)和Java虚拟机之间的区别主要是实现细节。(有人可能会说Java比Python更加完全沙箱化,但两者最终都通过原生C接口提供对底层架构的访问。)
不要忘记Python为x86提供了JIT编译器,这进一步混淆了问题。(见psyco)。
对“解释型语言”的更严格的解释只有在讨论VM的性能问题时才有用,例如,与Python相比,Ruby被认为更慢,因为它是一种解释型语言,不像Python——换句话说,上下文就是一切。
A virtual machine is a virtual computing environment with a specific set of atomic well defined instructions that are supported independent of any specific language and it is generally thought of as a sandbox unto itself. The VM is analogous to an instruction set of a specific CPU and tends to work at a more fundamental level with very basic building blocks of such instructions (or byte codes) that are independent of the next. An instruction executes deterministically based only on the current state of the virtual machine and does not depend on information elsewhere in the instruction stream at that point in time.
另一方面,解释器更复杂,因为它是为解析特定语言和特定语法的某些语法流而定制的,这些语法必须在周围标记的上下文中进行解码。您不能单独地查看每个字节甚至每一行,然后确切地知道下一步该做什么。语言中的令牌不能像相对于VM的指令(字节码)那样孤立地获取。
Java编译器将Java语言转换为字节码流,与C编译器将C语言程序转换为汇编代码没有什么不同。另一方面,解释器并不真正将程序转换为任何定义良好的中间形式,它只是将程序操作作为解释源代码的过程。
VM和解释器区别的另一个测试是你是否认为它是独立于语言的。我们所知道的Java虚拟机并不是Java特有的。您可以使用其他语言制作编译器,生成可以在JVM上运行的字节代码。另一方面,我不认为我们真的会考虑将Python以外的其他语言“编译”为Python以供Python解释器解释。
Because of the sophistication of the interpretation process, this can be a relatively slow process....specifically parsing and identifying the language tokens, etc. and understanding the context of the source to be able to undertake the execution process within the interpreter. To help accelerate such interpreted languages, this is where we can define intermediate forms of pre-parsed, pre-tokenized source code that is more readily directly interpreted. This sort of binary form is still interpreted at execution time, it is just starting from a much less human readable form to improve performance. However, the logic executing that form is not a virtual machine, because those codes still can't be taken in isolation - the context of the surrounding tokens still matter, they are just now in a different more computer efficient form.
解释器,翻译源代码为一些有效的中间表示(代码),并立即执行。
虚拟机,显式地执行存储的预编译代码,由编译器构建,它是解释器系统的一部分。
虚拟机的一个非常重要的特征是,在虚拟机内部运行的软件,受限于虚拟机提供的资源。确切地说,它无法跳出它的虚拟世界。想想远程代码的安全执行,Java applet。
在python的情况下,如果我们保留pyc文件,就像这篇文章的评论中提到的,那么这个机制将变得更像一个虚拟机,并且这个字节码执行得更快——它仍然会被解释,但从一个更计算机友好的形式。如果我们从整体上看,PVM是Python解释器的最后一步。
底线是,当我们提到Python解释器时,这意味着我们将其作为一个整体来引用,而当我们说PVM时,这意味着我们只是在谈论Python解释器的一部分,一个运行时环境。类似于Java,我们引用不同的部分,JRE, JVM, JDK等。
有关更多信息,请参阅维基百科条目:解释器和虚拟机。这里还有一个。在这里您可以找到应用程序虚拟机的比较。它有助于理解编译器,解释器和vm之间的区别。
在本文中,“虚拟机”指的是进程虚拟机,不是指 系统虚拟机,如Qemu或Virtualbox。进程虚拟机为 一个简单的程序,它提供了一个通用的编程环境——程序 这是可以编程的。
Java has an interpreter as well as a virtual machine, and Python has a virtual machine as well as an interpreter. The reason "virtual machine" is a more common term in Java and "interpreter" is a more common term in Python has a lot to do with the major difference between the two languages: static typing (Java) vs dynamic typing (Python). In this context, "type" refers to primitive data types -- types which suggest the in-memory storage size of the data. The Java virtual machine has it easy. It requires the programmer to specify the primitive data type of each variable. This provides sufficient information for Java bytecode not only to be interpreted and executed by the Java virtual machine, but even to be compiled into machine instructions. The Python virtual machine is more complex in the sense that it takes on the additional task of pausing before the execution of each operation to determine the primitive data types for each variable or data structure involved in the operation. Python frees the programmer from thinking in terms of primitive data types, and allows operations to be expressed at a higher level. The price of this freedom is performance. "Interpreter" is the preferred term for Python because it has to pause to inspect data types, and also because the comparatively concise syntax of dynamically-typed languages is a good fit for interactive interfaces. There's no technical barrier to building an interactive Java interface, but trying to write any statically-typed code interactively would be tedious, so it just isn't done that way.
在Java世界中,虚拟机最引人注目,因为它运行程序 用一种可以编译成机器指令的语言编写, 结果就是速度和资源效率。可以执行Java字节码 通过Java虚拟机,性能接近编译 程序,相对来说。这是由于原始数据的存在 在字节码中键入信息。Java虚拟机将Java放在 自身类别:
可移植的解释静态类型语言
仅次于LLVM的是LLVM,但LLVM在不同的级别上运行:
可移植解释汇编语言
The term "bytecode" is used in both Java and Python, but not all bytecode is created equal. bytecode is just the generic term for intermediate languages used by compilers/interpreters. Even C compilers like gcc use an intermediate language (or several) to get the job done. Java bytecode contains information about primitive data types, whereas Python bytecode does not. In this respect, the Python (and Bash,Perl,Ruby, etc.) virtual machine truly is fundamentally slower than the Java virtual machine, or rather, it simply has more work to do. It is useful to consider what information is contained in different bytecode formats:
Llvm: CPU寄存器 原始数据类型 Python:用户定义类型
做一个现实世界的类比:LLVM使用原子,即Java虚拟机 Python虚拟机处理的是材料。 因为所有东西最终都必须分解成亚原子粒子(真实的 机器操作),Python虚拟机有最复杂的任务。
Intepreters/compilers of statically-typed languages just don't have the same baggage that interpreters/compilers of dynamically-typed languages have. Programmers of statically-typed languages have to take up the slack, for which the payoff is performance. However, just as all nondeterministic functions are secretly deterministic, so are all dynamically-typed languages secretly statically-typed. Performance differences between the two language families should therefore level out around the time Python changes its name to HAL 9000.
The virtual machines of dynamic languages like Python implement some idealized logical machine, and don't necessarily correspond very closely to any real physical hardware. The Java virtual machine, in contrast, is more similar in functionality to a classical C compiler, except that instead of emitting machine instructions, it executes built-in routines. In Python, an integer is a Python object with a bunch of attributes and methods attached to it. In Java, an int is a designated number of bits, usually 32. It's not really a fair comparison. Python integers should really be compared to the Java Integer class. Java's "int" primitive data type can't be compared to anything in the Python language, because the Python language simply lacks this layer of primitives, and so does Python bytecode.
因为Java变量是显式类型的,所以可以合理地期望 比如Jython的性能在同一范围内 cPython的。另一方面,一个用Python实现的Java虚拟机 肯定比泥的速度要慢。不要指望Ruby, Perl等等, 为了更好的生活。它们的设计初衷不是这样的。它们是为 “脚本”,这是动态语言编程的称呼。
Every operation that takes place in a virtual machine eventually has to hit real hardware. Virtual machines contain pre-compiled routines which are general enough to to execute any combination of logical operations. A virtual machine may not be emitting new machine instructions, but it certainly is executing its own routines over and over in arbirtrarily complex sequences. The Java virtual machine, the Python virtual machine, and all the other general-purpose virtual machines out there are equal in the sense that they can be coaxed into performing any logic you can dream up, but they are different in terms of what tasks they take on, and what tasks they leave to the programmer.
Psyco for Python is not a full Python virtual machine, but a just-in-time compiler that hijacks the regular Python virtual machine at points it thinks it can compile a few lines of code -- mainly loops where it thinks the primitive type of some variable will remain constant even if the value is changing with each iteration. In that case, it can forego some of the incessent type determination of the regular virtual machine. You have to be a little careful, though, lest you pull the type out from under Psyco's feet. Pysco, however, usually knows to just fall back to the regular virtual machine if it isn't completely confident the type won't change.
这个故事的寓意是原始数据类型信息实际上是 有助于编译器/虚拟机。
最后,考虑以下情况:执行一个Python程序 由Python解释器/运行在Java上的Java虚拟机实现 解释器/虚拟机在LLVM中实现,运行在qemu虚拟中 在iPhone上运行的机器
永久链接
首先,你应该明白,编程或计算机科学一般不是数学,我们经常使用的大多数术语都没有严格的定义。
现在回答你的问题:
什么是解释器(计算机科学)
它按最小的可执行单元翻译源代码,然后执行该单元。
什么是虚拟机
对于JVM来说,虚拟机是一个包含解释器、类加载器、垃圾收集器、线程调度器、JIT编译器和许多其他东西的软件。
正如你所看到的,解释器是JVM的一部分,整个JVM不能被称为解释器,因为它包含许多其他组件。
为什么在谈论python时要用“解释器”这个词
在Java中,编译部分是显式的。 另一方面,Python的编译和解释过程不像Java那样明确,从最终用户的角度来看,解释是用于执行Python程序的唯一机制
为了深入回答“为什么是Java虚拟机,而不是Python解释器?”这个问题,让我们尝试回到编译理论领域,作为讨论的起点。
程序编译的典型过程包括以下步骤:
Lexical analysis. Splits program text into meaningful "words" called tokens (as part of the process all comments, spaces, new-lines etc. are removed, because they do not affect program behavior). The result is an ordered stream of tokens. Syntax analysis. Builds the so-called Abstract Syntax Tree (AST) from the stream of tokens. AST establish relations between tokens and, as a consequence, defines an order of evaluation of the program. Semantic analysis. Verifies semantical correctness of the AST using information about types and a set of semantical rules of the programming language. (For example, a = b + c is a correct statement from the syntaxis point of view, but completely incorrect from the semantic point of view if a was declared as a constant object) Intermediate code generation. Serializes AST into the linearly ordered stream of machine independent "primitive" operations. In fact, code generator traverses AST and logs the order of evaluation steps. As a result, from the tree-like representation of the program, we achieve much more simple list-like representation in which order of program evaluation is preserved. Machine code generation. The program in the form of machine independent "primitive" bytecode is translated into machine code of particular processor architecture.
好的。现在我们来定义这些术语。
Interpreter, in the classical meaning of that word, assumes execution based on the program evaluation based on AST produced directly from the program text. In that case, a program is distributed in the form of source code and the interpreter is fed by program text, frequently in a dynamic way (statement-by-statement or line-by-line). For each input statement, interpreter builds its AST and immediately evaluates it changing the "state" of the program. This is a typical behavior demonstrated by scripting languages. Consider for example Bash, Windows CMD etc. Conceptually, Python takes this way too.
If we replace the AST-based execution step on the generation of intermediate machine-independent binary bytecode step in the interpreter we will split the entire process of program execution into two separate phases: compilation and execution. In that case what previously was an interpreter will become a bytecode compiler, which will transform the program from the form of the text into some binary form. Then the program is distributed in that binary form, but not in the form of source code. On the user machine, that bytecode is fed into a new entity -- virtual machine, which in fact interpret that bytecode. Due to this, virtual machines are also called bytecode interpreter. But put your attention here! A classical interpreter is a text interpreter, but a virtual machine is a binary interpreter! This is an approach taken by Java and C#.
最后,如果我们将机器代码生成添加到字节码编译器中,我们就实现了我们所说的经典编译器。经典编译器将程序源代码转换为特定处理器的机器码。然后,该机器代码可以直接在目标处理器上执行,而不需要任何额外的中介(不需要任何类型的解释器,无论是文本解释器还是二进制解释器)。
现在让我们回到最初的问题,考虑Java和Python。
Java was initially designed to have as few implementation dependencies as possible. Its design is based on the principle "write once, run anywhere" (WORA). To implement it, Java was initially designed as a programming language that compiles into machine-independent binary bytecode, which then can be executed on all platforms that support Java without the need for its recompilation. You can think about Java like about WORA-based C++. Actually, Java is closer to C++ than to the scripting languages like Python. But in contrast to C++, Java was designed to be compiled into binary bytecode which then is executed in the environment of the virtual machine, while C++ was designed to be compiled in machine code and then directly executed by the target processor.
Python was initially designed as a kind of scripting programing language which interprets scripts (programs in the form of the text written in accordance with the programming language rules). Due to this, Python has initially supported a dynamic interpretation of one-line commands or statements, as the Bash or Windows CMD do. For the same reason, initial implementations of Python had not any kind of bytecode compilers and virtual machines for execution of such bytecode inside, but from the start Python had required interpreter which is capable to understand and evaluate Python program text.
因此,在历史上,Java开发人员倾向于谈论Java虚拟机(因为最初,Java已经作为Java字节码编译器和字节码解释器——JVM的包出现),而Python开发人员倾向于谈论Python解释器(因为最初Python没有任何虚拟机,是一种经典的文本解释器,直接执行程序文本,而不需要任何形式的编译或转换为任何形式的二进制代码)。
Currently, Python also has the virtual machine under the hood and can compile and interpret Python bytecode. And that fact makes an additional investment into the confusion "Why Java Virtual Machine, but Python interpreter?", because it seems that implementations of both languages contain virtual machines. But! Even in the current moment interpretation of program text is a primary way of Python programs execution. Python implementations exploit virtual machines under the hood exclusively as an optimization technique. Interpretation of binary bytecode in the virtual machine is much more efficient than a direct interpretation of the original program text. At the same time, the presence of the virtual machine in the Python is absolutely transparent for both Python language designers and Python programs developers. The same language can be implemented in interpreters with and without the virtual machine. In the same way, the same programs can be executed in interpreters with and without the virtual machine, and that programs will demonstrate exactly the same behavior and produce equally the same output from the equal input. The only observable difference will be the speed of program execution and the amount of memory consumed by the interpreter. Thus, the virtual machine in Python is not an unavoidable part of the language design, but just an optional extension of the major Python interpreter.
Java can be considered in a similar way. Java under the hood has a JIT compiler and can selectively compile methods of Java class into machine code of the target platform and then directly execute it. But! Java still uses bytecode interpretation as a primary way of Java program execution. Like Python implementations which exploit virtual machines under the hood exclusively as an optimization technique, the Java virtual machines use Just-In-Time compilers exclusively for optimization purposes. Similarly, just because of the fact that direct execution of the machine code at least ten times faster than the interpretation of Java bytecode. And like in the case of Python, the presence of JIT compiler under the hood of JVM is absolutely transparent for both Java language designers and Java program developers. The same Java programming language can be implemented by JVM with and without JIT compiler. And in the same way, the same programs can be executed in JVMs with and without JIT inside, and the same programs will demonstrate exactly the same behavior and produce equally the same output from the equal input on both JVMs (with and without JIT). And like in the case of Python, the only observable difference between them, will be in the speed of execution and in the amount of memory consumed by JVM. And finally, like in the case of Python, JIT in Java also is not an unavoidable part of the language design, but just an optional extension of the major JVM implementations.
From the point of view of design and implementation of virtual machines of Java and Python, they differ significantly, while (attention!) both still stay virtual machines. JVM is an example of a low-level virtual machine with simple basic operations and high instruction dispatch cost. Python in its turn is a high-level virtual machine, for which instructions demonstrate complex behavior, and instruction dispatch cost is not so significant. Java operates with very low abstraction level. JVM operates on the small well-defined set of primitive types and has very tight correspondence (typically one to one) between bytecode instructions and native machine code instructions. In contrary, Python virtual machine operates at high abstraction level, it operates with complex data types (objects) and supports ad-hoc polymorphism, while bytecode instructions expose complex behavior, which can be represented by a series of multiple native machine code instructions. For example, Python supports unbounded range mathematics. Thus Python VM is forced to exploit long arithmetics for potentially big integers for which result of the operation can overflow the machine word. Hence, one bytecode instruction for arithmetics in Python can expose into the function call inside Python VM, while in JVM arithmetic operation will expose into simple operation expressed by one or few native machine instructions.
因此,我们可以得出以下结论。Java虚拟机但Python解释器是因为:
The term of virtual machine assumes binary bytecode interpretation, while the term interpreter assumes program text interpretation. Historically, Java was designed and implemented for binary bytecode interpretation and Python was initially designed and implemented for program text interpretation. Thus, the term "Java Virtual Machine" is historical and well established in the Java community. And similarly, the term "Python Interpreter" is historical and well established in the Python community. Peoples tend to prolong the tradition and use the same terms that were used long before. Finally, currently, for Java, binary bytecode interpretation is a primary way of programs execution, while JIT-compilation is just an optional and transparent optimization. And for Python, currently, program text interpretation is a primary way of Python programs execution, while compilation into Python VM bytecode is just an optional and transparent optimization.
Therefore, both Java and Python have virtual machines are binary bytecode interpreters, which can lead to confusion such as "Why Java Virtual Machine, but Python interpreter?". The key point here is that for Python, a virtual machine is not a primary or necessary means of program execution; it is just an optional extension of the classical text interpreter. On the other hand, a virtual machine is a core and unavoidable part of Java program execution ecosystem. Static or dynamic typing choice for the programming language design affects mainly the virtual machine abstraction level only, but does not dictate whether or not a virtual machine is needed. Languages using both typing systems can be designed to be compiled, interpreted, or executed within the environment of virtual machine, depending on their desired execution model.
Python可以解释代码,而无需将其编译为字节码。Java不能。
Python是一种解释型语言,而不是编译型语言,尽管由于字节码编译器的存在,两者的区别可能很模糊。这意味着源文件可以直接运行,而无需显式地创建一个可执行文件,然后再运行。
(来自文档)。
在java中,每个文件都必须编译为.class文件,然后在JVM上运行。相反,python会通过主脚本导入这些文件,以帮助加快后续使用这些文件的速度。
然而,在典型的情况下,大多数python(至少是CPython)代码运行在模拟的堆栈机器中,它与JVM的指令几乎相同,因此没有太大的区别。
然而,这种区别的真正原因是,从一开始,java就把自己打上了“可移植的、可执行的字节码”的标签,而python则把自己打上了带有REPL的动态解释语言的标签。名字贴!
我认为两者之间的界限是模糊的,人们大多争论的是“解释器”这个词的含义,以及语言与“解释器……编译器”范围的每一方有多接近。然而,没有一个是100%的。我认为编写Java或Python实现是很容易的,这是频谱的任何价值。
目前Java和Python都有虚拟机和字节码,尽管一个操作具体的值大小(如32位整数),而另一个必须确定每次调用的大小,在我看来,这并没有定义术语之间的边界。
Python没有正式定义的字节码,它只存在于内存中,这一论点也不能说服我,只是因为我计划开发只识别Python字节码的设备,编译部分将在浏览器JS机器中完成。
性能只与具体的实现有关。我们不需要知道对象的大小就能处理它,最后,在大多数情况下,我们处理的是结构,而不是基本类型。可以通过重用现有对象来优化Python VM,从而消除每次在表达式计算期间创建新对象的需要。一旦完成,在计算两个整数的和之间没有全局性能差异,这是Java的闪光点。
两者之间没有致命的区别,只有一些与最终用户无关的实现上的细微差别和缺乏优化,可能在她开始注意到性能滞后的时候,但这又是实现而不是架构的问题。
for posts that mention that python does not need to generate byte code, I'm not sure that's true. it seems that all callables in Python must have a .__code__.co_code attribute which contains the byte code. I don't see a meaningful reason to call python "not compiled" just because the compiled artifacts may not be saved; and often aren't saved by design in Python, for example all comprehension compile new bytecode for it's input, this is the reason comprehension variable scope is not consistent between compile(mode='exec, ...) and compile compile(mode='single', ...) such as between running a python script and using pdb
HotSpot运行时被称为虚拟机,而CPython仅仅被称为解释器,这可能是有原因的
首先,CPython只是普通的、基于栈的字节码解释器。你向它输入Python操作码,CPython内部的软件堆栈机器就会计算你的代码,就像普通的解释器一样。
The Java HotSpot runtime is different. First and foremost, Java has 3 Just-in Time Compilers, C1, C2, and an experimental one that isn't in use yet. But that's not the main reason. The Interpreter inside the JVM is a very special kind of Interpreter called a Template Interpreter. Instead of just executing bytecode directly in a massive opcode switch case statement like CPython (And really almost every other interpreter does) does, the Template Interpreter inside the JVM contains an enormous arraylist. What does it contain? Key-value pairs of bytecodes and native CPU instructions! The arraylist is empty on startup and is filled with mappings of bytecodes pointing to native machine language to be directly run on the hardware just before your application starts up, what this means is that the "Interpreter" inside the JVM isn't actually an Interpreter at all- It's actually a discount Compiler! When Java bytecode is run, the "Interpreter" simply maps the input bytecode directly to native machine language and executes the native mapping directly, rather than implementing it in software. I'm not exactly sure why the JVM was made this way, but I suspect it was to easily execute "Interpreted" Code together with JIT Compiled Code seamlessly, and for speed/performance. If you pitted the JVM without JIT against CPython or most other interpreters it would still probably come out ahead of them, in virtue of its ingenious design which to my knowledge no other language has used before.