我想从Python应用程序调用C库。我不想包装整个API,只包装与我的案例相关的函数和数据类型。在我看来,我有三个选择:
用c语言创建一个实际的扩展模块,这可能有点过分,而且我还想避免学习编写扩展的开销。
使用Cython将相关部分从C库公开到Python。
在Python中完成所有的事情,使用ctypes与外部库通信。
我不知道2)还是3)哪个更好。3)的优点是ctypes是标准库的一部分,生成的代码将是纯Python——尽管我不确定这个优点到底有多大。
这两种选择有更多的优点/缺点吗?你推荐哪种方法?
编辑:感谢你的回答,它们为任何想做类似事情的人提供了很好的资源。当然,这个决定仍然是针对单一情况做出的——没有一个“这是正确的事情”之类的答案。对于我自己的情况,我可能会使用ctypes,但我也期待在其他一些项目中尝试Cython。
由于没有唯一的正确答案,接受一个答案就有些武断了;我选择了FogleBird的答案,因为它提供了一些关于ctypes的很好的见解,而且它也是目前投票最多的答案。然而,我建议阅读所有的答案,以获得一个良好的概述。
再次感谢。
警告:以下是Cython核心开发人员的观点。
I almost always recommend Cython over ctypes. The reason is that it has a much smoother upgrade path. If you use ctypes, many things will be simple at first, and it's certainly cool to write your FFI code in plain Python, without compilation, build dependencies and all that. However, at some point, you will almost certainly find that you have to call into your C library a lot, either in a loop or in a longer series of interdependent calls, and you would like to speed that up. That's the point where you'll notice that you can't do that with ctypes. Or, when you need callback functions and you find that your Python callback code becomes a bottleneck, you'd like to speed it up and/or move it down into C as well. Again, you cannot do that with ctypes. So you have to switch languages at that point and start rewriting parts of your code, potentially reverse engineering your Python/ctypes code into plain C, thus spoiling the whole benefit of writing your code in plain Python in the first place.
With Cython, OTOH, you're completely free to make the wrapping and calling code as thin or thick as you want. You can start with simple calls into your C code from regular Python code, and Cython will translate them into native C calls, without any additional calling overhead, and with an extremely low conversion overhead for Python parameters. When you notice that you need even more performance at some point where you are making too many expensive calls into your C library, you can start annotating your surrounding Python code with static types and let Cython optimise it straight down into C for you. Or, you can start rewriting parts of your C code in Cython in order to avoid calls and to specialise and tighten your loops algorithmically. And if you need a fast callback, just write a function with the appropriate signature and pass it into the C callback registry directly. Again, no overhead, and it gives you plain C calling performance. And in the much less likely case that you really cannot get your code fast enough in Cython, you can still consider rewriting the truly critical parts of it in C (or C++ or Fortran) and call it from your Cython code naturally and natively. But then, this really becomes the last resort instead of the only option.
所以,ctypes很适合做简单的事情,并快速运行一些东西。但是,当事情开始发展时,您很可能会注意到最好从一开始就使用Cython。
我知道这是一个老问题,但是当您在谷歌上搜索ctypes vs cython之类的东西时,会出现这个问题,这里的大多数答案都是由那些已经精通cython或c的人编写的,这可能无法反映您需要投入学习这些来实现您的解决方案的实际时间。这两方面我都是初学者。我以前从未接触过cython,对c/c++也没有什么经验。
在过去的两天里,我一直在寻找一种方法,将我代码中性能较重的部分委托给比python更低级的东西。我用ctypes和Cython实现了我的代码,它主要由两个简单的函数组成。
我有一个巨大的字符串列表需要处理。注意列表和字符串。
这两种类型都不完全对应于c中的类型,因为python字符串默认是unicode,而c字符串不是。python中的列表只是c的NOT数组。
以下是我的看法。使用cython。它更流畅地集成到python中,而且一般来说更容易使用。当出现错误时,ctypes只会抛出段错误,至少cython会在可能的情况下提供带有堆栈跟踪的编译警告,并且可以使用cython轻松返回有效的python对象。
下面是关于我需要投入多少时间来实现相同的功能的详细说明。顺便说一下,我做了很少的C/ c++编程:
Ctypes:
About 2h on researching how to transform my list of unicode strings to a c compatible type.
About an hour on how to return a string properly from a c function. Here I actually provided my own solution to SO once I have written the functions.
About half an hour to write the code in c, compile it to a dynamic library.
10 minutes to write a test code in python to check if c code works.
About an hour of doing some tests and rearranging the c code.
Then I plugged the c code into actual code base, and saw that ctypes does not play well with multiprocessing module as its handler is not pickable by default.
About 20 minutes I rearranged my code to not use multiprocessing module, and retried.
Then second function in my c code generated segfaults in my code base although it passed my testing code. Well, this is probably my fault for not checking well with edge cases, I was looking for a quick solution.
For about 40 minutes I tried to determine possible causes of these segfaults.
I split my functions into two libraries and tried again. Still had segfaults for my second function.
I decided to let go of the second function and use only the first function of c code and at the second or third iteration of the python loop that uses it, I had a UnicodeError about not decoding a byte at the some position though I encoded and decoded everthing explicitely.
在这一点上,我决定寻找一个替代品,并决定研究cython:
Cython
10分钟阅读cython hello world。
用15分钟检查SO如何使用setuptools而不是distutils使用cython。
10分钟关于cython类型和python类型的阅读。我了解到我可以使用大多数内置的python类型进行静态类型。
15分钟用cython类型重新注释我的python代码。
10分钟的修改我的setup.py使用编译模块在我的代码库。
将模块直接插入到多处理版本的代码库中。它的工作原理。
郑重声明,我当然没有衡量我投资的准确时机。这很可能是由于在处理ctypes时需要花费太多精力,所以我对时间的感知有点太专注了。但是它应该传达处理cython和ctypes的感觉
Cython本身是一个非常酷的工具,非常值得学习,而且惊人地接近Python语法。如果您使用Numpy进行任何科学计算,那么Cython是合适的选择,因为它与Numpy集成以实现快速矩阵运算。
Cython是Python语言的超集。您可以向它抛出任何有效的Python文件,它将吐出一个有效的C程序。在这种情况下,Cython只会将Python调用映射到底层的CPython API。这可能会导致50%的加速,因为您的代码不再被解释。
为了获得一些优化,您必须开始告诉Cython关于代码的其他事实,例如类型声明。如果你告诉它足够多,它可以把代码浓缩成纯c,也就是说,Python中的for循环变成了c中的for循环。在这里,你会看到巨大的速度提升。你也可以在这里链接到外部C程序。
使用Cython代码也非常简单。我觉得手册上说的很难。你只需要做:
$ cython mymodule.pyx
$ gcc [some arguments here] mymodule.c -o mymodule.so
然后你可以在你的Python代码中导入mymodule,完全忘记它可以编译成C语言。
在任何情况下,由于Cython都很容易安装和开始使用,所以我建议尝试一下它是否适合您的需求。如果它不是你想要的工具,那也不是浪费。