我一直在思考如何保护我的C/ c++代码不被反汇编和逆向工程破坏。通常情况下,在我的代码中,我绝不会宽恕这种行为;然而,为了各种人的安全,我目前正在研究的协议决不能被检查或理解。

现在这对我来说是一个新的主题,互联网并没有真正的资源来防止逆向工程,而是描述了大量关于如何逆向工程的信息

到目前为止我想到的一些事情是:

Code injection (calling dummy functions before and after actual function calls) Code obfustication (mangles the disassembly of the binary) Write my own startup routines (harder for debuggers to bind to) void startup(); int _start() { startup( ); exit (0) } void startup() { /* code here */ } Runtime check for debuggers (and force exit if detected) Function trampolines void trampoline(void (*fnptr)(), bool ping = false) { if(ping) fnptr(); else trampoline(fnptr, true); } Pointless allocations and deallocations (stack changes a lot) Pointless dummy calls and trampolines (tons of jumping in disassembly output) Tons of casting (for obfuscated disassembly)

我的意思是,这些是我想过的一些事情,但它们都可以在适当的时间框架内由代码分析师解决。我还有别的选择吗?


当前回答

最近有一篇论文叫做“程序混淆和一次性程序”。如果你真的想保护你的应用程序。本文主要围绕使用简单通用硬件的理论不可能结果。

如果你负担不起额外的硬件,那么还有另一篇论文,在所有具有相同功能和相同大小的程序中,给出了理论上的最佳可能混淆“On最佳可能混淆”。然而,本文表明,信息理论的最佳可能意味着多项式层次结构的崩溃。

如果这些结果不能满足你的需要,这些论文至少应该给你足够的参考书目引导去查阅相关文献。

更新:一种新的混淆概念,称为不可区分混淆,可以减轻不可能性结果(论文)

其他回答

正如很多人已经说过的:在一个普通的CPU上,你不能阻止它们,你只能延迟它们。就像我以前的密码学老师告诉我的那样:你不需要完美的加密,破解密码的代价肯定比获得的代价更大。你的困惑也是一样。

但还有3个注意事项:

It is possible to make reverse engineering impossible, BUT (and this is a very very big but), you cant do it on a conventional cpu. I did also much hardware development, and often FPGA are used. E.g. the Virtex 5 FX have a PowerPC CPU on them, and you can use the APU to implement own CPU opcodes in your hardware. You could use this facility to really decrypt incstuctions for the PowerPC, that is not accessible by the outside or other software, or even execute the command in the hardware. As the FPGA has builtin AES encryption for its configuration bitstream, you could not reverse engineer it (except someone manages to break AES, but then I guess we have other problems...). This ways vendors of hardware IP also protect their work. You speak from protocol. You dont say what kind of protocol it is, but when it is a network protocol you should at least protect it against network sniffing. This can you indeed do by encryption. But if you want to protect the en/decryption from an owner of the software, you are back to the obfuscation. Do make your programm undebuggable/unrunnable. Try to use some kind of detection of debugging and apply it e.g. in some formula oder adding a debug register content to a magic constant. It is much harder if your program looks in debug mode is if it where running normal, but makes a complete wrong computation, operation, or some other. E.g. I know some eco games, that had a really nasty copy-protection (I know you dont want copyprotection, but it is similar): The stolen version altered the mined resources after 30 mins of game play, and suddenly you got just a single resource. The pirate just cracked it (i.e. reverse engineered it) - checked if it run, and volia released it. Such slight behaviour changings are very hard to detect, esp. if they do not appear instantly to detection, but only delayed.

所以最后我想建议: 估算逆向工程人员的收益,将其转化为一些时间(例如,使用最便宜的印度工资),并进行逆向工程,使时间成本更大。

很多时候,担心你的产品被逆向工程是多余的。是的,它可以被逆向工程;但它是否会在短时间内变得如此出名,以至于黑客们会发现它值得逆转。它吗?(对于大量的代码行来说,这项工作不是一个小时间活动)。

如果它真的能赚钱,那么你就应该筹集足够的资金,通过专利或版权等合法途径来保护它。

恕我直言,采取你将要采取的基本预防措施,然后释放它。如果它成为逆向工程的一个点,这意味着你已经做得非常好,你自己会找到更好的方法来克服它。祝你好运。

传统的逆向工程技术依赖于智能代理使用反汇编程序回答关于代码的问题的能力。如果你想要更强的安全性,你必须做一些事情,可以证明阻止代理得到这样的答案。

您可以通过依赖停止程序(“程序X停止吗?”)来做到这一点,这通常是无法解决的。向程序中添加难以推理的程序,会使程序难以推理。构建这样的程序要比拆解它们容易。你也可以在程序中添加推理难度不同的代码;一个很好的候选程序是关于别名(“指针”)的推理程序。

Collberg等人有一篇论文(“制造廉价、弹性和隐形的不透明结构”)讨论了这些主题,并定义了各种“不透明”谓词,这些谓词会使对代码的推理变得非常困难:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.39.1946&rep=rep1&type=pdf

我还没有看到Collberg的具体方法应用于产品代码,尤其是C或c++源代码。

DashO Java混淆器似乎使用了类似的想法。 http://www.cs.arizona.edu/~collberg/Teaching/620/2008/Assignments/tools/DashO/

我不认为任何代码都是牢不可破的,但奖励必须非常棒,才能让人们愿意尝试它。

话虽如此,你还是应该做以下事情:

Use the highest optimization level possible (reverse engineering is not only about getting the assembly sequence, it is also about understanding the code and porting it into a higher-level language such as C). Highly optimized code can be a b---h to follow. Make structures dense by not having larger data types than necessary. Rearrange structure members between official code releases. Rearranged bit fields in structures are also something you can use. You can check for the presence of certain values which shouldn't be changed (a copyright message is an example). If a byte vector contains "vwxyz" you can have another byte vector containing "abcde" and compare the differences. The function doing it should not be passed pointers to the vectors but use external pointers defined in other modules as (pseudo-C code) "char *p1=&string1[539];" and "char p2=&string2[-11731];". That way there won't be any pointers pointing exactly at the two strings. In the comparison code you then compare for "(p1-539+i)-*(p2+11731+i)==some value". The cracker will think it is safe to change string1 because no one appears to reference it. Bury the test in some unexpected place.

尝试自己破解汇编代码,看看哪些是容易的,哪些是困难的。您可以尝试一些想法,使代码更难进行反向工程,并使调试更加困难。

自2013年7月以来,人们对密码学上健壮的混淆(以不可区分混淆的形式)重新产生了兴趣,这似乎是由Amit Sahai的原始研究激发的。

Sahai, Garg, Gentry, Halevi, Raykova, Waters,候选人 以及所有电路的功能加密(2013年7月21日)。 Sahai, Waters,《如何使用无区别模糊处理》 可否认加密,以及更多。 Sahai, Barak, Garg, Kalai, Paneth,保护混淆不受代数攻击(2014年2月4日)。

您可以在这篇Quanta Magazine文章和IEEE Spectrum文章中找到一些提炼的信息。

目前,利用这项技术所需的资源数量使其不切实际,但AFAICT的共识是对未来相当乐观。

我这么说很随意,但对于那些习惯于本能地忽视混淆技术的人来说——这是不同的。如果它被证明是真正的工作和实际,这确实是重要的,而不仅仅是为了混淆视听。