我看到哈希和加密算法之间有很多混淆,我想听到一些关于以下方面的专家建议:

什么时候使用哈希和加密 是什么让哈希或加密算法不同(从理论/数学层面) 例如,是什么使得哈希不可逆(没有彩虹树的帮助)

以下是一些类似的SO问题,但没有像我想要的那样详细:

混淆、哈希和加密之间的区别是什么? 加密和哈希的区别


当你不想返回原始输入时,使用哈希,当你想要返回原始输入时,使用加密。

哈希表获取一些输入并将其转换为一些位(通常被认为是一个数字,如32位整数,64位整数等)。相同的输入总是会产生相同的散列,但是在这个过程中你主要会丢失信息,所以你不能可靠地重现原始输入(但是有一些注意事项)。

加密主要保留了您输入到加密函数中的所有信息,只是使任何人在不拥有特定密钥的情况下很难(理想情况下不可能)逆转到原始输入。

哈希的简单例子

这里有一个简单的例子来帮助您理解为什么哈希(在一般情况下)不能返回原始输入。假设我要创建一个1位哈希。我的哈希函数接受一个比特字符串作为输入,如果输入字符串中设置了偶数位,则将哈希值设置为1,如果输入字符串中设置了奇数位,则设置为0。

例子:

Input    Hash
0010     0
0011     1
0110     1
1000     0

注意,有许多输入值的哈希值为0,也有许多输入值的哈希值为1。如果你知道哈希值是0,你就不能确定原始输入是什么。

顺便说一下,这个1位哈希并不是完全人为的…看看奇偶校验位。

加密的简单例子

你可以通过使用简单的字母替换来加密文本,比如如果输入是a,你就写B。如果输入是B,你就写c。一直到字母表的末尾,如果输入是Z,你又写a。

Input   Encrypted
CAT     DBU
ZOO     APP

就像简单的哈希示例一样,这种类型的加密在历史上也被使用过。


你可以在维基百科上查一下…但既然你想要一个解释,我在这里尽我所能:

哈希函数

They provide a mapping between an arbitrary length input, and a (usually) fixed length (or smaller length) output. It can be anything from a simple crc32, to a full blown cryptographic hash function such as MD5 or SHA1/2/256/512. The point is that there's a one-way mapping going on. It's always a many:1 mapping (meaning there will always be collisions) since every function produces a smaller output than it's capable of inputting (If you feed every possible 1mb file into MD5, you'll get a ton of collisions).

The reason they are hard (or impossible in practicality) to reverse is because of how they work internally. Most cryptographic hash functions iterate over the input set many times to produce the output. So if we look at each fixed length chunk of input (which is algorithm dependent), the hash function will call that the current state. It will then iterate over the state and change it to a new one and use that as feedback into itself (MD5 does this 64 times for each 512bit chunk of data). It then somehow combines the resultant states from all these iterations back together to form the resultant hash.

Now, if you wanted to decode the hash, you'd first need to figure out how to split the given hash into its iterated states (1 possibility for inputs smaller than the size of a chunk of data, many for larger inputs). Then you'd need to reverse the iteration for each state. Now, to explain why this is VERY hard, imagine trying to deduce a and b from the following formula: 10 = a + b. There are 10 positive combinations of a and b that can work. Now loop over that a bunch of times: tmp = a + b; a = b; b = tmp. For 64 iterations, you'd have over 10^64 possibilities to try. And that's just a simple addition where some state is preserved from iteration to iteration. Real hash functions do a lot more than 1 operation (MD5 does about 15 operations on 4 state variables). And since the next iteration depends on the state of the previous and the previous is destroyed in creating the current state, it's all but impossible to determine the input state that led to a given output state (for each iteration no less). Combine that, with the large number of possibilities involved, and decoding even an MD5 will take a near infinite (but not infinite) amount of resources. So many resources that it's actually significantly cheaper to brute-force the hash if you have an idea of the size of the input (for smaller inputs) than it is to even try to decode the hash.

加密功能

They provide a 1:1 mapping between an arbitrary length input and output. And they are always reversible. The important thing to note is that it's reversible using some method. And it's always 1:1 for a given key. Now, there are multiple input:key pairs that might generate the same output (in fact there usually are, depending on the encryption function). Good encrypted data is indistinguishable from random noise. This is different from a good hash output which is always of a consistent format.

用例

Use a hash function when you want to compare a value but can't store the plain representation (for any number of reasons). Passwords should fit this use-case very well since you don't want to store them plain-text for security reasons (and shouldn't). But what if you wanted to check a filesystem for pirated music files? It would be impractical to store 3 mb per music file. So instead, take the hash of the file, and store that (md5 would store 16 bytes instead of 3mb). That way, you just hash each file and compare to the stored database of hashes (This doesn't work as well in practice because of re-encoding, changing file headers, etc, but it's an example use-case).

Use a hash function when you're checking validity of input data. That's what they are designed for. If you have 2 pieces of input, and want to check to see if they are the same, run both through a hash function. The probability of a collision is astronomically low for small input sizes (assuming a good hash function). That's why it's recommended for passwords. For passwords up to 32 characters, md5 has 4 times the output space. SHA1 has 6 times the output space (approximately). SHA512 has about 16 times the output space. You don't really care what the password was, you care if it's the same as the one that was stored. That's why you should use hashes for passwords.

在需要取回输入数据时使用加密。注意“需要”这个词。如果您正在存储信用卡号码,则需要在某个时候将它们取出,但不希望以纯文本形式存储它们。因此,应该存储加密版本,并尽可能保证密钥的安全。

Hash functions are also great for signing data. For example, if you're using HMAC, you sign a piece of data by taking a hash of the data concatenated with a known but not transmitted value (a secret value). So, you send the plain-text and the HMAC hash. Then, the receiver simply hashes the submitted data with the known value and checks to see if it matches the transmitted HMAC. If it's the same, you know it wasn't tampered with by a party without the secret value. This is commonly used in secure cookie systems by HTTP frameworks, as well as in message transmission of data over HTTP where you want some assurance of integrity in the data.

关于密码散列的注意事项:

加密哈希函数的一个关键特征是,它们应该非常快地创建,并且非常难/慢地反转(以至于几乎不可能)。这就给密码带来了一个问题。如果您存储sha512(密码),您没有做任何事情来防止彩虹表或暴力攻击。记住,哈希函数是为了速度而设计的。因此,攻击者只需通过哈希函数运行字典并测试每个结果就可以了。

添加盐有助于解决问题,因为它将一些未知数据添加到散列中。因此,他们不需要找到任何匹配md5(foo)的东西,而是需要找到添加到已知盐中会产生md5(foo.salt)的东西(这要难得多)。但这仍然不能解决速度问题,因为如果他们知道盐,这只是一个运行字典的问题。

有很多处理方法。一种流行的方法被称为键强化(或键拉伸)。基本上,迭代哈希多次(通常是数千次)。这有两个作用。首先,它显著降低了哈希算法的运行速度。其次,如果实现正确(在每次迭代中传递输入和盐),实际上会增加输出的熵(可用空间),减少碰撞的机会。一个简单的实现是:

var hash = password + salt;
for (var i = 0; i < 5000; i++) {
    hash = sha512(hash + password + salt);
}

还有其他更标准的实现,如PBKDF2、BCrypt。但是这种技术被相当多的安全相关系统(如PGP、WPA、Apache和OpenSSL)所使用。

最重要的是,哈希(密码)还不够好。哈希(密码+盐)是更好的,但仍然不够好…使用一个扩展散列机制来生成你的密码散列…

关于琐碎拉伸的另一个注意事项

在任何情况下都不要将一个哈希的输出直接返回给哈希函数:

hash = sha512(password + salt); 
for (i = 0; i < 1000; i++) {
    hash = sha512(hash); // <-- Do NOT do this!
}

其原因与碰撞有关。记住,所有哈希函数都存在冲突,因为可能的输出空间(可能输出的数量)小于输入空间。要知道为什么,让我们看看发生了什么。首先,让我们假设sha1()有0.001%的碰撞概率(实际要低得多,但出于演示目的)。

hash1 = sha1(password + salt);

Now, hash1 has a probability of collision of 0.001%. But when we do the next hash2 = sha1(hash1);, all collisions of hash1 automatically become collisions of hash2. So now, we have hash1's rate at 0.001%, and the 2nd sha1() call adds to that. So now, hash2 has a probability of collision of 0.002%. That's twice as many chances! Each iteration will add another 0.001% chance of collision to the result. So, with 1000 iterations, the chance of collision jumped from a trivial 0.001% to 1%. Now, the degradation is linear, and the real probabilities are far smaller, but the effect is the same (an estimation of the chance of a single collision with md5 is about 1/(2128) or 1/(3x1038). While that seems small, thanks to the birthday attack it's not really as small as it seems).

相反,通过每次重新追加盐和密码,将数据重新引入散列函数。所以任何一轮的碰撞都不再是下一轮的碰撞。所以:

hash = sha512(password + salt);
for (i = 0; i < 1000; i++) {
    hash = sha512(hash + password + salt);
}

具有与本机sha512函数相同的碰撞几率。这就是你想要的。那就用这个吧。


Use hashes when you only need to go one way. For example, for passwords in a system, you use hashing because you will only ever verify that the value a user entered, after hashing, matches the value in your repository. With encryption, you can go two ways. hashing algorithms and encryption algorithms are just mathematical algorithms. So in that respect they are not different -- its all just mathematical formulas. Semantics wise, though, there is the very big distinction between hashing (one-way) and encryption(two-way). Why are hashes irreversible? Because they are designed to be that way, because sometimes you want a one-way operation.


当涉及到传输数据的安全性时,即双向通信,你使用加密。所有加密都需要密钥

当涉及到授权时,您使用哈希。哈希中没有键

Hashing takes any amount of data (binary or text) and creates a constant-length hash representing a checksum for the data. For example, the hash might be 16 bytes. Different hashing algorithms produce different size hashes. You obviously cannot re-create the original data from the hash, but you can hash the data again to see if the same hash value is generated. One-way Unix-based passwords work this way. The password is stored as a hash value, and to log onto a system, the password you type is hashed, and the hash value is compared against the hash of the real password. If they match, then you must've typed the correct password

为什么哈希是不可逆的:

哈希是不可逆的,因为输入到哈希的映射不是1对1的。 有两个输入映射到相同的哈希值通常被称为“哈希碰撞”。出于安全考虑,“好的”哈希函数的属性之一是在实际使用中很少发生冲突。


哈希函数可以看作是烤一条面包。你从输入(面粉、水、酵母等)开始,在应用哈希函数(混合+烘焙)后,你最终会得到一个输出:一条面包。

另一种方法是非常困难的——你不能真正地把面包分成面粉、水和酵母——其中一些在烘焙过程中丢失了,你永远无法确切地说出某条面包使用了多少水、面粉或酵母,因为这些信息被哈希函数(又名烤箱)破坏了。

从理论上讲,许多不同的输入变体将生产相同的面包(例如,2杯水和1茶匙酵母生产的面包与2.1杯水和0.9茶匙酵母生产的面包完全相同),但给定其中一个面包,你无法确切地说出哪种输入组合生产了它。

另一方面,加密可以被看作是一个保险箱。不管你放进去什么,只要你有一开始锁进去的钥匙,它就会出来。这是一个对称运算。给定一个键和一些输入,就会得到一个特定的输出。给定这个输出和相同的键,您将得到原始的输入。这是一个1:1的映射。


加密和哈希算法的工作原理类似。在每种情况下,都需要在比特之间制造混乱和扩散。简而言之,混淆是在密钥和密文之间创建一个复杂的关系,扩散是在传播每个比特的信息。

许多哈希函数实际上使用加密算法(或加密算法的原语)。例如,SHA-3候选Skein使用threfish作为底层方法来处理每个块。不同之处在于,它们不是保留每个密文块,而是破坏性地、确定性地合并在一起,形成固定的长度


哈希和加密/解密技术的基本概述如下。

散列:

如果你再次哈希任何纯文本,你不能得到相同的纯文本 散列文本中的文本。简单地说,这是一个单向的过程。


加密和解密:

如果你加密任何纯文本与密钥再次你可以 通过使用相同(对称)/不同(不对称)密钥对加密文本进行解密来获得相同的纯文本。


更新: 解决编辑问题中提到的问题。

1. When to use hashes vs encryptions Hashing is useful if you want to send someone a file. But you are afraid that someone else might intercept the file and change it. So a way that the recipient can make sure that it is the right file is if you post the hash value publicly. That way the recipient can compute the hash value of the file received and check that it matches the hash value. Encryption is good if you say have a message to send to someone. You encrypt the message with a key and the recipient decrypts with the same (or maybe even a different) key to get back the original message. credits


2. 是什么使哈希或加密算法不同(从理论/数学层面),即什么使哈希不可逆 (没有彩虹树的帮助)

Basically hashing is an operation that loses information but not encryption. Let's look at the difference in simple mathematical way for our easy understanding, of course both have much more complicated mathematical operations with repetitions involved in it Encryption/Decryption (Reversible): Addition: 4 + 3 = 7 This can be reversed by taking the sum and subtracting one of the addends 7 - 3 = 4 Multiplication: 4 * 5 = 20 This can be reversed by taking the product and dividing by one of the factors 20 / 4 = 5 So, here we could assume one of the addends/factors is a decryption key and result(7,20) is an encrypted text. Hashing (Not Reversible): Modulo division: 22 % 7 = 1 This can not be reversed because there is no operation that you can do to the quotient and the dividend to reconstitute the divisor (or vice versa). Can you find an operation to fill in where the '?' is? 1 ? 7 = 22 1 ? 22 = 7 So hash functions have the same mathematical quality as modulo division and lose the information.

学分


我的两句台词……面试官一般想要以下答案。

哈希是一种方法。您不能将数据/字符串从哈希代码转换。

加密是两种方式-如果你有密钥,你可以再次解密加密的字符串。


哈希函数将可变大小的文本转换为固定大小的文本。

来源:https://en.wikipedia.org/wiki/Hash_function


PHP中的哈希函数

哈希将字符串转换为哈希字符串。见下文。

散列:

$str = 'My age is 29';
$hash = hash('sha1', $str);
echo $hash; // OUTPUT: 4d675d9fbefc74a38c89e005f9d776c75d92623e

密码通常以散列表示形式存储,而不是以可读文本的形式存储。当终端用户希望访问受密码保护的应用程序时,必须在身份验证过程中提供密码。当用户提交密码时,有效的身份验证系统接收密码并对给定的密码进行散列。将此密码哈希与系统已知的哈希进行比较。在平等的情况下,允许访问。

DEHASH:

SHA1是单向哈希。这意味着你不能去散列。

但是,您可以强制使用散列。请参见:https://hashkiller.co.uk/sha1-decrypter.aspx。

MD5是另一种哈希。MD5散列器可以在这个网站上找到:https://www.md5online.org/。

为了阻止对哈希的蛮力攻击,可以给一个盐。 在php中,您可以使用password_hash()来创建密码散列。 函数password_hash()自动创建一个盐。 使用password_verify()对密码哈希进行验证(使用salt)。

// Invoke this little script 3 times, and it will give you everytime a new hash
$password = '1234';  
$hash = password_hash($password, PASSWORD_DEFAULT);  

echo $hash; 
// OUTPUT 

$2y$10$ADxKiJW/Jn2DZNwpigWZ1ePwQ4il7V0ZB4iPeKj11n.iaDtLrC8bu 

$2y$10$H8jRnHDOMsHFMEZdT4Mk4uI4DCW7/YRKjfdcmV3MiA/WdzEvou71u 

$2y$10$qhyfIT25jpR63vCGvRbEoewACQZXQJ5glttlb01DmR4ota4L25jaW

一个密码可以由多个哈希表示。 当使用password_verify()使用不同的密码哈希值验证密码时,该密码将被接受为有效密码。

$password = '1234';  

$hash = '$2y$10$ADxKiJW/Jn2DZNwpigWZ1ePwQ4il7V0ZB4iPeKj11n.iaDtLrC8bu';  
var_dump( password_verify($password, $hash) );  

$hash = '$2y$10$H8jRnHDOMsHFMEZdT4Mk4uI4DCW7/YRKjfdcmV3MiA/WdzEvou71u';  
var_dump( password_verify($password, $hash) );  

$hash = '$2y$10$qhyfIT25jpR63vCGvRbEoewACQZXQJ5glttlb01DmR4ota4L25jaW';  
var_dump( password_verify($password, $hash) );

// OUTPUT 

boolean true 

boolean true 

boolean true



加密函数通过使用加密密钥将文本转换为无意义的密文,反之亦然。

来源:https://en.wikipedia.org/wiki/Encryption


PHP加密

让我们深入研究一些处理加密的PHP代码。

- Mcrypt扩展-

加密:

$cipher = MCRYPT_RIJNDAEL_128;
$key = 'A_KEY';
$data = 'My age is 29';
$mode = MCRYPT_MODE_ECB;

$encryptedData = mcrypt_encrypt($cipher, $key , $data , $mode);
var_dump($encryptedData);

//OUTPUT:
string '„Ùòyªq³¿ì¼üÀpå' (length=16)

解密:

$decryptedData = mcrypt_decrypt($cipher, $key , $encryptedData, $mode);
$decryptedData = rtrim($decryptedData, "\0\4"); // Remove the nulls and EOTs at the END
var_dump($decryptedData);

//OUTPUT:
string 'My age is 29' (length=12)

——OpenSSL扩展——

Mcrypt扩展在7.1中被弃用。并在PHP 7.2中删除。 应该在php 7中使用OpenSSL扩展。请看下面的代码片段:

$key = 'A_KEY';
$data = 'My age is 29';

// ENCRYPT
$encryptedData = openssl_encrypt($data , 'AES-128-CBC', $key, 0, 'IV_init_vector01');
var_dump($encryptedData);

// DECRYPT    
$decryptedData = openssl_decrypt($encryptedData, 'AES-128-CBC', $key, 0, 'IV_init_vector01');
var_dump($decryptedData);

//OUTPUT
string '4RJ8+18YkEd7Xk+tAMLz5Q==' (length=24)
string 'My age is 29' (length=12)

对称加密:

对称加密也可以称为共享密钥或共享秘密加密。在对称加密中,一个密钥同时用于加密和解密流量。

非对称加密:

非对称加密也称为公钥加密。非对称加密与对称加密的主要区别在于使用两个密钥:一个用于加密,一个用于解密。最常用的非对称加密算法是RSA。

与对称加密相比,非对称加密施加了很高的计算负担,并且往往要慢得多。因此,它通常不用于保护有效负载数据。相反,它的主要优势在于能够在不安全的媒介(例如Internet)上建立安全通道。这是通过交换公钥来完成的,而公钥只能用于加密数据。互补私钥(从不共享)用于解密。

散列:

最后,哈希是一种不同于加密的加密安全形式。加密是一个两步过程,用于首先加密消息,然后解密消息,而哈希将消息压缩为不可逆的固定长度值或哈希。网络中最常见的两种哈希算法是MD5和SHA-1。

更多信息请点击:http://packetlife.net/blog/2010/nov/23/symmetric-asymmetric-encryption-hashing/


EncryptionThe Purpose of encryption is to transform data in order to keep it secret E.g (Sending someone a secret text that they only should able to read,sending passwords through Internet). Instead of focusing the usability the goal is to ensure the data send can be sent secretly and it can only seen by the user whom you sent. It Encrypts the data into another format of transforming it into unique pattern it can be encrypt with the secret key and those users who having the secret key can able to see the message by reversible the process. E.g(AES,BLOWFISH,RSA) The encryption may simply look like this FhQp6U4N28GITVGjdt37hZN Hashing In technically we can say it as takes a arbitary input and produced a fixed length string. Most important thing in these is you can't go from the output to the input.It produces the strong output that the given information has not been modified. The process is to take a input and hash it and then send with the sender's private key once the receiver received they can validate it with sender's public key. If the hash is wrong and did't match with hash we can't see any of the information. E.g(MD5,SHA.....)


Cryptography deals with numbers and strings. Basically every digital thing in the entire universe are numbers. When I say numbers, its 0 & 1. You know what they are, binary. The images you see on screen, the music that you listen through your earphone, everything are binaries. But our ears and eyes will not understand binaries right? Only brain could understand that, and even if it could understand binaries, it can’t enjoy binaries. So we convert the binaries to human understandable formats such as mp3,jpg,etc. Let’s term the process as Encoding. It’s two way process and can be easily decoded back to its original form.

哈希

哈希是另一种加密技术,数据一旦转换为其他形式就永远无法恢复。用门外汉的话说,没有所谓去哈希的过程。有许多哈希函数来完成这项工作,如sha-512, md5等。

如果原始值不能恢复,那么我们在哪里使用它?密码!当你为你的手机或电脑设置密码时,你的密码哈希会被创建并存储在一个安全的地方。当您下次尝试登录时,输入的字符串再次使用相同的算法(哈希函数)进行散列,输出与存储的值匹配。如果相同,则登录。否则你就会被赶出去。

Credits: wikimedia By applying hash to the password, we can ensure that an attacker will never get our password even if he steal the stored password file. The attacker will have the hash of the password. He can probably find a list of most commonly used passwords and apply sha-512 to each of it and compare it with the value in his hand. It is called the dictionary attack. But how long would he do this? If your password is random enough, do you think this method of cracking would work? All the passwords in the databases of Facebook, Google and Amazon are hashed, or at least they are supposed to be hashed.

然后是加密

加密介于哈希和编码之间。编码是一个双向过程,不应该用来提供安全性。加密也是一个双向过程,但是当且仅当知道加密密钥时才能检索原始数据。如果您不知道加密是如何工作的,不要担心,我们将在这里讨论基础知识。这就足以理解SSL的基础知识了。因此,有两种类型的加密,即对称加密和非对称加密。

对称密钥加密

I am trying to keep things as simple as I could. So, let’s understand the symmetric encryption by means of a shift algorithm. This algorithm is used to encrypt alphabets by shifting the letters to either left or right. Let’s take a string CRYPTO and consider a number +3. Then, the encrypted format of CRYPTO will be FUBSWR. That means each letter is shifted to right by 3 places. Here, the word CRYPTO is called Plaintext, the output FUBSWR is called the Ciphertext, the value +3 is called the Encryption key (symmetric key) and the whole process is a cipher. This is one of the oldest and basic symmetric key encryption algorithm and its first usage was reported during the time of Julius Caesar. So, it was named after him and it is the famous Caesar Cipher. Anyone who knows the encryption key and can apply the reverse of Caesar’s algorithm and retrieve the original Plaintext. Hence it is called a Symmetric Encryption.

非对称密钥加密

We know that, in Symmetric encryption same key is used for both encryption and decryption. Once that key is stolen, all the data is gone. That’s a huge risk and we need more complex technique. In 1976, Whitfield Diffie and Martin Hellman first published the concept of Asymmetric encryption and the algorithm was known as Diffie–Hellman key exchange. Then in 1978, Ron Rivest, Adi Shamir and Leonard Adleman of MIT published the RSA algorithm. These can be considered as the foundation of Asymmetric cryptography.

As compared to Symmetric encryption, in Asymmetric encryption, there will be two keys instead of one. One is called the Public key, and the other one is the Private key. Theoretically, during initiation we can generate the Public-Private key pair to our machine. Private key should be kept in a safe place and it should never be shared with anyone. Public key, as the name indicates, can be shared with anyone who wish to send encrypted text to you. Now, those who have your public key can encrypt the secret data with it. If the key pair were generated using RSA algorithm, then they should use the same algorithm while encrypting the data. Usually the algorithm will be specified in the public key. The encrypted data can only be decrypted with the private key which is owned by you.

来源:SSL/TLS for dummies第1部分:加密套件,哈希,加密| WST (https://www.wst.space/ssl-part1-ciphersuite-hashing-encryption/)


你已经得到了一些很好的答案,但我猜你可以这样看: 加密: 如果你有正确的密钥,加密必须是可以解密的。

例子: 就像你发电子邮件一样。 您可能不希望世界上每个人都知道您正在给接收电子邮件的人写什么,但接收电子邮件的人可能希望能够阅读它。

散列: 哈希的工作原理类似于加密,但它不应该能够反转它。

例子: 就像你把钥匙插进一扇锁着的门里(就是那种你关门时就会锁上的门)。你不需要关心锁的具体工作原理,只要在你使用钥匙时它能自己解锁就行了。如果出现问题,你可能无法修复它,不如换一个新锁。(就像每次登录都会忘记密码一样,至少我一直都这样做,这是使用哈希的常见领域)。

... 我猜在这种情况下,你可以把彩虹算法称为锁匠。

希望事情好转=)