一直以来都以为PEM只是单纯存个密钥,后来发现其实除了密钥还可以存很多奇奇怪怪的信息。
首先简单说以下PEM是啥,就是类似下面图片的一串东西,由-----BEGIN <TAG>-----开头,-----END <TAG>-----结尾,中间是Base64编码的一串二进制,每64个字母(即解码后的48bytes)有一个换行。中间的Base64解码后其实是一串遵循ASN.1协议的DER编码,简单来说可以看成一种序列化,把一个结构体中的整数、字串等编码成一个方便传输的二进制。

下面以RSA的公私钥为例子,可以用PyCrypto生成,也可以用openssl(略):
from Crypto.PublicKey import RSA
rsa = RSA.generate(1024)
sk = rsa.exportKey()
pk = rsa.publickey().exportKey()
with open ('./pub.pem', 'wb') as f:
    f.write(pk)
with open ('./priv.pem', 'wb') as f:
    f.write(sk)
RSA私钥
-----BEGIN RSA PRIVATE KEY-----
MIICXQIBAAKBgQCg0VTVv5fED3eXtEgZ0Jxgj6S1w45w2DvBMmcTjG7/TBqs7+Pd
tXHhtB2RHHq2E2z5BJMYlWNFDh9CcMq7xCB8VMTae4SiAxHPu6voK5/mC99IoI1X
g50M35Rk2EJivMBrwwgJWmmH9grQfWaaMStafkEzITeI7s8lhjJIuRNJ7wIDAQAB
AoGAD4JwxJaQO/Pj7EkSRQ8V7cgcsfz0sVRhWu4R+9Qo5k1AK1qNZtX3cDWPPm35
NbMk6NU0nIPXyZKlmCJJoxc0rLHbGcTI2CkmdRS8Hve7++JC1DUPZ6ACpW0z5W0a
lK3HHGjwINw5q30AZMERsWTia6BpjclKA839UW/9lm6HeUkCQQDKl+ScBYI3+W6Z
EYzjg/kZEsuhFj3pI2GB/3VO8+8aJg+sjS2a7oZtUai2g2mDsFz4UOeGKJtoWZJb
yGlfxnxHAkEAyzYwqv/8spYH8IM9x/BcFD7pL63+l12kz2cZ5xImvuclYuhjEyii
XXNRUHqNQ8EpWrbqJCtgoosQkjOpg/QhGQJAG0oypUGotNmIqF3Q2KTiXRpHC7/v
PwRhEh3TM3twbdlKqzepOQGAYiFp1IwHHpIXM+vSBCRcKsZGDM8GQrx96QJBAI2f
RKfII+qqWPor3SC8yM9rUMRj9Ky1HKlW51x87/fXy9x0rKeriAys05zM7CquMg4A
sIlomb5uQKxDyP4nY/ECQQDGfKbZiPU6vqghWUMaFGUSqNlCl41Kj4Py1CbxCV47
8bW5uLHMu60qMcZAGIBEekX14HkCaQYawTtfaPF3fX8H
-----END RSA PRIVATE KEY-----
以上是我生成的一个RSA私钥,首先从Base64上根本啥都看不出,所以可以先简单地转换成二进制(Hex):
with open('./priv.pem', 'r') as f:
    data = f.read()
key_64 = ''.join(data.split('\n')[1:-1])
key_num = libnum.s2n(base64.b64decode(key_64))
key_hex = hex(key_num)[2:]
print(key_hex)
'''
3082025d02010002818100a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef02030100010281800f8270c496903bf3e3ec4912450f15edc81cb1fcf4b154615aee11fbd428e64d402b5a8d66d5f770358f3e6df935b324e8d5349c83d7c992a5982249a31734acb1db19c4c8d829267514bc1ef7bbfbe242d4350f67a002a56d33e56d1a94adc71c68f020dc39ab7d0064c111b164e26ba0698dc94a03cdfd516ffd966e877949024100ca97e49c058237f96e99118ce383f91912cba1163de9236181ff754ef3ef1a260fac8d2d9aee866d51a8b6836983b05cf850e786289b6859925bc8695fc67c47024100cb3630aafffcb29607f0833dc7f05c143ee92fadfe975da4cf6719e71226bee72562e8631328a25d7351507a8d43c1295ab6ea242b60a28b109233a983f4211902401b4a32a541a8b4d988a85dd0d8a4e25d1a470bbfef3f0461121dd3337b706dd94aab37a9390180622169d48c071e921733ebd204245c2ac6460ccf0642bc7de90241008d9f44a7c823eaaa58fa2bdd20bcc8cf6b50c463f4acb51ca956e75c7ceff7d7cbdc74aca7ab880cacd39cccec2aae320e00b0896899be6e40ac43c8fe2763f1024100c67ca6d988f53abea82159431a146512a8d942978d4a8f83f2d426f1095e3bf1b5b9b8b1ccbbad2a31c6401880447a45f5e0790269061ac13b5f68f1777d7f07
'''
直接看的话其实也不能看出什么东西,但既然我是用PyCrypto生成的,那我就可以追下生成代码找到用什么格式生成,代码文件在对应Python库路径的.../site-packages/Crypto/PublicKey/RSA.py(PS:这篇文章的版本是pycryptodome 3.9.9),关于私钥生成的主要功能在RsaKey类的export_key函数(如果是跟我同版本的话是225行):
class RsaKey(object):
    ... ...
    def export_key(self, format='PEM', passphrase=None, pkcs=1,
                   protection=None, randfunc=None):
		... ...
        # DER format is always used, even in case of PEM, which simply
        # encodes it into BASE64.
        if self.has_private():
            binary_key = DerSequence([0,
                                      self.n,
                                      self.e,
                                      self.d,
                                      self.p,
                                      self.q,
                                      self.d % (self.p-1),
                                      self.d % (self.q-1),
                                      Integer(self.q).inverse(self.p)
                                      ]).encode()
            if pkcs == 1:
                key_type = 'RSA PRIVATE KEY'
                ... ...
			... ...
        if format == 'PEM':
            from Crypto.IO import PEM
            pem_str = PEM.encode(binary_key, key_type, passphrase, randfunc)
            return tobytes(pem_str)
        ... ...
首先从后往前看,最后返回的东西是PEM.encode编码出来的,所以先看PEM.encode做了什么,位置是.../site-packages/Crypto/IO/PEM.py的encode函数:
def encode(data, marker, passphrase=None, randfunc=None):
    ... ...
    out = "-----BEGIN %s-----\n" % marker
    ... ...
    # Each BASE64 line can take up to 64 characters (=48 bytes of data)
    # b2a_base64 adds a new line character!
    chunks = [tostr(b2a_base64(data[i:i + 48]))
              for i in range(0, len(data), 48)]
    out += "".join(chunks)
    out += "-----END %s-----" % marker
    return out
其实PEM.encode做的只是每48个bytes编码成一行Base64,然后附上BEGIN和END而已,不是什么关键函数。重点是输入的data是怎么生成的。
所以继续往上看,输入的data是由DerSequence以[0, n, e, d, ...]的顺序生成的,如果熟悉的话可以知道,这个也是openssl读RSA私钥时的输出顺序,可以用openssl rsa -in priv.pem --text试试,这个顺序在RFC3447中有定义:
      RSAPrivateKey ::= SEQUENCE {
          version           Version,
          modulus           INTEGER,  -- n
          publicExponent    INTEGER,  -- e
          privateExponent   INTEGER,  -- d
          prime1            INTEGER,  -- p
          prime2            INTEGER,  -- q
          exponent1         INTEGER,  -- d mod (p-1)
          exponent2         INTEGER,  -- d mod (q-1)
          coefficient       INTEGER,  -- (inverse of q) mod p
          otherPrimeInfos   OtherPrimeInfos OPTIONAL
      }
其中Version中的0是指普通的两个素数的RSA,如果是1的话则表示多素数的RSA:
            Version ::= INTEGER { two-prime(0), multi(1) }
               (CONSTRAINED BY
               {-- version must be multi if otherPrimeInfos present --})
所以接着追到DerSequence,在.../site-packages/Crypto/Util/asn1.py(344行):
class DerSequence(DerObject):
		... ...
        def encode(self):
                """Return this DER SEQUENCE, fully encoded as a
                binary string.
                """
                self.payload = b''
                for item in self._seq:
                    if byte_string(item):
                        self.payload += item
                    elif _is_number(item):
                        self.payload += DerInteger(item).encode()
                    else:
                        self.payload += item.encode()
                return DerObject.encode(self)
encode函数把输入seq中的每一个item分成三类,除了数字应该看代码都能理解,数字的话则还需要经过DerInteger(item)编码,所以还要追一下DerInteger,在同一个文件(249行):
class DerInteger(DerObject):
		... ...
        def encode(self):
                """Return the DER INTEGER, fully encoded as a
                binary string."""
                number = self.value
                self.payload = b''
                while True:
                    self.payload = bchr(int(number & 255)) + self.payload
                    if 128 <= number <= 255:
                        self.payload = bchr(0x00) + self.payload
                    if -128 <= number <= 255:
                        break
                    number >>= 8
                return DerObject.encode(self)
盲猜是一个数字转byte的功能(懒得逆),最后是由DerObject.encode编码的,而且上面的DerSequence的encode最后也是由DerObject.encode编码的,所以追到DerObject.encode,也是同一个文件(165行):
class DerObject(object):
    	... ...
        def encode(self):
                """Return this DER element, fully encoded as a binary byte string."""
                # Concatenate identifier octets, length octets,
                # and contents octets
                output_payload = self.payload
                ... ...
                return (bchr(self._tag_octet) +
                        self._definite_form(len(output_payload)) +
                        output_payload)
直接看return的东西就好了,是<tag> + <length> + <payload>的格式,payload是由上层函数做的所以这里不用管(已经逆完了);tag是ASN.1的类型标签,可以参考这里,比如0x30是指序列(Sequence),0x02指整数(Integer)等;length即payload的长度,但是前面还有个_definite_form对长度做格式化,继续追,还是在同一个文件(156行):
        def _definite_form(length):
                """Build length octets according to BER/DER
                definite form.
                """
                if length > 127:
                        encoding = long_to_bytes(length)
                        return bchr(len(encoding) + 128) + encoding
                return bchr(length)
大概意思是,如果长度小于127的话(即byte的最高位还没为1)就直接返回;如果超过127的话,把<length>的最高比特置1,然后加上存储长度需要占用的byte数量x,然后剩下的x个bytes用来存储长度。比如长度是0x0100的话需要2bytes存储,会被编成0x820100;长度是0xf0的话,因为最高比特为1所以不能直接存,占用1byte,被编成0x81f0。
手撕RSA私钥
经过上面的逆向后就可以开撕了,首先看前面转出来的二进制:
3082025d02010002818100a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef02030100010281800f8270c496903bf3e3ec4912450f15edc81cb1fcf4b154615aee11fbd428e64d402b5a8d66d5f770358f3e6df935b324e8d5349c83d7c992a5982249a31734acb1db19c4c8d829267514bc1ef7bbfbe242d4350f67a002a56d33e56d1a94adc71c68f020dc39ab7d0064c111b164e26ba0698dc94a03cdfd516ffd966e877949024100ca97e49c058237f96e99118ce383f91912cba1163de9236181ff754ef3ef1a260fac8d2d9aee866d51a8b6836983b05cf850e786289b6859925bc8695fc67c47024100cb3630aafffcb29607f0833dc7f05c143ee92fadfe975da4cf6719e71226bee72562e8631328a25d7351507a8d43c1295ab6ea242b60a28b109233a983f4211902401b4a32a541a8b4d988a85dd0d8a4e25d1a470bbfef3f0461121dd3337b706dd94aab37a9390180622169d48c071e921733ebd204245c2ac6460ccf0642bc7de90241008d9f44a7c823eaaa58fa2bdd20bcc8cf6b50c463f4acb51ca956e75c7ceff7d7cbdc74aca7ab880cacd39cccec2aae320e00b0896899be6e40ac43c8fe2763f1024100c67ca6d988f53abea82159431a146512a8d942978d4a8f83f2d426f1095e3bf1b5b9b8b1ccbbad2a31c6401880447a45f5e0790269061ac13b5f68f1777d7f07
30就是Sequence的tag,82就是说接下来后两个bytes是这个Sequence的长度,即0x025d个bytes,也就是剩下全部都是。接着的020100就是整数0,其中02是整数的tag,01是这个整数占1byte,00是value同样的方法也可以解02818100a0...和后面其他整数(其实生成的私钥PEM只有整数-),大概长这样:
3082025d  	# Begin Sequence: len=0x025d
0201  		# Version: (len=0x01)
00
028181		# n: (len=0x81)
00a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef
0203		# e: (len=0x03)
010001
028180		# d: (len=0x80)
0f8270c496903bf3e3ec4912450f15edc81cb1fcf4b154615aee11fbd428e64d402b5a8d66d5f770358f3e6df935b324e8d5349c83d7c992a5982249a31734acb1db19c4c8d829267514bc1ef7bbfbe242d4350f67a002a56d33e56d1a94adc71c68f020dc39ab7d0064c111b164e26ba0698dc94a03cdfd516ffd966e877949
0241		# p: (len=0x41)
00ca97e49c058237f96e99118ce383f91912cba1163de9236181ff754ef3ef1a260fac8d2d9aee866d51a8b6836983b05cf850e786289b6859925bc8695fc67c47
0241		# q: (len=0x41)
00cb3630aafffcb29607f0833dc7f05c143ee92fadfe975da4cf6719e71226bee72562e8631328a25d7351507a8d43c1295ab6ea242b60a28b109233a983f42119
0240		# d mod (p-1): (len=0x40)
1b4a32a541a8b4d988a85dd0d8a4e25d1a470bbfef3f0461121dd3337b706dd94aab37a9390180622169d48c071e921733ebd204245c2ac6460ccf0642bc7de9
0241		# d mod (q-1): (len=0x41)
008d9f44a7c823eaaa58fa2bdd20bcc8cf6b50c463f4acb51ca956e75c7ceff7d7cbdc74aca7ab880cacd39cccec2aae320e00b0896899be6e40ac43c8fe2763f1
0241		# (inverse of q) mod p: (len=0x41)
00c67ca6d988f53abea82159431a146512a8d942978d4a8f83f2d426f1095e3bf1b5b9b8b1ccbbad2a31c6401880447a45f5e0790269061ac13b5f68f1777d7f07
			
			# End Sequence
另外也可以from Crypto.Util.asn1 import DerSequence,DerInteger,然后用PyCrypto解,略。
RSA公钥
公钥部分也是类似的,先看.../site-packages/Crypto/PublicKey/RSA.py(348行)
class RsaKey(object):
    ... ...
    def export_key(self, format='PEM', passphrase=None, pkcs=1,
                   protection=None, randfunc=None):
		... ...
        if self.has_private():
            ... ...
        else:
            key_type = "PUBLIC KEY"
            binary_key = _create_subject_public_key_info(oid,
                                                         DerSequence([self.n,
                                                                      self.e])
                                                         )
		... ...
主要看_create_subject_public_key_info,在.../site-packages/Crypto/PublicKey/__init__.py(63行):
def _create_subject_public_key_info(algo_oid, secret_key, params=None):
    if params is None:
        params = DerNull()
    spki = DerSequence([
                DerSequence([
                    DerObjectId(algo_oid),
                    params]),
                DerBitString(secret_key)
                ])
    return spki.encode()
即会编码成一个嵌套数组,最终转化为DER时会是平坦化后的spki。另附上RFC 3447说明:
      RSAPublicKey ::= SEQUENCE {
          modulus           INTEGER,  -- n
          publicExponent    INTEGER   -- e
      }
手撕RSA公钥
过程和私钥的差不多,就略着讲了,首先是拿二进制:
30819f300d06092a864886f70d010101050003818d0030818902818100a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef0203010001
然后拆分成:
30819f 		# Begin Main Sequence: len=0x9f
300d		# Begin Sub1 Sequence: len=0x0d
0609		# algo_oid: (1.2.840.113549.1.1.1  - PKCSv1.2)
2a864886f70d010101
0500		# params: (null)
			# End Sub1 Sequence
03818d		# BitString: len=0x8d ([n, e])
00308189	# Begin Sub2 Sequence: len=0x89
028181		# n:
00a0d154d5bf97c40f7797b44819d09c608fa4b5c38e70d83bc13267138c6eff4c1aacefe3ddb571e1b41d911c7ab6136cf90493189563450e1f4270cabbc4207c54c4da7b84a20311cfbbabe82b9fe60bdf48a08d57839d0cdf9464d84262bcc06bc308095a6987f60ad07d669a312b5a7e4133213788eecf25863248b91349ef
0203		# e:
010001
			# End Sub2 Sequence
			# End Main Sequence
另外,关于algo_oid(OBJECT IDENTIFIER)的Hex编码还是有点迷,可以参考这里。
参考
https://www.shangyang.me/2017/05/24/encrypt-rsa-keyformat/
https://docs.microsoft.com/en-us/windows/win32/seccertenroll/about-introduction-to-asn-1-syntax-and-encoding
https://docs.microsoft.com/en-us/windows/win32/seccertenroll/about-encoded-tag-bytes
https://datatracker.ietf.org/doc/html/rfc3447
https://crypto.stackexchange.com/questions/29115/how-is-oid-2a-86-48-86-f7-0d-parsed-as-1-2-840-113549
https://www.alvestrand.no/objectid/
原文链接:https://tover.xyz/p/pem-by-hand/