• 问与答Python
  • Python urllib.request.urlopen 参数为 file 协议时的奇怪行为

操作系统是 Arch Linux 5.11.16-arch-1。 我系统的 docker-compose 用的时候会报错。到官网上下载的docker-compose一样会报错。具体报错不是重点。debug一番发现是jsonschema校验docker-compose.yml时用urllib.request.urlopen去加载本地的json文件时抛出的异常。

根据python文档,这个函数是能用在file协议上的。但是我的电脑就会报错。。。。。
一个最小复现

python -c 'import urllib.request; print(urllib.request.urlopen("file:///dev/urandom").read(2))'

报错

raceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.9/urllib/request.py", line 517, in open
    response = self._open(req, data)
  File "/usr/lib/python3.9/urllib/request.py", line 528, in _open
    result = self._call_chain(self.handle_open, 'default',
  File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 802, in <lambda>
    meth(r, proxy, type))
  File "/usr/lib/python3.9/urllib/request.py", line 830, in proxy_open
    return self.parent.open(req, timeout=req.timeout)
  File "/usr/lib/python3.9/urllib/request.py", line 517, in open
    response = self._open(req, data)
  File "/usr/lib/python3.9/urllib/request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.9/urllib/request.py", line 1375, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.9/urllib/request.py", line 1350, in do_open
    r = h.getresponse()
  File "/usr/lib/python3.9/http/client.py", line 1345, in getresponse
    response.begin()
  File "/usr/lib/python3.9/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.9/http/client.py", line 276, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

然后我用docker起了个ubuntu,用apt安了python3.8.5跑上面这一句能正常运行。
有用docker起了个archlinux,安了源里最新的python也能正常运行。

又试了试在host用pyenv新安装了ubuntu同版本的3.8.5, 仍然出现上面的报错。
因为pyenv是装在home下的,我可以放心大胆的改标准库。

diff --git a/versions/3.8.5/lib/python3.8/urllib/request.py b/versions/3.8.5/lib/python3.8/urllib/request.py
index e440738..0530957 100644
--- a/versions/3.8.5/lib/python3.8/urllib/request.py
+++ b/versions/3.8.5/lib/python3.8/urllib/request.py
@@ -99,6 +99,7 @@ import contextlib
 import warnings
 
 
+from icecream import ic
 from urllib.error import URLError, HTTPError, ContentTooShortError
 from urllib.parse import (
     urlparse, urlsplit, urljoin, unwrap, quote, unquote,
@@ -195,6 +196,7 @@ def urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
     installed and makes sure the requests are handled through the proxy.
 
     '''
+    ic(url, 'urlopen')
     global _opener
     if cafile or capath or cadefault:
         import warnings
@@ -505,6 +507,7 @@ class OpenerDirector:
 
     def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
         # accept a URL or a Request object
+        ic(fullurl, 'open')
         if isinstance(fullurl, str):
             req = Request(fullurl, data)
         else:

urlopen这个函数会调用opener.open https://github.com/python/cpython/blob/3.8/Lib/urllib/request.py#L222
我在urlopen和opener.open两个地方分别加了log,发现了奇怪的事情。

一行代码调用一次urlopen竟然会调用两次open,一次fullurl是字符串,多出来一次是Request instance, 最致命的是这个instance的type属性是‘http’具体代码这会导致后面的逻辑走http协议的,然后就炸了。

这个调用两次open的问题在我电脑上尝试了两个python版本都复现了。可是docker,别的电脑都没复现成功。之后也试了一下http的网址,就不会出现调用两次open的问题。

Python 3.9.3 (default, Apr  8 2021, 23:35:02) 
[GCC 10.2.0]

╰─❯ python -VV                                                                                     
Python 3.8.5 (default, Apr 24 2021, 13:00:31) 
[GCC 10.2.0]

但是网上好像也有人跟我遇到同一个问题 https://github.com/Julian/jsonschema/issues/714

流水帐写到这,我已经不知道要怎么debug了,不知论坛里有没有高手知道怎么回事。

至于docker-compose的问题就先暂时手动改jsonschema包的代码解决吧。。。

Weird.

检查一下你的机器上是否配置了任何proxy?也就是http_proxyhttps_proxy环境变量。

猜测其中一个open是在尝试打开http proxy

    Miigon

    没有配任何proxy。真的很奇怪,但是我电脑上(only我的电脑上)每次都能复现。

    难道有什么程序注册了 file 协议的 scheme?(瞎猜的

    7 个月 后

    © 2018-2025 0xFFFF