最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【未解决】Selenium中如何打开url后忽略加载部分资源

selenium crifan 3249浏览 0评论

用Selenium调试打开页面:

https://www.befrugal.com/home/

时,网页加载要好半天才能结束:

2017/11/26 12:36:51 LINE 115  INFO    open befrugalHomeUrl=https://www.befrugal.com/home/

2017/11/26 12:37:16 LINE 121  INFO    Found Login button

花了27秒,

但是实际上我此处需要的login的按钮,估计也需要就3,5秒,早就显示出来了:

但是(左下角)会显示

正在登录 d2z2x9m6jf98op.couldfront.net的响应

去Chrome调试发现很多都是:

https://d2z2x9m6jf98op.cloudfront.net/Large/thatsitnutrition.jpg

之类的图片。

此处希望:

在已经得到要显示的元素后,可以手动指定忽略加载某些url

比便于加速

或者是driver.get去加载url时,不用等待url完全加载完毕就进行后续处理

现在的逻辑貌似是:

url加载都是完全加载,然后才回去查找对应的元素

selenium get url omit some resource

selenium load url omit

Selenium WebDriver go to page without waiting for page load – Stack Overflow

仅限于firefox,感觉不容易保证准确性

Selenium Webdriver c# without waiting for page to load – Stack Overflow

IE和Firefox

感觉不容易控制准确性。

那看看是否可以忽略某些url比如上述的地址

Selenium WebDriver: After launching URL using driver.get(), Selenium takes too long to perform the next action in the java code · Issue #5025 · SeleniumHQ/selenium-google-code-issue-archive

URL Loading in Selenium Webdriver: All about get() and navigate() – Make Selenium Easy

试试:

driver.manage().timeouts().pageLoadTimeout(30, TimeUnit.SECONDS);

结果此处没有driver.manage或driver.timeout

只有:

driver.set_page_load_timeout()

但是此处还是不能去设置,否则超过了比如设置的20秒,实际上页面可以30秒后加载出来的,结果却异常,也不行。

How to load a URL in browser without using get() or navigate() method – Make Selenium Easy

Getting Started with Selenium for Automated Website Testing – The Sauce Labs Cookbook – Sauce Labs Documentation Wiki

How To Remove Third Party Resources

selenium load url omit some resource

java – Selenium Webdrivers: Load Page without any resources – Stack Overflow

说是可以用proxy去block这种url

去试试

java – prevent external content to be loaded in selenium webdriver test – Stack Overflow

selenium python black list

selenium webdriver – How to disable loading external urls on seleniumlibrary/robotframework – Stack Overflow

Selenium python should have a way to ‘blacklist’ aka mock responses · Issue #1007 · SeleniumHQ/selenium-google-code-issue-archive

selenium2 – How do you block Google Analytics from Selenium automated visits? – Software Quality Assurance & Testing Stack Exchange

WhiteList and BlackList are not working with the new BrowserMobProxy · Issue #279 · lightbody/browsermob-proxy

How To Remove Third Party Resources

7. WebDriver API — Selenium Python Bindings 2 documentation

好像只有firefox才有proxy?

Exclude Selenium WebDriver traffic from Google Analytics | Yi Zeng’s Blog

jarib/browsermob-proxy-rb: Ruby client for the BrowserMob Proxy

lightbody/browsermob-proxy: A free utility to help web developers watch and manipulate network traffic from their AJAX applications.

Proxy support · Issue #97 · mozilla/geckodriver

selenium python proxy

python – Running selenium behind a proxy server – Stack Overflow

Setting up proxy with selenium / python – Stack Overflow

AutomatedTester/browsermob-proxy-py: A python wrapper for Browsermob Proxy

但是好像还要单独起proxy的sever服务,还是很麻烦

How to Run Web Drivers with Proxies in Python – John Patrick Roach

可以用代理,但是没有blask list的功能。

selenium black list

selenium browsermob-proxy-py phantomjs

server.start() gives error · Issue #52 · AutomatedTester/browsermob-proxy-py

selenium+browsermob network网络爬取 – 色彩时光

Browsermob-proxy by lightbody

Selenium Phantomjs Proxy – pythonexample.com

Welcome to BrowserMob Proxy’s documentation! — BrowserMob Proxy 0.6.0 documentation

再去看:

https://github.com/lightbody/browsermob-proxy

里面说了,可以作为 embed模式运行的

再去

https://bmp.lightbody.net

看简介:

<code>BrowserMob Proxy
A free utility to help web developers watch and manipulate network traffic from their AJAX applications.

</code>

About BrowserMob Proxy (BMP)

BrowserMob proxy is based on technology developed in the Selenium open source project and a commercial load testing and monitoring service originally called BrowserMob and now part of Neustar.

The proxy is a free (Apache 2.0 license) utility that works well with Selenium or can be used independently.

It can capture performance data for web apps (via the HAR format), as well as manipulate browser behavior and traffic, such as whitelisting and blacklisting content, simulating network traffic and latency, and rewriting HTTP requests and responses.

先去:

https://bmp.lightbody.net

https://github.com/lightbody/browsermob-proxy/releases

下载bin文件:

browsermob-proxy-2.1.4-bin.zip

解压得到:

再去参考:

官网示例

https://github.com/lightbody/browsermob-proxy#using-with-selenium

和别人代码:

https://zhuanlan.zhihu.com/p/26238280

https://pythonexample.com/code/selenium-phantomjs-proxy/

去试试

直接去安装:

<code>pip install browsermob-proxy
</code>

结果安装了很多:

<code>➜  ms_store_auto_order git:(master) ✗ pip install browsermob-proxy
Collecting browsermob-proxy
  Downloading browsermob_proxy-0.8.0-py2-none-any.whl
Collecting requests&gt;=2.9.1 (from browsermob-proxy)
  Downloading requests-2.18.4-py2.py3-none-any.whl (88kB)
    100% |████████████████████████████████| 92kB 40kB/s
Collecting certifi&gt;=2017.4.17 (from requests&gt;=2.9.1-&gt;browsermob-proxy)
  Downloading certifi-2017.11.5-py2.py3-none-any.whl (330kB)
    100% |████████████████████████████████| 337kB 11kB/s
Collecting chardet&lt;3.1.0,&gt;=3.0.2 (from requests&gt;=2.9.1-&gt;browsermob-proxy)
  Downloading chardet-3.0.4-py2.py3-none-any.whl (133kB)
    100% |████████████████████████████████| 143kB 12kB/s
Collecting idna&lt;2.7,&gt;=2.5 (from requests&gt;=2.9.1-&gt;browsermob-proxy)
  Downloading idna-2.6-py2.py3-none-any.whl (56kB)
    100% |████████████████████████████████| 61kB 9.6kB/s
Collecting urllib3&lt;1.23,&gt;=1.21.1 (from requests&gt;=2.9.1-&gt;browsermob-proxy)
  Downloading urllib3-1.22-py2.py3-none-any.whl (132kB)
    100% |████████████████████████████████| 133kB 24kB/s
Installing collected packages: certifi, chardet, idna, urllib3, requests, browsermob-proxy
Successfully installed browsermob-proxy-0.8.0 certifi-2017.11.5 chardet-3.0.4 idna-2.6 requests-2.18.4 urllib3-1.22
</code>

好像就用不到官网下载的二进制了。

结果代码:

<code>from browsermobproxy import Server
</code>

proxy = None

if gCfg[“useProxy”]:

    # use browsermob-proxy to black list some url/resource to speed up

    proxyFullPath = os.path.join(os.getcwd(), “libs/proxy/browsermob-proxy”)

    logging.debug(“proxyFullPath=%s”, proxyFullPath)

    proxyServer = Server(proxyFullPath)

    logging.debug(“proxyServer=%s”, proxyServer)

    proxyServer.start()

    proxy = proxyServer.create_proxy()

    logging.debug(“proxy=%s”, proxy)

    # proxy.blacklist([“http://.*/.*.css.*”,”http://.*/.*.jpg.*”,”http://.*/.*.png.*”,”http://.*/.*.gif.*”],200)

    proxy.blacklist([“https?://\w+.cloudfront.net/.*],200)

if gCfg[“webdriver”] == “chrome”:

    # driver = webdriver.Chrome()

    chromeOptions = webdriver.ChromeOptions()

    logging.info(“chromeOptions=%s”, chromeOptions)

    if gCfg[“useProxy”]:

        chromeOptions.add_argument(‘–proxy-server={host}:{port}’.format(host=“localhost”, port=proxy.port))

    driver = webdriver.Chrome(chrome_options=chromeOptions)

elif gCfg[“webdriver”] == “phantomjs”:

    # driver = webdriver.PhantomJS()

    desiredCapabilities = DesiredCapabilities.PHANTOMJS.copy()

    logging.debug(“desiredCapabilities=%s”, desiredCapabilities)

    desiredCapabilities[‘phantomjs.page.customHeaders.User-Agent’] = ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) ‘ \

                                                                     ‘AppleWebKit/537.36 (KHTML, like Gecko) ‘ \

                                                                     ‘Chrome/39.0.2171.95 Safari/537.36’

    logging.debug(“desiredCapabilities=%s”, desiredCapabilities)

    driver = webdriver.PhantomJS(desired_capabilities=desiredCapabilities)

结果运行出错:

<code>    proxyServer.start()
  File "/usr/local/lib/python2.7/site-packages/browsermobproxy/server.py", line 127, in start
    raise ProxyServerError("Can't connect to Browsermob-Proxy")
browsermobproxy.exceptions.ProxyServerError: Can't connect to Browsermob-Proxy
</code>

无意间发现,多了个log文件:server.log

<code>Error: Unable to access jarfile ...libs/lib/browsermob-dist-2.1.4.jar
</code>

好像是路径不对?

然后再去看看,是不是必须是jar包

看到:

不是路径,而是找不到对应的jar包

所以整个解压缩后的文件都放进去

通过:

代码:

好像有显示等待代理

结果试了半天,还是不行:

算了。放弃。

主要是:

本身很麻烦。

并且是Proxy Server启动的时间就很慢,10秒左右

create_proxy也很慢,也有5秒以上。

-》失去了加速的目的。

转载请注明:在路上 » 【未解决】Selenium中如何打开url后忽略加载部分资源

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
85 queries in 0.166 seconds, using 22.09MB memory