最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】java或android的网络(爬虫)访问中,如何设置参数,以禁止自动跳转

Java crifan 2747浏览 0评论

已经有了对应的java代码去访问网络了:

    /** Get response from url, headerDict, postDict */
    public HttpResponse getUrlResponse(String url,
    							HttpParams headerParams,
								List<NameValuePair> postDict)
    {
    	// init
    	HttpResponse response = null;
    	HttpUriRequest request = null;
    	DefaultHttpClient httpClient = new DefaultHttpClient();

    	if(postDict != null)
    	{
    		HttpPost postReq = new HttpPost(url);
    		
    		try{
    			HttpEntity postBodyEnt = new UrlEncodedFormEntity(postDict);
    			postReq.setEntity(postBodyEnt);
    		}
    		catch(Exception e){
    			
    		}

    		request = postReq;
    	}
    	else
    	{
        	HttpGet getReq = new HttpGet(url);
        	
        	request = getReq;
    	}

    	if(headerParams != null)
    	{
    		request.setParams(headerParams);
    	}

		try{			
			HttpContext localContext = new BasicHttpContext();
			localContext.setAttribute(ClientContext.COOKIE_STORE, localCookies);
			response = httpClient.execute(request, localContext);
			List<Cookie> respCookieList = localCookies.getCookies();
			System.out.println("Cookies for " + url);
			for(Cookie ck : respCookieList)
			{
				System.out.println(ck);
			}
			
        } catch (ClientProtocolException cpe) {
            // TODO Auto-generated catch block
        	cpe.printStackTrace();    
        } catch (IOException ioe) {
            // TODO Auto-generated catch block
        	ioe.printStackTrace();
        }
		
    	return response;
    }

但是此处想要支持,设置参数,去禁止掉自动跳转。

【解决过程】

1.网上找了下,参考:

How to prevent apache http client from following a redirect

->

Chapter 5. HTTP client service

相关的设置参数是:

ClientPNames.HANDLE_REDIRECTS=’http.protocol.handle-redirects’: defines whether redirects should be handled automatically. This parameter expects a value of type java.lang.Boolean. If this parameter is not set HttpClient will handle redirects automatically.

但是貌似还是不知道如何写此处的代码。

2.参考:

How to process the redirect in post method using HttpClient?

可以写出此处的代码:

    /** Get response from url, headerDict, postDict */
    public HttpResponse getUrlResponse(String url,
    							HttpParams headerParams,
								List<NameValuePair> postDict)
    {
    	// init
    	HttpResponse response = null;
    	HttpUriRequest request = null;
    	DefaultHttpClient httpClient = new DefaultHttpClient();
    	
    	//disable auto redirect
    	httpClient.getParams().setParameter(ClientPNames.HANDLE_REDIRECTS, Boolean.FALSE);

    	if(postDict != null)

但是貌似不是此处想要的,通过

HttpParams headerParams

去设置的。

3.参考:

[jira] [Updated] (HTTPCLIENT-1312) Decompressing on redirects with redirection support off doesn’t work properly

写出代码:

    /** Get response from url, headerDict, postDict */
    public HttpResponse getUrlResponse(String url,
    							HttpParams headerParams,
								List<NameValuePair> postDict)
    {
    	// init
    	HttpResponse response = null;
    	HttpUriRequest request = null;
    	DefaultHttpClient httpClient = new DefaultHttpClient();
    	
    	//disable auto redirect
    	//httpClient.getParams().setParameter(ClientPNames.HANDLE_REDIRECTS, Boolean.FALSE);
    	
    	headerParams.setBooleanParameter(ClientPNames.HANDLE_REDIRECTS, Boolean.FALSE);

就可以实现所需要的,通过headerParams去设置对应的参数了。

【总结】

java/android中,是通过HttpParams去设置对应的ClientPNames.HANDLE_REDIRECTS为Boolean.FALSE,以实现禁止自动跳转的。


然后后续的,就是再去完善对应的函数,把别处传递到getUrlResponse的HttpParams,都一点点支持了。

比如外部设置:

autoredirect为false

然后此处就添加上这句:

headerParams.setBooleanParameter(ClientPNames.HANDLE_REDIRECTS, Boolean.FALSE);

以及再陆续添加其他header参数的支持。

转载请注明:在路上 » 【已解决】java或android的网络(爬虫)访问中,如何设置参数,以禁止自动跳转

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.178 seconds, using 22.11MB memory