程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
 程式師世界 >> 編程語言 >> .NET網頁編程 >> C# >> C#入門知識 >> c#如何采集需要登錄的頁面,

c#如何采集需要登錄的頁面,

編輯:C#入門知識

c#如何采集需要登錄的頁面,


首先說明:代碼片段是從網絡獲取,然後自己修改。我想好的東西應該拿來分享。

先說下原理:當我們采集頁面的時候,如果被采集的網站需要登錄才能采集。不管是基於Cookie還是基於Session,我們都會首先發送一個Http請求頭,這個Http請求頭裡面就包含了網站需要的Cookie信息。當網站接收到發送過來的Http請求頭時,會從Http請求頭獲取相關的Cookie或者Session信息,然後由程序來處理,決定你是否有權限訪問當前頁面。

好了,原理搞清楚了,就好辦了。我們所要做的僅僅是在采集的時候(或者說HttpWebRequest提交數據的時候),將Cookie信息放入Http請求頭裡面就可以了。

在這裡我提供2種方法。

第一種,直接將Cookie信息放入HttpWebRequest的CookieContainer裡。看代碼:

protected void Page_Load(object sender, EventArgs e)
        {
            //設置Cookie,存入Hashtable
            Hashtable ht = new Hashtable();
            ht.Add("username", "youraccount");
            ht.Add("id", "yourid");
            this.Collect(ht);
        }
        public void Collect(Hashtable ht)
        {
            string content = string.Empty;
            string url = "http://www.ibest100.com/需要登錄後才能采集的頁面";
            string host = "http://www.ibest100.com";
            try
            {
                //獲取提交的字節
                byte[] bs = Encoding.UTF8.GetBytes(content);
                //設置提交的相關參數
                HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(url);
                req.Method = "POST";
                req.ContentType = "application/json;charset=utf-8";
                req.ContentLength = bs.Length;
                //將Cookie放入CookieContainer,然後再將CookieContainer添加到HttpWebRequest
                CookieContainer cc = new CookieContainer();
                cc.Add(new Uri(host), new Cookie("username", ht["username"].ToString()));
                cc.Add(new Uri(host), new Cookie("id", ht["id"].ToString()));
                req.CookieContainer = cc;
                //提交請求數據
                Stream reqStream = req.GetRequestStream();
                reqStream.Write(bs, 0, bs.Length);
                reqStream.Close();
                //接收返回的頁面,必須的,不能省略
                WebResponse wr = req.GetResponse();
                System.IO.Stream respStream = wr.GetResponseStream();
                System.IO.StreamReader reader = new System.IO.StreamReader(respStream, System.Text.Encoding.GetEncoding("utf-8"));
                string t = reader.ReadToEnd();
                System.Web.HttpContext.Current.Response.Write(t);
                wr.Close();
            }
            catch (Exception ex)
            {
                System.Web.HttpContext.Current.Response.Write("異常在getPostRespone:" + ex.Source + ":" + ex.Message);
            }

        }

第二種,每次打開采集程序時,需要先到被采集的網站模擬登錄一次,獲取CookieContainer,然後再采集。看代碼:

protected void Page_Load(object sender, EventArgs e)
        {
            try
            {
                CookieContainer cookieContainer = new CookieContainer();
                string formatString = "username={0}&password={1}";//***************
                string postString = string.Format(formatString, "youradminaccount", "yourpassword");
                //將提交的字符串數據轉換成字節數組
                byte[] postData = Encoding.UTF8.GetBytes(postString);
                //設置提交的相關參數
                string URI = "http://www.ibest100.com/登錄頁面";//***************
                HttpWebRequest request = WebRequest.Create(URI) as HttpWebRequest;
                request.Method = "POST";
                request.KeepAlive = false;
                request.ContentType = "application/x-www-form-urlencoded";
                request.CookieContainer = cookieContainer;
                request.ContentLength = postData.Length;
                // 提交請求數據
                System.IO.Stream outputStream = request.GetRequestStream();
                outputStream.Write(postData, 0, postData.Length);
                outputStream.Close();
                //接收返回的頁面,必須的,不能省略
                HttpWebResponse response = request.GetResponse() as HttpWebResponse;
                System.IO.Stream responseStream = response.GetResponseStream();
                System.IO.StreamReader reader = new System.IO.StreamReader(responseStream, Encoding.GetEncoding("gb2312"));
                string srcString = reader.ReadToEnd();
                //打開您要訪問的頁面
                URI = "http://www.ibest100.com/需要登錄後才能采集的頁面";//***************
                request = WebRequest.Create(URI) as HttpWebRequest;
                request.Method = "GET";
                request.KeepAlive = false;
                request.CookieContainer = cookieContainer;
                // 接收返回的頁面
                response = request.GetResponse() as HttpWebResponse;
                responseStream = response.GetResponseStream();
                reader = new System.IO.StreamReader(responseStream, Encoding.GetEncoding("gb2312"));
                srcString = reader.ReadToEnd();
                //輸出獲取的頁面或者處理
                Response.Write(srcString);
            }
            catch (WebException we)
            {
                string msg = we.Message;
                Response.Write(msg);
            }
        }

也許有人會問,如果對方登錄的時候要驗證碼怎麼辦?那你就用第一種方式吧,只不過需要你分析對方的Cookie。

應用范圍:采集數據、論壇發帖、博客發文。

感謝來自網絡 的文章 編輯:dezai

轉載自:http://www.aspnetjia.com

  1. 上一頁:
  2. 下一頁:
Copyright © 程式師世界 All Rights Reserved