程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
 程式師世界 >> 編程語言 >> C語言 >> VC >> 關於VC++ >> URL編碼

URL編碼

編輯:關於VC++

本文的目的是設計一個完成URL編碼的C++類。在我曾經的項目中,我需要從 VC++ 6.0應用程序中POST數據,而這些數據需要進行URL編碼。我在MSDN中查找 能根據提供的字符串生成URL編碼的相關類或API,但我沒有找到,因此我必須設 計一個自己的URLEncode C++類。

URLEncoder.exe是一個使用URLEncode類的MFC對話框程序。

如何處理

一些特殊字符在Internet上傳送是件棘手的事情, 經URL編碼特殊處理,可以 使所有字符安全地從Internet傳送。

例如,回車的ASCII值是13,在發送FORM數據時候這就認為是一行數據的結束 。

通常,所有應用程序采用HTTP或HTTPS協議在客戶端和服務器端傳送數據。服 務器端從客戶端接收數據有兩種基本方法:

1、數據可以從HTTP頭傳送(COOKIES或作為FORM數據發送)

2、可以包含在URL中的查詢部分

當數據包含在URL,它必須遵循URL語法進行編碼。在WEB服務器端,數據自動 解碼。考慮一下下面的URL,哪個數據是作為查詢參數。

例如:http://WebSite/ResourceName?Data=Data

WebSite是URL名稱

ResourceName可以是ASP或Servlet名稱

Data是需要發送的數據。如果MIME類型是Content-Type: application/x- www-form-urlencoded,則要求進行編碼。

RFC 1738

RFC 1738指明了統一資源定位(URLs)中的字符應該是US-ASCII字符集的子集 。這是受HTML的限制,另一方面,允許在文檔中使用所有ISO-8859-1(ISO- Latin)字符集。這將意味著在HTML FORM裡POST的數據(或作為查詢字串的一部 分),所有HTML編碼必須被編碼。

ISO-8859-1 (ISO-Latin)字符集

在下表中,包含了完整的ISO-8859-1 (ISO-Latin)字符集,表格提供了每個 字符范圍(10進制),描述,實際值,十六進制值,HTML結果。某個范圍中的字 符是否安全。

Character range(decimal) Type Values Safe/Unsafe 0-31 ASCII Control Characters These characters are not printable Unsafe 32-47 Reserved Characters '' ''!?#$%&''()*+,-./ Unsafe 48-57 ASCII Characters and Numbers 0-9 Safe 58-64 Reserved Characters :;<=>?@ Unsafe 65-90 ASCII Characters A-Z Safe 91-96 Reserved Characters [\]^_` Unsafe 97-122 ASCII Characters a-z Safe 123-126 Reserved Characters {|}~ Unsafe 127 Control Characters '' '' Unsafe 128-255 Non-ASCII Characters '' '' Unsafe

所有不安全的ASCII字符都需要編碼,例如,范圍(32-47, 58-64, 91-96, 123-126)。

下表描述了這些字符為什麼不安全。 Character Unsafe Reason Character Encode "<" Delimiters around URLs in free text %3C > Delimiters around URLs in free text %3E . Delimits URLs in some systems %22 # It is used in the World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. %23 { Gateways and other transport agents are known to sometimes modify such characters %7B } Gateways and other transport agents are known to sometimes modify such characters %7D | Gateways and other transport agents are known to sometimes modify such characters %7C \ Gateways and other transport agents are known to sometimes modify such characters %5C ^ Gateways and other transport agents are known to sometimes modify such characters %5E ~ Gateways and other transport agents are known to sometimes modify such characters %7E [ Gateways and other transport agents are known to sometimes modify such characters %5B ] Gateways and other transport agents are known to sometimes modify such characters %5D ` Gateways and other transport agents are known to sometimes modify such characters %60 + Indicates a space (spaces cannot be used in a URL) %20 / Separates directories and subdirectories %2F ? Separates the actual URL and the parameters %3F & Separator between parameters specified in the URL %26

如何實現

字符的URL編碼是將字符轉換到8位16進制並在前面加上''%''前綴。例如, US-ASCII字符集中空格是10進制

的32或16進制的20,因此,URL編碼是%20。

URLEncode: URLEncode是一個C++類,來實現字符串的URL編碼。CURLEncode 類包含如下函數:

isUnsafeString

decToHex

convert

URLEncode

URLEncode()函數完成編碼過程,URLEncode檢查每個字符,看是否安全。如 果不安全將用%16進制值進行轉換並添加到原始字符串中。

代碼片斷:

class CURLEncode
{
private:
 static CString csUnsafeString;
 CString (char num, int radix);
 bool isUnsafe(char compareChar);
 CString convert(char val);
public:
 CURLEncode() { };
 virtual ~CURLEncode() { };
 CString (CString vData);
};
bool CURLEncode::isUnsafe(char compareChar)
{
 bool bcharfound = false;
 char tmpsafeChar;
 int m_strLen = 0;
 m_strLen = csUnsafeString.GetLength();
 for(int ichar_pos = 0; ichar_pos < m_strLen ;ichar_pos++)
 {
  tmpsafeChar = csUnsafeString.GetAt(ichar_pos);
  if(tmpsafeChar == compareChar)
  {
   bcharfound = true;
   break;
  }
 }
 int char_ascii_value = 0;
 //char_ascii_value = __toascii(compareChar);
 char_ascii_value = (int) compareChar;
 if(bcharfound == false && char_ascii_value > 32 &&
               char_ascii_value < 123)
 {
  return false;
 }
 // found no unsafe chars, return false
 else
 {
  return true;
 }
 return true;
}
CString CURLEncode::decToHex(char num, int radix)
{
 int temp=0;
 CString csTmp;
 int num_char;
num_char = (int) num;
 if (num_char < 0)
  num_char = 256 + num_char;
 while (num_char >= radix)
  {
  temp = num_char % radix;
  num_char = (int)floor(num_char / radix);
  csTmp = hexVals[temp];
  }
 csTmp += hexVals[num_char];
 if(csTmp.GetLength() < 2)
 {
  csTmp += ''0'';
 }
 CString strdecToHex(csTmp);
 // Reverse the String
 strdecToHex.MakeReverse();
 return strdecToHex;
}
CString CURLEncode::convert(char val)
{
 CString csRet;
 csRet += "%";
 csRet += decToHex(val, 16);
 return csRet;
}

參考:

URL編碼: .

RFC 1866: The HTML 2.0 規范 (純文本). 附錄包含了字符表: .

Web HTML 2.0 版本(RFC 1866) : .

The HTML 3.2 (Wilbur) 建議: .

The HTML 4.0 建議: .

W3C HTML 國際化區域: .

本文配套源碼

  1. 上一頁:
  2. 下一頁:
欄目導航
Copyright © 程式師世界 All Rights Reserved