程式師世界 >> 編程語言 >> C語言 >> C++ >> C++入門知識 >> 結構體最後的長度為0或1數組的作用

結構體最後的長度為0或1數組的作用

編輯：C++入門知識

其實很早在看LINUX下就看到這個東西，後來在MFC內存池裡同樣也看到了類似的東西，還依照MFC寫過一個類似的小內存池，（MFC用的是return this + 1）後來在李先靜的《系統程序員成長計劃》裡看到了類似的定義，於是心裡想著總結一下，結果發現網上已經有牛人總結的很好了，於是乎就轉了過來，謝謝你們的分享，這是我前進的動力！
同時，需要引起注意的：ISO/IEC 9899-1999裡面，這麼寫是非法的，這個僅僅是GNU C的擴展，gcc可以允許這一語法現象的存在。但最新的C/C++不知道是否可以，我沒有測試過。（C99允許。微軟的VS系列報一個WARNING，即非常的標准擴展。）
結構體最後使用0或1的長度數組的原因，主要是為了方便的管理內存緩沖區，如果你直接使用指針而不使用數組，那麼，你在分配內存緩沖區時，就必須分配結構體一次，然後再分配結構體內的指針一次，（而此時分配的內存已經與結構體的內存不連續了，所以要分別管理即申請和釋放）而如果使用數組，那麼只需要一次就可以全部分配出來，（見下面的例子），反過來，釋放時也是一樣，使用數組，一次釋放，使用指針，得先釋放結構體內的指針，再釋放結構體。還不能顛倒次序。
其實就是分配一段連續的的內存，減少內存的碎片化。

標題結構體最後的長度為0或者1的數組選擇自googol4u 的Blog

在Linux系統裡，/usr/include/linux/if_pppox.h裡面有這樣一個結構：
struct pppoe_tag {
    __u16 tag_type;
    __u16 tag_len;
    char tag_data[0];
} __attribute ((packed));
最後一個成員為可變長的數組，對於TLV（Type-Length-Value）形式的結構，或者其他需要變長度的結構體，用這種方式定義最好。使用起來非常方便，創建時，malloc一段結構體大小加上可變長數據長度的空間給它，可變長部分可按數組的方式訪問，釋放時，直接把整個結構體free掉就可以了。例子如下：
struct pppoe_tag *sample_tag;
__u16 sample_tag_len = 10;
sample_tag = (struct pppoe_tag *)malloc(sizeof(struct pppoe_tag)+sizeof(char)*sample_tag_len);
sample_tag->tag_type = 0xffff;
sample_tag->tag_len = sample_tag_len;
sample_tag->tag_data[0]=....
...
釋放時，
free(sample_tag)

是否可以用char *tag_data 代替呢？其實它和char *tag_data 是有很大的區別，為了說明這個問題，我寫了以下的程序： www.2cto.com
例1：test_size.c
10 struct tag1
20 {
30 int a;
40 int b;
50 }__attribute ((packed));
60
70 struct tag2
80 {
90 int a;
100 int b;
110 char *c;
120 }__attribute ((packed));
130
140 struct tag3
150 {
160 int a;
170 int b;
180 char c[0];
190 }__attribute ((packed));
200
210 struct tag4
220 {
230 int a;
240 int b;
250 char c[1];
260 }__attribute ((packed));
270
280 int main()
290 {
300 struct tag2 l_tag2;
310 struct tag3 l_tag3;
320 struct tag4 l_tag4;
330
340 memset(&l_tag2,0,sizeof(struct tag2));
350 memset(&l_tag3,0,sizeof(struct tag3));
360 memset(&l_tag4,0,sizeof(struct tag4));
370
380 printf("size of tag1 = %d\n",sizeof(struct tag1));
390 printf("size of tag2 = %d\n",sizeof(struct tag2));
400 printf("size of tag3 = %d\n",sizeof(struct tag3));
410
420 printf("l_tag2 = %p,&l_tag2.c = %p,l_tag2.c = %p\n",&l_tag2,&l_tag2.c,l_tag2.c);
430 printf("l_tag3 = %p,l_tag3.c = %p\n",&l_tag3,l_tag3.c);
440 printf("l_tag4 = %p,l_tag4.c = %p\n",&l_tag4,l_tag4.c);
450 exit(0);
460 }

__attribute ((packed)) 是為了強制不進行4字節對齊，這樣比較容易說明問題。
程序的運行結果如下：
size of tag1 = 8
size of tag2 = 12
size of tag3 = 8
size of tag4 = 9
l_tag2 = 0xbffffad0,&l_tag2.c = 0xbffffad8,l_tag2.c = (nil)
l_tag3 = 0xbffffac8,l_tag3.c = 0xbffffad0
l_tag4 = 0xbffffabc,l_tag4.c = 0xbffffac4

從上面程序和運行結果可以看出：tag1本身包括兩個32位整數，所以占了8個字節的空間。tag2包括了兩個32位的整數，外加一個char *的指針，所以占了12個字節。tag3才是真正看出char c[0]和char *c的區別，char c[0]中的c並不是指針，是一個偏移量，這個偏移量指向的是a、b後面緊接著的空間，所以它其實並不占用任何空間。tag4更加補充說明了這一點。所以，上面的struct pppoe_tag的最後一個成員如果用char *tag_data定義，除了會占用多4個字節的指針變量外，用起來會比較不方便：
方法一，創建時，可以首先為struct pppoe_tag分配一塊內存，再為tag_data分配內存，這樣在釋放時，要首先釋放tag_data占用的內存，再釋放pppoe_tag占用的內存；
方法二，創建時，直接為struct pppoe_tag分配一塊struct pppoe_tag大小加上tag_data的內存，從例一的420行可以看出，tag_data的內容要進行初始化，要讓tag_data指向strct pppoe_tag後面的內存。
struct pppoe_tag {
    __u16 tag_type;
    __u16 tag_len;
    char *tag_data;
} __attribute ((packed));

struct pppoe_tag *sample_tag;
__u16 sample_tag_len = 10;
方法一：
sample_tag = (struct pppoe_tag *)malloc(sizeof(struct pppoe_tag));
sample_tag->tag_len = sample_tag_len;
sample_tag->tag_data = http://www.cnblogs.com/winkyao/archive/2012/02/14/malloc(sizeof(char)*sample_tag_len);
sample_tag->tag_data[0]=...
釋放時：
free(sample_tag->tag_data);
free(sample_tag);

方法二：
sample_tag = (struct pppoe_tag *)malloc(sizeof(struct pppoe_tag)+sizeof(char)*sample_tag_len);
sample_tag->tag_len = sample_tag_len;
sample_tag->tag_data = http://www.cnblogs.com/winkyao/archive/2012/02/14/((char *)sample_tag)+sizeof(struct pppoe_tag);
sample_tag->tag_data[0]=...
釋放時：
free(sample_tag);
所以無論使用那種方法，都沒有char tag_data[0]這樣的定義來得方便。
講了這麼多，其實本質上涉及到的是一個C語言裡面的數組和指針的區別問題（也就是我們提到的內存管理問題，數組分配的是在結構體空間地址後一段連續的空間，而指針是在一個隨機的空間分配的一段連續空間）。char a[1]裡面的a和char *b的b相同嗎？《Programming Abstractions in C》（Roberts, E. S.，機械工業出版社，2004.6）82頁裡面說：“arr is defined to be identical to &arr[0]”。也就是說，char a[1]裡面的a實際是一個常量，等於&a[0]。而char *b是有一個實實在在的指針變量b存在。所以，a=b是不允許的，而b=a是允許的。兩種變量都支持下標式的訪問，那麼對於a[0]和b[0]本質上是否有區別？我們可以通過一個例子來說明。
例二：
10 #include <stdio.h>
20 #include <stdlib.h>
30
40 int main()
50 {
60 char a[10];
70 char *b;
80
90 a[2]=0xfe;
100 b[2]=0xfe;
110 exit(0);
120 }

編譯後，用objdump可以看到它的匯編：
080483f0 <main>:
80483f0: 55 push %ebp
80483f1: 89 e5 mov %esp,%ebp
80483f3: 83 ec 18 sub $0x18,%esp
80483f6: c6 45 f6 fe movb $0xfe,0xfffffff6(%ebp)
80483fa: 8b 45 f0 mov 0xfffffff0(%ebp),%eax
80483fd: 83 c0 02 add $0x2,%eax
8048400: c6 00 fe movb $0xfe,(%eax)
8048403: 83 c4 f4 add $0xfffffff4,%esp
8048406: 6a 00 push $0x0
8048408: e8 f3 fe ff ff call 8048300 <_init+0x68>
804840d: 83 c4 10 add $0x10,%esp
8048410: c9 leave
8048411: c3 ret
8048412: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi
8048419: 8d bc 27 00 00 00 00 lea 0x0(%edi,1),%edi

可以看出，a[2]＝0xfe是直接尋址，直接將0xfe寫入&a[0]+2的地址，而b[2]=0xfe是間接尋址，先將b的內容（地址）拿出來，加2，再0xfe寫入計算出來的地址。所以a[0]和b[0]本質上是不同的。
但當數組作為參數時，和指針就沒有區別了。
int do1(char a[],int len);
int do2(char *a,int len);
這兩個函數中的a並無任何區別。都是實實在在存在的指針變量。
順便再說一下，對於struct pppoe_tag的最後一個成員的定義是char tag_data[0]，某些編譯器不支持長度為0的數組的定義，在這種情況下，只能將它定義成char tag_data[1]，使用方法相同。
在openoffice的源代碼中看到如下數據結構，是一個unicode字符串結構，他的最後就用長度為1數組，可能是為了兼容或者跨編譯器。

typedef struct _rtl_uString
{
    sal_Int32 refCount;
    sal_Int32 length;
    sal_Unicode buffer[1];
} rtl_uString;
這是不定長字符串。大概意思是這樣：

rtl_uString * str = malloc(256);
str->length = 256;
str->buffer現在就指向一個長度為256 - 8的緩沖區



總結：通過上面的轉載的文章，可以清晰的發現，這種方法的優勢其實就是為了簡化內存的管理，我們假設在理想的內存狀態下，那麼分配的內存空間，可以是按序下來的（當然，實際因為內存碎片等的原因會不同的）我們可以利用最後一個數組的指針直接無間隔的跳到分配的數組緩沖區，這在LINUX下非常常見，在WINDOWS下的我只是在MFC裡見過類似的，別的情況下記不清楚了，只記得MFC裡的是這麼講的，可以用分配的結構體的指針（this）直接+1（詳細的方法請看我的博客:CE分類裡的：內存池技術的應用和詳細說明），就跳到實際的內存空間，當初也是想了半天，所以說，很多東西看似很復雜，其實都是基礎的東西，要好好打實基礎，這才是萬丈高樓拔地巍峨的前提和保障，學習亦是如是，切忌好高骛遠，應該腳踏實地，一步一步的向前走，而且要不時的總結自己的心得和體會，理論和實踐不斷的相互印證，才能夠走得更遠，看到更美麗的風景。
最後，再次感謝網上無私共享的童鞋們！！！

柔性數組結構成員收藏【柔性數組結構成員
　　C99中，結構中的最後一個元素允許是未知大小的數組，這就叫做柔性數組成員，但結構中的柔性數組成員前面必須至少一個其他成員。柔性數組成員允許結構中包含一個大小可變的數組。sizeof返回的這種結構大小不包括柔性數組的內存。包含柔性數組成員的結構用malloc ()函數進行內存的動態分配，並且分配的內存應該大於結構的大小，以適應柔性數組的預期大小。】
C語言大全，“柔性數組成員”

【柔性數組結構成員
　　C99中，結構中的最後一個元素允許是未知大小的數組，這就叫做柔性數組成員，但結構中的柔性數組成員前面必須至少一個其他成員。柔性數組成員允許結構中包含一個大小可變的數組。sizeof返回的這種結構大小不包括柔性數組的內存。包含柔性數組成員的結構用malloc ()函數進行內存的動態分配，並且分配的內存應該大於結構的大小，以適應柔性數組的預期大小。】
C語言大全，“柔性數組成員”

看看C99 標准中靈活數組成員：
結構體變長的妙用——0個元素的數組
有時我們需要產生一個結構體，實現了一種可變長度的結構。如何來實現呢？
看這個結構體的定義：
typedef struct st_type
{
int nCnt;
int item[0];
}type_a;
（有些編譯器會報錯無法編譯可以改成：）
typedef struct st_type
{
int nCnt;
int item[];
}type_a;
這樣我們就可以定義一個可變長的結構，用sizeof(type_a)得到的只有4，就是sizeof(nCnt)=sizeof(int)那
個0個元素的數組沒有占用空間，而後我們可以進行變長操作了。
C語言版：
type_a *p = (type_a*)malloc(sizeof(type_a)+100*sizeof(int));
C++語言版:
type_a *p = (type_a*)new char[sizeof(type_a)+100*sizeof(int)];
這樣我們就產生了一個長為100的type_a類型的東西用p->item[n]就能簡單地訪問可變長元素，原理十分簡單
，分配了比sizeof(type_a)多的內存後int item[];就有了其意義了，它指向的是int nCnt;後面的內容，是沒
有內存需要的，而在分配時多分配的內存就可以由其來操控，是個十分好用的技巧。
而釋放同樣簡單：
C語言版：
free(p);
C++語言版：
delete []p;
其實這個叫靈活數組成員(fleible array member)C89不支持這種東西，C99把它作為一種特例加入了標准。但
是，C99所支持的是incomplete type，而不是zero array，形同int item[0];這種形式是非法的，C99支持的
形式是形同int item[];只不過有些編譯器把int item[0];作為非標准擴展來支持，而且在C99發布之前已經有
了這種非標准擴展了，C99發布之後，有些編譯器把兩者合而為一。
下面是C99中的相關內容：
6.7.2.1 Structure and union specifiers

As a special case, the last element of a structure with more than one named member may have

an incomplete array type; this is called a flexible array member. With two exceptions, the

flexible array member is ignored. First, the size of the structure shall be equal to the offset

of the last element of an otherwise identical structure that replaces the flexible array member

with an array of unspecified length.106) Second, when a . (or ->) operator has a left operand

that is (a pointer to) a structure with a flexible array member and the right operand names that

member, it behaves as if that member were replaced with the longest array (with the same element

type) that would not make the structure larger than the object being accessed; the offset of the

array shall remain that of the flexible array member, even if this would differ from that of the

replacement array. If this array would have no elements, it behaves as if it had one element but

the behavior is undefined if any attempt is made to access that element or to generate a pointer

one past it.
例如在VC++6裡使用兩者之一都能通過編譯並且完成操作，而會產生warning C4200: nonstandard extension

used : zero-sized array in struct/union的警告消息。
而在DEVCPP裡兩者同樣可以使用，並且不會有警告消息