程式師世界 >> 編程語言 >> .NET網頁編程 >> .NET實例教程 >> .net垃圾回收和CLR 4.0對垃圾回收所做的改進之二

.net垃圾回收和CLR 4.0對垃圾回收所做的改進之二

編輯：.NET實例教程

A survey of garbage collection and the changes CLR 4.0 brings in Part 2 - serIEs of what is new in CLR 4.0

接前篇Continue the previous post .Net垃圾回收和CLR 4.0對垃圾回收所做的改進之一

CLR4.0所帶來的變化仍然沒有在這篇，請看下篇。

內存釋放和壓縮

創建對象引用圖之後，垃圾回收器將那些沒有在這個圖中的對象(即不再需要的對象)釋放。釋放內存之後, 出現了內存碎片, 垃圾回收器掃描托管堆，找到連續的內存塊，然後移動未回收的對象到更低的地址, 以得到整塊的內存，同時所有的對象引用都將被調整為指向對象新的存儲位置。這就象一個夯實的動作。

After building up the reference relationship graph, garbage collector reclaims the objects not in the graph(no longer needed), after releasing the objects not in the graph, there is memory scrap. Garbage collector scans the managed heap to find continous memory block, and shifts the remaining objects to lower address to get consecutive memory space, and then adjusts the references of objects according to the shifted address of objects. This is looking like a tamping on the managed heap.

下面要說到的是代的概念。代概念的引入是為了提高垃圾收集器的整體性能。We come to the concept of generations next. The importing of generation concept is to improve the performance of garbage collector.

代Generations

請想一想如果垃圾收集器每次總是掃描所有托管堆中的對象，對性能會有什麼影響。會不會很慢?是的。微軟因此引入了代的概念。

Please think about what will happen if garbage collector scans all the objects in the whole heap in every garbage collecting cycle. Will it be very slow? Yes, therefore Microsoft imported the concept of generations.

為什麼代的概念可以提高垃圾收集器的性能?因為微軟是基於對大量編程實踐的科學估計，做了一些假定而這些假定符合絕大多數的編程實踐:

Why generation concept can help improve performance of garbage collector? Because Microsoft did scIEntific valuation on mass of programming practice, and made assumptions and the assumptions conform to most of programming practice:

越新的對象，其生命周期越短。The newer an object is, the shorter its lifetime will be.

越老的對象，其生命周越長。The older an object is, the longer its lifetime will be.

新對象之間通常有強的關系並被同時訪問。Newer objects tend to have strong relationships to each other and are frequently Accessed around the same time.

壓縮一部分堆比壓縮整個堆快。Compacting a portion of the heap is faster than compacting the whole heap.

有了代的概念，垃圾回收活動就可以大部分局限於一個較小的區域來進行。這樣就對垃圾回收的性能有所提高。After importing the concept of generations, most of garbage collecting will be limited in in smaller range of memory. This enhances the performance of garbage collector.

讓我們來看垃圾收集器具體是怎麼實現代的: Let’s see how generations are exactly implemented in garbage collector:

第0代：新建對象和從未經過垃圾回收對象的集合 Generation 0: A collection of newly created object and the objects never collected.

第1代：在第0代收集活動中未回收的對象集合 Generation 1: A collection of objects not collected by garbage collector in collecting cycle of generation 0.

第2代：在第1和第2代中未回收的對象集合, 即垃圾收集器最高只支持到第2代, 如果某個對象在第2代的回收活動中留下來，它仍呆在第2代的內存中。 Generation 2: A collection of objects not collected by garbage collector in generation 1 and generation 2. This means the highest generation that garbage collector supports is generation 2. If an object survives in generation 2 collecting cycle, it still
remains in memory of generation 2.

當程序剛開始運行，垃圾收集器分配為每一代分配了一定的內存，這些內存的初始大小由.Net framework的策略決定。垃圾收集器記錄了這三代的內存起始地址和大小。這三代的內存是連接在一起的。第2代的內存在第1代內存之下，第1代內存在第0代內存之下。應用程序分配新的托管對象總是從第0代中分配。如果第0代中內存足夠，CLR就很簡單快速地移動一下指針，完成內存的分配。這是很快速的。當第0代內存不足以容納新的對象時，就觸發垃圾收集器工作，來回收第0代中不再需要的對象，當回收完畢，垃圾收集器就夯實第0代中沒有回收的對象至低的地址，同時移動指針至空閒空間的開始地址(同時按照移動後的地址去更新那些相關引用)，此時第0代就空了，因為那些在第0代中沒有回收的對象都移到了第1代。

When the program initializes, garbage collector allocates memory for generations. The initial size of memory blocks are determined according to the strategIEs of the .Net framework. Garbage collector records the start address and size of the memory block for generations. The memory blocks of generations are continuous and adjacent. The memory of generation 2 is under the memory of generation 1, and the memory of generation 1 is under the memory of generation 0. CLR always allocates memory for new objects in generation 0. If there is enough memory in generation 0, CLR simply moves the pointer to allocate memory. This is really fast. When there is not enough memory in generation 0 to accommodate new objects, CLR triggers garbage collector starts to collect objects no longer needed from generation 0. When the collecting action in generation 0 finishs, garbage collector tamps(or compacts) the objects not collected in generation 0 to lower address, and moves the pointer to start address of free memory(and updates the related references according to the shifted address of objects). At this time, generation 0 is empty, because the objects survived in generation 0 are moved to generation 1.

當只對第0代進行收集時，所發生的就是部分收集。這與之前所說的全部收集有所區別(因為代的引入)。對第0代收集時，同樣是從根開始找那些正引用的對象，但接下來的步驟有所不同。當垃圾收集器找到一個指向第1代或者第2代地址的根，垃圾收集器就忽略此根，繼續找其他根，如果找到一個指向第0代對象的根，就將此對象加入圖。這樣就可以只處理第0代內存中的垃圾。這樣做有個先決條件，就是應用程序此前沒有去寫第1代和第2代的內存，沒有讓第1代或者第2代中某個對象指向第0代的內存。但是實際中應用程序是有可能寫第1代或者第2代的內存的。針對這種情況，CLR有專門的數據結構(Card table)來標志應用程序是否曾經寫第1代或者第2代的內存。如果在此次對第0代進行收集之前，應用程序寫過第1代或者第2代的內存，那些被Card Table登記的對象(在第1代或者第2代)將也要在此次對第0代收集時作為根。這樣，才可以正確地對第0代進行收集。

When collecting generation 0 only, it is partial collection. It is different from full collection mentioend earlIEr(because of the generations). When collecting generation 0, garbage collector starts from the roots, which is the same as the full collection, but it is different in coming steps. When garbage collector finds a root pointing to an address of generation 1 or 2, garbage collector ignores the root, and goes to next root. If garbage collector finds a root pointing to an object of generation 0, garbage collector addes the object into the graph. That way garbage collector processes the objects of generation 0 only. There is a pre-condition to do that. It is that the application does not write to the memory of generation 1 and 2, does not allow some objects of generation 1 or 2 refer to the memory of generation 0. But in our daily work, the applicaiton is possible to write the memory of generation 1 or 2. In this case, CLR has a dedicated data structure called Card Ta
ble to record whether the application writes the memory of generation 1 or 2. If the application writes the memory of generation 1 or 2 before the collecting on generation 0, the objects recorded by the Card Table will become roots during the collecting on generation 0. Garbage collection on generation 0 can be done correctly in this case.

以上說到了第0代收集發生的一個條件，即第0代沒有足夠內存去容納新對象。執行GC.Collect()也會觸發對第0代的收集。另外，垃圾收集器還為每一代都維護著一個監視閥值。第0代內存達到這個第0代的閥值時也會觸發對第0代的收集。對第1代的收集發生在執行GC.Collect(1)或者第1代內存達到第1代的閥值時。第2代也有類似的觸發條件。當第1代收集時，第0代也需要收集。當第2代收集時，第1和第0代也需要收集。在第n代收集之後仍然存留下來的對象將被轉移到第n+1代的內存中，如果n=2, 那麼存留下來的對象還將留在第2代中。

We mentioned a criteria to trigger collecting on generation 0 in above paragraphs: generation 0 does not have enough memory to accommodate new objects. When execute GC.Collect(), it launches collecting on generation 0 also. In addition, garbage collector sets up a threshold for each of generations. When the memory of generation 0 reaches the threshold, collecting on generation 0 happens also. Collecting on generation 1 happens when executing GC.Collect() or the memory of generation 1 reaches the threshold of generation 1. Generation 2 has similar trigger conditions. When collecting on generation 1, collecting on generation 0 happens also. When collecting on generation 2, collecting on generation 1 and 0 happen also. The survived object in collecting generation n will be moved to the memory of generation n+1. If n=2, the remaining objects still stay in generation 2.

對象結束Finalization of objects

對象結束機制是程序員忘記用Close或者Dispose等方法清理申請的資源時的一個保證措施。如下的一個類，當一個此類的實例創建時，在第0代中分配內存，同時此對象的引用要被加入到一個由CLR維護的結束隊列中去。

Finalization is an ensuring mechanism when programmers forget to use Close or Dispose method to clean up resources. For exmaple, a class like the following, when an instane of the class is created, it is allocated in memory of generation 0, and a reference of the object is appended to Finalization quere maintained by CLR.

public class BaSEObj {

    public BaSEObj() { }

    protected override void Finalize() {

        // Perform resource cleanup code here...

        // Example: Close file/Close network connection

        Console.WriteLine("In Finalize.");

當此對象成為垃圾時，垃圾收集器將其引用從結束隊列移到待結束隊列中，同時此對象會被加入引用關系圖。一個獨立運行的CLR線程將一個個從待結束隊列(Jeffrey Richter稱之為Freachable quere)取出對象，執行其Finalize方法以清理資源。因此，此對象不會馬上被垃圾收集器回收。只有當此對象的Finalize方法被執行完畢後，其引用才會從待結束隊列中移除。等下一輪回收時，垃圾回收器才會將其回收。

When the object becomes garbage, garbage collector moves the reference from Finalization queue to ToBeFinalized queue(Jeffrey Richter calle
d it Freachable queue), and appends the object to the reference graph. A standalone thread of CLR will fetch objects from the ToBeFinalized queue one by one, and execute the Finalize() method of objects to clean up resources. Therefore, the object will not be collected right away by garbage collector. After the Finalize() method is executed, its reference will be removed from the ToBeFinalizaed queue. When next collecting comes, garbage collector reclaims its memory.

GC類有兩個公共靜態方法GC.ReRegisterForFinalize和GC.SuppressFinalize大家也許想了解一下，ReRegisterForFinalize是將指向對象的引用添加到結束隊列中(即表明此對象需要結束)，SuppressFinalize是將結束隊列中該對象的引用移除，CLR將不再會執行其Finalize方法。

There are two public static methods of GC class you guys may want to know: GC.ReRegisterForFinalize and GC.SuppressFinalize. ReRegisterForFinalize is to append the reference of objects to finalization queue(meaning the objects need to be finalized), SuppressFinalize is to remove the reference of objects from finalization queue, then CLR would not execute the Finalize method of the object.

因為有Finalize方法的對象在new時就自動會加入結束隊列中，所以ReRegisterForFinalize可以用的場合比較少。ReRegisterForFinalize比較典型的是配合重生(Resurrection)的場合來用。重生指的是在Finalize方法中讓根又重新指向此對象。那麼此對象又成了可到達的對象，不會被垃圾收集器收集，但是此對象的引用未被加入結束隊列中。所以此處需要用ReRegisterForFinalize方法來將對象的引用添加到結束隊列中。因為重生本身在現實應用中就很少見，所以ReRegisterForFinalize也將比較少用到。

Because the objects with Finalize method will be appended to Finalization queue when new Operation, there are few scenariOS to use ReRegisterForFinalize method. A typical scenario is to use ReRegisterForFinalize with Resurrection. Resurrection is that we let a root pointing to the object again in Finalize method, and then the object becomes reachable again, therefore it will be not collected by garbage collector. But the reference of the object has not been appended to Finalization queue, therefore we can use ReRegisterForFinalize to append the object to Finalization queue. Because there are few requirement in reality to use resurrection, ReRegisterForFinalize will have be used in low rate.

相比之下，SuppressFinalize更常用些。SuppressFinalize用於同時實現了Finalize方法和Dispose()方法來釋放資源的情況下。在Dispose()方法中調用GC.SuppressFinalize(this)，那麼CLR就不會執行Finalize方法。Finalize方法是程序員忘記用Close或者Dispose等方法清理資源時的一個保證措施。如果程序員記得調用Dispose()，那麼就會不執行Finalize()來再次釋放資源;如果程序員忘記調用Dispose(), Finalize方法將是最後一個保證資源釋放的措施。這樣做不失為一種雙保險的方案。

Compare to ReRegisterForFinalize, SuppressFinalize has more frequent utilization. When we implement both Finalize method and Dispose method to release resources, we need to use SuppressFinalize method. Call GC.SuppressFinalize(this) in Dispose() method body and then CLR will execute the Finalize method. Finalization is an ensuring mechanism when programmers forget to use Close or Dispose method to clean up resources. If programmers do call Dispose(), then CLR will not call Finalize method to release resources again. If programmers forget to call Dispose(), then Finalize method will be the final ensuring mechnism for resource rele
asing. That way it is dual fail-safe solution.

對象結束機制對垃圾收集器的性能影響比較大，同時CLR難以保證調用Finalize方法的時間和次序。因此，盡量不要用對象結束機制，而采用自定義的方法或者名為Close, Dispose的方法來清理資源。可以考慮實現IDisposable接口並為Dispose方法寫好清理資源的方法體。

Finalization has significant impact on performance of garbage collector, and CLR can not be sure on the order to call Finalize methods of objects, therefore please do not use finalization of objects as possible as you can, instead, you could use self defined methods, Close method or Dispose method to clean up resources. Please think about to implement the IDisposable interface and write method body for the Dispose method.

大對象堆Large object heap

大對象堆專用於存放大於85000字節的對象。初始的大對象內存區域堆通常在第0代內存之上，並且與第0代內存不鄰接。第0,第1和第2代合起來稱為小對象堆。CLR分配一個新的對象時，如果其大小小於85000字節，就在第0代中分配，如果其大小大於等於85000自己，就在大對象堆中分配。

Large object heap is to store objects that its size is over 85000 bytes. The initial memory block is above the memory block of generation 0, and it is not adjacent to memory block of generation 0. Generation 0,1 and 2 is called small object heap. When CLR allocates a new object, if its size is lower than 85000 bytes, then allocates memory in generation 0; If its size is over 85000 bytes, then allocates memory in large object heap.

因為大對象的尺寸比較大，收集時成本比較高，所以對大對象的收集是在第2代收集時。大對象的收集也是從根開始查找可到達對象，那些不可到達的大對象就可回收。垃圾收集器回收了大對象後，不會對大對象堆進行夯實操作(畢竟移動大對象成本較高)，而是用一個空閒對象表的數據結構來登記哪些對象的空間可以再利用，其中兩個相鄰的大對象回收將在空閒對象表中作為一個對象對待。空閒對象表登記的空間將可以再分配新的大對象。

Because size of large object is significant, the cost of collection is significant also. Collection of large objects happens when generation 2 collecting. Collection of large objects starts from the roots also and searches for reachable objects. Non-reachable large objects will be collected. After collecting non-reachable large objects, garbage collector will not tamp the large object heap(because the cost of moving a large object is high), instead, garbage collector uses a free object table to record memory ranges can be re-used, if there are two adjacent large object collected, then treats the two large objects as one large object in free object table. The memory ranges in free object table can be re-used by new large objects.

大對象的分配，回收的成本都較小對象高，因此在實踐中最好避免很快地分配大對象又很快回收，可以考慮如何分配一個大對象池，重復利用這個大對象池，而不頻繁地回收。

The cost of allocation and collection of large objects is higher than the cost of allocation and collection of small objects, therefore it would better avoid to allocate large object and release it soon. Please think about allocate a pool of large objects, try to re-use the pool of large objects, do not frequently reclaim large objects.