這篇文章主要介紹TensorFlow內(nèi)存管理bfc算法的示例分析,文中介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們一定要看完!
創(chuàng)新互聯(lián)公司-專業(yè)網(wǎng)站定制、快速模板網(wǎng)站建設(shè)、高性價(jià)比涿鹿網(wǎng)站開發(fā)、企業(yè)建站全套包干低至880元,成熟完善的模板庫,直接使用。一站式涿鹿網(wǎng)站制作公司更省心,省錢,快速模板網(wǎng)站建設(shè)找我們,業(yè)務(wù)覆蓋涿鹿地區(qū)。費(fèi)用合理售后完善,10余年實(shí)體公司更值得信賴。1. 基本介紹
tensorflow設(shè)備內(nèi)存管理模塊實(shí)現(xiàn)了一個(gè)best-fit with coalescing算法(后文簡稱bfc算法)。
bfc算法是Doung Lea's malloc(dlmalloc)的一個(gè)非常簡單的版本。
它具有內(nèi)存分配、釋放、碎片管理等基本功能。
2. bfc基本算法思想
1. 數(shù)據(jù)結(jié)構(gòu)
整個(gè)內(nèi)存空間由一個(gè)按基址升序排列的Chunk雙向鏈表來表示,它們的直接前趨和后繼必須在地址連續(xù)的內(nèi)存空間。Chunk結(jié)構(gòu)體里含有實(shí)際大小、請(qǐng)求大小、是否被占用、基址、直接前趨、直接后繼、Bin索引等信息。
2. 申請(qǐng)
用戶申請(qǐng)一個(gè)內(nèi)存塊(malloc)。根據(jù)chunk雙鏈表找到一個(gè)合適的內(nèi)存塊,如果該內(nèi)存塊的大小是用戶申請(qǐng)的大小的二倍以上,那么就將該內(nèi)存塊切分成兩塊,這就是split操作。
返回其中一塊給用戶,并將該內(nèi)存塊標(biāo)識(shí)為占用
Spilt操作會(huì)新增一個(gè)chunk,所以需要修改chunk雙鏈表以維持前驅(qū)和后繼關(guān)系
如果用戶申請(qǐng)512的空間,正好有一塊1024的chunk2是空閑的,由于1024/512 =2,所以chunk2 被split為2塊:chunk2_1和chunk2_2。返回chunk2_1給用戶并將其標(biāo)志位占用狀態(tài)。
3. 釋放
用戶釋放一個(gè)內(nèi)存塊(free)。先將該塊標(biāo)記為空閑。然后根據(jù)chunk數(shù)據(jù)結(jié)構(gòu)中的信息找到其前驅(qū)和后繼內(nèi)存塊。如果前驅(qū)和后繼塊中有空閑的塊,那么將剛釋放的塊和空閑的塊合并成一個(gè)更大的chunk(這就是merge操作,合并當(dāng)前塊和其前后的空閑塊)。再修改雙鏈表結(jié)構(gòu)以維持前驅(qū)后繼關(guān)系。這就做到了內(nèi)存碎片的回收。
如果用戶要free chunk3,由于chunk3的前驅(qū)chunk2也是空閑的,所以將chunk2和chunk3合并得到一個(gè)新的chunk2',大小為chunk2和chunk3之和。
3. bins
1. bins數(shù)據(jù)結(jié)構(gòu)
bfc算法采取的是被動(dòng)分塊的策略。最開始整個(gè)內(nèi)存是一個(gè)chunk,隨著用戶申請(qǐng)空間的次數(shù)增加,最開始的大chunk會(huì)被不斷的split開來,從而產(chǎn)生越來越多的小chunk。當(dāng)chunk數(shù)量很大時(shí),為了尋找一個(gè)合適的內(nèi)存塊而遍歷雙鏈表無疑是一筆巨大的開銷。為了實(shí)現(xiàn)對(duì)空閑塊的高效管理,bfc算法設(shè)計(jì)了bin這個(gè)抽象數(shù)據(jù)結(jié)構(gòu)。
每個(gè)bin都有一個(gè)size屬性,一個(gè)bin是一個(gè)擁有chunk size >= binsize的空閑chunk的集合。集合中的chunk按照chunk size的升序組織成單鏈表。bfc算法維護(hù)了一個(gè)bin的集合:bins。它由多個(gè)bin以及從屬于每個(gè)bin的chunks組成。內(nèi)存中所有的空閑chunk都由bins管理。
圖中每一列表示一個(gè)bin,列首方格中的數(shù)字表示bin的size。bin size的大小都是256的2^n的倍。每個(gè)bin下面掛載了一系列的空閑chunk,每個(gè)chunk的chunk size都大于等于所屬的bin的bin size,按照chunk size的升序掛載成單鏈表。
2. bins操作
bfc算法針對(duì)bins這個(gè)集合設(shè)計(jì)了三個(gè)操作:search、insert、delete。
search
給定一個(gè)chunk size,從bins中找到大于等于該chunksize的最小的那個(gè)空閑chunk。Search操作具體流程如下。如果bin以數(shù)組的形式組織,那么可以從index = chunk size /256 >>2的那個(gè)bin開始查找。最好的情況是開始查找的那個(gè)bin的chunk鏈表非空,那么直接返回鏈表頭即可。這種情況時(shí)間復(fù)雜度是常數(shù)級(jí)的。最壞的情況是遍歷bins數(shù)組中所有的bin。對(duì)于一般大小的內(nèi)存來說,bins數(shù)組元素非常少,比如4G空間只需要23個(gè)bin就足夠了(256 * 2 ^ 23 > 4G),因此也很快能返回結(jié)果??傮w來說search操作是非常高效的。對(duì)于固定大小內(nèi)存來說,查找時(shí)間是常數(shù)量級(jí)的。
insert
將一個(gè)空閑的chunk插入到一個(gè)bin所掛載的chunk鏈表中,同時(shí)需要維持chunk鏈表的升序關(guān)系。具體流程是直接將chunk插入到index = chunk size /256 >>2的那個(gè)bin中即可。
delete
將一個(gè)空閑的chunk從bins中移除。
4. 總結(jié)
將內(nèi)存分塊管理,按塊進(jìn)行空間分配和釋放。
通過split操作將大內(nèi)存塊分解成用戶需要的小內(nèi)存塊。
通過merge操作合并小的內(nèi)存塊,做到內(nèi)存碎片回收
通過bin這個(gè)抽象數(shù)據(jù)結(jié)構(gòu)實(shí)現(xiàn)對(duì)空閑塊高效管理。
5. 代碼分析
1. 代碼地址
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/common_runtime
2. 數(shù)據(jù)結(jié)構(gòu)
Chunk
static const int kInvalidChunkHandle = -1; ... struct Chunk { size_t size = 0; // Full size of buffer. // We sometimes give chunks that are larger than needed to reduce // fragmentation. requested_size keeps track of what the client // actually wanted so we can understand whether our splitting // strategy is efficient. size_t requested_size = 0; // allocation_id is set to -1 when the chunk is not in use. It is assigned a // value greater than zero before the chunk is returned from // AllocateRaw, and this value is unique among values assigned by // the parent allocator. int64 allocation_id = -1; void* ptr = nullptr; // pointer to granted subbuffer. // If not kInvalidChunkHandle, the memory referred to by 'prev' is directly // preceding the memory used by this chunk. E.g., It should start // at 'ptr - prev->size' ChunkHandle prev = kInvalidChunkHandle; // If not kInvalidChunkHandle, the memory referred to by 'next' is directly // following the memory used by this chunk. E.g., It should be at // 'ptr + size' ChunkHandle next = kInvalidChunkHandle; // What bin are we in? BinNum bin_num = kInvalidBinNum; bool in_use() const { return allocation_id != -1; } };
Bin
// A Bin is a collection of similar-sized free chunks. struct Bin { // All chunks in this bin have >= bin_size memory. size_t bin_size = 0; struct ChunkComparator { explicit ChunkComparator(BFCAllocator* allocator) : allocator_(allocator) {} // Sort first by size and then use pointer address as a tie breaker. bool operator()(const ChunkHandle ha, const ChunkHandle hb) const NO_THREAD_SAFETY_ANALYSIS { const Chunk* a = allocator_->ChunkFromHandle(ha); const Chunk* b = allocator_->ChunkFromHandle(hb); if (a->size != b->size) { return a->size < b->size; } return a->ptr < b->ptr; } private: BFCAllocator* allocator_; // The parent allocator }; typedef std::set<ChunkHandle, ChunkComparator> FreeChunkSet; // List of free chunks within the bin, sorted by chunk size. // Chunk * not owned. FreeChunkSet free_chunks; Bin(BFCAllocator* allocator, size_t bs) : bin_size(bs), free_chunks(ChunkComparator(allocator)) {} };
AllocationRegion
AllocationRegion給一個(gè)連續(xù)的內(nèi)存區(qū)域做指針到ChunkHandle的映射。
RegionManager
RegionManager聚集了一個(gè)或多個(gè)AllocationRegion,并提供一個(gè)從指針到基礎(chǔ)ChunkHandle的間接層,這個(gè)間接層可在多個(gè)不連續(xù)的內(nèi)存區(qū)域進(jìn)行分配。
3. 分配大小
將每次分配的內(nèi)存大小調(diào)整為kMinAllocationSize的N倍,這樣所有內(nèi)存地址都是很好地按字節(jié)對(duì)齊了。
// kMinAllocationSize = 256 static const size_t kMinAllocationBits = 8; static const size_t kMinAllocationSize = 1 << kMinAllocationBits; ... size_t BFCAllocator::RoundedBytes(size_t bytes) { size_t rounded_bytes = (kMinAllocationSize * ((bytes + kMinAllocationSize - 1) / kMinAllocationSize)); DCHECK_EQ(size_t{0}, rounded_bytes % kMinAllocationSize); return rounded_bytes; }
4. 初始化bin
typedef int BinNum; static const int kInvalidBinNum = -1; static const int kNumBins = 21; ... // 二進(jìn)制2^8往左移0,1,2位 // (static_cast<size_t>(256) << 0) = 256 // (static_cast<size_t>(256) << 1) = 512 // (static_cast<size_t>(256) << 2) = 1024 size_t BinNumToSize(BinNum index) { return static_cast<size_t>(256) << index; } ... char bins_space_[sizeof(Bin) * kNumBins]; // Map from bin size to Bin Bin* BinFromIndex(BinNum index) { return reinterpret_cast<Bin*>(&(bins_space_[index * sizeof(Bin)])); } ... // We create bins to fit all possible ranges that cover the // memory_limit_ starting from allocations up to 256 bytes to // allocations up to (and including) the memory limit. for (BinNum b = 0; b < kNumBins; b++) { size_t bin_size = BinNumToSize(b); VLOG(1) << "Creating bin of max chunk size " << strings::HumanReadableNumBytes(bin_size); new (BinFromIndex(b)) Bin(this, bin_size); CHECK_EQ(BinForSize(bin_size), BinFromIndex(b)); CHECK_EQ(BinForSize(bin_size + 255), BinFromIndex(b)); CHECK_EQ(BinForSize(bin_size * 2 - 1), BinFromIndex(b)); if (b + 1 < kNumBins) { CHECK_NE(BinForSize(bin_size * 2), BinFromIndex(b)); } }
5. 查找bin
// 求屬于第幾個(gè)bin BinNum BinNumForSize(size_t bytes) { uint64 v = std::max<size_t>(bytes, 256) >> kMinAllocationBits; int b = std::min(kNumBins - 1, Log2FloorNonZero(v)); return b; } // 最高位非零的二進(jìn)制位數(shù),eg: 0001 0101B 為5 inline int Log2FloorNonZero(uint64 n) { #if defined(__GNUC__) return 63 ^ __builtin_clzll(n); #elif defined(PLATFORM_WINDOWS) unsigned long index; _BitScanReverse64(&index, n); return index; #else int r = 0; while (n > 0) { r++; n >>= 1; } return r; #endif }
6. 查找Chunk
// 先加鎖 mutex_lock l(lock_); void* ptr = FindChunkPtr(bin_num, rounded_bytes, num_bytes); if (ptr != nullptr) { return ptr; } // FindChunkPtr函數(shù)內(nèi)部 void* BFCAllocator::FindChunkPtr(BinNum bin_num, size_t rounded_bytes, size_t num_bytes) { // First identify the first bin that could satisfy rounded_bytes. for (; bin_num < kNumBins; bin_num++) { // Start searching from the first bin for the smallest chunk that fits // rounded_bytes. Bin* b = BinFromIndex(bin_num); for (auto citer = b->free_chunks.begin(); citer != b->free_chunks.end(); ++citer) { // 從之前得到的Bin索引開始,查找合適的空閑Chunk: const BFCAllocator::ChunkHandle h = (*citer); BFCAllocator::Chunk* chunk = ChunkFromHandle(h); DCHECK(!chunk->in_use()); if (chunk->size >= rounded_bytes) { // We found an existing chunk that fits us that wasn't in use, so remove // it from the free bin structure prior to using. RemoveFreeChunkIterFromBin(&b->free_chunks, citer); // If we can break the size of the chunk into two reasonably // large pieces, do so. // // TODO(vrv): What should be the criteria when deciding when // to split? // 具體實(shí)現(xiàn)后面會(huì)分析 if (chunk->size >= rounded_bytes * 2) { SplitChunk(h, rounded_bytes); chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved } // The requested size of the returned chunk is what the user // has allocated. chunk->requested_size = num_bytes; // Assign a unique id and increment the id counter, marking the // chunk as being in use. chunk->allocation_id = next_allocation_id_++; // Update stats. ++stats_.num_allocs; stats_.bytes_in_use += chunk->size; stats_.max_bytes_in_use = std::max(stats_.max_bytes_in_use, stats_.bytes_in_use); stats_.max_alloc_size = std::max<std::size_t>(stats_.max_alloc_size, chunk->size); VLOG(4) << "Returning: " << chunk->ptr; if (VLOG_IS_ON(4)) { LOG(INFO) << "A: " << RenderOccupancy(); } return chunk->ptr; } } } return nullptr; }
7. 拆分Chunk
如果Chunk的大小大于等于申請(qǐng)內(nèi)存大小的2倍,那么將該Chunk拆分成2個(gè):第一個(gè)Chunk的大小等于申請(qǐng)內(nèi)存大小,第二個(gè)Chunk作為它的直接后繼。
if (chunk->size >= rounded_bytes * 2) { SplitChunk(h, rounded_bytes); chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved } void BFCAllocator::SplitChunk(BFCAllocator::ChunkHandle h, size_t num_bytes) { // Allocate the new chunk before we do any ChunkFromHandle ChunkHandle h_new_chunk = AllocateChunk(); Chunk* c = ChunkFromHandle(h); CHECK(!c->in_use() && (c->bin_num == kInvalidBinNum)); // Create a new chunk starting num_bytes after c BFCAllocator::Chunk* new_chunk = ChunkFromHandle(h_new_chunk); new_chunk->ptr = static_cast<void*>(static_cast<char*>(c->ptr) + num_bytes); region_manager_.set_handle(new_chunk->ptr, h_new_chunk); // Set the new sizes of the chunks. new_chunk->size = c->size - num_bytes; c->size = num_bytes; // The new chunk is not in use. new_chunk->allocation_id = -1; // Maintain the pointers. // c <-> c_neighbor becomes // c <-> new_chunk <-> c_neighbor BFCAllocator::ChunkHandle h_neighbor = c->next; new_chunk->prev = h; new_chunk->next = h_neighbor; c->next = h_new_chunk; if (h_neighbor != kInvalidChunkHandle) { Chunk* c_neighbor = ChunkFromHandle(h_neighbor); c_neighbor->prev = h_new_chunk; } // Add the newly free chunk to the free bin. InsertFreeChunkIntoBin(h_new_chunk); }
8. 回收chunk
加鎖,獲得ChunkHandle
mutex_lock l(lock_); BFCAllocator::ChunkHandle h = region_manager_.get_handle(ptr); FreeAndMaybeCoalesce(h);
FreeAndMaybeCoalesce
void BFCAllocator::FreeAndMaybeCoalesce(BFCAllocator::ChunkHandle h) { Chunk* c = ChunkFromHandle(h); CHECK(c->in_use() && (c->bin_num == kInvalidBinNum)); // Mark the chunk as no longer in use c->allocation_id = -1; // Updates the stats. stats_.bytes_in_use -= c->size; // This chunk is no longer in-use, consider coalescing the chunk // with adjacent chunks. ChunkHandle chunk_to_reassign = h; // If the next chunk is free, coalesce the two if (c->next != kInvalidChunkHandle) { Chunk* cnext = ChunkFromHandle(c->next); if (!cnext->in_use()) { // VLOG(8) << "Chunk at " << cnext->ptr << " merging with c " << // c->ptr; chunk_to_reassign = h; // Deletes c->next RemoveFreeChunkFromBin(c->next); Merge(h, ChunkFromHandle(h)->next); } } // If the previous chunk is free, coalesce the two c = ChunkFromHandle(h); if (c->prev != kInvalidChunkHandle) { Chunk* cprev = ChunkFromHandle(c->prev); if (!cprev->in_use()) { // VLOG(8) << "Chunk at " << c->ptr << " merging into c->prev " // << cprev->ptr; chunk_to_reassign = c->prev; // Deletes c RemoveFreeChunkFromBin(c->prev); Merge(ChunkFromHandle(h)->prev, h); c = ChunkFromHandle(h); } } InsertFreeChunkIntoBin(chunk_to_reassign); }
Merge
// Merges h2 and h3 when Chunk(h2)->next is h3 and Chunk(h3)->prev is c1. // We merge Chunk(h3) into Chunk(h2). void BFCAllocator::Merge(BFCAllocator::ChunkHandle h2, BFCAllocator::ChunkHandle h3) { Chunk* c1 = ChunkFromHandle(h2); Chunk* c2 = ChunkFromHandle(h3); // We can only merge chunks that are not in use. CHECK(!c1->in_use() && !c2->in_use()); // c1's prev doesn't change, still points to the same ptr, and is // still not in use. // Fix up neighbor pointers // // c1 <-> c2 <-> c3 should become // c1 <-> c3 BFCAllocator::ChunkHandle h4 = c2->next; c1->next = h4; CHECK(c2->prev == h2); if (h4 != kInvalidChunkHandle) { BFCAllocator::Chunk* c3 = ChunkFromHandle(h4); c3->prev = h2; } // Set the new size c1->size += c2->size; DeleteChunk(h3); }
以上是“TensorFlow內(nèi)存管理bfc算法的示例分析”這篇文章的所有內(nèi)容,感謝各位的閱讀!希望分享的內(nèi)容對(duì)大家有幫助,更多相關(guān)知識(shí),歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道!
網(wǎng)站標(biāo)題:TensorFlow內(nèi)存管理bfc算法的示例分析-創(chuàng)新互聯(lián)
網(wǎng)站路徑:http://jinyejixie.com/article0/pepio.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供外貿(mào)建站、網(wǎng)站設(shè)計(jì)公司、微信公眾號(hào)、做網(wǎng)站、營銷型網(wǎng)站建設(shè)、網(wǎng)頁設(shè)計(jì)公司
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場,如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來源: 創(chuàng)新互聯(lián)
猜你還喜歡下面的內(nèi)容