您现在的位置： Linux教程網 >> UnixLinux > >> Linux綜合 >> Linux資訊 >> Linux文化

Linux inode cache機制分析

Linux inode cache機制實現在fs／inode.c文件中。

1．1．Inode的slab分配器緩存
索引節點緩存（inode cache，簡稱icache）機制的實現是以inode對象的slab分配器緩存為基礎的，因此要從物理內存中申請或釋放一個inode對象，都必須通過kmem_cache_alloc()函數和kmem_cache_free()函數來進行。
Inode對象的slab分配緩存由一個kmem_cache_t類型的指針變量inode_cachep來定義。這個slab分配器緩存是在inode cache的初始化函數inode_init（）中通過kmem_cache_create()函數來創建的。
Linux在inode.c文件中又定義了兩個封裝函數，來實現從inode_cachep slab分配器緩存中分配一個inode對象或將一個不再使用的inode對象釋放給slab分配器，如下所示：
#define alloc_inode() \
((struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL))
static void destroy_inode(struct inode *inode)
{
if (!list_empty(&inode->i_dirty_buffers))
BUG();
kmem_cache_free(inode_cachep, (inode));
}

1．2 和inode對象相關的一些底層操作
源文件inode.c中實現了一些對inode對象的底層基本操作，如下：
（1）clean_inode()——初始化部分inode對象成員域
該函數用來將一個剛從inode_cachep slab分配器中分配得到的inode對象中的某些成員初始化為已知的值（通常為0），但是有一個例外，既鏈接數i_nlink被初始化為1。這是一個靜態的靜態內部函數，因此它只能被inode.c中的其他函數所調用，如：get_empty_inode()和get_new_inode()。
/*
* This just initializes the inode fields
* to known values before returning the inode..
*
* i_sb, i_ino, i_count, i_state and the lists have
* been initialized elsewhere..
*/
static void clean_inode(struct inode *inode)
{
static struct address_space_operations empty_aops;
static struct inode_operations empty_iops;
static struct file_operations empty_fops;
memset(&inode->u, 0, sizeof(inode->u));
inode->i_sock = 0;
inode->i_op = &empty_iops;
inode->i_fop = &empty_fops;
inode->i_nlink = 1; ／＊ NOTE！i_nlink被初始化為1 ＊／
atomic_set(&inode->i_writecount, 0);
inode->i_size = 0;
inode->i_generation = 0;
memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
inode->i_pipe = NULL;
inode->i_bdev = NULL;
inode->i_data.a_ops = &empty_aops;
inode->i_data.host = inode;
inode->i_mapping = &inode->i_data;
}
（2）get_empty_inode()——從slab分配器中分配一個空的inode對象
該函數通過調用alloc_inode宏從slab分配器中分配一個inode對象，然後把除了I_count引用計數和鏈接計數I_nlink之外的所有域都初始化為NULL（部分域的初始化通過調用clean_inode函數來完成），並將這個inode對象鏈入inode_in_use鏈表中。最後返回這個inode對象的指針，如下所示：
struct inode * get_empty_inode(void)
{
static unsigned long last_ino;
struct inode * inode;

inode = alloc_inode();
if (inode)
{
spin_lock(&inode_lock);
inodes_stat.nr_inodes++;
list_add(&inode->i_list, &inode_in_use);
inode->i_sb = NULL;
inode->i_dev = 0;
inode->i_ino = ++last_ino;
inode->i_flags = 0;
atomic_set(&inode->i_count, 1);
inode->i_state = 0;
spin_unlock(&inode_lock);
clean_inode(inode);
}
return inode;
}
Linux內核模塊通常並不會調用這個函數來分配一個inode對象。那些想獲取一個沒有索引節點號的inode對象的內核模塊（如網絡層等），以及那些沒有任何已知信息的fs，通常會用這個函數來獲取一個新的inode對象。
(3) clear_inode（）——清除一個inode對象中的內容
在調用destroy_inode（）函數釋放一個inode對象之前，通常調用該函數來清除該inode對象中內容，如：使inode引用的緩沖區無效、解除對其它對象的引用等。
/**
* clear_inode - clear an inode
* @inode: inode to clear
*
* This is called by the filesystem to tell us
* that the inode is no longer useful. We just
* terminate it with extreme prejudice.
*/
void clear_inode(struct inode *inode)
{
if (!list_empty(&inode->i_dirty_buffers))
invalidate_inode_buffers(inode);

if (inode->i_data.nrpages)
BUG();
if (!(inode->i_state & I_FREEING))
BUG();
if (inode->i_state & I_CLEAR)
BUG();
wait_on_inode(inode);
if (IS_QUOTAINIT(inode))
DQUOT_DROP(inode);
if (inode->i_sb && inode->i_sb->s_op && inode->i_sb->s_op->clear_inode)
inode->i_sb->s_op->clear_inode(inode);
if (inode->i_bdev) {
bdput(inode->i_bdev);
inode->i_bdev = NULL;
}
inode->i_state = I_CLEAR;
}

1．3 icache數據結構
Linux通過在inode_cachep slab分配器緩存之上定義各種雙向鏈表來實現inode緩存機制，以便有效地管理內存inode對象。這些鏈表包括：正在使用的inode鏈表、未使用的inode鏈表、inode哈希鏈表和匿名inode的哈希鏈表，他們的定義如下：
static LIST_HEAD(inode_in_use);
static LIST_HEAD(inode_unused);
static struct list_head *inode_hashtable;
static LIST_HEAD(anon_hash_chain); /* for inodes with NULL i_sb */
此外，每個超級塊對象super_block中還有一條被修改過的、且正在使用的inode雙向鏈表s_dirty。
每一個inode對象都會存在於兩個分離的雙向鏈表中：
（1）一個就是inode哈希鏈表inode_hashtable，用來加快inode查找，每個inode對象都通過I_hash指針鏈入哈希鏈表中。
（2）另一個就是inode的“類型”鏈表：
l 如果I_count>0、I_nlink>0且該inode不髒，則這個inode就通過其I_list指針鏈入系統全局的inode_in_use雙向鏈表。
l 如果I_count和I_nlink都大於0，但是這個inode為髒（既I_state域中設置了I_DIRTY標志），則這個inode通過I_list指針鏈入他所屬的super_block對象的s_dirty鏈表。
l 如果I_count＝0，則通過其I_list鏈入inode_unused鏈表。
對於那些不屬於任何超級塊對象（即I_sb＝NULL）的匿名inode對象，則通過I_hash指針鏈入系統全局的匿名inode哈希鏈表anon_hash_chain。

1．3．1 對inode緩存鏈表的鎖保護
Linux在inode.c中定義了自旋鎖inode_lock，來實現對所有inode緩存鏈表的互斥訪問。也即，任何訪問任意一條inode緩存鏈表的代碼段，都必須通過調用spin_lock()函數持有該自旋鎖，並在結束訪問後通過spin_unlock()釋放該自旋鎖。Inode_lock的定義如下：
Spinlock_t inode_lock＝SPIN_LOCK_UNLOCKED；
NOTE！如果要改變一個正在使用的inode對象的I_state域，也必須先持有該自旋鎖。

1．3．2 inode緩存的統計信息
全局變量inodes_stat定義了inode cache的統計信息，主要包括cache中的inode對象總數和其中未使用的inode個數，其定義如下：
struct {
int nr_inodes;
int nr_unused;
int dummy[5];
} inodes_stat;

1．3．3 inode哈希鏈表
inode哈希鏈表的主要用途是加快在icache中查找一個特定的inode對象。指針inode_hashtable指向一組哈希鏈表表頭，所有哈希函數值（記為h）相同的inode對象都通過I_hash指針作為接口組成雙向鏈表，並掛在inode_hashtable［h］這個哈希鏈表表頭之後。所有哈希鏈表表頭都放在一起組成一個數組，該數組的首地址由指針inode_hashtable所指向。
在Linux中，inode哈希鏈表表頭數組是存放在2order個連續的物理頁幀中的，其中，order≥1，且它的值與系統總的物理頁幀數num_physpages的值相關。因此，哈希鏈表表頭的個數為：2order＊PAGE_SIZE／sizeof（struct list_head）。由於list_head結構類型的大小是8個字節（2個32位指針），因此：inode哈希鏈表表頭的個數可以表示為：2（order+12-3）。
l 哈希鏈表的初始化
inode cache的初始化工作是由inode_init()函數來完成的,它主要完成兩項工作:(1)inode哈希鏈表的初始化,包括為inode哈希鏈表表頭數組分配物理內存等；（2）創建inode slab分配器緩存。該函數的源代碼如下：
/*
* Initialize the hash tables.
*/
void __init inode_init(unsigned long mempages)
{
struct list_head *head;
unsigned long order;
unsigned int nr_hash;
int i;

／＊計算order的值，但是我不知道為什麼要這樣計算？：）＊／
mempages >>= (14 - PAGE_SHIFT);
mempages *= sizeof(struct list_head);
for (order = 0; ((1UL = 0);

printk("Inode-cache hash table entries: %d (order: %ld, %ld bytes)\n",
nr_hash, order, (PAGE_SIZE > I_HASHBITS) + (tmp >> I_HASHBITS*2);
return tmp & I_HASHMASK;
}
l 對inode哈希鏈表的操作函數
與inode哈希鏈表相關的操作函數有：insert_inode_hash()和remove_inode_hash(),以及find_inode()。
Insert_inode_hash()函數用於將一個未散列的inode對象插入到相應的索引節點哈希鏈表中：
void insert_inode_hash(struct inode *inode)
{
struct list_head *head = &anon_hash_chain;
if (inode->i_sb)
head = inode_hashtable + hash(inode->i_sb, inode->i_ino);
spin_lock(&inode_lock);
list_add(&inode->i_hash, head);
spin_unlock(&inode_lock);
}
remove_inode_hash()函數用於將一個inode對象從他所屬的哈希鏈表中摘除：
void remove_inode_hash(struct inode *inode)
{
spin_lock(&inode_lock);
list_del(&inode->i_hash);
INIT_LIST_HEAD(&inode->i_hash);
spin_unlock(&inode_lock);
}
而find_inode（）函數則用於在某個鏈表中查找由（sb,ino）唯一標識的inode對象，如下所示：
/*
* Called with the inode lock held.
* NOTE: we are not increasing the inode-refcount, you must call __iget()
* by hand after calling find_inode now! This simplifies iunique and won't
* add any additional branch in the common code.
*/
static struct inode * find_inode(struct super_block * sb, unsigned long ino, struct list_head *head, find_inode_t find_actor, void *opaque)
{
struct list_head *tmp;
struct inode * inode;

tmp = head;
for (;;) {
tmp = tmp->next;
inode = NULL;
if (tmp == head)
break;
inode = list_entry(tmp, struct inode, i_hash);
if (inode->i_ino != ino)
continue;
if (inode->i_sb != sb)
continue;
if (find_actor && !find_actor(inode, ino, opaque))
continue;
break;
}
return inode;
}
NOTE! 調用find_inode（）函數時一定要持有inode_lock鎖。

1．4 icache中的inode對象存取接口——iget／iput

1．4．1 iget()——引用一個inode對象
其他內核模塊要想訪問一個inode對象，必須通過iget()訪問接口。該函數會首先在icache的哈希鏈表中查找相應的inode對象，如果找到，則將該對象的引用計數加1，然後返回該inode對象的指針。如果沒有找到，則通過調用get_new_inode（）函數分配一個新的inode對象。該函數的源代碼如下所示（include/linux/fs.h）：
static inline struct inode *iget(struct super_block *sb, unsigned long ino)
{
return iget4(sb, ino, NULL, NULL);
}
可以看出，實際的工作是由iget4()函數（帶有4個參數的iget函數，所以叫iget4）來完成的，其源碼如下（inode.c）：
struct inode *iget4(struct super_block *sb, unsigned long ino, find_inode_t find_actor, void *opaque)
{
struct list_head * head = inode_hashtable + hash(sb,ino);
struct inode * inode;

spin_lock(&inode_lock);
inode = find_inode(sb, ino, head, find_actor, opaque);
if (inode) {
__iget(inode);
spin_unlock(&inode_lock);
wait_on_inode(inode);
return inode;
}
spin_unlock(&inode_lock);

/*
* get_new_inode() will do the right thing, re-trying the search
* in case it had to block at any point.
*/
return get_new_inode(sb, ino, head, find_actor, opaque);
}
NOTE:
（1）首先，通過散列函數hash找到該inode對象應該在那個哈希鏈表中，然後調用find_inode（）函數在該哈希鏈表中查找是否存在該inode對象。。
（2）如果在哈希鏈表中找到了相應的inode對象（由超級塊對象指針sb和索引節點號ino唯一確定），則先調用__iget（）函數增加該inode對象的引用計數。然後調用wait_on_inode（）函數等待該inode對象被其他內核模塊解鎖。最後，直接返回這個inode對象的指針。
（3）如果沒有找到，則調用get_new_inode()函數分配一個新的inode對象。
內聯函數__iget()用來增加一個inode對象的引用計數。該函數首先判斷inode對象的I_count的當前值。如果它非0，則將I_count加1後就直接返回。如果它為0，則先將I_count加1，然後就看看這個inode對象當前是否為髒，如果為髒（I_state&I_DIRTY非0），那就什麼也不做，讓該inode對象繼續待在超級塊對象的s_dirty鏈表中。如果不為髒，則將該inode對象從原來的“類型”鏈表（應該在inode_unused鏈表中）中刪除，並將其連入到inode_in_use鏈表中。最後將icache統計信息中的未使用inode個數減1。
static inline void __iget(struct inode * inode)
{
if (atomic_read(&inode->i_count)) {
atomic_inc(&inode->i_count);
return;
}
atomic_inc(&inode->i_count);
if (!(inode->i_state & I_DIRTY)) {
list_del(&inode->i_list);
list_add(&inode->i_list, &inode_in_use);
}
inodes_stat.nr_unused--;
}
wait_on_inode()函數測試一個inode對象是否已經被加鎖，如果是，則調用__wait_on_inode()函數等待該inode被解鎖。否則就直接返回：
static inline void wait_on_inode(struct inode *inode)
{
if (inode->i_state & I_LOCK)
__wait_on_inode(inode);
}
get_new_inode()函數用於得到一個新的inode對象。該函數首先調用alloc_inode宏從inode_cachep這個slab分配器緩存中分配一個新的inode對象。如下所示：
static struct inode * get_new_inode(struct super_block *sb, unsigned long ino, struct list_head *head, find_inode_t find_actor, void *opaque)
{
struct inode * inode;

inode = alloc_inode();
if (inode) {
struct inode * old;

spin_lock(&inode_lock);
/* We released the lock, so.. */
old = find_inode(sb, ino, head, find_actor, opaque);
if (!old) {
inodes_stat.nr_inodes++;
list_add(&inode->i_list, &inode_in_use);
list_add(&inode->i_hash, head);
inode->i_sb = sb;
inode->i_dev = sb->s_dev;
inode->i_ino = ino;
inode->i_flags = 0;
atomic_set(&inode->i_count, 1);
inode->i_state = I_LOCK;
spin_unlock(&inode_lock);

clean_inode(inode);
sb->s_op->read_inode(inode);

/*
* This is special! We do not need the spinlock
* when clearing I_LOCK, because we're guaranteed
* that nobody else tries to do anything about the
* state of the inode when it is locked, as we
* just created it (so there can be no old holders
* that haven't tested I_LOCK).
*/
inode->i_state &= ~I_LOCK;
wake_up(&inode->i_wait);

return inode;
}

/*
* Uhhuh, somebody else created the same inode under
* us. Use the old inode instead of the one we just
* allocated.
*/
__iget(old);
spin_unlock(&inode_lock);
destroy_inode(inode);
inode = old;
wait_on_inode(inode);
}
return inode;
}
（1）由於在調用get_new_inode函數時，調用者已經釋放了inode_lock鎖，因此在inode_lock被釋放到調用get_new_inode函數期間，其他內核模塊有可能已經創建了（sb,ino）所確定的索引節點。所以，get_new_inode函數必須重新調用find_inode函數，以再一次在哈希鏈表中查找是否已存在相應的inode對象。
（2）如果find_inode（）返回一個有效的指針，則說明其它模塊已經創建了（sb,ino）這個inode對象，因此就要調用destroy_inode（）函數來銷毀一開始創建的inode對象，然後調用__iget()函數增加所找到的inode對象的引用計數，然後用wait_on_inode（）函數等待他被解鎖。
（3）如果find_inode()返回NULL，則對先前所分配的inode對象進行初始化。其中，要調用clean_inode（）函數和超級塊的s_op->read_inode()方法，以從塊設備上讀取磁盤索引節點的內容。最後，返回所分配的inode對象的指針。

1．4．2 iput（）函數——釋放對一個inode對象的引用
當其他內核對象或模塊不再需要一個inode對象時，必須通過iput（）函數，來解除對該inode對象的引用。
iput()函數將inode對象的引用計數減1。如果減到了0，則將該inode釋放。否則就直接返回。
/**
* iput - put an inode
* @inode: inode to put
*
* Puts an inode, dropping its usage count. If the inode use count hits
* zero the inode is also then freed and may be destroyed.
*/

void iput(struct inode *inode)
{
if (inode) {
struct super_operations *op = NULL;

if (inode->i_sb && inode->i_sb->s_op)
op = inode->i_sb->s_op;
if (op && op->put_inode)
op->put_inode(inode);

if (!atomic_dec_and_lock(&inode->i_count, &inode_lock))
return;

if (!inode->i_nlink) {
list_del(&inode->i_hash);
INIT_LIST_HEAD(&inode->i_hash);
list_del(&inode->i_list);
INIT_LIST_HEAD(&inode->i_list);
inode->i_state|=I_FREEING;
inodes_stat.nr_inodes--;
spin_unlock(&inode_lock);

if (inode->i_data.nrpages)
truncate_inode_pages(&inode->i_data, 0);

if (op && op->delete_inode) {
void (*delete)(struct inode *) = op->delete_inode;
/* s_op->delete_inode internally recalls clear_inode() */
delete(inode);
} else
clear_inode(inode);
if (inode->i_state != I_CLEAR)
BUG();
} else {
if (!list_empty(&inode->i_hash)) {
if (!(inode->i_state & I_DIRTY)) {
list_del(&inode->i_list);
list_add(&inode->i_list,
&inode_unused);
}
inodes_stat.nr_unused++;
spin_unlock(&inode_lock);
return;
} else {
/* magic nfs path */
list_del(&inode->i_list);
INIT_LIST_HEAD(&inode->i_list);
inode->i_state|=I_FREEING;
inodes_stat.nr_inodes--;
spin_unlock(&inode_lock);
clear_inode(inode);
}
}
destroy_inode(inode);
}
}
對該函數的注釋如下：
（1）函數首先調用超級塊對象中的s_op->put_inode（）方法（如果有的話）來釋放這個inode對象。
（2）通過atomic_dec_and_lock原子操作將I_count減1，並同時對inode_lock進行加鎖。如果I_count在減1後還為非0值，則直接返回。否則就進行下面的處理。
（3）如果該inode的i_nlink值為0，則說明文件系統中已經沒有任何文件鏈接到這個inode對象上。對於這種情況，iput()先將這個inode從其所屬的哈希鏈表和“類型”鏈表中摘除。然後在其i_count中設置I_FREEING標志，表示正在釋放這個inode對象。最後，看看是否定義了超級塊對象方法s_op->delete_inode()，如果有則調用該方法刪除相應的磁盤索引節點，如果未定義該方法，則調用clear_inode（）函數來清除這個inode對象。最後，用destroy_inode()將這個內存索引節點釋放給inode_cachep slab分配器緩存。
（4）如果該inode的i_nlink不為0，則說明雖然沒有人引用這個inode對象，但fs中還有文件鏈接與這個inode對象相關聯，因此這個inode對象有可能再次被引用。對於這種情況，iput()函數首先判斷這個inode對象是否正連接在哈希鏈表中。如果是，則將這個inode對象放到inode_unused鏈表中（當然，前提是這個inode對象不在s_dirty鏈表中，如果他在s_dirty鏈表中，則讓他繼續待在s_dirty鏈表中），然後直接返回就可以了。如果不是，則這個inode對象已經不用繼續留在icache中了（反正通過哈希鏈表也找不到他），因此將這個inode對象從其所屬的“類型”鏈表中摘除，然後調用clear_inode()方法清除這個inode對象，最後調用destroy_indoe（）將這個內存索引節點釋放回inode_cachep slab分配器緩存。注意！這裡不能調用超級塊對象的delete_inode（）方法來刪除對應的磁盤索引節點，因為fs中還有文件鏈接與這個inode相關聯。

上一篇文章： Linux(2.4.0)目錄項緩存dcache機制的實現分析
下一篇文章： linux下串行通信開發

Linux文化

Linux系統：Linux的信號機制

Linux inode cache機制分析

Linux線程實現機制分析

Linux 線程實現機制分析

Linux內存中Swap和Buffer Cache機制

Linux內存中Cache分析

Linux內核的namespace機制分析