您现在的位置： Linux教程網 >> UnixLinux > >> Linux基礎 >> 關於Linux

uclinux-2008R1-RC8(bf561)到VDSP5的移植（38）：cache與spinloc

在以前的實現中，直接用adi_acquire_lock來實現spinlock，在沒有啟用cache的時候，沒有任何問題，但是在啟用了icache之後，會出現死循環。

vdsp5中adi_acquire_lock的實現在ccblkfn.h文件中，如下所示：

#pragma inline #pragma always_inline static void adi_acquire_lock(testset_t *_t) { int tVal; csync(); #ifdef __WORKAROUND_L2_TESTSET_STALL tVal = __builtin_testset_05000248((char *) _t); #else tVal = __builtin_testset((char *) _t); #endif while (tVal == 0) { csync(); #ifdef __WORKAROUND_L2_TESTSET_STALL tVal = __builtin_testset_05000248((char *) _t); #else tVal = __builtin_testset((char *) _t); #endif } }

在VDSP5文檔中查一下__WORKAROUND_L2_TESTSET_STALL，它是這樣描述的：

Enables workaround for the anomaly 05-00-0248: “TestSet operation causes stall of the other core.” The avoidance is enforced by the compiler automatically issuing a write to an L2-defined variable immediately following a TESTSET instruction. This is done as part of the code generated for a __builtin_testset() call.

The compiler defines the macro __WORKAROUND_L2_TESTSET_STALL at the compile, assembly, and link build stages when this workaround is enabled.

再查一下VisualDSP++ Silicon Anomaly Support，對anomaly 05-00-0248有更進一步的描述：

Anomaly ID Summary 05-00-0248 TestSet operation causes core stall (dual-core). Compiler Option -workaround l2-testset-stall Defs -D__WORKAROUND_L2_TESTSET_STALL Behavior In dual-core blackfins, when one core performs a TESTSET operation to internal L2 memory, it locks out the other core if the latter is accessing L2 memory at the same time. The solution to this is that any TESTSET operation to internal L2 memory should be followed immediately by a write to L2 memory. The compiler automatically works around this anomaly, so that any call to __builtin_testset() will contain the appropriate write to L2 memory. Assembler No assembler actions. Libraries No runtime library actions.

再進一步看一下__build_in_testset()的匯編代碼：

P1 = [FP + -40]; P0.L = 0x2C; P0.H = 0xfeb0; TESTSET (P1); [P0] = P0;

當沒有啟用cache的時候，這段代碼沒有任何問題，但是在啟用cache的時候，當 PC指針指向TESTSET (P1)，在執行之前，[P1]指向的內容已經變成了0x80（流水線的原因），再執行TESTSET的時候，當然返回值為0，所以造成了 adi_acquire_lock之後的死循環！

再仔細看一遍TESTSET的文檔，有這樣的語句：

The software designer is responsible for executing atomic operations in the proper cacheable / non-cacheable memory space. Typically, these operations should execute in non-cacheable, off-core memory. In a chip implementation that requires tight temporal coupling between processors or processes, the design should implement a dedicated, non-cacheable block of memory that meets the data latency requirements of the system.

原來居然都沒留意。

知道原因之後，最先想到的就是關閉cache，但是這樣必然造成系統效率的嚴重降低。另一種辦法是在TESTSET指令之前中斷流水線，不讓它運行，試寫了下面一段代碼：

static inline void uclinux_acquire_lock(testset_t *_t) {
 int tVal;
 csync();
 asm("/
      p0 = %0; /
       csync;   /
      testset (p0);    /
      if cc jump .out; /
.loop:   /
         csync;    /
         testset (p0); /
         if !cc jump .loop; /
.out:    /
         nop;/
      " ::"d"(_t)
      );
}

因為在uclinux內核中實際並沒有使用L2 CACHE，所以在這裡並沒有像文檔所說的插入在L2CACHE中讀寫的語句，僅僅在testset指令之前插入一條csync指令。看看csync的作用：

Use CSYNC to enforce a strict execution sequence on loads and stores or to conclude all transitional core states before reconfiguring the core modes. For example, issue CSYNC before configuring memory-mapped registers (MMRs). CSYNC should also be issued after stores to MMRs to make sure the data reaches the MMR before the next instruction is fetched.

Typically, the Blackfin processor executes all load instructions strictly in the order that they are issued and all store instructions in the order that they are issued. However, for performance reasons, the architecture relaxes ordering between load and store operations. It usually allows load operations to access memory out of order with respect to store operations. Further, it usually allows loads to access memory speculatively. The core may later cancel or restart speculative loads. By using the Core Synchronize or System Synchronize instructions and managing interrupts appropriately, you can restrict out-of-order and speculative behavior.

嘿嘿，試驗後搞定！

上一篇文章： uclinux-2008R1-RC8(bf561)到VDSP5的移植（39）：鏈接重排
下一篇文章： uclinux-2008R1-RC8(bf561)到VDSP5的移植(37)：_cplb_mgr

關於Linux

uclinux-2008R1-RC8(bf561)到VDSP5的移植(62)

uclinux-2008R1-RC8(bf561)到VDSP5的移植(61):KBUILD_MODNAME

uclinux-2008R1-RC8(bf561)到VDSP5的移植(60):current_text_addr

uclinux-2008R1-RC8(bf561)到VDSP5的移植(58)

uclinux-2008R1-RC8(bf561)到VDSP5的移植(57)

uclinux-2008R1-RC8(bf561)到VDSP5的移植(56):__grab_cache_page

uclinux-2008R1-RC8(bf561)到VDSP5的移植(52)：cache.s的問題

uclinux-2008R1-RC8(bf561)到VDSP5的移植(49)：kernel_thread_helper的問題