歡迎來到Linux教程網
Linux教程網
Linux教程網
Linux教程網
Linux教程網 >> Linux基礎 >> Linux教程 >> CentOS 6.5下VLAN設備的性能問題

CentOS 6.5下VLAN設備的性能問題

日期:2017/2/28 13:56:52   编辑:Linux教程

問題描述

之前做的一些網絡性能的測試都是在三層網絡測試的,最近在大二層網絡重新測試TDocker的網絡性能時,發現物理機的性能比容器還差,在容器內部可以跑60w+,物理機器卻只能跑45w+。這與100w+的預期相差太遠。

由於在大二層的網絡下引入了VLAN設備(由於linux bridge不支持VLAN而引入),所以初步懷疑問題出在VLAN network device。

使用perf看一下,發現dev_queue_xmit中的一個spin lock占用了大量的CPU,達到70%+。

但是,在3.10.x的內核下卻沒有這個問題:

從上面可以看到,在3.10.x內核下,內核spin lock的開銷很小。另外,從後者的調用的路徑可以看到,spin lock主要出現在sk_buff從VLAN設備下發物理網卡,而不是從協議棧下發VLAN設備。看來,對於CentOS6.5(2.6.32-431),問題主要出現在VLAN設備。

原因分析

先看看dev_queue_xmit函數,它是協議棧到底層網絡設備的入口。

dev_queue_xmit

//net/core/dev.c
int dev_queue_xmit(struct sk_buff *skb)
{
    struct net_device *dev = skb->dev;
    struct netdev_queue *txq;
    struct Qdisc *q;
...
    txq = netdev_pick_tx(dev, skb);
    q = rcu_dereference(txq->qdisc);

    trace_net_dev_queue(skb);
    if (q->enqueue) { ///對於VLAN設備,沒有qdisc隊列,參考noqueue_qdisc
        rc = __dev_xmit_skb(skb, q, dev, txq);
        goto out;
    }

    /* The device has no queue. Common case for software devices:
       loopback, all the sorts of tunnels...

       Really, it is unlikely that netif_tx_lock protection is necessary
       here.  (f.e. loopback and IP tunnels are clean ignoring statistics
       counters.)
       However, it is possible, that they rely on protection
       made by us here.

       Check this and shot the lock. It is not prone from deadlocks.
       Either shot noqueue qdisc, it is even simpler 8)
     */
    if (dev->flags & IFF_UP) {
        int cpu = smp_processor_id(); /* ok because BHs are off */

        if (txq->xmit_lock_owner != cpu) {

            HARD_TX_LOCK(dev, txq, cpu);

            if (!netif_tx_queue_stopped(txq)) {
                rc = NET_XMIT_SUCCESS;
                if (!dev_hard_start_xmit(skb, dev, txq)) {
                    HARD_TX_UNLOCK(dev, txq);
                    goto out;
                }
            }
            HARD_TX_UNLOCK(dev, txq);
        } 
    }

    rc = -ENETDOWN;
    rcu_read_unlock_bh();

可以看到,內核在把sk_buff下發給網絡設備驅動之前,會嘗試請求隊列的xmit_lock,這是為了防止SMP多個CPU同時給driver下發數據。實際上,大部分driver自身內部已經實現了lock,所以,這裡的xmit_lock顯得有點多余。所以,內核引入了NETIF_F_LLTX,如果驅動已經實現了lock,就會設置NETIF_F_LLTX標志位,這樣,內核在調用dev_queue_xmit時,就不會對xmit_lock加鎖了。

TX_LOCK

#define HARD_TX_LOCK(dev, txq, cpu) {           \
    if ((dev->features & NETIF_F_LLTX) == 0) {  \
        __netif_tx_lock(txq, cpu);      \
    }                       \
}

static inline void __netif_tx_lock(struct netdev_queue *txq, int cpu)
{
    spin_lock(&txq->_xmit_lock);
    txq->xmit_lock_owner = cpu;
}

從上面的代碼可以看到,如果網絡設備設置了NETIF_F_LLTX,內核就不會對xmit_lock加鎖。

但是CentOS6.5(2.6.32-431)的內核,對於VLAN設備,卻沒有設置NETIF_F_LLTX,由於VLAN設備只有一個隊列,必然導致xmit_lock競爭,使得sys CPU高達70%多。

  • 2.6.32-431
static int vlan_dev_init(struct net_device *dev)
{
    struct net_device *real_dev = vlan_dev_info(dev)->real_dev;
...
    /* IFF_BROADCAST|IFF_MULTICAST; ??? */
    dev->flags  = real_dev->flags & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI);
    dev->iflink = real_dev->ifindex;
    dev->state  = (real_dev->state & ((1<<__LINK_STATE_NOCARRIER) |
                      (1<<__LINK_STATE_DORMANT))) |
              (1<<__LINK_STATE_PRESENT);

    dev->features |= real_dev->features & real_dev->vlan_features;
...
  • 3.10.x

而在3.10.x的內核,對於VLAN設備,也只有一個隊列,為什麼卻沒有性能問題呢?

實際上,3.10.x的內核,對於VLAN設備,設置了NETIF_F_LLTX,僅管只有一個隊列,也不會有xmit_lock的開銷。

static int vlan_dev_init(struct net_device *dev)
{
    struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
...
    /* IFF_BROADCAST|IFF_MULTICAST; ??? */
    dev->flags  = real_dev->flags & ~(IFF_UP | IFF_PROMISC | IFF_ALLMULTI |
                      IFF_MASTER | IFF_SLAVE);
    dev->iflink = real_dev->ifindex;
    dev->state  = (real_dev->state & ((1<<__LINK_STATE_NOCARRIER) |
                      (1<<__LINK_STATE_DORMANT))) |
              (1<<__LINK_STATE_PRESENT);

    dev->hw_features = NETIF_F_ALL_CSUM | NETIF_F_SG |
               NETIF_F_FRAGLIST | NETIF_F_ALL_TSO |
               NETIF_F_HIGHDMA | NETIF_F_SCTP_CSUM |
               NETIF_F_ALL_FCOE;

    dev->features |= real_dev->vlan_features | NETIF_F_LLTX;

查看網絡設備features

一般來說,我們可以通過ethtool -k 查看網絡設備的feature:

  • 2.6.32-431
# ethtool  -k eth1.11
Features for eth1.11:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: off
ntuple-filters: off
receive-hashing: off

對於CentOS6.5(2.6.32-431),是從/sys/class/net/${ethX}/features讀取features:

#cat /sys/class/net/eth1.11/features
0x114833
--------------------------
1 0001 0100 1000 0011 0011  0x114833
          1 0000 0000 0000  NETIF_F_LLTX    4096
            1000 0000 0000  NETIF_F_GSO     2048
     1 0000 0000 0000 0000  NETIF_F_TSO     1<<16
        100 0000 0000 0000  NETIF_F_GRO     16384
                        01  NETIF_F_SG      1
                        10  NETIF_F_IP_CSUM 2
                    1 0000  NETIF_F_IPV6_CSUM  16
                   10 0000  NETIF_F_HIGHDMA  32
1 0000 0000 0000 0000 0000  NETIF_F_TSO6  (1<<20)

可以看到,CentOS6.5的內核對於VLAN設備,沒有設置NETIF_F_LLTX標志。

  • 3.10.x

對於3.10.x內核,已經沒有/sys/class/net/${ethX}/features,但是內核支持ETHTOOL_GFEATURES命令(2.6.32-431不支持該命令),ethtool通過ETHTOOL_GFEATURES獲取網絡設備的features:

//net/core/ethtool.c
int dev_ethtool(struct net *net, struct ifreq *ifr)
{   
    case ETHTOOL_GFEATURES:
        rc = ethtool_get_features(dev, useraddr);
        break;
# ./ethtool -k eth1.11 | grep tx-lockless
tx-lockless: on [fixed]
# ./ethtool -k eth1 | grep tx-lockless   
tx-lockless: off [fixed]

從上面可以確認,3.10.x的內核對VLAN設備的確設置了NETIF_F_LLTX標志。

  • ethtool的實現
//ethtool-3.5
static struct feature_state *
get_features(struct cmd_context *ctx, const struct feature_defs *defs)
{
...
    if (defs->n_features) { ///內核支持ETHTOOL_GFEATURES
        state->features.cmd = ETHTOOL_GFEATURES;
        state->features.size = FEATURE_BITS_TO_BLOCKS(defs->n_features);
        err = send_ioctl(ctx, &state->features);
        if (err)
            perror("Cannot get device generic features");
        else
            allfail = 0;
    } else {
        /* We should have got VLAN tag offload flags through
         * ETHTOOL_GFLAGS.  However, prior to Linux 2.6.37
         * they were not exposed in this way - and since VLAN
         * tag offload was defined and implemented by many
         * drivers, we shouldn't assume they are off.
         * Instead, since these feature flag values were
         * stable, read them from sysfs.
         */
        char buf[20]; ///從/sys/class/net/%s/features讀取features
        if (get_netdev_attr(ctx, "features", buf, sizeof(buf)) > 0)
            state->off_flags |=
                strtoul(buf, NULL, 0) &
                (ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN);
    }


static int get_netdev_attr(struct cmd_context *ctx, const char *name,
            char *buf, size_t buf_len)
{
#ifdef TEST_ETHTOOL
    errno = ENOENT;
    return -1;
#else
    char path[40 + IFNAMSIZ];
    ssize_t len;
    int fd;

    len = snprintf(path, sizeof(path), "/sys/class/net/%s/%s",
               ctx->devname, name);
    assert(len < sizeof(path));
    fd = open(path, O_RDONLY);
    if (fd < 0)
        return fd;
    len = read(fd, buf, buf_len - 1);
    if (len >= 0)
        buf[len] = 0;
    close(fd);
    return len;
#endif
} 

更多CentOS相關信息見CentOS 專題頁面 http://www.linuxidc.com/topicnews.aspx?tid=14

Copyright © Linux教程網 All Rights Reserved