歡迎來到Linux教程網
Linux教程網
Linux教程網
Linux教程網
Linux教程網 >> Linux基礎 >> Linux教程 >> Linux高級文本處理之gawk printf命令與函數

Linux高級文本處理之gawk printf命令與函數

日期:2017/2/28 13:43:52   编辑:Linux教程

一、使用printf格式化輸出

printf 可以非常靈活、簡單地以你期望的格式輸出結果。

語法:

printf "print format", variable1,variable2,etc.

printf 中的特殊字符

printf 不會使用 OFS 和 ORS,它只根據”format”裡面的格式打印數據。

printf 格式化字符

實例1:

[root@localhost ~]# cat pri.awk 
BEGIN {
    printf "s--> %s\n", "String"
    printf "c--> %c\n", "String"
    printf "s--> %s\n", 101.23
    printf "d--> %d\n", 101,23
    printf "e--> %e\n", 101,23
    printf "f--> %f\n", 101,23
    printf "g--> %g\n", 101,23
    printf "o--> %o\n", 0x8
    printf "x--> %x\n", 16
    printf "percentage--> %%\n", 17
}
[root@localhost ~]# awk -f pri.awk 
s--> String
c--> S
s--> 101.23
d--> 101
e--> 1.010000e+02
f--> 101.000000
g--> 101
o--> 10
x--> 10
percentage--> %

printf中修飾字符:

修飾符:#[.#] 第一個數字控制顯示的寬度;第二個#表示小數點後精度

– 左對齊(默認右對齊)%-15s

+ 顯示數值的正負符號 %+d,0也會添加正號

$ 如果要在價錢之前加上美元符號,只需在格式化字符串之前(%之前)加上$即可

0 左邊補 0 (而不是空格),在指定寬度的數字前面加一個 0,例如使用"%05s"代替"%5s"

實例2:

[root@localhost ~]# awk 'BEGIN { printf "|%6s%7.3f|\n", "Good","2.1" }'  
|  Good  2.100|
[root@localhost ~]# awk 'BEGIN { printf "|%-6s%-7.3f|\n", "Good","2.1" }'
|Good  2.100  |

把結果重定向到文件:

Awk 中可以把 print 語句打印的內容重定向到指定的文件中。

實例3:

[root@localhost ~]# awk 'BEGIN{a=5;printf "%3d\n",a> "report.txt"}'
[root@localhost ~]# cat report.txt 
  5

另一種方法使用awk -f script.awk file > redirectfile

awk腳本執行方式:

實例4:

[root@localhost ~]# cat fz.awk      
#!/bin/awk -f
BEGIN {
FS=",";
OFS=",";
total1 = total2 = total3 = total4 = total5 = 10;
total1 += 5; print total1;
total2 -= 5; print total2;
total3 *= 5; print total3;
total4 /= 5; print total4;
total5 %= 5; print total5;
}
[root@localhost ~]# chmod +x fz.awk   
[root@localhost ~]# ./fz.awk        
15
5
50
2
0

二、awk內置函數與自定義函數

數值處理函數:

rand()函數

rand()函數用於產生 0~1 之間的隨機數,它只返回 0~1 之間的數,絕不會返回 0 或 1。這些 數在 awk 運行時是隨機的,但是在多次運行中,又是可預知的。

實例1:產生 1000 個隨機數(0 到 100 之間)

[root@localhost ~]# cat occ.awk 
BEGIN {
    while(i<1000)
    {
        n = int(rand()*100);
        rnd[n]++;
        i++;
    }
    for(i=0;i<=100;i++)
    {
        print i,"Occured",rnd[i],"times";
    }
}
[root@localhost ~]# awk -f occ.awk 
0 Occured 11 times
1 Occured 8 times
2 Occured 9 times
3 Occured 15 times
4 Occured 16 times
5 Occured 5 times
6 Occured 8 times
7 Occured 9 times
8 Occured 7 times
9 Occured 7 times
10 Occured 11 times
11 Occured 7 times
12 Occured 10 times
13 Occured 9 times
14 Occured 6 times
15 Occured 18 times
16 Occured 10 times
17 Occured 10 times
18 Occured 9 times
19 Occured 8 times
20 Occured 11 times
21 Occured 13 times
22 Occured 10 times
23 Occured 9 times
24 Occured 15 times
25 Occured 8 times
26 Occured 3 times
27 Occured 17 times
28 Occured 9 times
29 Occured 13 times
30 Occured 11 times
31 Occured 9 times
32 Occured 12 times
33 Occured 12 times
34 Occured 9 times
35 Occured 6 times
36 Occured 13 times
37 Occured 15 times
38 Occured 6 times
39 Occured 9 times
40 Occured 7 times
41 Occured 8 times
42 Occured 6 times
43 Occured 8 times
44 Occured 10 times
45 Occured 7 times
46 Occured 10 times
47 Occured 8 times
48 Occured 16 times
49 Occured 12 times
50 Occured 6 times
51 Occured 15 times
52 Occured 6 times
53 Occured 12 times
54 Occured 8 times
55 Occured 13 times
56 Occured 6 times
57 Occured 16 times
58 Occured 5 times
59 Occured 7 times
60 Occured 11 times
61 Occured 12 times
62 Occured 14 times
63 Occured 11 times
64 Occured 9 times
65 Occured 6 times
66 Occured 7 times
67 Occured 10 times
68 Occured 8 times
69 Occured 12 times
70 Occured 13 times
71 Occured 9 times
72 Occured 10 times
73 Occured 11 times
74 Occured 7 times
75 Occured 13 times
76 Occured 13 times
77 Occured 10 times
78 Occured 5 times
79 Occured 12 times
80 Occured 17 times
81 Occured 8 times
82 Occured 7 times
83 Occured 10 times
84 Occured 12 times
85 Occured 12 times
86 Occured 11 times
87 Occured 14 times
88 Occured 4 times
89 Occured 8 times
90 Occured 15 times
91 Occured 10 times
92 Occured 15 times
93 Occured 8 times
94 Occured 11 times
95 Occured 5 times
96 Occured 12 times
97 Occured 11 times
98 Occured 7 times
99 Occured 11 times
100 Occured  times

注意:可見rand()函數產生的隨機數重復概率很高。

srand(n)函數

srand(n)函數使用給定的參數 n 作為種子來初始化隨機數的產生過程。不論何時啟動, awk 只會從 n 開始產生隨機數,如果不指定參數 n, awk 默認使用當天的時間作為產生隨機數的 種子。

實例2:產生 5 個從 5 到 50 的隨機數

[root@localhost ~]# cat srand.awk 
BEGIN {
    #Initialize the sedd with 5.
    srand(5);
    #Totally I want to generate 5 numbers
    total = 5;
    #maximun number is 50
    max = 50;
    count = 0;
    while(count < total)
    {
        rnd = int(rand()*max);
        if( array[rnd] == 0 )
        {
            count++;
            array[rnd]++;
        }
    }
    for ( i=5;i<=max;i++)
    {
        if (array[i])
            print i;}
    }
[root@localhost ~]# awk -f srand.awk 
14
16
23
33
35

常用字符串函數

length函數:

length([S]) 返回指定字符串長度。

實例1:length函數

[root@bash ~]# awk 'BEGIN{print length("young")}'
5

sub函數:

sub(r,s,[t]) 對t字符串進行搜索r表示的模式匹配的內容(可使用正則匹配),並將第一個匹配的內容替換為s代表的字符串。

實例1:

[root@bash ~]# awk 'BEGIN{a="geek young";sub("young","xixi",a);print a}' 
geek xixi  #注意字符串要用引號

實例2:

[root@bash ~]# echo "geek young hahahaha"|awk '
>{sub(/\<young\>/,"xixi",$2);  #正則匹配模式中字符串不加引號
>print $2}'   
xixi

實例3:

[root@bash ~]# echo "2008:08:08:08 08:08:08" | awk 'sub(/:/,"",$1)'
200808:08:08 08:08:08

實例4:

[root@bash ~]# cat sub.awk
BEGIN {
state="CA is California"
sub("C[Aa]","KA",state);
print state;
}
[root@bash ~]# awk -f sub.awk
KA is California

gsub函數:

gsub([r,s,[t]]) 對t字符串進行搜索r表示的模式匹配的內容(可使用正則匹配),並全部替換為s。

實例1:

[root@bash ~]# echo "2008:08:08:08 08:08:08" | awk 'gsub(/:/,"",$1)'
2008080808 08:08:08

split函數:

split(s,array,[r]) 以r為分割符切割字符s,並將切割後的結果存至array表示的數組中第一個索引值為1,第二個索引值為2,…。

實例1:

[root@bash ~]# echo "192.168.1.1:80"|awk '
>{split($1,ip,":");
>print ip[1],"----",ip[2]}'                       
192.168.1.1 ---- 80

實例2:

[root@bash ~]# netstat -tan | awk '
>/^tcp\>/{split($5,ip,":");
>count[ip[1]]++}  #將一個數組的值作為另一個數組的索引並自加通常用來計算重復次數
>END{for (i in count){print i,count[i]}}'
116.211.167.193 3
0.0.0.0 4
192.168.1.116 1

實例3:

[root@bash ~]# cat items-sold1.txt   
101:2,10,5,8,10,12
102:0,1,4,3,0,2
103:10,6,11,20,5,13
104:2,3,4,0,6,5
105:10,2,5,7,12,6
[root@bash ~]# cat split.awk
BEGIN {
FS=":"
} {
split($2,quantity,",");
total=0;
for(x in quantity)
total=total+quantity[x];
print "Item",$1,":",total,"quantities sold";
}
[root@bash ~]# awk -f split.awk items-sold1.txt
Item 101 : 47 quantities sold
Item 102 : 10 quantities sold
Item 103 : 65 quantities sold
Item 104 : 20 quantities sold
Item 105 : 42 quantities sold

substr 函數

語法:

substr(input-string,location,length)
  • substr 函數從字符串中提取指定的部分(子串),上面語法中:

  • input-string:包含子串的字符串

  • location:子串的開始位置

  • length:從 location 開始起,出去的字符串的總長度。這個選項是可選的,如果不指

  • 定長度,那麼從 location 開始一直取到字符串的結尾

實例1:從字符串的第 5 個字符開始,取到字符串結尾並打印出來

[root@localhost ~]# cat items.txt 
101,HD Camcorder,Video,210,10
102,Refrigerator,Appliance,850,2
103,MP3 Player,Audio,270,15
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5
[root@localhost ~]# awk '{ print substr($0,5) }' items.txt
HD Camcorder,Video,210,10
Refrigerator,Appliance,850,2
MP3 Player,Audio,270,15
Tennis Racket,Sports,190,20
Laser Printer,Office,475,5

實例2:從第 2 個字段的第 1 個字符起,打印 5 個字符

[root@localhost ~]# awk -F"," '{ print substr($2,1,5) }' items.txt
HD Ca
Refri
MP3 P
Tenni
Laser

調用shell函數

雙向管道 |&

awk 可以使用”|&”和外部進程通信,這個過程是雙向的。

實例1:

[root@localhost ~]# cat doub.awk 
BEGIN {
    command = "sed 's/Awk/Sed and Awk/'"
    print "Awk is Great!" |& command
    close(command,"to");  #awk中同時只能存在一個管道
    command |& getline tmp
    print tmp;
    close(command);
}
[root@localhost ~]# awk -f doub.awk 
Sed and Awk is Great!

說明:”|&”表示這裡是雙向管道。 ”|&”右邊命令的輸入來自左邊命令的輸出。close(command,"to") – 一旦命令執行完成,應該關閉”to”進程。 command |& getline tmp –既然命令已經執行完成,就要用 getline 獲取其輸出。前面命令的輸出會被存在變量”tmp”中。close(command) 最後,關閉命令。

system系統函數

執行系統命令時,可以傳遞任意的字符串作為命令的參數,它會被當做操作系統命令准確第執行,並返回結果(這和雙向管道有所不同)。

實例1:

[root@localhost ~]# awk 'BEGIN{system("hostname");}' #不用加print命令
localhost.localdomain  
[root@localhost ~]# awk 'BEGIN{system("pwd")}'
/root
[root@localhost ~]# awk 'BEGIN{system("date")}'
Fri Jan 20 23:57:55 CST 2017

getline函數

geline 命令可以控制 awk 從輸入文件(或其他文件)讀取數據。注意,一旦 getline執行完成, awk 腳本會重置 NF,NR,FNR 和$0 等內置變量。

實例1:

[root@localhost ~]# cat items.txt 
101,HD Camcorder,Video,210,10
102,Refrigerator,Appliance,850,2
103,MP3 Player,Audio,270,15
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5
[root@localhost ~]# awk -F"," '
>{getline;print $0;}' items.txt #類似sed中n命令改變awk執行流程
102,Refrigerator,Appliance,850,2
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5
  • 開始執行 body 區域時,執行任何命令之前, awk 從 items.txt 文件中讀取第一行數據,保存在變量$0 中

  • getline – 我們用 getline 命令強制 awk 讀取下一行數據,保存在變量$0 中(之前的內容被覆蓋掉了)

  • print $0 –既然現在$0 中保存的是第二行數據, print $0 會打印文件第二行(而不是第一行)

  • body 區域繼續執行,只打印偶數行的數據。 (注意到最後一行 105 也打印了 )

除了把 getline 的內容放到$0 中,還可以把它保存在變量中。

實例2:打印奇數行

[root@localhost ~]# awk -F"," '{getline tmp; print $0;}' items.txt
101,HD Camcorder,Video,210,10
103,MP3 Player,Audio,270,15
105,Laser Printer,Office,475,5

說明:

  • 開始執行 body 區域時,執行任何命令之前, awk 從 items.txt 文件中讀取第一行數據,保存在變量$0 中

  • getline tmp – 強制 awk 讀取下一行,並保存在變量 tmp 中

  • print $0 – 此時$0 仍然是第一行數據,因為 getline tmp 沒有覆蓋$0,因此會打印第一行數據(而不是第二行)

  • body 區域繼續執行,只打印奇數行的數據。

實例3:從其他的文件 getline 內容到變量中

[root@localhost ~]# cat items.txt 
101,HD Camcorder,Video,210,10
102,Refrigerator,Appliance,850,2
103,MP3 Player,Audio,270,15
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5
[root@localhost ~]# cat items-sold.txt 
101 2 10 5 8 10 12
102 0 1 4 3 0 2
103 10 6 11 20 5 13
104 2 3 4 0 6 5
105 10 2 5 7 12 6
[root@localhost ~]# awk -F"," '{
>print $0; 
>getline tmp < "items-sold.txt";
>print tmp;}' items.txt
101,HD Camcorder,Video,210,10
101 2 10 5 8 10 12
102,Refrigerator,Appliance,850,2
102 0 1 4 3 0 2
103,MP3 Player,Audio,270,15
103 10 6 11 20 5 13
104,Tennis Racket,Sports,190,20
104 2 3 4 0 6 5
105,Laser Printer,Office,475,5
105 10 2 5 7 12 6

實例4:getline 執行外部命令

[root@localhost ~]# cat get.awk 
BEGIN {
    FS=",";
    "date" | getline
    close("date")
    print "Timestamp:" $0
} {
if ( $5 <= 5)
    print "Buy More:Order",$2,"immediately!"
else
    print "Sell More:Give discount on",$2,"immediatelty!"
}
[root@localhost ~]# cat items.txt 
101,HD Camcorder,Video,210,10
102,Refrigerator,Appliance,850,2
103,MP3 Player,Audio,270,15
104,Tennis Racket,Sports,190,20
105,Laser Printer,Office,475,5
[root@localhost ~]# awk -f get.awk items.txt 
Timestamp:Sat Jan 21 00:23:53 CST 2017
Sell More:Give discount on HD Camcorder immediatelty!
Buy More:Order Refrigerator immediately!
Sell More:Give discount on MP3 Player immediatelty!
Sell More:Give discount on Tennis Racket immediatelty!
Buy More:Order Laser Printer immediately!

實例5:除了把命令輸出保存在$0 中之外,也可以把它保存在任意的 awk 變量中

[root@localhost ~]# cat get2.awk              
BEGIN {FS=",";
    "date" | getline timestamp
    close("date")
    print "Timestamp:" timestamp
} {
if ( $5 <= 5)
    print "Buy More: Order",$2,"immediately!"
else
    print "Sell More: Give discount on",$2,"immediately!"
}
[root@localhost ~]# awk -f get2.awk items.txt 
Timestamp:Sat Jan 21 00:26:29 CST 2017
Sell More: Give discount on HD Camcorder immediately!
Buy More: Order Refrigerator immediately!
Sell More: Give discount on MP3 Player immediately!
Sell More: Give discount on Tennis Racket immediately!
Buy More: Order Laser Printer immediately!

awk自定義函數

格式:

function name ( parameter, parameter, ... ) {
statements
return expression
}

實例1:

[root@localhost ~]# cat fun.awk function max(v1,v2) { v1>v2?var=v1:var=v2 return var } BEGIN{a=3;b=2;print max(a,b)} [root@localhost ~]# awk -f fun.awk 3
Copyright © Linux教程網 All Rights Reserved