歡迎來到Linux教程網
Linux教程網
Linux教程網
Linux教程網
Linux教程網 >> Linux基礎 >> Linux教程 >> Linux高級文本處理工具之sed

Linux高級文本處理工具之sed

日期:2017/2/28 13:43:46   编辑:Linux教程

sedStream Editor文本流編輯,sed是一個“非交互式的”面向字符流的編輯器。能同時處理多個文件多行的內容,可以不對原文件改動,把整個文件輸入到屏幕,可以把只匹配到模式的內容輸入到屏幕上。還可以對原文件改動,但是不會再屏幕上返回結果。

基本概念

一.sed命令的語法如下所示:

sed [options] script filename

sed命令的選項(option):

-n :只打印模式匹配的行

-e :多腳本運行,多點編輯,例如 -e script1 -e script2 -e script3

-f :將sed的動作寫在一個文件內,用–f filename 執行filename內的sed動作

-r :支持擴展表達式

-i :直接修改文件內容

同大多數Linux命令一樣,sed也是從stdin中讀取輸入,並且將輸出寫到stdout,但是當filename被指定時,則會從指定的文件中獲取輸入,輸出可以重定向到文件中,但是需要注意的是,該文件絕對不能與輸入的文件相同。

options是指sed的命令行參數,這一塊並不是重點,參數也不多。

script是指需要對輸入執行的一個或者多個操作指令(instruction),sed會依次讀取輸入文件的每一行到緩存中並應用script中指定的操作指令,因此而帶來的變化並不會影響最初的文件(注:如果使用sed時指定-i參數則會影響最初的文件)。如果操作指令很多,為了不影響可讀性,可以將其寫到文件中,並通過-f參數指定scriptfile:

sed -f scriptfile filename

說明:

這裡有一個建議,在命令行中指定的操作指令最好用單引號引起來,這樣可以避免shell對特殊字符的處理。

二、sed工作原理

1.讀入新的一行內容到緩存空間;

2.從指定的操作指令中取出第一條指令,判斷是否匹配pattern;

3.如果不匹配,則忽略後續的編輯命令,回到第2步繼續取出下一條指令;

4.如果匹配,則針對緩存的行執行後續的編輯命令;完成後,回到第2步繼續取出下一條指令;

5.當所有指令都應用之後,輸出緩存行的內容;回到第1步繼續讀入下一行內容;

6.當所有行都處理完之後,結束;

sed工作原理圖:

三、簡單例子

實例1:將MA替換為Massachusetts

[root@localhost ~]# cat list
John Daggett, 341 King Road, Plymouth MA
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Terry Kalkas, 402 Lans Road, Beaver Falls PA
Eric Adams, 20 Post Road, Sudbury MA
Hubert Sims, 328A Brook Road, Roanoke VA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
Sal Carpenter, 73 6th Street, Boston MA

[root@localhost ~]# sed -e 's@MA@Massachusetts@' list
John Daggett, 341 King Road, Plymouth Massachusetts
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Terry Kalkas, 402 Lans Road, Beaver Falls PA
Eric Adams, 20 Post Road, Sudbury Massachusetts
Hubert Sims, 328A Brook Road, Roanoke VA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
Sal Carpenter, 73 6th Street, Boston Massachusetts

實例2:這裡面的-e選項是可選的,這個參數只是在命令行中同時指定多個操作指令時才需要用到

[root@localhost ~]# sed -e 's/ MA/, Massachusetts/' -e 's/ PA/, Pennsylvania/' list 
John Daggett, 341 King Road, Plymouth, Massachusetts
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Terry Kalkas, 402 Lans Road, Beaver Falls, Pennsylvania
Eric Adams, 20 Post Road, Sudbury, Massachusetts
Hubert Sims, 328A Brook Road, Roanoke VA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
Sal Carpenter, 73 6th Street, Boston, Massachusetts

即使在多個操作指令的情況下,-e參數也不是必需的,我一般不會加-e參數,比如上面的例子可以換成下面的寫法:

[root@localhost ~]# sed 's/ MA/, Massachusetts/;s/ PA/, Pennsylvania/' list

說明:操作指令之間可以用逗號分隔,這點和shell命令可以用逗號分隔是一樣的。

實例3:只輸出修改過的內容

[root@localhost ~]# sed -n 's@MA@Massachusetts@p' list 
John Daggett, 341 King Road, Plymouth Massachusetts
Eric Adams, 20 Post Road, Sudbury Massachusetts
Sal Carpenter, 73 6th Street, Boston Massachusetts

說明sed命令是指定-n參數,該參數會抑制sed默認的輸出

模式空間與地址匹配

一、模式空間的轉換

sed只會緩存一行的內容在模式空間,這樣的好處是sed可以處理大文件而不會有任何問題,不像一些編輯器因為要一次性載入文件的一大塊內容到緩存中而導致內存不足。下面用一個簡單的例子來講解模式空間的轉換過程,如下圖所示:

現在要把一段文本中的Unix System與UNIX System都要統一替換成The UNIX Operating System,因此我們用兩句替換命令來完成這個目的:

s/Unix /UNIX /
s/UNIX System/UNIX Operating System/

對應上圖,過程如下:

1.首先一行內容The Unix System被讀入模式空間;

2.應用第一條替換命令將Unix替換成UNIX;

3.現在模式空間的內容變成The UNIX System;

4.應用第二條替換命令將UNIX System替換成UNIX Operating System;

5.現在模式空間的內容變成The UNIX Operating System;

6.所有編輯命令執行完畢,默認輸出模式空間中的行;

二、地址匹配

默認情況下,sed是全局匹配的,即對所有輸入行都應用指定的編輯命令,這是因為sed依次讀入每一行,每一行都會成為當前行並被處理,所以s/CA/California/g會將所有輸入行的CA替換成California。這一點跟vi/vim是不一樣的,眾所周知,vim的替換命令默認是替換當前行的內容,除非你指定%s才會作全局替換。

實例1:將list文件中包含Sal的行中MA替換為Massachusetts

[root@localhost ~]# sed -e /Sal/'s@MA@Massachusetts@' list 
John Daggett, 341 King Road, Plymouth MA
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Terry Kalkas, 402 Lans Road, Beaver Falls PA
Eric Adams, 20 Post Road, Sudbury MA
Hubert Sims, 328A Brook Road, Roanoke VA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
Sal Carpenter, 73 6th Street, Boston Massachusetts

說明:/Sal/是一個正則表達式匹配包含Sal的行

/Sal/是一個正則表達式匹配包含Sal的行,因此像行“San Francisco, CA”則不會被替換。

sed命令中可以包含0個、1個或者2個地址(地址對),地址可以為正則表達式(如/Sal/),行號或者特殊的行符號(如$表示最後一行):

● 如果沒有指定地址,默認將編輯命令應用到所有行;

●如果指定一個地址,只將編輯命令應用到匹配該地址的行;

●如果指定一個地址對(addr1,addr2),則將編輯命令應用到地址對中的所有行(包括起始和結束);

●如果地址後面有一個感歎號(!),則將編輯命令應用到不匹配該地址的所有行;

實例2:為了方便理解上述內容,我們以刪除命令(d)為例,默認不指定地址將會刪除所有行

[root@localhost ~]# sed 'd' list
[root@localhost ~]#

實例3:刪除制定的行

[root@localhost ~]# cat -n list
 1  John Daggett, 341 King Road, Plymouth MA
 2  Alice Ford, 22 East Broadway, Richmond VA
 3  Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
 4  Terry Kalkas, 402 Lans Road, Beaver Falls PA
 5  Eric Adams, 20 Post Road, Sudbury MA
 6  Hubert Sims, 328A Brook Road, Roanoke VA
 7  Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
 8  Sal Carpenter, 73 6th Street, Boston MA

[root@localhost ~]# sed '1d' list  #刪除list文件的第一行
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Terry Kalkas, 402 Lans Road, Beaver Falls PA
Eric Adams, 20 Post Road, Sudbury MA
Hubert Sims, 328A Brook Road, Roanoke VA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
Sal Carpenter, 73 6th Street, Boston MA

[root@localhost ~]# sed '$d' list #刪除list文件的最後一行
John Daggett, 341 King Road, Plymouth MA
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Terry Kalkas, 402 Lans Road, Beaver Falls PA
Eric Adams, 20 Post Road, Sudbury MA
Hubert Sims, 328A Brook Road, Roanoke VA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA

[root@localhost ~]# sed /MA/'d' list   #刪除包含MA的行
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Terry Kalkas, 402 Lans Road, Beaver Falls PA
Hubert Sims, 328A Brook Road, Roanoke VA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA

[root@localhost ~]# sed '/MA/d' list  #同上,也是刪除包含MA的行
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Terry Kalkas, 402 Lans Road, Beaver Falls PA
Hubert Sims, 328A Brook Road, Roanoke VA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA

實例4:通過指定地址對可以刪除該范圍內的所有行,例如刪除第3行到最後一行

[root@localhost ~]# cat list
John Daggett, 341 King Road, Plymouth MA
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Terry Kalkas, 402 Lans Road, Beaver Falls PA
Eric Adams, 20 Post Road, Sudbury MA
Hubert Sims, 328A Brook Road, Roanoke VA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
Sal Carpenter, 73 6th Street, Boston MA
[root@localhost ~]# sed '2,$d' list
John Daggett, 341 King Road, Plymouth MA

實例5:使用正則匹配,刪除從包含Alice的行開始到包含Hubert的行結束的所有行

[root@localhost ~]# cat list
John Daggett, 341 King Road, Plymouth MA
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Terry Kalkas, 402 Lans Road, Beaver Falls PA
Eric Adams, 20 Post Road, Sudbury MA
Hubert Sims, 328A Brook Road, Roanoke VA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
Sal Carpenter, 73 6th Street, Boston MA
[root@localhost ~]# sed '/Alice/,/Hubert/d' list
John Daggett, 341 King Road, Plymouth MA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
Sal Carpenter, 73 6th Street, Boston MA

實例6:行號和地址對是可以混用的

[root@localhost ~]# cat -n list
     1  John Daggett, 341 King Road, Plymouth MA
     2  Alice Ford, 22 East Broadway, Richmond VA
     3  Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
     4  Terry Kalkas, 402 Lans Road, Beaver Falls PA
     5  Eric Adams, 20 Post Road, Sudbury MA
     6  Hubert Sims, 328A Brook Road, Roanoke VA
     7  Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
     8  Sal Carpenter, 73 6th Street, Boston MA
[root@localhost ~]# sed '2,/Amy/d' list   #刪除第二行到Amy之間的所有行
John Daggett, 341 King Road, Plymouth MA
Sal Carpenter, 73 6th Street, Boston MA

實例7:如果在地址後面指定感歎號(!),則會將命令應用到不匹配該地址的行

[root@localhost ~]# sed '1,3!d' list   #表示刪除1到3行以外的行
John Daggett, 341 King Road, Plymouth MA
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK

實例8:執行多個編輯命令,sed中可以用{}來組合命令,就好比編程語言中的語句塊

[root@localhost ~]# cat -n list
     1  John Daggett, 341 King Road, Plymouth MA
     2  Alice Ford, 22 East Broadway, Richmond VA
     3  Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
     4  Terry Kalkas, 402 Lans Road, Beaver Falls PA
     5  Eric Adams, 20 Post Road, Sudbury MA
     6  Hubert Sims, 328A Brook Road, Roanoke VA
     7  Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
     8  Sal Carpenter, 73 6th Street, Boston MA
[root@localhost ~]# sed -n '1,4{s/ MA/, Massachusetts/;s/ PA/, Pennsylvania/;p}' list
John Daggett, 341 King Road, Plymouth, Massachusetts
Alice Ford, 22 East Broadway, Richmond VA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Terry Kalkas, 402 Lans Road, Beaver Falls, Pennsylvania

實例9:顯示list文件中的奇數行

[root@localhost ~]# sed -n '1~2p' list  #1~2表示從第一行開始步進單位為2行
John Daggett, 341 King Road, Plymouth MA
Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
Eric Adams, 20 Post Road, Sudbury MA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA

實例10:顯示list文件中的偶數行

     [root@localhost ~]# cat -n list
     1  John Daggett, 341 King Road, Plymouth MA
     2  Alice Ford, 22 East Broadway, Richmond VA
     3  Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
     4  Terry Kalkas, 402 Lans Road, Beaver Falls PA
     5  Eric Adams, 20 Post Road, Sudbury MA
     6  Hubert Sims, 328A Brook Road, Roanoke VA
     7  Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
     8  Sal Carpenter, 73 6th Street, Boston MA

    [root@localhost ~]# sed -n '2~2p' list
    Alice Ford, 22 East Broadway, Richmond VA
    Terry Kalkas, 402 Lans Road, Beaver Falls PA
    Hubert Sims, 328A Brook Road, Roanoke VA
    Sal Carpenter, 73 6th Street, Boston MA

實例11:顯示list文件中從第6行開往後的三行

[root@localhost ~]# sed -n '6,+3p' list
Hubert Sims, 328A Brook Road, Roanoke VA
Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
Sal Carpenter, 73 6th Street, Boston MA

說明:#,+n表明從數字#開始後邊的n行

Copyright © Linux教程網 All Rights Reserved