在UNIX下的管理性文件,大部分是不需要特殊的文件專用工具即可編輯,打印和閱讀的簡易文本文件。
這些文件大部分放在標准目錄:/etc下。如:
常見的密碼文件和組文件:(passwd,group)
文件系統加載表:(fstab,vfstab)
主機文件:(hosts)
默認的shell啟動文件:(profile)
系統啟動和關機的shell腳本:(存放於子目錄樹rc0.d,rc1.d ... rc6.d下)
從結構化文本文件中提取數據
練習1:切割passwd下第一,第七字段
[linuxidc@test ~]$ vi patest.sh
#!/bin/bash
umask 077
PERSON=/tmp/pd.key.person.$$
OFFICE=/tmp/pd.key.office.$$
TELEPHONE=/tmp/pd.key.telephone.$$
USER=/tmp/pd.key.user.$$
trap "exit 1" HUP INT PIPE QUIT TERM
trap "rm -f $PERSON $OFFICE $TELEPHONE $USER " EXIT
awk -F: '{ print $1 ":" $7 }' /etc/passwd > $USER
awk -F: '{ print $1}' < $USER | sort >$PERSON
sed -e 's=^[:]∗:[^/]*/[/]∗/.*$=\1:\2=' < $USER | sort >$OFFICE
sed -e 's=^[:]∗:[^/]*/[^/]*/[/]∗=\1:\2=' < $USER | sort >$TELEPHONE
join -t: $PERSON $OFFICE |
join -t: - $TELEPHONE |
sort -t: -k1,1 -k2,2 -k3,3 |
awk -F: '{ printf("%-39s\t%s\t%s\n", $1,$2,$3) }'
[linuxidc@test ~]$ chmod +x patest.sh
[linuxidc@test ~]$ bash +x patest.sh
adm sbin nologin
alert2system bin bash
alert2systemtest bin bash
avahi sbin nologin
bin sbin nologin
cvsroot bin bash
dbus sbin nologin
dovecot sbin nologin
ftp sbin nologin
ftpuser bin bash
games sbin nologin
gdm sbin nologin
git_test usr local/git/bin/git-shell
gopher sbin nologin
gup sbin nologin
....
練習2: 如果/etc/passwd 下第五字段包含姓名,辦公室號碼,電話等,
如下文檔,試建立辦公室名錄
[linuxidc@test ~]$ vi passwd1
gz_willwu:x:843:843:Will wu/SN091/555-6728:/home/gz_willwu:/bin/bash
ninf_thomaschan:x:853:853:Thomas chan/INF002/554-4565:/home/sninf_thomaschan:/bin/bash
llwu:x:843:843:Will wu/SN091/555-6728:/home/gz_willwu:/bin/bash
sninf_thomaschan:x:853:853:Thomas chan/INF002/554-4565:/home/sninf_thomaschan:/bin/bash
sninf_tonyhung:x:856:856:Tonny huang/HK0501/553-6465:/home/sninf_tonyhung:/bin/bash
gz_kinma:x:857:857:Kin ma/SN021/555-6733:/home/gz_kinma:/bin/bash
linuxidc:x:859:859:Field yang/SN001/555-6765:/home/linuxidc:/bin/bash
gz_hilwu:x:843:843:hil wu/SN021/555-6744:/home/gz_willwu:/bin/bash
步驟解析:
①[linuxidc@test ~]$ awk -F: '{ print $1 ":" $5 }' passwd1 |
> sed -e 's=/.*==' -e 's=^[:]∗:.∗ []∗=\1:\3, \2='
ninf_thomaschan:chan, Thomas
llwu:wu, Will
sninf_thomaschan:chan, Thomas
sninf_tonyhung:huang, Tonny
gz_kinma:ma, Kin
linuxidc:yang, Field
gz_willwu:wu, Will
# ^[:]∗ 匹配用戶名稱字段,如gz_willwu
# .∗□ 匹配文字到空白處,如will□wu
# []∗ 匹配剩下的非空白處,如will
# \1:\3, \2 引用第一個左括號匹配到的內容:第三個左括號匹配到的內容, 第二個左括號匹配到的內容
#結果如 sninf_thomaschan:chan, Thomas
②[linuxidc@test ~]$ awk -F: '{ print $1 ":" $5 }' passwd1 |
> sed -e 's=^[:]∗:[^/]*/[/]∗/.*$=\1:\2='
ninf_thomaschan:INF002
llwu:SN091
sninf_thomaschan:INF002
sninf_tonyhung:HK0501
gz_kinma:SN021
linuxidc:SN001
gz_willwu:SN091
③[linuxidc@test ~]$ awk -F: '{ print $1 ":" $5 }' passwd1 |
> sed -e 's=^[:]∗:[^/]*/[^/]*/[/]∗=\1:\2='
ninf_thomaschan:554-4565
llwu:555-6728
sninf_thomaschan:554-4565
sninf_tonyhung:553-6465
gz_kinma:555-6733
linuxidc:555-6765
gz_willwu:555-6728
實際運行腳本如下:建立辦公室名錄的腳本
[linuxidc@test ~]$ vi patest.sh
#!/bin/bash
# 過濾/etc/passwd之類的輸入流
#並以此書庫衍生出辦公室名錄
#
#
umask 077
PERSON=/tmp/pd.key.person.$$
OFFICE=/tmp/pd.key.office.$$
TELEPHONE=/tmp/pd.key.telephone.$$
USER=/tmp/pd.key.user.$$
trap "exit 1" HUP INT PIPE QUIT TERM
trap "rm -f $PERSON $OFFICE $TELEPHONE $USER " EXIT
awk -F: '{ print $1 ":" $5 }' passwd1 > $USER
sed -e 's=/.*==' \
# s=/.*== 刪除第一個/後直至行結尾所有內容,截取後結果如gz_willwu:Will wu
-e 's=^[:]∗:.∗ []∗=\1:\3, \2=' < $USER | sort >$PERSON
sed -e 's=^[:]∗:[^/]*/[/]∗/.*$=\1:\2=' < $USER | sort >$OFFICE
sed -e 's=^[:]∗:[^/]*/[^/]*/[/]∗=\1:\2=' < $USER | sort >$TELEPHONE
join -t: $PERSON $OFFICE |
#結合個人信息與辦公室位置
join -t: - $TELEPHONE |
#加入電話號碼
cut -d: -f 2- |
#刪除鍵值,使用cut截取字段2直至結束
sort -t: -k1,1 -k2,2 -k3,3 |
# 以:分隔字段,依次對字段1,2,3進行排序
awk -F: '{ printf("%-39s\t%s\t%s\n", $1,$2,$3) }'
#重新格式化輸出
附:
$# 是傳給腳本的參數個數
$0 是腳本本身的名字
$1 是傳遞給該shell腳本的第一個參數
$2 是傳遞給該shell腳本的第二個參數
$@ 是傳給腳本的所有參數的列表
$* 是以一個單字符串顯示所有向腳本傳遞的參數,與位置變量不同,參數可超過9個
$$ 是腳本運行的當前進程ID號
$? 是顯示最後命令的退出狀態,0表示沒有錯誤,其他表示有錯誤
[linuxidc@test ~]$ ./patest2.sh
chan, Thomas INF002 554-4565
chan, Thomas INF002 554-4565
huang, Tonny HK0501 553-6465
ma, Kin SN021 555-6733
wu, hil SN021 555-6744
wu, Will SN091 555-6728
yang, Field SN001 555-6765
[linuxidc@test ~]$
練習3:建立一個腳本,查詢匹配調節的特定文字
[linuxidc@test ~]$ vi puzzle-help.sh
#!/bin/bash
#通過一堆單詞列表,進行模式匹配
#語法: ./puzzle-help.sh egrep-pattern [word-list-file]
FILES="/usr/share/dict/words
/usr/dict/words
/usr/share/lib/dict/words
/usr/local/share/dict/words.biology
/usr/local/share/dict/words.chemistry
/usr/local/share/dict/words.general
/usr/local/share/dict/words.knuth
/usr/local/share/dict/words.latin
/usr/local/share/dict/words.manpages
/usr/local/share/dict/words.mathematics
/usr/local/share/dict/words.physics
/usr/local/share/dict/words.roget
/usr/local/share/dict/words.sciences
/usr/local/share/dict/words.UNIX
/usr/local/share/dict/words.webster
"
#FILES變量保存了單詞列表文件的內建列表,可供各個本地站點定制
pattern="$1"
egrep -h -i "$pattern" $FILES 2>/dev/null | sort -u -f
#grep -h :指示最後結果不要顯示文件名,-i:表示忽略大小寫
#sort -u :只有唯一的記錄,丟棄所有具相同鍵值的記錄
#sort -f :排序時忽略大小寫,均視為大寫字母
①[linuxidc@test ~]$ ./puzzle-help.sh '^b.....[xz]...$' | fmt
Babelizing bamboozled bamboozler bamboozles baronizing Bellinzona
Belshazzar bigamizing bilharzial Birobizhan botanizing Brontozoum
Buitenzorg bulldozers bulldozing
#匹配b開頭,中間任意五個字符,加上x/z,再加任意三個字符
②[linuxidc@test ~]$ ./puzzle-help.sh '[^aeiouy]{7}' /usr/dict/words |fmt
2,4,5-t A.M.D.G. arch-christendom arch-christianity A.R.C.S.
branch-strewn B.R.C.S. bright-striped drought-stricken earth-sprung
earth-strewn first-string K.C.M.G. latch-string light-spreading
light-struck Llanfairpwllgwyngyll night-straying night-struck
Nuits-St-Georges pgnttrp R.C.M.P. rock-'n'-roll R.S.V.P. scritch-scratch
scritch-scratching strength-bringing substrstrata thought-straining
tight-stretched tsktsks witch-stricken witch-struck world-schooled
world-spread world-strange world-thrilling
# 找出每行7個輔音字母的英文單詞
[linuxidc@test ~]$ ./puzzle-help.sh '[^aeiouy]{8}' /usr/dict/words |fmt
B.R.C.S. K.C.M.G. R.C.M.P. rock-'n'-roll R.S.V.P.
③[linuxidc@test ~]$ ./puzzle-help.sh '[aeiouy]{6}' /usr/dict/words |fmt
AAAAAA euouae
# 找出每行6個元音字母的英文單詞
[linuxidc@test ~]$ ./puzzle-help.sh '[aeiouy]{5}' /usr/dict/words |fmt
AAAAAA Aeaea Aeaean AIEEE ayuyu Bayeau Blueeye cadiueio Chaouia cooeeing
cooeyed cooeying euouae fooyoung gayyou Guauaenok Iyeyasu Jayuya
Liaoyang Mayeye miaoued miaouing Pauiie queueing Taiyuan taoiya theyaou
trans-Paraguayian ukiyoye Waiyeung
[linuxidc@test ~]$
練習4:試建立一個腳本,作為單詞出現頻率過濾器
[linuxidc@test ~]$ vi wf.sh
#!/bin/bash
#從標准輸入流讀取文本流,在輸出出現頻率最高的前n個單詞的列表
#附上出現頻率的計數,按照這幾計數由大到小排列
#輸出到標准輸出
#語法 : ./wf [n] < file
#
tr -cs A-Za-z\' '\n' |
#將非字母字符置換成換行符號,相當於:
# tr -cs [^[A-Za-z] '\n'
tr A-Z a-z |
sort |
uniq -c |
#去除重復,並顯示其計數
sort -k1,1nr -k2 |
#計數由大到小排序後,再按單詞由小到大排序
#sort -k:定義排序鍵值字段,按照那個字段(file)進行排序
#sort -n :依照數值的大小排序
#sort -r :以相反的順序來排序,由大到小
# sort -k1,1nr :表示從字段1起始處開始,以數值類型反向排序,並結束與字段1的結尾
sed ${1:-25}q
#顯示前n行,默認為25行
[linuxidc@test ~]$ vi test #隨意截取文段建立測試文件
Patent interference cases are historically rare; but they’ve become basically
non-existent since a change in the patent law in 2013. Today, patents are
awarded on a “first to file” basis. However, prior to 2013, patents were granted
on a “first to invent” basis, meaning whoever could prove they invented the idea
first would have rights to the patent. Since Doudna’s and Zhang’s patents were filed
before the switch went into effect, the case falls under the “first to invent” standard.
In the past, patent interference cases like this were concluded within a year,
Sherkow said, but given the value of this patent, it seems more than likely that
the losing party will appeal the decision. That process could stretch out for years.
測試實例:
①、默認情況下格式化輸出
[linuxidc@test ~]$ ./wf.sh < test | pr -c4 -t -w80
10 the 3 were 2 interferenc 2 they
5 patent 2 are 2 invent 2 this
5 to 2 basis 2 on 1 and
4 a 2 but 2 s 1 appeal
4 first 2 cases 2 since 1 awarded
3 in 2 could 2 that 1 basically
3 patents
#pr -cn:產生n欄的輸出 可縮寫為-n
#pr -t:不顯示標題
#pr -wn:每行至多n個字符
②、截取前面12行後格式化輸出
[linuxidc@test ~]$ ./wf.sh 12 < test | pr -c4 -t -w80
10 the 4 a 3 patents 2 basis
5 patent 4 first 3 were 2 but
5 to 3 in 2 are 2 cases
③、算出去除重復行後有多少單詞出現
[linuxidc@test ~]$ ./wf.sh 9999 < test | wc -l
82
[linuxidc@test ~]$ ./wf.sh 9999 < test | wc -w
164
[linuxidc@test ~]$ ./wf.sh 999 < test | wc -c
1153
# wc -l:計算行數 ,-c:計算字節數 , -w:計算字數
④、截取最不常見的出現的單詞
[linuxidc@test ~]$ ./wf.sh 999 < test | tail -n -12 | pr -c4 -t -w80
1 today 1 ve 1 will 1 year
1 under 1 went 1 within 1 years
1 value 1 whoever 1 would 1 zhang
⑤、計算出測試文檔中出現一次的單詞個數
[linuxidc@test ~]$ ./wf.sh 999 < test | grep -c '^ *1.'
62
#接在數字1後的.表示的是制表字符(Tab),參數999無意義,可任意取大於文檔字數的數字
#grep -c:統計每個文件匹配的行數
⑥、計算出經常出現的核心單詞個數
[linuxidc@test ~]$ ./wf.sh 999 < test | awk '$1 >=3' | wc -l
8
[linuxidc@test ~]$