Slippery Rock University Dr. Deborah Whitfield Go Browns!



Chapter 10: Filters using REs


Regular expressions:
Syntax Description
zx 'z' followed by 'x'
zy$ ends in zy
^b starts with b
^b.*zy starts with b (^b) followed by any number of characters (.*) and containing zy
[aeiou] contains one of a, e, i, o, u,
[a-z] a lowercase letter
[A-Z] an uppercase letter


expr:
Command Description
expr perform arithmetic operations
expr value operation value
expr 3 + 4
7
man expr Check out all of the expr possibilities!
see the example


Basic Grep:
Command Description
grep zx /usr/share/dict/words find anything containing 'z' followed by 'x'
grep zy /usr/share/dict/words | more find anything containing 'z' followed by 'y' (note: piping to more is often a good idea when using grep, since output from grep may be very large)
grep zy$ /usr/share/dict/words | more find anything ending in 'zy'
grep ^b.*zy find anything starting with b (^b) followed by any number of characters (.*) and containing zy
grep ^[aeious].*zy find anything starting with a, e, i,o, u, s (^[aeious]) followed by any number of characters (.*) and containing zy
vowel=[aeiou] define vowels
grep ^$vowel.*zy /usr/share/dict/words find anything beginning with a vowel and containing 'zy'
constant=[a-z]
grep ^$constant.*zy /usr/share/dict/words
find anything beginning with a lowercase letter and containing 'zy'
grep ^z.*$vowel$ /usr/share/dict/words find anything beginning with z and ending with a vowel
w=/usr/share/dict/words
echo $w
/usr/share/dict/words
echo w
w
grep ^z.*z$ $w
grep ^y.*y$ $w
defining a symbol to refer to a file name (so as to be able to avoid typing it again and again)
see the example


More Grep:
Command Description
history lists commands you issued
history | wc count lines in history
history | grep ls | wc count number of times ls issued
history | grep cd | wc count number of times cd issued
history |grep grep | wc count number of times grep issued
see the example
 
more to come


Grep Flags:

C (count), V (negation), and I (ignore) flags


Command Description
w=/usr/share/dict/words
wc -l $w
How many words (lines) are in the whole dictionary?
grep -c ^e $w -c flag counts occurences.
How many start with an 'e'?
grep -c ^[^e] $w Using ^ inside of the braces, negates the character. So this counts the number of words that do not start with an e.
grep -c e $w How many contain an 'e'?
grep -c ^[^e]*$ $w How many contain only non-e's from start (^) to end ($)? .
Note:these last two counts should add up to the total number in the dictionary.
grep -cv e $w the same thing using the -v flag (look for non-matches) on grep.
grep -cv ^e $w How many do not start with an e
 
grep -c d $w how many words have a d?
grep -ic d $w how many words have d ignoring case?
grep -c t $w how many words have a t?
grep -ic t $w how many words have a t ignoring case?
grep -i d $w | grep -ic t how many have both?
$ grep -ic [dt] $w how many have either?
grep -icv [dt] $w how many have neither?
grep -ic ^[^dt]*$ $w Another way to determine how many have neither?
see the example

complex regular expressions -- counting and remembering

Command Description
w=/usr/share/dict/words
grep "[aeiou]\{5\}" $w Find words having five vowels in a row.
grep "a[b-df-hj-np-tv-xz]\{5\}" $w Find words having five consonants in a row, directly following an 'a'.
grep "\(.\)\(.\)\(.\)\(.\).*\1\2\3\4" $w Find words having a sequence of four letters repeated twice. Each parenthesized string is remembered and recalled through \n.
grep "\(....\).*\1" $w A shorter way to do the same thing, using only one remembered string.
grep "\(..\)\(.*\1\)\{2\}" $w Find a sequence of two characters and remember them. Then find any string followed by those same two characters and repeated twice.
see the example

sed info:

FORMAT

sed -f script filename

sed -n

sed -e 'instruction;instruction' file // for more than one file

INSTRUCTIONS

Delete

[address]d

address

^ start of line

$ end of line

1,\$d -- deletes 1st to last line

1,/^$/d -- deletes 1st through blank line

Substitute

[address]s/pattern/replacement/flags

replacement

& Replaced by the string matched by the RE

\n n is a single digit, Matches the nth substring

previously specified in the RE using "\(" and "\)"

\ Escape the & and \

flags

n -- any number from 1 to 512 -- replace only nth occurrence

g -- global

p -- prints

w file -- write to the file

Append

[line-address]a\

text

for multiple lines of text, end each line with a \

Insert

[line-address]i\

text

Change

[line-address]c\

text

Transform

[address]y/abc/xyz transforms each a to an x, b to a y, and c to z

Next

Read

Write


sed Examples:
Command Description
sed s/he/she/ file string substitution -
replace the first occurence of he on each line with she
sed s/he/she/g file string substitution -
replace all occurences of he on each line with she
sed 1s/he/she/g file string substitution -
replace all occurences of he on the first line with she
sed s/word.*/word/ file Replace word and all that follows it with word
sed s/word.*/word./ file Replace word and all that follows it with word adding a '.' at the end.
sed 2d file delete the second line from our display of the file.
sed 1,2d file deleting a range of lines (lines 1 through 2)
sed /more/d file deleting any line that matches a regular expression.
sed /more/p file duplicating any line that matches a regular expression
sed -n /more/p file Produce only the lines that match a regular expression.
The -n flag keeps sed "quiet" unless overtly told to print a line, through the 'p' command.
sed /test/s/is/at/ file Substitute the string 'is' by 'at' but only in lines containing a particular regular expression. Note that 'is' in the first line is unaffected.
sed s/i/x/2 file modify the 2nd occurence of 'i' in each line
sed = file the equal sign places line numbers into the output.
see the example

more sed

Command Description
sed -n /^[a-e].*xy$/p $w find the words beginning with a through e
and ending in 'xy'
grep ^[a-e].*xy$ $w | sed s/p/m/ find those words and substitute p by m
(first occurrence only)
grep ^[a-e].*xy$ $w | sed s/p/m/g find those words and substitute all p's by m's
('g' means "global")
sed -n /^[a-e].*xy$/s/p/m/gp $w similar to above, but showing only those lines affected
by the substitution of p by m.
see the example

All work herein is subject to copyright. Original content to Dr. Deborah Whitfield, text content (Your UNIX/Linux) to Prentice Hall publishing.