পৃষ্ঠাসমূহ

.

Search Your Article

Monday, April 3, 2017

Stream Editor - Strings

Substitute Command

Text substitution operations like "find and replace" are common in any text editor. In this section, we illustrate how SED performs text substitution. Given below is the syntax of the substitution command.
[address1[,address2]]s/pattern/replacement/[flags]
Here, address1 and address2 are the starting and ending addresses respectively, which can be either line numbers or pattern strings. Both these addresses are optional parameters. The pattern is the text which we want to replace with the replacement string. Additionally, we can specify optional flags with the SED.
In the books.txt file, we have used comma(,) to separate each column. Let us use vertical bar(|) to separate each column. To do this, replace comma(,) with vertical bar(|).
[jerry]$ sed 's/,/ | /' books.txt
On executing the above code, you get the following result:
1) A Storm of Swords | George R. R. Martin, 1216 
2) The Two Towers | J. R. R. Tolkien, 352 
3) The Alchemist | Paulo Coelho, 197 
4) The Fellowship of the Ring | J. R. R. Tolkien, 432 
5) The Pilgrimage | Paulo Coelho, 288 
6) A Game of Thrones | George R. R. Martin, 864 
If you observe carefully, only the first comma is replaced and the second remains as it is. Why? As soon as the pattern matches, SED replaces it with the replacement string and moves to the next line. By default, it replaces only the first occurrence. To replace all occurrences, use the global flag (g) with SED as follows:
[jerry]$ sed 's/,/ | /g' books.txt
On executing the above code, you get the following result:
1) A Storm of Swords | George R. R. Martin | 1216 
2) The Two Towers | J. R. R. Tolkien | 352 
3) The Alchemist | Paulo Coelho | 197 
4) The Fellowship of the Ring | J. R. R. Tolkien | 432 
5) The Pilgrimage | Paulo Coelho | 288 
6) A Game of Thrones | George R. R. Martin | 864
Now all occurrences of commas(,) are replaced with vertical bar(|).
We can instruct the SED to perform text substitution only when a pattern match succeeds. The following example replaces comma(,) with vertical bar(|) only when a line contains the pattern The Pilgrimage.
[jerry]$ sed '/The Pilgrimage/ s/,/ | /g' books.txt 
On executing the above code, you get the following result:
1) A Storm of Swords, George R. R. Martin, 1216 
2) The Two Towers, J. R. R. Tolkien, 352 
3) The Alchemist, Paulo Coelho, 197 
4) The Fellowship of the Ring, J. R. R. Tolkien, 432 
5) The Pilgrimage | Paulo Coelho | 288 
6) A Game of Thrones, George R. R. Martin, 864
In addition to this, SED can replace a specific occurrence of the pattern. Let us replace only the second instance of comma(,) with vertical bar(|).
[jerry]$ sed 's/,/ | /2' books.txt
On executing the above code, you get the following result:
1) A Storm of Swords, George R. R. Martin | 1216 
2) The Two Towers, J. R. R. Tolkien | 352 
3) The Alchemist, Paulo Coelho | 197 
4) The Fellowship of the Ring, J. R. R. Tolkien | 432 
5) The Pilgrimage,Paulo Coelho | 288 
6) A Game of Thrones, George R. R. Martin  | 864
In the above example, the number at the end of the SED command (or at the place of flag) implies the 2nd occurrence.
SED provides an interesting feature. After performing substitution, SED provides an option to show only the changed lines. For this purpose, SED uses the p flag which refers to print. The following example lists only changed lines.
[jerry]$ sed -n 's/Paulo Coelho/PAULO COELHO/p' books.txt
On executing the above code, you get the following result:
3) The Alchemist, PAULO COELHO, 197 
5) The Pilgrimage, PAULO COELHO, 288 
We can store changed lines in another file as well. To achieve this result, use the w flag. The following example shows how to do it.
[jerry]$ sed -n 's/Paulo Coelho/PAULO COELHO/w junk.txt' books.txt
We used the same SED command. Let us verify the contents of the junk.txt file.
[jerry]$ cat junk.txt
On executing the above code, you get the following result:
3) The Alchemist, PAULO COELHO, 197 
5) The Pilgrimage, PAULO COELHO, 288
To perform case-insensitive substitution, use the i flag which implies ignore case. The following example performs case-insensitive substitution.
[jerry]$ sed  -n 's/pAuLo CoElHo/PAULO COELHO/pi' books.txt
On executing the above code, you get the following result:
3) The Alchemist, PAULO COELHO, 197 
5) The Pilgrimage, PAULO COELHO, 288
So far, we have used only the foreslash(/) character as a delimiter, but we can also use vertical bar(|), at sign(@), caret(^), exclamation mark(!) as a delimiter. The following example shows how to use other characters as a delimiter.
Let us assume you need to replace the path /bin/sed with /home/jerry/src/sed/sed-4.2.2/sed. Hence, your SED command looks like this:
[jerry]$ echo "/bin/sed" | sed 's/\/bin\/sed/\/home\/jerry\/src\/sed\/sed-4.2.2\/sed/'
On executing the above code, you get the following result:
/home/jerry/src/sed/sed-4.2.2/sed
We can make this command more readable and easy to understand. Let us use vertical bar(|) as delimiter and see the result.
[jerry]$ echo "/bin/sed" | sed 's|/bin/sed|/home/jerry/src/sed/sed-4.2.2/sed|'
On executing the above code, you get the following result:
/home/jerry/src/sed/sed-4.2.2/sed
Indeed! We got the same result and the syntax is more readable. Similarly, we can use the "at" sign (@) as a delimiter as follows:
[jerry]$ echo "/bin/sed" | sed 's@/bin/sed@/home/jerry/src/sed/sed-4.2.2/sed@'
On executing the above code, you get the following result:
/home/jerry/src/sed/sed-4.2.2/sed 
In addition to this, we can use caret(^) as a delimiter.
[jerry]$ echo "/bin/sed" | sed 's^/bin/sed^/home/jerry/src/sed/sed-4.2.2/sed^'
On executing the above code, you get the following result:
/home/jerry/src/sed/sed-4.2.2/sed 
We can also use exclamation mark (!) as a delimiter as follows:
[jerry]$ echo "/bin/sed" | sed 's!/bin/sed!/home/jerry/src/sed/sed-4.2.2/sed!'
On executing the above code, you get the following result:
/home/jerry/src/sed/sed-4.2.2/sed 
Generally, backslash(/) is used as a delimiter but sometimes it is more convenient to use other supported delimiters with SED.

Creating a Substring

We learnt the powerful substitute command. Let us see if we can find a substring from a matched text. Let us understand how to do it with the help of an example.
Let us consider the following text:
[jerry]$ echo "Three One Two"
Suppose we have to arrange it into a sequence. Means, it should print One first, then Two, and finally Three. The following one-liner does the needful.
echo "Three One Two" | sed 's|\(\w\+\) \(\w\+\) \(\w\+\)|\2 \3 \1|'
Note that in the above example, vertical bar (|) is used as a delimiter.
In SED, substrings can be specified by using a grouping operator and it must be prefixed with an escape character, i.e., \( and \).
\w is a regular expression that matches any letter, digit, or underscore and "+" is used to match more than one characters. In other words, the regular expression \(\w\+\) matches the single word from the input string.
In the input string, there are three words separated by space, hence there are three regular expressions separated by space. The first regular expression stores the first word, i.e.,Three, the second stores the word One, and the third stores the word Two
These substrings are referred by \N, where N is the substring number. Hence, \2 prints the second substring, i.e., One; \3 prints the third substring, i.e., Two; and \1 prints the first substring, i.e., Three
Let us separate these words by commas(,) and modify the regular expression accordingly.
[jerry]$ echo "Three,One,Two" | sed 's|\(\w\+\),\(\w\+\),\(\w\+\)|\2,\3,\1|'
On executing the above code, you get the following result:
One,Two,Three
Note that now there is comma(,) instead of space in the regular expression.

String Replacement Flags (GNU SED only)

In the previous section, we saw some examples of the substitution command. The GNU SED provides some special escape sequences which can be used in the replacement string. Note that these string replacement flags are GNU specific and may not work with other variants of SED. Here we will discuss string replacement flags.
  • \L: When \L is specified in the replacement string, it treats all the remaining characters of the the word after \L as lowercase characters. For example, the characters "ULO" are treated as lowercase characters.
[jerry]$ sed -n 's/Paulo/PA\LULO/p' books.txt
On executing the above code, you get the following result:
3) The Alchemist, PAulo Coelho, 197
5) The Pilgrimage, PAulo Coelho, 288
  • \u: When \u is specified in the replacement string, it treats the immediate character after \u as an uppercase character. In the following example, \u is used before the characters 'a' and 'o'. Hence SED treats these characters as uppercase letters.
[jerry]$ sed -n 's/Paulo/p\uaul\uo/p' books.txt
On executing the above code, you get the following result:
3) The Alchemist, pAulO Coelho, 197 
5) The Pilgrimage, pAulO Coelho, 288
  • \U: When \U is specified in the replacement string, it treats all the remaining characters of the the word after \U as uppercase characters.
[jerry]$ sed -n 's/Paulo/\Upaulo/p' books.txt 
On executing the above code, you get the following result:
3) The Alchemist, PAULO Coelho, 197 
5) The Pilgrimage, PAULO Coelho, 288
  • \E: This flag should be used with \L or \U. It stops the conversion initiated by the flag \L or \U. In the following example, only the first word is replaced with uppercase letters.
[jerry]$ sed -n 's/Paulo Coelho/\Upaulo \Ecoelho/p' books.txt
On executing the above code, you get the following result:
3) The Alchemist, PAULO coelho, 197 
5) The Pilgrimage, PAULO coelho, 288

No comments:

Post a Comment