Usage of sed, tr,uniq and awk shell commands.
Hola!
There are many data manipulation shell commands available out there here are a few I used and found interesting.
- sed (Stream Editor):
sed
is a powerful stream editor used for text manipulation. It takes input from a file, a pipeline, or standard input and applies specified text transformation operations to each line of the input. The transformations are based on commands provided tosed
. Common operations include searching, replacing, inserting, deleting, and more.sed
is particularly useful for editing large amounts of text or automating text processing tasks. - tr (Translate):
tr
is a command-line utility used for character translation or deletion. It takes input from a file or standard input, processes it, and produces the output to standard output or to a file.tr
is commonly used to replace or remove specific characters in a text stream. It can also be used to squeeze multiple consecutive occurrences of a character into a single instance, and it can be helpful for tasks like converting between character sets or transliterating text. - uniq (Unique):
uniq
is a command-line tool used to filter out repeated lines from a sorted file. It reads input from a file or standard input where each line is sorted, and it outputs only the unique lines to standard output. If there are duplicate lines adjacent to each other in the input,uniq
ensures that only one copy of such lines appears in the output. To work effectively,uniq
expects the input to be sorted since it only compares adjacent lines for uniqueness. - awk
awk
is a versatile and powerful text-processing tool and programming language. It is designed for data extraction and reporting tasks.awk
operates on a line-by-line basis, processing input from a file or standard input. It allows users to define patterns and actions, where patterns specify which lines to process, and actions define what operations to perform on those lines.awk
excels at handling structured text, where data is organized in columns. It is commonly used for tasks like parsing log files, generating reports, and performing various data manipulations.
Example:
I was given an assignment to achieve the following in a single shell command.
Input:
double double toil and trouble
fire burn and cauldron bubble bubble
tomorrow and tomorrow and tomorrow
creeps in this this petty pace from day toto day
to the last syllable of recorded time time
Expected response:
double toil and trouble
fire burn and cauldron bubble
tomorrow and tomorrow and tomorrow
creeps in this petty pace from day toto day
to the last syllable of recorded time
I have used the following combination of commands by doing the trial and error method to achieve the same.
sed -z 's/\n/ , /g' input.txt | tr " " "\n" | uniq | tr "\n" " " | sed -z 's/ , /\n/g'
How does it work?
Step-1:
The following command replaces the new line with “ , “ this acts as a placeholder for new lines.
sed -z 's/\n/ , /g' duplicate.txt
Expected Response for the above command:
double double toil and trouble , fire burn and cauldron bubble bubble , tomorrow tomorrow and tomorrow and tomorrow , creeps in this this petty pace from day toto day , to the last syllable of recorded time time ,
Step-2:
Then the output of the step-1 command is sent as input to the following command which replaces “ “ with a new line to make all words in new lines.
sed -z 's/\n/ , /g' duplicate.txt | tr " " "\n"
Expected Response for the above command:
double
double
toil
and
trouble
,
fire
<----------->
day
,
to
the
last
syllable
of
recorded
time
time
,
Step-3:
Then the output of the step-2 command is sent as input to the following command which then “uniq” commands removes the CONSECUTIVE duplicate words in the file.
sed -z 's/\n/ , /g' duplicate.txt | tr " " "\n" | uniq
Expected Response for the above command:
double
toil
and
trouble
,
fire
burn
and
cauldron
bubble
,
tomorrow
<---------->
syllable
of
recorded
time
,
Step-4:
Then the output of the step-3 command is sent as input to the following command which replaces the new line with “ “ to make all words into one new
sed -z 's/\n/ , /g' duplicate.txt | tr " " "\n" | uniq | tr "\n" " "
Expected Response for the above command:
double toil and trouble , fire burn and cauldron bubble , tomorrow and tomorrow and tomorrow , creeps in this petty pace from day toto day , to the last syllable of recorded time ,
Step-5:
Then the output of the step-4 command is sent as input to the following command which replaces “ , “ with a new line to make the one line into smaller lines as the input.
sed -z 's/\n/ , /g' duplicate.txt | tr " " "\n" | uniq | tr "\n" " " | sed -z 's/ , /\n/g'
Expected Response for the above command:
double toil and trouble
fire burn and cauldron bubble
tomorrow and tomorrow and tomorrow
creeps in this petty pace from day toto day
to the last syllable of recorded time
I haven’t used awk in the above example but it’s also a powerful CLI utility to filter the outputs here is an example of using awk
to monitor CPU usage in real-time using the top
command:
top -d 1 | awk 'NR > 7 {print "PID: "$1", CPU%: "$9}'
Explanation:
top -d 1
: Thetop
the command displays real-time information about system processes. The-d 1
option refreshes the output every 1 second to provide real-time updates.awk 'NR > 7 {print "PID: "$1", CPU%: "$9}'
:awk
processes the output oftop
. TheNR > 7
condition skips the first 7 header lines oftop
output, as they contain non-process-related information. For each subsequent line (representing a process), it prints the process ID (PID) and the CPU usage percentage (9th field).
Expected output:
PID: 1234, CPU%: 12.3
PID: 5678, CPU%: 9.8
PID: 9101, CPU%: 5.6
...
More shell scripts are available here:
https://github.com/infinite8loop/shell
That’s it guys, Thanks for reading, and feel free to hit me up for any AWS/DevOps-related discussions — LinkedIn.
Happy scripting!!