How to get the specific column from the command output in Linux

How to get the specific column from the command output in Linux

The awk command is used in Linux to get the value of the specific column from a command output or a file

Example:

awk -F ' ' '{print $8}'

The above command will print the 8th column from the command output

Below is the typical Usecase:

I had a requirement to delete files older than 7 days as a purge operation, For that, I had to get the list of filenames that are older than 7 days

HDFS file listing:

hadoop fs -ls -R /user/spark/applicationHistory
-rwxrwx---+ 3 spark spark 192323026 2022-01-0606 13:52 /user/spark/applicationHistory/application_1592823123197_1123426
-rwxrwx---+ 3 spark spark 26722320 22-01-0606 13:52 /user/spark/applicationHistory/application_1592823123197_1123427
-rwxrwx---+ 3 spark spark 1533339 2022-01-0606 13:52 /user/spark/applicationHistory/application_1592823123197_1123428
-rwxrwx---+ 3 spark spark 34200498 2022-01-08 07:09 /user/spark/applicationHistory/application_1592823123197_1123429
-rwxrwx---+ 3 spark spark 2342382364 2022-07-08 08:54 /user/spark/applicationHistory/application_1592823123197_1123422

To get a specific filename column

Before starting, We need to make sure a few things

  • Delimiter
  • Column position

Delimiter

In this case, the delimiter is ” ” space, So we can use the space delimiter in the command to separate the column

Column position

Count the column position from Left -> Right, Starting from 1

In this case, the file name is in the 8th column

-rwxrwx---+ 3 spark spark 2342382364 2022-07-08 08:54 /user/spark/applicationHistory/application_1592823123197_1123422

Final command:

hadoop fs -ls  /user/spark/applicationHistory | awk -F ' ' '{print $8}'

Output:

application_1592823123197_1123426
application_1592823123197_1123427
application_1592823123197_1123428
application_1592823123197_1123429
application_1592823123197_1123422

To Print Multiple columns:

We can tweak the same command to print multiple columns as an output. For example, we need the file name and the size of the file

The command can be tweaked below

hadoop fs -ls  /user/spark/applicationHistory | awk -F ' ' '{print $5 " " $8}'

Output:

192323026 /user/spark/applicationHistory/application_1592823123197_1123426
267223 /user/spark/applicationHistory/application_1592823123197_1123427
1533339 /user/spark/applicationHistory/application_1592823123197_1123428
34200498 /user/spark/applicationHistory/application_1592823123197_1123429
2342382364 /user/spark/applicationHistory/application_1592823123197_1123422

For different delimiter:

Let’s say you are having different delimiters like “,” or “|” etc, We just have to change the command below

Example:

cat test.txt 
a|b|c|d
c|d|f|g
cat test.txt | awk -F '|' '{print $3}'       
c
f

Conclusion:

awk command in Linux Can be used to get a specific column from a command output or from a file, We just have to tweak the command based on the delimiter and the column position

Good Luck with your Learning !!

Similar Posts