How to get the specific column from the command output in Linux

The awk command is used in Linux to get the value of the specific column from a command output or a file
Example:
awk -F ' ' '{print $8}'
The above command will print the 8th column from the command output
Below is the typical Usecase:
I had a requirement to delete files older than 7 days as a purge operation, For that, I had to get the list of filenames that are older than 7 days
HDFS file listing:
hadoop fs -ls -R /user/spark/applicationHistory
-rwxrwx---+ 3 spark spark 192323026 2022-01-0606 13:52 /user/spark/applicationHistory/application_1592823123197_1123426 -rwxrwx---+ 3 spark spark 26722320 22-01-0606 13:52 /user/spark/applicationHistory/application_1592823123197_1123427 -rwxrwx---+ 3 spark spark 1533339 2022-01-0606 13:52 /user/spark/applicationHistory/application_1592823123197_1123428 -rwxrwx---+ 3 spark spark 34200498 2022-01-08 07:09 /user/spark/applicationHistory/application_1592823123197_1123429 -rwxrwx---+ 3 spark spark 2342382364 2022-07-08 08:54 /user/spark/applicationHistory/application_1592823123197_1123422
To get a specific filename column
Before starting, We need to make sure a few things
- Delimiter
- Column position
Delimiter
In this case, the delimiter is ” ” space, So we can use the space delimiter in the command to separate the column
Column position
Count the column position from Left -> Right, Starting from 1
In this case, the file name is in the 8th column
-rwxrwx---+ 3 spark spark 2342382364 2022-07-08 08:54 /user/spark/applicationHistory/application_1592823123197_1123422
Final command:
hadoop fs -ls /user/spark/applicationHistory | awk -F ' ' '{print $8}'
Output:
application_1592823123197_1123426 application_1592823123197_1123427 application_1592823123197_1123428 application_1592823123197_1123429 application_1592823123197_1123422
To Print Multiple columns:
We can tweak the same command to print multiple columns as an output. For example, we need the file name and the size of the file
The command can be tweaked below
hadoop fs -ls /user/spark/applicationHistory | awk -F ' ' '{print $5 " " $8}'
Output:
192323026 /user/spark/applicationHistory/application_1592823123197_1123426 267223 /user/spark/applicationHistory/application_1592823123197_1123427 1533339 /user/spark/applicationHistory/application_1592823123197_1123428 34200498 /user/spark/applicationHistory/application_1592823123197_1123429 2342382364 /user/spark/applicationHistory/application_1592823123197_1123422
For different delimiter:
Let’s say you are having different delimiters like “,” or “|” etc, We just have to change the command below
Example:
cat test.txt a|b|c|d c|d|f|g
cat test.txt | awk -F '|' '{print $3}' c f
Conclusion:
awk command in Linux Can be used to get a specific column from a command output or from a file, We just have to tweak the command based on the delimiter and the column position
Good Luck with your Learning !!