Linux Tutorial — Part 4: Text Processing Tools

By Suraj Ahir November 27, 2025 6 min read

Linux — Text Processing
Linux — Text Processing
← Part 3 Linux Tutorial · Part 4 of 12 Part 5 →

Linux systems generate enormous amounts of text data — logs, configuration files, code, data exports. The ability to search, filter, transform, and extract information from text files using command-line tools is a superpower. These tools are used daily by system administrators, DevOps engineers, and developers.

grep — Search for Patterns

grep searches files for lines matching a pattern:

grep Examples
grep "error" app.log              # Lines containing "error"
grep -i "error" app.log           # Case-insensitive search
grep -n "error" app.log           # Show line numbers
grep -r "config" /etc/            # Recursive search in directory
grep -v "DEBUG" app.log           # Lines NOT containing DEBUG
grep -c "error" app.log           # Count matching lines
grep -l "error" /var/log/*.log    # List files with matches
grep "error\|warning" app.log     # Multiple patterns (OR)
grep -A 3 "ERROR" app.log         # 3 lines after match
grep -B 2 "ERROR" app.log         # 2 lines before match

find — Locate Files

find Examples
find / -name "nginx.conf"               # Find by name
find /home -name "*.py"                 # Find all Python files
find /var/log -name "*.log" -size +10M  # Log files over 10MB
find . -mtime -7                        # Modified in last 7 days
find /tmp -type f -mtime +30 -delete    # Delete old temp files
find . -name "*.py" -exec grep -l "import os" {} \;

sed — Stream Editor

sed modifies text in files or streams — essential for automation:

sed Examples
sed 's/old/new/' file.txt              # Replace first occurrence per line
sed 's/old/new/g' file.txt             # Replace all occurrences
sed -i 's/old/new/g' file.txt          # Replace in-place (modify file)
sed -n '10,20p' file.txt               # Print lines 10 to 20
sed '/^#/d' config.txt                 # Delete comment lines
sed 's/  */ /g' file.txt               # Remove extra spaces

awk — Data Processing

awk is a full programming language for processing structured text — especially CSV-like data:

awk Examples
awk '{print $1}' file.txt             # Print first column
awk '{print $1, $3}' file.txt         # Print columns 1 and 3
awk -F: '{print $1}' /etc/passwd      # Use : as delimiter
awk '{sum += $2} END {print sum}' data.txt  # Sum column 2
awk 'NR > 1' file.txt                 # Skip header line
awk '/ERROR/ {print NR": "$0}' app.log  # Print error lines with numbers

cut — Extract Columns

cut Examples
cut -d: -f1 /etc/passwd               # Extract first field
cut -d, -f2,4 data.csv                # Extract columns 2 and 4
cut -c1-50 file.txt                   # First 50 characters

sort and uniq — Organizing Data

sort and uniq
sort names.txt                        # Alphabetical sort
sort -n numbers.txt                   # Numerical sort
sort -r file.txt                      # Reverse sort
sort -k2 data.txt                     # Sort by column 2
sort file.txt | uniq                  # Remove duplicates
sort file.txt | uniq -c               # Count occurrences
sort access.log | uniq -c | sort -rn | head -10  # Top 10 IPs

Pipes — Combining Commands

The pipe | sends output of one command as input to another. This is where Linux command-line power really shows:

Pipe Examples
cat app.log | grep ERROR | sort | uniq -c | sort -rn
# Read log → filter errors → sort → count → sort by count

ps aux | grep nginx | grep -v grep
# List processes → find nginx → remove grep itself

find . -name "*.py" | xargs grep "import requests"
# Find Python files containing "import requests"

In Part 5, we move to process management — how to start, stop, monitor, and control running programs in Linux.

awk — The Swiss Army Knife of Text Processing

While grep searches and sed transforms, awk is a complete programming language for processing structured text. It is particularly powerful for tabular data. awk processes files line by line, splitting each line into fields by a delimiter (space by default). $1 refers to the first field, $2 to the second, and so on:

awk Examples
# Print second column of each line
awk '{print $2}' file.txt

# Print lines where third column is greater than 100
awk '$3 > 100 {print $0}' data.txt

# Calculate sum of a column
awk '{sum += $1} END {print "Total:", sum}' numbers.txt

# Use comma as delimiter (for CSV-like files)
awk -F',' '{print $1, $3}' data.csv

# Print with formatting
awk '{printf "Name: %-20s Score: %d
", $1, $2}' grades.txt

awk may look intimidating at first but is one of the most powerful tools in a Linux administrator's toolkit. Learning even the basics of awk dramatically expands your ability to process and analyze log files, system output, and data files from the command line.

Combining Tools with Pipelines

The real power of Linux text processing tools is combining them with pipes (|) to build processing pipelines. For example, to find the top five IP addresses in a web server access log: awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -5. This pipeline extracts the first field (IP address), sorts all IPs, counts unique occurrences, sorts by count in reverse, and shows the top five. Understanding how to compose these pipelines is a core DevOps and system administration skill.

Practice Exercise

Download a sample log file or generate one using: for i in {1..100}; do echo "$(date) INFO Process $((RANDOM % 10)) completed task $i"; done > sample.log. Then use grep, cut, awk, and sort in combination to: extract all unique process numbers, count how many times each process number appears, and display the results sorted by frequency. This exercise builds practical text processing skill in a realistic scenario.

Linux in Your Daily Engineering Practice

Linux command-line proficiency is not something you learn once and then stop improving. It is a skill that deepens continuously as you encounter new tools, new use cases, and new problems to solve. The engineers who are most effective at the command line did not become that way by reading comprehensive guides — they became that way by spending years solving real problems at the terminal, gradually accumulating a toolkit of commands, aliases, scripts, and muscle memory. The best approach is to use Linux for real work as much as possible, to look up better ways to do things you already do, and to make note of efficient patterns you observe in others' work. Over time, the terminal becomes faster than any GUI for the tasks you do repeatedly.

Disclaimer: This content is for educational purposes only. SRJahir Tech does not guarantee any specific outcome or job placement. Learning requires consistent practice.