🆔 Command: uniq (The Duplicate Handler)
The uniq command is used to filter out or report repeated lines in a file. It is the perfect partner to the sort command because uniq only detects duplicate lines if they are adjacent (one after the other).
1. The "Why"
For a developer and Linux power user, uniq is essential for data cleaning and statistics.
- Counting Occurrences: See how many times a specific error appears in your journalctl or Supabase logs.
- Cleaning Lists: If you are compiling a list of research topics , use
uniqto ensure you don't have the same entry twice. - Log Auditing: Find out which IP addresses are most active in your server logs.
- Code Maintenance: Identify redundant imports or duplicate lines in your Java source code.
2. How uniq Thinks
Because uniq only looks at the "next" line, you almost always pipe sort into it:
cat file.txt | sort | uniq
3. Essential Flags
| Flag | Purpose | Example |
|---|---|---|
-c |
Count | Prefixes each line with the number of times it occurred. |
-d |
Repeated Only | Only prints the lines that had duplicates (hides unique lines). |
-u |
Unique Only | Only prints lines that appeared exactly once. |
-i |
Ignore Case | Treats "Java" and "java" as the same word. |
4. Practical Examples for Your Workflow
A. Counting Errors in a Log
If you want to see which specific errors are most common in your application log:
grep "ERROR" app.log | sort | uniq -c | sort -nr
This sorts the errors, counts them, and then sorts them again numerically so the most frequent error is at the top.
B. Finding Duplicate Packages
If you have a text file where you keep track of Arch Linux packages to install and want to see if you accidentally listed any twice:
sort packages.txt | uniq -d
C. Identifying Unique Users
If you have a log of usernames accessing your documentation viewer and want to know how many different people logged in:
awk '{print $1}' access.log | sort | uniq | wc -l
D. Ignoring Specific Fields (-f)
If you have a log where the first word is a timestamp but you want to compare the rest of the line:
# Ignores the first field (the timestamp) when looking for duplicates
uniq -f 1 logs.txt
5. sort -u vs. sort | uniq
You might wonder why we don't just use sort -u.
sort -u: Simply removes duplicates and gives you the list.sort | uniq -c: Gives you the count of each item, which is much more useful for analysis and debugging.
6. Pro-Tips
- Visualizing Trends: You can use
uniq -cto see if a specific bug in your app is happening more often over time by piping timestamps into it. - Arch Linux Context: You can check which groups your user belongs to and ensure there's no redundancy (though the system handles this, it's a good learning exercise):
id -Gn | tr ' ' '\n' | sort | uniq. - Data Pipelines: In your tutorial writing workflow, if you're gathering data from multiple sources,
uniqis your final "sanity check" to ensure a clean dataset for your readers.
7. Summary Reference
| Goal | Command |
|---|---|
| Remove duplicates | `sort file.txt |
| Count occurrences | `sort file.txt |
| See only duplicates | `sort file.txt |
| See only unique lines | `sort file.txt |