SQL Query to Find Duplicate Records (With Example & Output) – Easy Guide
🔍 Introduction
SQL query to find duplicate records is a very important concept for both beginners and experienced developers. Duplicate data can cause incorrect results, data inconsistency, and performance issues in databases.
In this guide, you will learn how to use sql query to find duplicate records with simple examples, different methods, and real-world scenarios.
In this article, you will learn:
- ✅ How to find duplicate records in SQL
- ✅ Real examples with output
- ✅ Multiple query methods
- ✅ Interview tips
📌 What Are Duplicate Records?
Duplicate records are rows where one or more columns have the same values.
👉 Example:
| Id | Name | |
|---|---|---|
| 1 | John | john@gmail.com |
| 2 | Amit | amit@gmail.com |
| 3 | John | john@gmail.com |
➡️ Here, John + john@gmail.com is duplicated
🧪 Sample Table
CREATE TABLE Employees (
Id INT,
Name VARCHAR(50),
Email VARCHAR(100)
);
INSERT INTO Employees VALUES
(1, 'John', 'john@gmail.com'),
(2, 'Amit', 'amit@gmail.com'),
(3, 'John', 'john@gmail.com'),
(4, 'Sara', 'sara@gmail.com'),
(5, 'Amit', 'amit@gmail.com');
✅ Method 1: Using GROUP BY
SELECT Name, Email, COUNT(*) AS DuplicateCount
FROM Employees
GROUP BY Name, Email
HAVING COUNT(*) > 1;
FROM Employees
GROUP BY Name, Email
HAVING COUNT(*) > 1;
📊 Output:
| Name | DuplicateCount | |
|---|---|---|
| John | john@gmail.com | 2 |
| Amit | amit@gmail.com | 2 |
👉 This query finds duplicate combinations of columns
✅ Method 2: Using ROW_NUMBER()
SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY Name, Email ORDER BY Id) AS RowNum FROM Employees) t
WHERE RowNum > 1;
WHERE RowNum > 1;
✅ Method 3: Using EXISTS
SELECT e1.* FROM Employees e1
WHERE EXISTS ( SELECT 1 FROM Employees e2
WHERE e1.Name = e2.Name
AND e1.Email = e2.Email
AND e1.Id <> e2.Id );
WHERE EXISTS ( SELECT 1 FROM Employees e2
WHERE e1.Name = e2.Name
AND e1.Email = e2.Email
AND e1.Id <> e2.Id );
✅ Method 4: Using JOIN
SELECT e1.* FROM Employees e1
JOIN Employees e2
ON e1.Name = e2.Name
AND e1.Email = e2.Email AND e1.Id <> e2.Id;
JOIN Employees e2
ON e1.Name = e2.Name
AND e1.Email = e2.Email AND e1.Id <> e2.Id;
🚀 How to Delete Duplicate Records
DELETE FROM Employees
WHERE Id NOT IN ( SELECT MIN(Id) FROM Employees
GROUP BY Name, Email );
WHERE Id NOT IN ( SELECT MIN(Id) FROM Employees
GROUP BY Name, Email );
⚠️ Important Tips
- Always backup data before delete
- Use ROW_NUMBER() for safer deletion
- Identify correct columns
🎯 Interview Questions
- How do you find duplicate records in SQL?
- Difference between GROUP BY and ROW_NUMBER()?
- What is PARTITION BY in SQL?
📌 Conclusion
SQL query to find duplicate records is useful for maintaining clean and accurate data.
❓ Frequently Asked Questions (FAQ) :
1. What is a SQL query to find duplicate records?
A SQL query to find duplicate records is used to identify rows that have the same values in one or more columns. It is commonly done using GROUP BY with HAVING COUNT(*) > 1.
2. How do I find duplicate rows in SQL?
You can find duplicate rows in SQL by using GROUP BY and COUNT(*) or by using ROW_NUMBER() with PARTITION BY to identify repeated records.
3. How can I remove duplicate records in SQL?
You can remove duplicate records in SQL using DELETE with GROUP BY or by using ROW_NUMBER() and deleting rows where the row number is greater than 1.
4. What is the difference between GROUP BY and ROW_NUMBER()?
GROUP BY is used to identify duplicate values, while ROW_NUMBER() helps to return actual duplicate rows and is useful when deleting duplicates safely.
5. Why is it important to find duplicate records in SQL?
Finding duplicate records is important to maintain data accuracy, avoid incorrect reports, and improve database performance.
6. Can we find duplicates in a single column in SQL?
Yes, you can find duplicates in a single column by using GROUP BY on that column and applying HAVING COUNT(*) > 1.
7. Which method is best to find duplicate records in SQL?
The GROUP BY method is the most commonly used, but ROW_NUMBER() is better when you need to identify and remove specific duplicate rows.