How to Clean Messy Data in Excel
Every data analyst knows the pain of receiving a spreadsheet that looks like it was assembled during an earthquake. Names in three different formats, dates that Excel doesn't recognize, random spaces everywhere. Sound familiar?
Data cleaning isn't glamorous, but it's essential. Bad data leads to bad analysis. Here's your comprehensive guide to transforming chaos into clarity.
The Data Cleaning Checklist
Before diving in, scan your data for these common issues:
- Duplicate rows
- Extra spaces (leading, trailing, multiple)
- Inconsistent capitalization
- Text stored as numbers (or vice versa)
- Dates not recognized as dates
- Empty cells where there shouldn't be
- Invalid entries and typos
- Merged cells (the enemy of analysis)
Removing Duplicates
Quick Method: Remove Duplicates Tool
- Select your data range (including headers)
- Go to Data > Remove Duplicates
- Check the columns that define "duplicate"
- Click OK
Excel shows how many duplicates were removed and how many unique values remain.
Formula Method: Identify Before Deleting
Sometimes you want to see duplicates before removing them. Use COUNTIF:
=COUNTIF($A$2:$A$1000, A2) > 1
This returns TRUE for duplicate values. Filter to show only TRUE, then decide what to delete.
Fixing Text Issues
Remove Extra Spaces: TRIM
The TRIM function removes:
- Leading spaces
- Trailing spaces
- Multiple spaces between words (leaving single spaces)
=TRIM(A2)
Standardize Capitalization
=UPPER(A2) → JOHN SMITH
=LOWER(A2) → john smith
=PROPER(A2) → John Smith
Remove Non-Printable Characters: CLEAN
=CLEAN(A2)
Removes ASCII characters 0-31 (non-printable characters that sometimes sneak in from imports).
The Ultimate Cleanup Combo
Combine TRIM and CLEAN:
=TRIM(CLEAN(A2))
Extracting and Splitting Data
Text to Columns
When names are "LastName, FirstName" and you need separate columns:
- Select the column
- Go to Data > Text to Columns
- Choose Delimited
- Select comma as delimiter
- Finish
Formula Extraction
Extract first name from "FirstName LastName":
=LEFT(A2, FIND(" ", A2) - 1)
Extract last name:
=RIGHT(A2, LEN(A2) - FIND(" ", A2))
With TEXTSPLIT (Excel 365):
=TEXTSPLIT(A2, " ")
Fixing Date Problems
Dates Stored as Text
When dates look like dates but Excel treats them as text:
Method 1: Multiply by 1
=A2 * 1
Method 2: DATEVALUE
=DATEVALUE(A2)
Method 3: Text to Columns
- Select the date column
- Data > Text to Columns
- Next > Next
- Choose Date format (MDY, DMY, etc.)
- Finish
Inconsistent Date Formats
When you have "1/5/2024" and "January 5, 2024" mixed:
Create a helper column with:
=TEXT(DATEVALUE(A2), "YYYY-MM-DD")
Find and Replace Magic
Ctrl + H opens Find and Replace. It's more powerful than most realize.
Remove All Spaces
Find: (one space) Replace: (empty)
Standardize Abbreviations
Find: Street Replace: St
Find: Avenue Replace: Ave
Remove Line Breaks
Find: Ctrl + J (inserts line break character) Replace: (space or empty)
Use Wildcards
Enable "Use wildcards" option:
*matches any sequence of characters?matches any single character~escapes special characters
Find: Mr.*
Replace: (empty)
Removes "Mr." followed by anything.
Handling Blank Cells
Find and Select Blanks
- Select your data range
- Press F5 (or Ctrl+G) > Special
- Choose "Blanks"
- All blank cells are selected
Now you can:
- Delete rows: Right-click > Delete > Entire Row
- Fill with value: Type a value and press Ctrl + Enter
- Fill from above: Press Ctrl + D
Fill Blanks with Value Above
- Select data range
- Go to (F5) > Special > Blanks
- Type
=and click the cell above (or press Up Arrow) - Press Ctrl + Enter
Number Issues
Text to Numbers
When numbers have the green triangle warning:
- Select the range
- Click the warning icon
- Choose "Convert to Number"
Or use formula:
=VALUE(A2)
Remove Number Formatting Symbols
Clean currency, percentages, etc.:
=VALUE(SUBSTITUTE(SUBSTITUTE(A2, "$", ""), ",", ""))
Power Query: The Professional Choice
For complex or recurring cleaning tasks, Power Query is the answer.
Access Power Query
- Select your data
- Go to Data > From Table/Range
- Power Query Editor opens
Common Power Query Transformations
- Remove duplicates: Right-click column header
- Split column: Right-click > Split Column
- Replace values: Right-click > Replace Values
- Change type: Click column type icon
- Trim/Clean: Add Column > Format > Trim/Clean
Why Power Query?
- Repeatable: Steps are recorded and can be applied to new data
- Non-destructive: Original data remains unchanged
- Powerful: Handles transformations formulas can't easily do
- Connected: Can pull from multiple sources
Data Validation: Prevent Future Messes
After cleaning, prevent new errors:
Dropdown Lists
- Select target cells
- Data > Data Validation
- Allow: List
- Source: Your list of valid options
Number Ranges
- Data > Data Validation
- Allow: Whole number (or Decimal)
- Set minimum and maximum
Date Ranges
Restrict to valid date ranges to prevent typos.
Automation with Macros
If you clean similar data repeatedly, record a macro:
- Go to View > Macros > Record Macro
- Perform your cleaning steps
- Stop recording
- Run the macro on new data
Final Steps
After cleaning:
- Spot check: Manually review a sample
- Validate: Use COUNTBLANK, COUNTIF to verify
- Document: Note what you did (for next time)
- Backup: Keep original data in a separate sheet
Conclusion
Data cleaning is where real analysis begins. A well-cleaned dataset reveals insights that messy data hides. Master these techniques, and you'll spend less time fighting your data and more time finding answers in it.
Start with the basics: TRIM, Remove Duplicates, Find & Replace. Graduate to Power Query for complex transformations. Your future self (and anyone else who uses your spreadsheets) will thank you.