
​
SQL Automated Data Cleaning Project
​​​​
Link to database: Click here​
​
Objective
Designed and executed a procedure to clean and standardize the US Household Income dataset, ensuring accuracy and consistency for further analysis.
​
Tools Used
MySQL
Process​
I created a new table called "us_household_income_cleaneded" to store cleaned and standardized data, incorporating an additional timestamp column to track changes.
The data cleaning process involved removing duplicate rows using the ROW_NUMBER() function to ensure unique records and addressing common data quality issues.
This included correcting typos in the State_Name column (e.g., changing "georia" to "Georgia"), standardizing text fields such as County, City, Place, and State_Name to uppercase for uniformity, and resolving inconsistencies in the Type column (e.g., standardizing "CPD" to "CDP").
To optimize these efforts, I created a stored procedure named "copy_and_clean_data" to automate the creation of the cleaned table, copy raw data, and execute all cleaning steps.
​
Results
I created a standardized and cleaned dataset in the "us_household_income_cleaned" table.
I ensured data consistency by removing duplicates and fixing inaccuracies.
Finally, I utomated the cleaning process, making it repeatable and efficient for future updates.
​