top of page
Automated Data Cleaning.png

​

SQL Automated Data Cleaning Project

​​​​

Link to code: Click here

Link to database: Click here​

​

Objective

Designed and executed a procedure to clean and standardize the US Household Income dataset, ensuring accuracy and consistency for further analysis.

​

Tools Used

MySQL

 

Process​

I created a new table called "us_household_income_cleaneded" to store cleaned and standardized data, incorporating an additional timestamp column to track changes.

The data cleaning process involved removing duplicate rows using the ROW_NUMBER() function to ensure unique records and addressing common data quality issues.

This included correcting typos in the State_Name column (e.g., changing "georia" to "Georgia"), standardizing text fields such as County, City, Place, and State_Name to uppercase for uniformity, and resolving inconsistencies in the Type column (e.g., standardizing "CPD" to "CDP").

To optimize these efforts, I created a stored procedure named "copy_and_clean_data"  to automate the creation of the cleaned table, copy raw data, and execute all cleaning steps.

​

Results

I created a standardized and cleaned dataset in the "us_household_income_cleaned" table.

I ensured data consistency by removing duplicates and fixing inaccuracies.

Finally, I utomated the cleaning process, making it repeatable and efficient for future updates.

​

 

Click here to see complete code on GitHub

Click here to see the database

bottom of page