Context : 

Having completed two comprehensive courses on SQL, It is time to apply their newly acquired knowledge in a real-world context. Through the use of SQL queries and data cleaning techniques, the project aims to extract actionable insights from a messy and disorganised dataset. 

Project overview and objective : 

The primary objective of this project is to utilise SQL statements and techniques to clean and prepare a raw dataset for further analysis or visualisation. 

By carefully selecting and executing appropriate SQL statements, the project aims to address inconsistencies, redundancies, and inaccuracies in the dataset and transform it into a more structured and organised format. 

While the project does not include further analysis or visualisations, the cleaned dataset can serve as a valuable resource for future investigations and inform data-driven decision-making in a variety of domains.

Tools used : 

PostgreSQL
Jupyter Notebook

Data source : 

link to data source

Tasks performed : 

Please this file for a detailed view 

  • Changing data types for columns
  • Dealing with missing data
  • Breaking out columns
  • Cleaning up inconsistencies
  • Remove duplicates
  • Drop columns

Conclusion : 

This dataset provides an ideal opportunity to apply newly acquired SQL skills and techniques to clean and prepare the data for future analysis. 

By leveraging SQL to address inconsistencies, redundancies, and inaccuracies,, the project aims to transform the data into a more organised and structured format that is suitable for further analysis.

Link to file