Cleaning up data is the first and also a very important step towards preparing data for visualization. It saves a great deal of time and effort for visualization when cleaning and formatting data is nicely done. If you have some data in speadsheet you would like to visualize, it is a great idea to go clean it up before moving on to visualize it.
This page introduces some tools to help prepare data format before visualization.
If you have your numeric / textual mixed data in spreadsheets, please make sure that you set different columns in the right format and then validate your data by format to make sure no mis-formatted data in your dataset.
1. Organize your data in consistent format (some basic format issues are listed below. All formats listed above could be found under Format Cells in Excel):
2. Validate your dataset by format and set up error message:
Note: please keep your data format consistent. Do not leave blank space. You can put in N/A or NONE for blank spaces.
More cleaning tricks are shown in the demo below:
A free open-source tool to help you identify inconsistent data field, suggest refining options and merge field together under your chosen display.
Go to OpenRefine
To learn basic data manipulation skills: OpenRefine Help Resources.
Regular expression for advanced data manipulation: OpenRefine Cheat Sheet.
A free web-based data cleaning tool. Users can copy paste your data into the website for cleaning. It has multiple functions and will provide a comprehensive list of editing functions including extract, merge, drop, delete, etc. The tool provides suggested editing options to manipulate the data.
Go to Data Wrangler
SAS is a powerful tool to manipulate data. Some of its data cleaning functions include:
Below is a video of cleaning data in SAS. It, however, doesn't show all the data cleaning techniques in SAS. Please contact your librarian if you have further questions about data cleaning in SAS.