Applications of Data Visualisation in Data Science
The use of data visualisation in data science can be broadly categorised under three main application:
- Data checking and cleaning
- Exploration and discovery
- Presentation and communication of results
Data Checking and Cleaning
It is good practice when you first get a new data set both in terms of a sanity check but also to gain a deeper understanding of your data to first create a few preliminary plots. This does not have to be extensive and could be as simple as a histogram or scatterplot of the individual features. From these initial plots you would be able to validate that there are no obvious errors and to get a feel for the distribution of values.
Exploration and discovery
The preliminary plots created for data checking and cleaning lays the foundation for a more intensive exploration and discovery phase to understand your data. Hillary Mason, one of the world’s leading data scientists, says that when she gets a new data set, she starts by making a dozen or more scatter plots, trying to get a sense of what might be interesting. Visualisation reveals possible connections and patterns that can then be confirmed (or refuted) using other kinds of analysis.
Presentation and communication of results
As a data scientists your communication skills is just as important if not more so than the technical skills your bring. Because if you can not communicate your insights/propositions to technical and non-technical stakeholders then did you really do any analysis? So the presentation of delivery of the results from your analysis is crucial and this is where data visualisation fits comfortably. A well designed visualisation will serve two main pruposes:
- to help you and other modellers/analysts understand the results
- communicate the results to other stakeholders in an intuitive and easily digestable manner