The study of how to visually portray data is known as data visualisation. It effectively communicates findings from data by visually displaying the data.
We may obtain a visual overview of our data via data visualisation. The human mind processes and comprehends any given data more easily when it is presented with images, maps, and graphs. Both small and big data sets benefit from data visualisation, but enormous data sets are where it really shines because it is difficult to manually view, let alone analyse, and comprehend, all of our data.
Python has a number of charting libraries, including Matplotlib, Seaborn, and many additional data visualisation tools with a variety of capabilities for building educational, unique, and visually appealing charts to show data in the simplest and most powerful manner.
The first step in this is to import the modules, and then after that start working on it.
Importing the modules:
import matplotlib.pyplot as plt
#For Matplotlib
import seaborn as sns
#For Seaborn
Matplotlib vs Seaborn
Matplotlib | Seaborn |
used for basic graph like line graphs, bar charts, etc | used to visualise statistics and is capable of completing more sophisticated visualisations with fewer instructions. |
mainly used by datasets and arrays | works in entire datasets |
acts productively with datasets and arrays, and considers aces and figures as objects | more organized and functional, treats the entire dataset as a solitary unit |
For exploratory data analysis, Matplotlib is more flexible and works well with Pandas and Numpy. | more inbuilt themes and mainly used for statistical analysis |
Line Chart in Matplotlib
An informational graph called a line chart shows data as a collection of dots connected by straight lines. Each marker or data point in a line chart is drawn and linked by a line or curve.
The first step to start is by importing the modules, as told above.
For better understanding, we can add an x-axis too, in this case for years.
Now we’ll move forward to name the axis.
We can also plot multiple datasets on a single axis.
With the help of the marker parameter, we can use markers to display each data point on our graph. Matplotlib offers a wide variety of marker shapes, including a circle, cross, square, diamond, etc. We can also show the legend, to make the graph more clear.
We can also change the size of the figure, or more precisely the size of the graph, by defining the range of the figure size.
Plotting Barplot with Seaborn
We’ll use the dataset of tips for working out with the Seaborn library.
The dataset consists of:
- Information of sex (gender)
- Time of the day
- Total Bill
- The tips given by customers
To see how the average bill amount changes over the course of the week, we may create a bar graph. The day-wise averages may be calculated, and plt.bar can then be used to do this. A barplot function that can compute averages automatically is also available in the Seaborn library.
If we want to compare another element side by side, we can use the command of hue, which will work in a way that the comparison will be done based on that third element.
If you want to make a horizontal barplot, simply switch the x and y axis, and you’ll get a barplot, made horizontal. Try it yourself. 🙂
Histograms with Seaborn
For this, we will use another dataset, which we loaded using the iris data, which contains the information about the flower sepal width. Moving on, making Histograms is also a way to visualize data as it is used to plot data over a range, and uses a bar representation to depict data belonging to a particular range.
The Histogram for this data is:
It is pertinent to note that, these histograms can also be plotted using the Matlplotlib library using the function plt.hist(data)
.
Scatterplot using Seaborn
Coming next to plotting scatterplots. When we need to plot two or more variables that are located at various locations, we utilise scatter plots. The data is not restricted to a range and is dispersed over the graph. A scatter plot displays two or more variables, each of which is represented by a distinct colour. Let’s draw a scatter plot using the “Iris” dataset.
Using the same above complete dataset, instead of just sepal width.
We can further enhance it by using the hue feature and adding the species as a feature in it. Let’s see, what we get:
Heatmaps in Seaborn
Moving further, we have another type of data visualization technique called Heatmaps. Heatmaps are used to observe alterations in behaviour or slow data changes. Different colours are used to symbolise various values. Tells us how the occurrence fluctuates based on how these colours vary in hue, intensity, etc. Let’s utilise the flight dataset in Seaborn to visualise the monthly passenger footfall at an airport over a 12-year period using heatmaps.
Here you go, with a beautiful heatmap of the dataset:
Wind-up
In this article, we went through different methods of data visualization along with examples. Data Visualization is a vast field. We used some examples, just to give you an insight into how it works. For, this you should also explore many more resources, read the documentation of the libraries/modules of Seaborn and Matplotlib, which will give you a broader sense of understanding, and then try it yourself to get a hands-on experience on it. The pandas’ libraries were discussed in the previous article, along with a YouTube video, in which you must go through Reading a csv Dataset using Pandas and also access the Pandas Cheatsheet.
I will soon share a cheat sheet of Seaborn and matplotlib, which will be very much useful. We’ll further dive into this and take our step into Machine Learning in the upcoming lessons, a big part of Data Science.
I hope you liked the article and learned from it, please give your feedback in the comments below, and if you have any questions, please feel free to ask me in the comments section, or through my e-mail: immadshahid@gmail.com