{"id":436,"date":"2023-06-08T02:30:56","date_gmt":"2023-06-07T21:30:56","guid":{"rendered":"https:\/\/immadshahid.com\/?p=436"},"modified":"2023-06-08T02:40:51","modified_gmt":"2023-06-07T21:40:51","slug":"data-science-lecture-2-data-visualization","status":"publish","type":"post","link":"https:\/\/immadshahid.com\/blog\/data-science-lecture-2-data-visualization\/","title":{"rendered":"Data Science Lecture 2: Data Visualization"},"content":{"rendered":"\n<p>The study of how to visually portray data is known as data visualisation. It effectively communicates findings from data by visually displaying the data.<\/p>\n\n\n\n<p>We may obtain a visual overview of our data via data visualisation. The human mind processes and comprehends any given data more easily when it is presented with images, maps, and graphs. Both small and big data sets benefit from data visualisation, but enormous data sets are where it really shines because it is difficult to manually view, let alone analyse, and comprehend, all of our data.<\/p>\n\n\n\n<p>Python has a number of charting libraries, including Matplotlib, Seaborn, and many additional data visualisation tools with a variety of capabilities for building educational, unique, and visually appealing charts to show data in the simplest and most powerful manner.<\/p>\n\n\n\n<p>The first step in this is to import the modules, and then after that start working on it.<\/p>\n\n\n\n<p>Importing the modules:<\/p>\n\n\n\n<p><code>import matplotlib.pyplot as plt<\/code>  #For Matplotlib<\/p>\n\n\n\n<p><code>import seaborn as sns<\/code> #For Seaborn<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Matplotlib vs Seaborn<\/h2>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Matplotlib<\/strong><\/td><td><strong>Seaborn<\/strong><\/td><\/tr><tr><td>used for basic graph like line graphs, bar charts, etc<\/td><td>used to visualise statistics and is capable of completing more sophisticated visualisations with fewer instructions.<\/td><\/tr><tr><td>mainly used by datasets and arrays<\/td><td>works in entire datasets<\/td><\/tr><tr><td>acts productively with datasets and arrays, and considers aces and figures as objects<\/td><td>more organized and functional, treats the entire dataset as a solitary unit<\/td><\/tr><tr><td>For exploratory data analysis, Matplotlib is more flexible and works well with Pandas and Numpy.<\/td><td>more inbuilt themes and mainly used for statistical analysis<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\"><em>Table: Matplotlib vs Seaborn<\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Line Chart in Matplotlib<\/h2>\n\n\n\n<p>An informational graph called a line chart shows data as a collection of dots connected by straight lines. Each marker or data point in a line chart is drawn and linked by a line or curve.<\/p>\n\n\n\n<p>The first step to start is by importing the modules, as told above. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-1.png\" alt=\"\" class=\"wp-image-437\" width=\"446\" height=\"394\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-1.png 861w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-1-300x265.png 300w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-1-768x678.png 768w\" sizes=\"(max-width: 446px) 100vw, 446px\" \/><\/figure>\n<\/div>\n\n\n<p><\/p>\n\n\n\n<p>For better understanding, we can add an x-axis too, in this case for years.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-2.png\" alt=\"\" class=\"wp-image-438\" width=\"444\" height=\"424\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-2.png 733w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-2-300x286.png 300w\" sizes=\"(max-width: 444px) 100vw, 444px\" \/><\/figure>\n<\/div>\n\n\n<p>Now we&#8217;ll move forward to name the axis.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-3.png\" alt=\"\" class=\"wp-image-439\" width=\"426\" height=\"343\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-3.png 802w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-3-300x242.png 300w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-3-768x619.png 768w\" sizes=\"(max-width: 426px) 100vw, 426px\" \/><\/figure>\n<\/div>\n\n\n<p>We can also plot multiple datasets on a single axis.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-4.png\" alt=\"\" class=\"wp-image-440\" width=\"426\" height=\"395\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-4.png 816w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-4-300x278.png 300w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-4-768x712.png 768w\" sizes=\"(max-width: 426px) 100vw, 426px\" \/><\/figure>\n<\/div>\n\n\n<p>With the help of the marker parameter, we can use markers to display each data point on our graph. Matplotlib offers a wide variety of marker shapes, including a circle, cross, square, diamond, etc. We can also show the legend, to make the graph more clear. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-5.png\" alt=\"\" class=\"wp-image-441\" width=\"436\" height=\"422\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-5.png 726w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-5-300x290.png 300w\" sizes=\"(max-width: 436px) 100vw, 436px\" \/><\/figure>\n<\/div>\n\n\n<p>We can also change the size of the figure, or more precisely the size of the graph, by defining the range of the figure size.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Plotting Barplot with Seaborn<\/h2>\n\n\n\n<p>We&#8217;ll use the dataset of tips for working out with the Seaborn library.<\/p>\n\n\n\n<p>The dataset consists of:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Information of sex (gender)<\/li>\n\n\n\n<li>Time of the day<\/li>\n\n\n\n<li>Total Bill<\/li>\n\n\n\n<li>The tips given by customers<\/li>\n<\/ul>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-6.png\" alt=\"\" class=\"wp-image-443\" width=\"255\" height=\"234\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-6.png 595w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-6-300x276.png 300w\" sizes=\"(max-width: 255px) 100vw, 255px\" \/><\/figure>\n<\/div>\n\n\n<p>To see how the average bill amount changes over the course of the week, we may create a bar graph. The day-wise averages may be calculated, and plt.bar can then be used to do this. A barplot function that can compute averages automatically is also available in the Seaborn library.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-7.png\" alt=\"\" class=\"wp-image-444\" width=\"356\" height=\"299\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-7.png 693w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-7-300x252.png 300w\" sizes=\"(max-width: 356px) 100vw, 356px\" \/><\/figure>\n<\/div>\n\n\n<p>If we want to compare another element side by side, we can use the command of hue, which will work in a way that the comparison will be done based on that third element.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-8.png\" alt=\"\" class=\"wp-image-445\" width=\"380\" height=\"315\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-8.png 688w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-8-300x249.png 300w\" sizes=\"(max-width: 380px) 100vw, 380px\" \/><\/figure>\n<\/div>\n\n\n<p>If you want to make a horizontal barplot, simply switch the x and y axis, and you&#8217;ll get a barplot, made horizontal. Try it yourself. \ud83d\ude42<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Histograms with Seaborn<\/h2>\n\n\n\n<p>For this, we will use another dataset, which we loaded using the iris data, which contains the information about the flower sepal width. Moving on, making Histograms is also a way to visualize data as it is used to plot data over a range, and uses a bar representation to depict data belonging to a particular range.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-9.png\" alt=\"\" class=\"wp-image-446\" width=\"235\" height=\"161\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-9.png 475w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-9-300x205.png 300w\" sizes=\"(max-width: 235px) 100vw, 235px\" \/><figcaption class=\"wp-element-caption\"><em>The Iris data of Sepal Width<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<p>The Histogram for this data is:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-10.png\" alt=\"\" class=\"wp-image-447\" width=\"417\" height=\"332\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-10.png 721w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-10-300x239.png 300w\" sizes=\"(max-width: 417px) 100vw, 417px\" \/><figcaption class=\"wp-element-caption\"><em>A histogram using seaborn<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<p>It is pertinent to note that, these histograms can also be plotted using the Matlplotlib library using the function <code>plt.hist(data)<\/code>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Scatterplot using Seaborn<\/h2>\n\n\n\n<p>Coming next to plotting scatterplots. When we need to plot two or more variables that are located at various locations, we utilise scatter plots. The data is not restricted to a range and is dispersed over the graph. A scatter plot displays two or more variables, each of which is represented by a distinct colour. Let&#8217;s draw a scatter plot using the &#8220;Iris&#8221; dataset.<\/p>\n\n\n\n<p>Using the same above complete dataset, instead of just sepal width.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-11.png\" alt=\"\" class=\"wp-image-448\" width=\"434\" height=\"347\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-11.png 723w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-11-300x239.png 300w\" sizes=\"(max-width: 434px) 100vw, 434px\" \/><figcaption class=\"wp-element-caption\"><em>A Scatterplot<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<p>We can further enhance it by using the hue feature and adding the species as a feature in it. Let&#8217;s see, what we get:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-12.png\" alt=\"\" class=\"wp-image-449\" width=\"467\" height=\"303\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-12.png 889w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-12-300x195.png 300w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-12-768x498.png 768w\" sizes=\"(max-width: 467px) 100vw, 467px\" \/><figcaption class=\"wp-element-caption\"><em>Adding an additional hue feature, species, here is what you get<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\">Heatmaps in Seaborn<\/h2>\n\n\n\n<p>Moving further, we have another type of data visualization technique called Heatmaps. Heatmaps are used to observe alterations in behaviour or slow data changes. Different colours are used to symbolise various values. Tells us how the occurrence fluctuates based on how these colours vary in hue, intensity, etc. Let&#8217;s utilise the flight dataset in Seaborn to visualise the monthly passenger footfall at an airport over a 12-year period using heatmaps.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-13-1024x523.png\" alt=\"\" class=\"wp-image-450\" width=\"507\" height=\"259\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-13-1024x523.png 1024w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-13-300x153.png 300w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-13-768x392.png 768w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-13.png 1227w\" sizes=\"(max-width: 507px) 100vw, 507px\" \/><figcaption class=\"wp-element-caption\"><em>Dataset of Flights<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<p>Here you go, with a beautiful heatmap of the dataset:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/immadshahid.com\/wp-content\/uploads\/2023\/06\/image-14.png\" alt=\"\" class=\"wp-image-451\" width=\"415\" height=\"337\" srcset=\"https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-14.png 780w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-14-300x244.png 300w, https:\/\/immadshahid.com\/blog\/wp-content\/uploads\/2023\/06\/image-14-768x624.png 768w\" sizes=\"(max-width: 415px) 100vw, 415px\" \/><figcaption class=\"wp-element-caption\"><em>A Heatmap of the dataset of Flights<\/em><\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\">Wind-up<\/h2>\n\n\n\n<p>In this article, we went through different methods of data visualization along with examples. Data Visualization is a vast field. We used some examples, just to give you an insight into how it works. For, this you should also explore many more resources, read the documentation of the libraries\/modules of Seaborn and Matplotlib, which will give you a broader sense of understanding, and then try it yourself to get a hands-on experience on it. The pandas&#8217; libraries were discussed in the previous article, along with a YouTube video, in which you must go through <a href=\"https:\/\/immadshahid.com\/data-science-lecture-1-reading-a-csv-dataset-using-pandas\/\" data-type=\"URL\" data-id=\"https:\/\/immadshahid.com\/data-science-lecture-1-reading-a-csv-dataset-using-pandas\/\">Reading a csv Dataset using Pandas<\/a> and also access the <a href=\"https:\/\/immadshahid.com\/pandas-cheat-sheet\/\" data-type=\"URL\" data-id=\"https:\/\/immadshahid.com\/pandas-cheat-sheet\/\">Pandas Cheatsheet<\/a>.<\/p>\n\n\n\n<p>I will soon share a cheat sheet of Seaborn and matplotlib, which will be very much useful. We&#8217;ll further dive into this and take our step into Machine Learning in the upcoming lessons, a big part of Data Science. <\/p>\n\n\n\n<p>I hope you liked the article and learned from it, please give your feedback in the comments below, and if you have any questions, please feel free to ask me in the comments section, or through my e-mail: immadshahid@gmail.com <\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The study of how to visually portray data is known as data visualisation. It effectively communicates findings from data by visually displaying the data. We may obtain a visual overview of our data via data visualisation. The human mind processes and comprehends any given data more easily when it is presented with images, maps, and graphs. Both small and big [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":453,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[130,124,128,138,92,54,90,139,140,141,21,42,121,136,25,89,129,125,127,137,122,23,123],"tags":[133,134,131,135],"class_list":["post-436","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cheat-sheet","category-data-analytics","category-data-science","category-data-visualization","category-earning-and-learning","category-education","category-freelancing","category-graphs","category-heatmaps","category-histograms","category-internet","category-language","category-machine-learning","category-matplotlib","category-media","category-online-earning","category-pandas","category-predictive-analysis","category-python","category-seaborn","category-sklearn","category-social-media","category-training-a-dataset","tag-data-visualization","tag-matplotlib","tag-pandas","tag-seaborn"],"_links":{"self":[{"href":"https:\/\/immadshahid.com\/blog\/wp-json\/wp\/v2\/posts\/436","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/immadshahid.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/immadshahid.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/immadshahid.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/immadshahid.com\/blog\/wp-json\/wp\/v2\/comments?post=436"}],"version-history":[{"count":5,"href":"https:\/\/immadshahid.com\/blog\/wp-json\/wp\/v2\/posts\/436\/revisions"}],"predecessor-version":[{"id":457,"href":"https:\/\/immadshahid.com\/blog\/wp-json\/wp\/v2\/posts\/436\/revisions\/457"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/immadshahid.com\/blog\/wp-json\/wp\/v2\/media\/453"}],"wp:attachment":[{"href":"https:\/\/immadshahid.com\/blog\/wp-json\/wp\/v2\/media?parent=436"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/immadshahid.com\/blog\/wp-json\/wp\/v2\/categories?post=436"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/immadshahid.com\/blog\/wp-json\/wp\/v2\/tags?post=436"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}