Thursday, February 22, 2024

LIS 4317 Visual Analytics - Module 7 Assignment

For this week's assignment, we are tasked with creating visual analytics based on distribution analysis. I will be working with the mtcars dataset to understand the distribution of horsepower (hp).

A quick note, I generated a couple visuals of the horsepower (hp) distribution:

Scatter plot:

Boxplot:

Line Graph:

Histogram:

Reflecting on Few's recommendations in testing and best practices when it comes to conducting distribution analysis, each of my graphs have strengths and weaknesses. To begin, Few notes that there are three main characteristics when it comes to describing distributions. These are...

Spread: A simple measure of dispersion, or how spread out the values are and it essentially is the full range of values from highest to lowest.

Center: An estimate of the middle of a set of values and it is often demonstrated by either the mean or median.

Shape: Where the values are located throughout the spread. 

For the most part, my visuals do a good job of showing spread except for maybe the boxplot as it simplifies the values that are shown on the y-axis tick marks but the full spread is still albeit it is slightly downplayed. As for center, visuals 1, 3, and 4 provides horsepower's mean and median and where it lies in correspondence to the chart. The second visual, the boxplot, only provides the median. Moving on to shape, one can note that visuals 1, 3, and 4 appear slightly skewed to the right. In the histogram, one can also see a brief gap near the 300 tick mark and an outlier when hp equals 325. 

As for whether these visuals correspond to Few's distribution analysis best practices, I believe my visuals do a fairly good job when it comes to interval consistency but fails when it comes to outlier resistance. As one can tell from the visuals, there is a clear outlier where hp equals 325. The mean calculation can be heavily affected by outliers and as a result, can be shifted in the direction of that outlier and we can clearly see that happening here. Therefore, it might be a good idea to remove that outlier from the dataset before conducting visual analysis. 

All in all, Few's recommendations are incredibly helpful when it comes to deciphering data when it is visualized.

~ Katie

No comments:

Post a Comment

LIS 4370 R Programming - sentimentTextAnalyzer2 Final Project

For this class's major final project, I set out to make the process of analyzing textual files and URL links for sentiment insights much...