Thursday, February 22, 2024

LIS 4317 Visual Analytics - Module 7 Assignment

For this week's assignment, we are tasked with creating visual analytics based on distribution analysis. I will be working with the mtcars dataset to understand the distribution of horsepower (hp).

A quick note, I generated a couple visuals of the horsepower (hp) distribution:

Scatter plot:

Boxplot:

Line Graph:

Histogram:

Reflecting on Few's recommendations in testing and best practices when it comes to conducting distribution analysis, each of my graphs have strengths and weaknesses. To begin, Few notes that there are three main characteristics when it comes to describing distributions. These are...

Spread: A simple measure of dispersion, or how spread out the values are and it essentially is the full range of values from highest to lowest.

Center: An estimate of the middle of a set of values and it is often demonstrated by either the mean or median.

Shape: Where the values are located throughout the spread. 

For the most part, my visuals do a good job of showing spread except for maybe the boxplot as it simplifies the values that are shown on the y-axis tick marks but the full spread is still albeit it is slightly downplayed. As for center, visuals 1, 3, and 4 provides horsepower's mean and median and where it lies in correspondence to the chart. The second visual, the boxplot, only provides the median. Moving on to shape, one can note that visuals 1, 3, and 4 appear slightly skewed to the right. In the histogram, one can also see a brief gap near the 300 tick mark and an outlier when hp equals 325. 

As for whether these visuals correspond to Few's distribution analysis best practices, I believe my visuals do a fairly good job when it comes to interval consistency but fails when it comes to outlier resistance. As one can tell from the visuals, there is a clear outlier where hp equals 325. The mean calculation can be heavily affected by outliers and as a result, can be shifted in the direction of that outlier and we can clearly see that happening here. Therefore, it might be a good idea to remove that outlier from the dataset before conducting visual analysis. 

All in all, Few's recommendations are incredibly helpful when it comes to deciphering data when it is visualized.

~ Katie

Wednesday, February 21, 2024

LIS 4370 R Programming - Module 7 Assignment

For this week's assignment, I will start out by examining the iris dataset and then transition to my own dataset when it comes to creating two examples of S3 and S4.

Question 1: Determine if a generic function can be applied to your dataset

To begin, I used the following functions on the iris dataset and came up with the following output:

Based on this output, I can confirm that a generic function can be applied to my chosen dataset.

Question 2: How do you tell what OO system (S3 vs. S4) an object is associated with?

In the library, pryr, one can use the function otype() to determine which OO system object is associated with.

In this case, the iris dataset is associated with S3.

Question 3: How do you determine what the base type of an object is?

Using the typeof() function can help in determining an object's base type. Continuing with the iris dataset, we can check the object type of each of the variables within the dataset:

Question 4: What is an generic function?

A generic function can be defined as a function that performs a common task like printing (print()) or even plotting (plot()). Furthermore, they can be thought as extended function objects because they contain information that is used in creating and dispatching for the function.

Question 5: What are the main differences between S3 and S4?

To put it simply, S3 is considered more convenient while S4 is more safe. Additionally, S3 classes are very straightforward to implement as it only uses the first argument to dispatch but it can allow for mistakes to slip through like misspelled values and missing values and will not alert the programmer of the potential issues. On the other hand, S4 classes and methods are way more formal and more closely related to object-oriented concepts and unlike S3, S4 will complain about such misspellings and other issues to alert the programmer that the current code does need to be fixed.

Question 6: Create two examples of S3 and S4. (Code will be linked to GitHub)

S3 Code:

Output:

S4: Code:

Output:

After conducting this brief code experiment, I must admit that I greatly prefer the form of S4 over S3 just for its ease of creating instances of the class. 

Link to GitHub Code: Module 7 Code

~ Katie

LIS 4370 R Programming - sentimentTextAnalyzer2 Final Project

For this class's major final project, I set out to make the process of analyzing textual files and URL links for sentiment insights much...