Tuesday, March 5, 2024

LIS 4370 R Programming - Module 9 Assignment

In this week in R Programming, we are asked to select a dataset from the Vincent Arel Bundock dataset list and create visualizations from that data.

Link to the list: Vincent Arel Bundock Datasets

I decided to work from the pizzaplace.csv dataset which contains sales, pizza type, and size data over an entire year. 

In the instructions, it is mentioned that there are three ways to make visualizations in R: Base R, Lattice package, and ggplot2 package. Thus, I will generate visualizations from these listed methods.

In base R, let's make a "pie" chart that determines the occurrences of each of the 4 types of pizzas sold:

As for the second visual, let's use the lattice package to explore the relationship between prices and pizza size:

Moving on to the third visual using the ggplot2 package, the data was separated by facet to make it easier to compare trends between the four types of pizza and their associated sales over time:

After creating the visuals using the three different methods, I must admit that it is interesting to see how each method does have its pros and cons. For example, I do like to use the base R method but things can get complicated fast with having to call out all the individual methods. To make the first visual better, I should include percentages for each of the four types of pizza sold. Moving on to the second visual, I do not have too much experience using the Lattice package but I do think that the visual came out well in telling a story with the data. For instance, it still weirds me out that someone bought a super expensive small pizza that surpassed the price of a large pizza. Lastly, the ggplot2 visual really puts into perspective which pizza type is the most expensive in terms of sales like classic going above 30. 

Check out the code here: Module 9 Code

~ Katie 

LIS 4317 Visual Analytics - Module 9 Assignment

In this week of Visual Analytics, we are asked to create a multi-variate visualization graph with a dataset of our choice.

In this case, I decided to work with a dataset called nyc_squirrels.csv and basically it contains very detailed observations from squirrel watching in New York City's Central Park. From what they were doing, what sound they made, to even the exact geo coordinates of where the squirrel watching event occurred, it is all noted down in the data. 

Link to where I found the data: NYC Squirrels Data

From this data, I decided that I wanted to better understand the spatial distribution of squirrel sightings and see if there is any difference in sightings that occurred in the AM or PM.

To begin my analysis, I did make a point to clean my data containing entries with NA values and deleted variables that were not conducive to the analysis.

With the data ready, I used the ggplot2 package in R to graph the points:

Here is the visual: 

As you can see, when the geo coordinate points are plotted, it actually makes a rough outline of Central Park. The big empty gap you are seeing represents Jacqueline Kennedy Onassis Reservoir so it makes sense that there were not any squirrels spotted there. For the most part, I do not see any particular difference in squirrel sightings in the AM versus PM but there does appear to be more squirrel sightings at night than during the day.

For fun, let's see what this plot looks like in Tableau with a map underneath the points:

See the map up close here: NYC Squirrel Sightings

Wrapping up, visualizing multi-variables can be very helpful when it comes to understanding the subtle relationships between them. It is definitely interesting to be able to compare AM sightings to PM sightings and where they occurred in Central Park and allows for one to better understand the dataset.

As for applying the 5 principles of design, alignment is used for the axis labels, legend, and title for better readability. With repetition, shape style, color, font size, and type are kept consistently. To highlight the difference between day and night, I opted to use cool colors most often associated with the night for PM and warm colors for AM which checks off the contrast requirement. Moving on to proximity, visual elements like the legend are clearly placed together to promote connection. Lastly, with balance, I must admit that the Tableau visual is not as balanced as the previous ggplot visual. It has very small legend which makes it have uneven weight. To prevent this, I should think about adding more data elements to make the visual more balanced.

~ Katie

LIS 4370 R Programming - sentimentTextAnalyzer2 Final Project

For this class's major final project, I set out to make the process of analyzing textual files and URL links for sentiment insights much...