Wednesday, January 31, 2024

LIS4317 Visual Analytics - Module 4 Assignment

For this assignment, I will generate visualizations based on monthly modal time series data from Data.gov. 

In the visualizations, I decided to use the following 6 variables:

Primary USA City, Year, Vehicle Revenue Miles, Vehicle Revenue Hours, Ridership, Collisions with Motor Vehicle, and Collisions with Person

Thinking about how to best visualize this data, the first thing I wanted to see was the overall rate of collisions over the years recorded.

To do this, I went with a bubble chart and as you can see, 2019 had the lowest rate of collisions compared to the others.

Legend for reference:

While this is interesting, I wanted to further compare collisions across select US major cities and came up with the following:


Legend for reference:

The top line chart represents collisions with motor vehicles while the bottom chart shows collisions with people. Immediately, we can see that New York City experiences the highest number of collisions but it is interesting to see how it trends downward very quickly in the course of a year between 2018 and 2019. Further, the other cities trended downward as well but look steady compared to New York City. 

See the visualizations up close here:

Bubble Chart

Line Chart

~ Katie


Tuesday, January 30, 2024

LIS4370 R Programming - Module 4 Assignment

For this assignment, I will analyzing data that came from a local hospital that contains general patient information like blood pressure as well as the decisions made by the general doctor, external doctor, and the final decision made the head of the emergency unit. Additionally, I will generate a boxplot and histogram representing patient blood pressure and the decisions made by the healthcare professionals.

To begin, I organized the data and created the following data frame called patientInfo which consists of 10 observations of 5 variables.

Immediately, we can see that the column first contains an NA value. While I could remove it, seeing that it is such a small dataset, I instead convert the row to numeric.

 Moving on to plotting, let's take a look at the boxplot containing all the variables of patientInfo:

Just looking at the graph, the first thing that sticks out as the variable bloodp against the other variables. For the most part, the other variables seem to be clustered around the 0 mark while bloodp appears to have the median line a little below 100. Furthermore, we can see the outliers of 42 and 205 clearly represented. It makes sense to see first, second, and finaldecision around 0 and 1 as the values primarily consisted of these numbers. Additionally, frequency of visit was represented as a decimal so it makes sense to see it around 0 as well.  

As for histograms, I decided to make use of ggplot to plot each of the variables:

Freq:

bloodp:

first:

second:

finaldecision:


Looking at the histograms, what first sticks out to me is the various blood pressure readings from the patients in the bloodp histogram. For the most part, the blood pressure readings seem to mostly be around 100 but there are few high readings and low readings. As for the decisions made by the general doctor, external doctor, and the head of the emergency unit, one can first see that general doctor rated the health of the patients as bad (1) more so than good (0). With the external doctor, they rated the patients as high (1) more so than low (0). Lastly, with the finaldecision by emergency unit head, they also tended to rate the patients as high (1) than as low (0). 

Naturally, more information is needed to understand the meanings of high and low by the medical professionals but looking at graphs, it appears that high may mean someone is in poor health than in good health and hopefully this means that the medical professionals are making the right decision for their patients.

Here's a link to the full code via GitHub:

Module 4 Code

~ Katie

LIS 4370 R Programming - sentimentTextAnalyzer2 Final Project

For this class's major final project, I set out to make the process of analyzing textual files and URL links for sentiment insights much...