Sunday, April 14, 2024

LIS 4317 Visual Analytics - Final Project

For this project, I want to answer the following research question:

How does the employment and unemployment rates of recent 4-year college graduates change over time in the state of Florida? Furthermore, how does their median earning income change over time?

Project Objectives:

  • Perform trend analysis on employment and unemployment rates of graduates from 4-year public universities in Florida.
  • Analyze changes in the median earning income of recent graduates from 4-year public universities in Florida over time.
  • Address data gaps resulting from inconsistent reporting by certain universities.
  • Focus the analysis specifically on public universities in Florida.

To solve this question, I will be using an extensive dataset called scorecard from the Vincent Arel Bundock Dataset Archive. This dataset contains information about my posed research question from the years 2007 to 2016. To add, this dataset is not limited to schools in Florida but contains data from all schools in the United States and also includes data on trade schools and community colleges. 

Problem Description:

Based on this information, the goal of this project is to perform trend analysis to understand how the employment and unemployment rates of graduates from 4-year public universities in Florida have changed over time. In addition, I would like to see how the median earning income of graduates has changed over time as well.

It is important to note that some schools that are listed do not provide data for all the years so it is important to take into consideration that there may be some gaps in the time. Upon analyzing the data from the Florida schools, it seems that they reported these values every other year. Given the extensiveness of the data, it may not be feasible to show every school in Florida including community colleges and trade schools. So, I will filter the data to only show public universities in the state of Florida.

Related Work:

Reflecting on concepts applicable to this problem, it can be related to time series analysis as I want to figure out if there are notable patterns evident throughout this stretch of time. As mentioned in chapter 4 of Nathan Yau’s book, Visualize This, it is recommended that looking at the big picture, that is, the full recorded length of time can allow us to analyze irregularities or spikes and dips in the data. However, it may also bring up outliers that do not add much to the overall visualization so it is important to take into account whether the full time period should be graphed or only part of it. For exploratory purposes, I intend to graph the full time period to determine if there are any spikes or dips I should be concerned with. Further, when I see a particular stretch of time that is noteworthy, I will graph that subset of the data.

Secondly, there is the concept of discrete points in time. As Yau points out, it is preferred that the recorded values are from specific points or blocks of time and there is a finite number of possible values. Thankfully, the observations in my selected dataset are finite from the count of college graduates working or not working after completing their degree to the median value of their earnings upon graduating college.

Reviewing Yau’s time series analysis visuals, I found this one to be the most interesting for its simplicity while also being quite informative to readers regarding when a new record was made:

Making sure to touch upon the ideas of time series analysis by Stephen Few of Now You See It, the concept of trend will play a big role in understanding earnings as well as how employment and unemployment rates increase and decrease over time.

Lastly, here is a line graph that I located from the R-Graph Gallery website which served as inspiration regarding line graph design:

Solution:

Before I began my visualization, I had to preprocess my data beforehand to identify the schools in Florida. Thankfully, for trade schools, community colleges, and four-year institutions, they all could be easily identified through the variable: pred_degree_awarded_ipeds which implemented the following numbering system for one to quickly identify what type of school is listed.

1 - Trade Schools

2 - Community Colleges

3 - Four Year Institutions

Then, to prevent having too many schools in the visual, I focused solely on schools that were assigned 3 and filtered out my data to only include public institutions rather than private institutions.

After some filtering, I was able to prepare my data for visualization:

After some consideration, it became clear that the line graph would be the preferred method of visualization. Given that I am dealing with time series data which does involve some spikes and dips in the values, it was clear that it was the most suitable.

I decided to create three line graphs using the ggplot2 package but to give myself more options when it came to design, I also enabled the packages “hrbrthemes” and “viridis”.

When it came to plotting the lines, I knew that I had to go beyond the standard ggplot2 design defaults. Although I do enjoy the color scheme, I needed to make sure that each of the lines were easily distinguishable from one another. After exploring the internet in search of palettes, I found the "Paired" palette to be the most aesthetically pleasing amongst the other color palettes. As for the design of the graph itself,  I made use of theme_ipsum of the hrbrthemes package to give my graphs a more mature design. 

Below are the graphs:

Median incomes over time

Unemployment after graduation over time

Employment after graduation over time

Key takeaways:

Note: It is important to understand that the employment and unemployment values as seen in visuals 2 and 3 are simply counts of how many individuals responded saying that they were employed or not. In this case, individuals were more likely to report that they were employed after graduation as opposed to being unemployed after graduation. Additionally, it seems that USF alumni might be overrepresented in the dataset.

Viewing the first line graph, Median Earning Income of College Graduates, we can immediately see that University of Florida clearly has turned out undergraduate students which make a high income upon graduating. Not far behind it is Florida International University and Florida State University. What is notable about the graph is the sudden dip in incomes during the year 2012 and then its immediate spike back up again. From this graph, we can conclude that median incomes were at first a downward trend but immediately went up becoming an upward trend. More data is needed to further understand this pattern.

Moving on to the second graph, Number of Unemployed Students After Graduation, this graph shows a somewhat concerning upward trend with University of South Florida-Main Campus guiding the way. Not good! Anyway, I must add that it is hard to say exactly if unemployment is necessarily a bad thing. For instance, it could just mean that the graduates went on immediately to attain an advanced degree such as a Masters. However, more data is needed to further understand this situation.

As for the third graph, Number of Employed Students After Graduation, USF redeems itself as it gradually increases with the number of students employed after graduation but I am surprised by the University of Florida being in forth place behind University of Central Florida and Florida State University. As I mentioned from the previous, it could be the case that these numbers are down because these students have gone on to attain advanced degrees elsewhere.

All in all, these graphs are informative but I still have some concerns regarding count of employment and unemployment as it is not the best method to show variations across the institutions. For example, the researchers could have gotten a really small number of responses from these institutions and thus, the data could be skewed and not accurately providing us with the actual trend.

To combat this issue, I have taken the percentage of both employment and unemployment to determine if these trends are really as significant as count makes it out to be:

Unemployment Percentage:

Employment Percentage:

Looking at both these graphs, although we can see that there were clear spikes and dips throughout the years, the trends themselves tell a slightly different story. First, in the Unemployment Percentage graph, there seems to be a downward trend and the percentages themselves are very low with the highest recorded being 15 percent of students at Florida Gulf Coast University report being unemployed after graduation in 2007. As for the Employment Percentage graph, it follows more of a flat trend. Yes, there are highs and lows but there really is no trend with Employment percentage. This allows us to see that although the count of students employed has increased, it really does not make a difference in the overall employment trend.

Although the universities appear to be going in the right direction, it is also critical to determine how a line of best fit impacts how one reads the graph:

Unemployment Percentage:

Employment Percentage:

Through the line of best fit, it deviates from the original story as perceived by the previous plots. When the line of best fit for unemployment (not working) is flat or slightly increasing, and the line of best fit for employment (working) is flat or slightly decreasing, it suggests that there may be little to no change or a slight deterioration in employment and unemployment rates over the years for the institutions.

Furthermore, we must also look towards outside factors like:

Stagnation in Employment and Unemployment: The flat or slightly changing trend lines indicate that there hasn't been significant improvement or worsening in employment and unemployment rates over the years. This could imply a stagnant job market or consistent labor force dynamics within these institutions.

Economic Stability or Stagnation: It may suggest overall economic stability or stagnation in the regions where these institutions operate. Stable economic conditions might lead to steady employment rates, while stagnant conditions might result in little change in both employment and unemployment rates.

Structural Factors: There could be underlying structural factors within these institutions or industries that contribute to the observed trends. For example, if these institutions operate in sectors with slow growth or high job security, it could lead to relatively stable employment and unemployment rates over time.

Overall, interpreting the implications of these trends requires considering broader economic context, institutional factors, and potential limitations of the data. Further analysis or contextual information may help provide a clearer understanding of the observed patterns.

~ Katie

No comments:

Post a Comment

LIS 4370 R Programming - sentimentTextAnalyzer2 Final Project

For this class's major final project, I set out to make the process of analyzing textual files and URL links for sentiment insights much...