Thursday, March 28, 2024

LIS 4370 R Programming - Module 12 Assignment

In the following link, please see the main functions I have created for the sentimentTextAnalyzer package. Each of the functions are completely functional but I have some testing to do to make sure that it can handle a variety of different types of text and URL and file types. 

Link to RMD file on GitHub: RMD File

easyRead:

The first function I created is called easyRead and its main purpose is to do any preprocessing before the file or link is properly cleaned by easyClean which is called within easyRead. The input is the user's selected file or link and the output is ready-to-use matrix, the appropriate format for analysis.

easyClean:

easyClean takes the preprocessed text from easyRead and cleans it by making the words lowercase, removing punctuation, removing numbers, and removing common English stopwords. The input is the preprocessed text and the output is a matrix.

easyFrequency:

In easyFrequency, it takes the previously created matrix and outputs the frequency of words found within the text. By reading in positive and negative lexicons, the function then determines of the frequency of those types of words found within the text. The input is the word_matrix, the positive and negative lexicons and the output is a list of the frequency results.

easyWordCloud:

This function takes in a dataframe and returns a default wordcloud. At this time, users must create a dataframe from the easyFrequency results for this function to work properly. 

Quick Demo:



Insights, Challenges, Improvements:

For the most part, I am satisfied with easyRead and easyClean but easyFrequency and easyWordCloud could use some polish. At this moment, user's have to input their own lexicons which I understand is not feasible for everyone. Thus, I will have to figure out how to include a few more ready-to-use lexicons. Additionally, I think I will try to change the output to be a dataframe rather than a list as individuals do have to do a bit of coding to get the results ready for visualization. As for easyWordCloud, it works but it could be better. I would like to include some style options for the user to choose from and provide some more control over the number of words shown on the wordcloud. 

~ Katie

Tuesday, March 26, 2024

LIS 4317 Visual Analytics - Module 11 Assignment

After reviewing the many visualizations Dr. Piwek made on his website,

Tufte and Minard Post

I decided to replicate the following visuals:

Density Plot Code:

Visual:

Box Plot Code:

Visual:

Reflection:

Going through Dr. Piwek's post on graphing visuals inspired by Tufte and Minard was quite interesting. There were many complex visualizations included and I have found the ones with added interactivity through the use of the package highcharter to be particularly fascinating. Going forward, I will have to refer back to the post when I need a refresher on style. 

~ Katie

LIS 4370 R Programming - sentimentTextAnalyzer2 Final Project

For this class's major final project, I set out to make the process of analyzing textual files and URL links for sentiment insights much...