In the following link, please see the main functions I have created for the sentimentTextAnalyzer package. Each of the functions are completely functional but I have some testing to do to make sure that it can handle a variety of different types of text and URL and file types.
Link to RMD file on GitHub: RMD File
easyRead:
The first function I created is called easyRead and its main purpose is to do any preprocessing before the file or link is properly cleaned by easyClean which is called within easyRead. The input is the user's selected file or link and the output is ready-to-use matrix, the appropriate format for analysis.
easyClean:
easyClean takes the preprocessed text from easyRead and cleans it by making the words lowercase, removing punctuation, removing numbers, and removing common English stopwords. The input is the preprocessed text and the output is a matrix.
easyFrequency:
In easyFrequency, it takes the previously created matrix and outputs the frequency of words found within the text. By reading in positive and negative lexicons, the function then determines of the frequency of those types of words found within the text. The input is the word_matrix, the positive and negative lexicons and the output is a list of the frequency results.
easyWordCloud:
This function takes in a dataframe and returns a default wordcloud. At this time, users must create a dataframe from the easyFrequency results for this function to work properly.
Quick Demo:
Insights, Challenges, Improvements:
For the most part, I am satisfied with easyRead and easyClean but easyFrequency and easyWordCloud could use some polish. At this moment, user's have to input their own lexicons which I understand is not feasible for everyone. Thus, I will have to figure out how to include a few more ready-to-use lexicons. Additionally, I think I will try to change the output to be a dataframe rather than a list as individuals do have to do a bit of coding to get the results ready for visualization. As for easyWordCloud, it works but it could be better. I would like to include some style options for the user to choose from and provide some more control over the number of words shown on the wordcloud.
~ Katie