The main goal of this master thesis is to extract and analyse publicly available data from social media (Twitter, Reddit), Stack Overflow and Git and determine whether there is any relationship between the sentiment of open-source tool users and the release frequency and release size.
Second part of the thesis is about automatic pairing Git issues with the Stack Overflow discussions based on the text similarity. These tasks required a development of an algorithm based on natural language processing and machine learning. Thesis covers all important steps of developing such algorithm - from choosing a training dataset through preprocessing the data and evaluation and fine-tuning of classifiers to the presentation of the results. Performance of various algorithms from various development packages and with several datasets is studied. Final implementation is based on Scikit-learn Python module and utilizes Term Frequency - Inverse Document Frequency (Tf-Idf). In the last part of the thesis, the results are presented to the open-source project users and compared with their personal experience. Also, two extra real-world use cases where my classifier was applied are mentioned.