Go to page
 

Bibliographic Metadata

Title
Sentiment analysis and linking of social media with open-source software repositories
Additional Titles
Sentiment analysis and linking of social media with open-source software repositories
AuthorĎurček, Martin
CensorBreu, Ruth ; Ďurček, Martin
Thesis advisorFelderer, Michael
Published2018
Institutional NoteInnsbruck, Univ., Masterarb., 2018
Date of SubmissionOctober 2018
LanguageEnglish
Document typeMaster Thesis
Keywords (DE)Sprachverarbeitung
Keywords (EN)twitter / sentiment analysis / computational linguistics / natural language processing / text processing / GitHub / open-source / software repository mining / reddit / bug / term frequency - inverse document frequency / string similarity / text processing / scikit-learn
URNurn:nbn:at:at-ubi:1-29766 Persistent Identifier (URN)
Restriction-Information
 The work is publicly available
Files
Sentiment analysis and linking of social media with open-source software repositories [2.68 mb]
Links
Reference
Classification
Abstract (English)

The main goal of this master thesis is to extract and analyse publicly available data from social media (Twitter, Reddit), Stack Overflow and Git and determine whether there is any relationship between the sentiment of open-source tool users and the release frequency and release size.

Second part of the thesis is about automatic pairing Git issues with the Stack Overflow discussions based on the text similarity. These tasks required a development of an algorithm based on natural language processing and machine learning. Thesis covers all important steps of developing such algorithm - from choosing a training dataset through preprocessing the data and evaluation and fine-tuning of classifiers to the presentation of the results. Performance of various algorithms from various development packages and with several datasets is studied. Final implementation is based on Scikit-learn Python module and utilizes Term Frequency - Inverse Document Frequency (Tf-Idf). In the last part of the thesis, the results are presented to the open-source project users and compared with their personal experience. Also, two extra real-world use cases where my classifier was applied are mentioned.

Stats
The PDF-Document has been downloaded 7 times.