{Han19} Junxiao Han, Shuiguang Deng, Xin Xia, Dongjing Wang and Jianwei Yin. Characterization and Prediction of Popular Projects on GitHub. Proc. of IEEE COMPSAC, 2019. A random forest classifier is built based on these features to identify popular GitHub projects
{Coelho20} Jailton Coelho, Marco Tulio Valente, Luciano Milen, Luciana L. Silva. Is this GitHub project maintained? Measuring the level of maintenance activity of open-source projects. Information and Software Technology, 2020, 122:106274.
{Golzadeh21} Mehdi Golzadeh, Alexandre Decan, Damien Legay, Tom Mens. A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments. Journal of Systems and Software, 2021, 175:110911.
{Bao21} Lingfeng Bao, Xin Xia, David Lo and Gail C. Murphy. A Large Scale Study of Long-Time Contributor Prediction for GitHub Projects. IEEE Transactions on Software Engineering, 2021, 47(6):1277-1298. effectively predict newcomers in OSS projects to be LTCs based on their activity data that is collected from Github
{Golzadeh21} Mehdi Golzadeh, Alexandre Decan, Damien Legay, Tom Mens. A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments. Journal of Systems and Software, 2021, 175:110911.
{Xiao22} W. Xiao, H. He, W. Xu, X. Tan, J. Dong and M. Zhou. Recommending Good First Issues in GitHub OSS Projects. Proc. of IEEE/ACM ICSE, 2022. propose RecGFI, an effective practical approach for the recommendation of good first issues to newcomers, which can be used to relieve maintainers' burden and help newcomers onboardData(https://github.com/mcxwx123/RecGFI)
{Abdellatif22} Ahmad Abdellatif, Mairieli Wessel, Igor Steinmacher, Marco A. Gerosa, and Emad Shihab. BotHunter: an approach to detect software bots in GitHub. In Proceedings of the 19th International Conference on Mining Software Repositories. Proc. of MSR, 2022. propose using a machine learning-based approach to identify the bot accounts regardless of their activity levelData(https://zenodo.org/record/5884330)
{Feng22} Runhan Feng, Ziyang Yan, Shiyan Peng, and Yuanyuan Zhang. Automated detection of password leakage from public GitHub repositories. Proc. of ICSE, 2022. presents PassFinder, an automated approach to effectively detecting password leakage from public repositories that involve various programming languages on a large scale
{Wang23} Tianlei Wang, Shaowei Wang, Tse-Hsun (Peter) Chen. Study the correlation between the readme file of GitHub projects and their popularity. Journal of Systems and Software, 2023, 205:111806. Data(https://github.com/TianleiWang0414/WIP2022-GitHub_Readme-code/)
{Xiao23} Wenxin Xiao, Hao He, Weiwei Xu, Yuxia Zhang, Minghui Zhou. How Early Participation Determines Long-Term Sustained Activity in GitHub Projects? Proc. of ESEC/FSE, 2023. early participants have a positive effect on project’s future sustained activity if they have prior experience in OSS project incubation and demonstrate concentrated focus and steady commitment
Behavior
{Lima14} Antonio Lima, Luca Rossi, Mirco Musolesi. Coding Together at Scale: GitHub as a Collaborative Social Network. Proc. of AAAI ICWSM, 2014.
{Tsay14} Jason Tsay, Laura Dabbish, James Herbsleb. Influence of Social and Technical Factors for Evaluating Contribution in GitHub. Proc. of ICSE, 2014. project managers made use of information signaling both good technical contribution practices for a pull request and the strength of the social connection between the submitter and project manager when evaluating pull requests
{Tsay14} Jason Tsay, Laura Dabbish, James Herbsleb. Let's Talk About It: Evaluating Contributions through Discussion in GitHub. Proc. of ACM FSE, 2014. how developers in open work environments evaluate and discuss pull requests
{Gousios16} Georgios Gousios, Margaret-Anne Storey, Alberto Bacchelli. Work Practices and Challenges in Pull-Based Development: The Contributor’s Perspective. Proc. of ICSE, 2016.
{Steinmacher18} Igor Steinmacher, Gustavo Pinto, Igor Scaliante Wiese, Marco Aurélio Gerosa. Almost there: a study on quasi-contributors in open source software projects. Proc. of ICSE, 2018. quasi-contributors
{Claes18} Maëlick Claes, Mika V. Mäntylä, Miikka Kuutila, Bram Adams. Do Programmers Work at Night or During the Weekend? Proc. of ICSE, 2018. the first large-scale study of software engineers' working hours by investigating the time stamps of commit activities of 86 large open source software projects
{Hu18} Yan Hu, Shanshan Wang, Yizhi Ren, Kim-Kwang Raymond Choo. User influence analysis for Github developer social networks. Expert Systems with Applications, 2018, 108:108-118.
{Gong19} Qingyuan Gong, Jiayun Zhang, Yang Chen, Qi Li, Yu Xiao, Xin Wang, Pan Hui. Detecting Malicious Accounts in Online Developer Communities Using Deep Learning. Proc. of ACM CIKM, 2019.
{Prana19} Gede Artha Azriadi Prana, Christoph Treude, Ferdian Thung, Thushari Atapattu & David Lo. Categorizing the Content of GitHub README Files. Empirical Software Engineering volume, 2019, 24:1296–1327. we conduct a qualitative study involving the manual annotation of 4,226 README file sections from 393 randomly sampled GitHub repositories and we design and evaluate a classifier and a set of features that can categorize these sections automatically
{Imtiaz19} Nasif Imtiaz, Justin Middleton, Joymallya Chakraborty, Neill Robson, Gina Bai and Emerson Murphy-Hill, "Investigating the Effects of Gender Bias on GitHub. Proc. of IEEE/ACM ICSE, 2019. how that effects of gender bias are largely invisible on the GitHub platform itself, there are still signals of women concentrating their work in fewer places and being more restrained in communication than men
{Mezouar19} Mariam El Mezouar, Feng Zhang & Ying Zou. An empirical study on the teams structures in social coding using GitHub projects. Empirical Software Engineering, 2019, 24:3790–3823. The evolution of the team structures of projects over time reveals that only a low percentage of the projects witnesses a change towards team structures associated to faster pull requests processing
{Zhang19} Yu Zhang, Frank F. Xu, Sha Li, Yu Meng, Xuan Wang, Qi Li, Jiawei Han. HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories. Proc. of IEEE ICDM, 2019. targets the automatic repository classification problem as keyword-driven hierarchical classification
{Meli19} Michael Meli, Matthew R. McNiece, Matthew R. McNiece. How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories. Proc. of NDSS, 2019.
{Tan20} Xin Tan, Minghui Zhou, Zeyu Sun. A First Look at Good First Issues on GitHub. Proc. of ESEC/FSE, 2020. Newcomer onboarding is an important but challenging question in open source projects and our work enables a better understanding of GFI mechanism and its problems, as well as highlights ways in improving them
{Shao20} Huajie Shao, Dachun Sun, Jiahao Wu, Zecheng Zhang, Aston Zhang, Shuochao Yao, Shengzhong Liu, Tianshi Wang, Chao Zhang, and Tarek Abdelzaher. Paper2repo: GitHub Repository Recommendation for Academic Papers. Proc. of WWW, 2020. describe a novel item-item cross-platform recommender system, paper2repo, that recommends relevant repositories on GitHub that match a given paper in an academic search system such as Microsoft Academic
{Brisson20} Scott Brisson, Ehsan Noei, Kelly Lyons. We Are Family: Analyzing Communication in GitHub Software Repositories and Their Forks. Proc. of SANER, 2020. We study communications within 385 software families comprised of 13,431 software repositories. We find that the fork depth, the number of users who have contributed to multiple repositories in the same family, the number of followers from outside the family, familial pull requests, and reported issues share a statistically significant relationship with repository stars
{Zhang21} Jiayun Zhang, Yang Chen, Qingyuan Gong, Aaron Yi Ding, Yu Xiao, Xin Wang, Pan Hui. Understanding the Working Time of Developers in IT Companies in China and the United States. IEEE Software, 2021, 38(2):96-106.
{Zhang24} Yuxia Zhang, Mian Qin, Klaas-Jan Stol, Minghui Zhou, and Hui Liu. 2024. How Are Paid and Volunteer Open Source Developers Different? A Study of the Rust Project. Proc. of IEEE/ACM ICSE, 2024. presents an empirical study of paid developers and volunteers in Rust, a popular open source programming language project
{Ziegler24} Albert Ziegler, Eirini Kalliamvakou, X. Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2024. Measuring GitHub Copilot's Impact on Productivity. Commun. ACM 67, 3 (March 2024), 54–63. https://doi.org/10.1145/3633453
CI
{Vasilescu15} Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, Vladimir Filkov. Quality and Productivity Outcomes Relating to Continuous Integration in GitHub. Proc. of ESEC/FSE, 2015.
{Hilton16} Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, Danny Dig. Usage, costs, and benefits of continuous integration in open-source projects. Proc. of IEEE/ACM ASE, 2016.
{Zhao17} Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, Vladimir Filkov, Bogdan Vasilescu. The impact of continuous integration on other software development practices: A large-scale empirical study. Proc. of IEEE/ACM ASE, 2017.
{Gallaba18} Keheliya Gallaba, Christian Macho, Martin Pinzger, and Shane McIntosh. 2018. Noise and heterogeneity in historical build data: an empirical study of Travis CI. Proc. of IEEE/ACM ASE, 2018.
{Widder19} David Gray Widder, Michael Hilton, Christian Kästner, and Bogdan Vasilescu. A conceptual replication of continuous integration pain points in the context of Travis CI. Proc. of ESEC/FSE, 2019.
{Cassee20} Nathan Cassee, Bogdan Vasilescu, Alexander Serebrenik. The Silent Helper: The Impact of Continuous Integration on Code Reviews. Proc. of SANER, 2020. with the introduction of CI, pull requests are being discussed less; in presence of CI developers perform the same amount of work by communicating less, giving rise to the idea of CI as a silent helper
{Zhang24} Yang Zhang, Yiwen Wu, Tingting Chen, Tao Wang, Hui Liu, and Huaimin Wang. How do Developers Talk about GitHub Actions? Evidence from Online Software Development Community. Proc. of IEEE/ACM ICSE, 2024. leads to the first comprehensive taxonomy of problems related to GHA, covering 4 categories and 16 sub-categories