collaboration networks and citation patterns - chenyang03/Reading GitHub Wiki
survey
{Fortunato18} Santo Fortunato, Carl T. Bergstrom, Katy Börner, James A. Evans, Dirk Helbing, Staša Milojević, Alexander M. Petersen, Filippo Radicchi, Roberta Sinatra, Brian Uzzi, Alessandro Vespignani, Ludo Waltman, Dashun Wang, and AlbertLászló Barabási. Science of science.Science359,eaao0185(2018).DOI:10.1126/science.aao0185
{Liu23} Lu Liu, Benjamin F. Jones, Brian Uzzi, and Dashun Wang. Data, measurement and empirical methods in the science of science. Nature Human Behaviour 7, 1046–1058 (2023). https://doi.org/10.1038/s41562-023-01562-4
collaboration networks
{Moody04} James Moody. The Structure of a Social Science Collaboration Network: Disciplinary Cohesion from 1963 to 1999. American Sociological Review, 2004, 69(2):213-238. PDF
{Guimerà05} Roger Guimerà, Brian Uzzi, Jarrett Spiro, and Luís A. Nunes Amaral. Team assembly mechanisms determine collaboration network structure and team performance. Science, 2005, 308(5722):697-702. develop a model for the assembly of teams of creative agents in which the selection of the members of a team is controlled by three parameters: (i) the number, m, of team members; (ii) the probability, p, of selecting incumbents, that is, agents already belonging to the network; and (iii) the propensity, q, of incumbents to select past collaborators.
{Tang08} Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: extraction and mining of academic social networks. Proc. of ACM KDD, 2008.
{Uzzi13} Brian Uzzi, Satyam Mukherjee, Michael Stringer, Ben Jones. Atypical Combinations and Scientific Impact. Science, 2013, 342:468. Highest-impact science is primarily grounded in exceptionally conventional combinations of prior work yet simultaneously features an intrusion of unusual combinations. Papers of this type were twice as likely to be highly cited works.
{Fu14} Tom Z. Fu, Qianqian Song, and Dah Ming Chiu. The academic social network. Scientometrics, 2014, 101(1): 203-239.
{Staša14} Staša Milojević. Principles of scientific research team formation and evolution. Proceedings of the National Academy of Sciences 111.11 (2014): 3984-3989.
{Guan16} JianCheng Guan, KaiRui Zuo, KaiHua Chen, Richard C.M.Yam. Does country-level R&D efficiency benefit from the collaboration network structure? Research Policy, 2016, 45(4):770-784. PDF
{Liang16} Ronghua Liang and Xiaorui Jiang. 2016. Scientific ranking over heterogeneous academic hypernetwork. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI'16). AAAI Press, 20–26.
{Kong19} Xiangjie Kong, Yajie Shi, Wei Wang, Kai Ma, Liangtian Wan, Feng Xia. The Evolution of Turing Award Collaboration Network: Bibliometric-Level and Network-Level Metrics. IEEE Transactions on Computational Social Systems, 2019, 6(6):1318-1328. although the Turing Award Collaboration Network has small-world properties, it is not a scale-free network
{Wu19} Lingfei Wu, Dashun Wang & James A. Evans. Large teams develop and small teams disrupt science and technology. Nature, 2019, 566:378–382. Work from larger teams builds on more-recent and popular developments, and attention to their work comes immediately. By contrast, contributions by smaller teams search more deeply into the past, are viewed as disruptive to science and technology and succeed further into the future—if at all.
{Lin23} Yiling Lin, Carl Benedikt Frey and Lingfei Wu. Remote Collaboration Fuses Fewer Breakthrough Ideas. Nature 2023, 623:987–991.
{Sheng23} Jingran Sheng, Bo Liang, Lin Wang, Xiaofan W. Evolution of scientific collaboration based on academic ages. Physica A, 2023, 624:128846. This paper sets up a scientific research dataset containing over 150 million articles in a wide range of disciplines and fields. After a preliminary cleansing, we select a subset of papers from this dataset with the publication year from 1955 to 2015, including more than 7.9 million authors and 22 million papers. Based on this dataset, we investigate and analyze the scientific collaborative patterns between authors and between countries and regions from the perspective of the authors’ academic age (AA).
{Liu23} Lu Liu, Benjamin F. Jones, Brian Uzzi & Dashun Wang. Data, measurement and empirical methods in the science of science. Nature Human Behaviour, 2023, 7:1046–1058. This Review considers this growing, multidisciplinary literature through the lens of data, measurement and empirical methods. We discuss the purposes, strengths and limitations of major empirical approaches, seeking to increase understanding of the field’s diverse methodologies and expand researchers’ toolkits.
{Wu23} Youyou Wu, Yang Yang and Brian Uzzi. A discipline-wide investigation of the replicability of Psychology papers over the past two decades. PNAS, 2023, 120(6):e2208863120. Using a validated machine learning model that estimates a paper’s likelihood of replication, we found evidence that both supports and refutes speculations drawn from a relatively small sample of manual replications
{Zhang23} Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, and Jiawei Han. 2023. Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '23). Association for Computing Machinery, New York, NY, USA, 3458–3469. https://doi.org/10.1145/3580305.3599544
{Xue24}Zhikai Xue, Guoxiu He, Zhuoren Jiang, Sichen Gu, Yangyang Kang, Star Zhao, and Wei Lu. 2024. Predicting Scientific Impact Through Diffusion, Conformity, and Contribution Disentanglement. In Proceedings of CIKM '24. Association for Computing Machinery, New York, NY, USA, 2764–2774. https://doi.org/10.1145/3627673.3679546
{Gao24} Jian Gao & Dashun Wang. Quantifying the use and potential benefits of artificial intelligence in scientific research. Nature Human Behaviour, 2024, 8:2281–2292. We find that the use and benefits of AI appear widespread throughout the sciences, growing especially rapidly since 2015. However, there is a substantial gap between AI education and its application in research, highlighting a misalignment between AI expertise supply and demand.
{Peng24} Hao Peng, Huilian Sophie Qiu, Henrik Barslund Fosse, and Brian Uzzi. Promotional language and the adoption of innovative ideas in science. PNAS, 2024, 121(25):e2320066121. Using three longitudinal samples of funded and unfunded grant applications from three of the world’s largest funders—the NIH, the NSF, and the Novo Nordisk Foundation—we find that the percentage of promotional language in a grant proposal is associated with the grant’s probability of being funded, its estimated innovativeness, and its predicted levels of citation impact.
{Hill25} Ryan Hill, Yian Yin, Carolyn Stein, Xizhao Wang, Dashun Wang & Benjamin F. Jones. The pivot penalty in research. Nature, 2025, 642:999–1006. We find a pervasive ‘pivot penalty’, in which the impact of new research steeply declines the further a researcher moves from their previous work.
{Ke15} Qing Ke, Emilio Ferrara, Filippo Radicchi and Alessandro Flammini:Defining and identifying Sleeping Beauties in science. PNAS, 2015, 112(24): 7426-7431.
{Jiang21} Song Jiang, Bernard J. Koch, Yizhou Sun. HINTS: Citation Time Series Prediction for New Publications via Dynamic Heterogeneous Information Network Embedding. Proc. of WWW, 2021. We propose HINTS, a novel end-to-end deep learning framework that converts citation signals from dynamic heterogeneous information networks (DHIN) into citation time series. HINTS imputes pseudo-leading values for a paper in the years before it is published from DHIN embeddings, and then transforms these embeddings into the parameters of a formal model that can predict citation counts immediately after publication.
{Rungta22} Mukund Rungta, Janvijay Singh, Saif M. Mohammad, Diyi Yang. Geographic Citation Gaps in NLP Research. Proc. of EMNLP, 2022. We first created a dataset of 70,000 papers from the ACL Anthology, extracted their meta-information, and generated their citation network. We then show that not only are there substantial geographical disparities in paper acceptance and citation but also that these disparities persist even when controlling for a number of variables such as venue of publication and sub-field of NLP.
{Yang23} C. Yang and J. Han, "Revisiting Citation Prediction with Cluster-Aware Text-Enhanced Heterogeneous Graph Neural Networks," 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 2023, pp. 682-695, doi: 10.1109/ICDE55515.2023.00058.
{Yan24} Pengwei Yan, Yangyang Kang, Zhuoren Jiang, Kaisong Song, Tianqianjin Lin, Changlong Sun, and Xiaozhong Liu. 2024. Modeling Scholarly Collaboration and Temporal Dynamics in Citation Networks for Impact Prediction. In Proceedings of SIGIR '24 (short paper). Association for Computing Machinery, New York, NY, USA, 2522–2526. https://doi.org/10.1145/3626772.3657926
{Nandi24} Rabindra Nath Nandi, Suman Kalyan Maity, Brian Uzzi, Sourav Medya. An Experimental Analysis on Evaluating Patent Citations. Proc. of EMNLP, 2024. We create a semantic graph of patents based on their semantic similarities, enabling the use of Graph Neural Network (GNN)-based approaches for predicting citations
AI impact on science
{Hao26} Qianyue Hao, Fengli Xu, Yong Li, and James Evans. Artificial intelligence tools expand scientists’ impact but contract science’s focus. Nature 649, 1237–1243 (2026).
{Kusumegi25} Keigo Kusumegi, Xinyu Yang, Paul Ginsparg, Mathijs de Vaan, Toby Stuart, and Yian Yin. ,Scientific production in the era of large language models.Science390,1240-1243(2025).DOI:10.1126/science.adw3000
{Noy23} Shakked Noy, Whitney Zhang, Experimental evidence on the productivity effects of generative artificial intelligence. Science 381,187-192(2023). DOI:10.1126/science.adh2586
{Liang25} Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D. Manning, and James Zou. Quantifying large language model usage in scientific papers. Nature Human Behavior 9, 2599–2609 (2025).
{Wu25} Kevin Wu, Eric Wu, Kevin Wei, Angela Zhang, Allison Casasola, Teresa Nguyen, Sith Riantawan, Patricia Shi, Daniel Ho, and James Zou. An automated framework for assessing how well LLMs cite relevant medical references. Nature Communications 16, 3615 (2025). https://doi.org/10.1038/s41467-025-58551-6
{Richardson25} R.A.K. Richardson, S.S. Hong, J.A. Byrne, T. Stoeger, & L.A.N. Amaral, The entities enabling scientific fraud at scale are large, resilient, and growing rapidly, Proc. Natl. Acad. Sci. U.S.A. 122 (32) e2420092122, https://doi.org/10.1073/pnas.2420092122 (2025).
{Kobak25} Dmitry Kobak, Rita González-Márquez, Emőke-Ágnes Horvát, and Jan Lause. ,Delving into LLM-assisted writing in biomedical publications through excess vocabulary.Sci. Adv.11,eadt3813(2025).DOI:10.1126/sciadv.adt3813
{He26} Y. He, & Y. Bu, Academic journals’ AI policies fail to curb the surge in AI-assisted academic writing, Proc. Natl. Acad. Sci. U.S.A. 123 (9) e2526734123, https://doi.org/10.1073/pnas.2526734123 (2026).
{Park26} Minsu Park, Suman Kalyan Maity, Stefan Wuchty, Dashun Wang, Interdisciplinary papers supported by disciplinary grants garner deep and broad scientific impact, PNAS Nexus, Volume 5, Issue 3, March 2026, pgag057, https://doi.org/10.1093/pnasnexus/pgag057
{Ibrahim26} Ibrahim, S.K.M., Mahmoud, Z.A.Z. Generative AI in academic writing: a comparison of human-authored and ChatGPT-generated research article titles. Humanit Social Science Communications 13, 394 (2026). https://doi.org/10.1057/s41599-026-06956-z
{Bianchi26} Federico Bianchi, Owen Queen, Nitya Thakkar, Eric Sun and James Zou. Exploring the use of AI authors and reviewers at Agents4Science. Nature Biotechnology 44, 11–14 (2026). https://doi.org/10.1038/s41587-025-02963-8
{Zhang25} Yanbo Zhang, Sumeer A. Khan, Adnan Mahmud, Huck Yang, Alexander Lavin, Michael Levin, Jeremy Frey, Jared Dunnmon, James Evans, Alan Bundy, Saso Dzeroski, Jesper Tegner, and Hector Zenil. Exploring the role of large language models in the scientific method: from hypothesis to discovery. npj Artificial Intelligence. 1, 14 (2025). https://doi.org/10.1038/s44387-025-00019-5
{Karpatne25} Anuj Karpatne, Aryan Deshwal, Xiaowei Jia, Wei Ding, Michael Steinbach, Aidong Zhang, and Vipin Kumar. AI-enabled scientific revolution in the age of generative AI: second NSF workshop report. npj Artificial Intelligence. 1, 18 (2025). https://doi.org/10.1038/s44387-025-00018-6
{Li25} Baoxue Li, and Chunhui Zhao. Self-reflection enhances large language models towards substantial academic response. npj Artificial Intelligence. 1, 42 (2025). https://doi.org/10.1038/s44387-025-00045-3
{Tang26} Zhisheng Tang, Ke Shen, and Mayank Kejriwa. An evaluation of estimative uncertainty in large language models. npj Complexity 3, 8 (2026). https://doi.org/10.1038/s44260-026-00070-6