Literature Review - CankayaUniversity/ceng-407-408-2023-2024-Insightio GitHub Wiki

1. Literature Review

This file includes the text for the literature review paper that is required as per CENG 407 project development and documentation guidelines. Each section of the paper is committed by their respective authors.

References for citations follow the guidelines set forth in APA 7.

Authors:

Eda Nur Altunok Ekin Nalçacı Günalp Güngör Utku Özbek Zeynep Polat

Özet

Bu araştırma, perakende sektöründen acil tahliye senaryolarına kadar farklı alanlarda, insan varlığını etkili bir şekilde kontrol etmek amacıyla kullanılan insan sayım sistemlerinin giderek artan önemini ele almaktadır. Bu makale, kalabalık alanlarda bireyleri doğru bir şekilde belirlemenin ve nicelendirmenin zorluklarına odaklanmakta ve bunun için aydınlatma, hava koşulları ve görüntü kalitesi gibi faktörleri dikkate almaktadır. Ayrıca, daha geleneksel yoğunluk regresyon tekniklerini yerine koyan daha sofistike bilgisayar görüsü tabanlı stratejilerin gelişimini göstermektedir. Farklı çalışma hedeflerinde homojen değerlendirmeler sağlamak için standartlaştırılmış değerlendirme yöntemlerinin önemini vurgulamaktadır. Bu analiz, kalabalık sayımın çağdaş operasyonel ve güvenlik önlemlerinde oynadığı hayati rolü vurgulayarak, bu alandaki başarıları ve zorlukları kapsamlı bir şekilde sunmaktadır.

Abstract

This study of research investigates the expanding importance of people counting systems in efficiently controlling human presence in a variety of contexts, from retail to emergency evacuation situations. The paper discusses the challenges of precisely identifying and quantifying individuals in crowded locations, taking into account elements like lighting, weather, and image quality. It also illustrates the development of computer vision-based strategies, showing how more sophisticated approaches replaced more conventional density regression techniques. In order to provide uniform assessments across various study objectives, the analysis underlines the need for standardized evaluation techniques. In summary, this analysis highlights the critical role that crowd counting plays in contemporary operational and safety measures by offering a comprehensive overview of the field's accomplishments and obstacles.

1. Introduction

In today’s global landscape, grappling with the complexities of managing human presence has become increasingly challenging due to the ever-expanding population. Businesses, public spaces, and communities seeking to efficiently navigate and analyze crowded environments require robust tools for data acquisition and manipulation. The global market for people counting systems reached a valuation of 969.9 million USD in 2021, with a projected annual growth rate of 12.2% from 2022 to 2030 (People Counting System Market Size, Share & Trends Analysis Report by End-use (BFSI, Corporate), by Offering (Hardware, Software), by Mounting Platform (Ceiling, Wall), by Type, by Technology, and Segment Forecasts, 2022 - 2030, n.d.).

Importantly, people counting methods are essential in many industries, retail being one of the best examples. The transition from manual counting to sophisticated computer solutions has been crucial for retail organizations seeking to modernize operations and improve consumer experiences. According to Chakraborty (2023), these systems provide real-time analysis, identify high-traffic areas, optimize staffing, examine consumer behavior, and facilitate forecasting.

Real-time tracking also extends its reach to emergency evacuations, where human-centered sensing proves invaluable for monitoring crowd behavior in real-time, ultimately bolstering incident management strategies (Crowd Models for Emergency Evacuation: A Review Targeting Human-Centered Sensing, 2013).

These statistics underscore the profound impact of people counting technology across diverse industries. Evidently, businesses and organizations are increasingly embracing this technology to enhance operational efficiency, elevate security measures, and foster data-informed decision-making processes.

2. Main Findings

The literature review findings are bifurcated into two distinct parts. The first part scrutinizes the challenges entailed in accurately discerning individuals within crowded spaces. This encompasses considerations such as variations in lighting, weather, and image quality, in addition to complexities arising from overlapping individuals and occluded areas.

The following section then explores the development of computer vision-based approaches. The noteworthy advancements in computational approaches and their usefulness in crowd counting are emphasized. This includes talking about methods such as density regression and dense detection and how they can be used to improve the accuracy and efficiency of this field. All of these studies provide an extensive understanding of the basic concepts and recent developments in crowd counting.

2.1. Challenges and Difficulties In Human Detection and Counting

Accurately identifying and counting people in congested spaces is a complex task with many moving parts. External factors that introduce unpredictability and can have a major impact on the accuracy of human detection include variations in lighting, weather, and image quality (Raghavachari et al., 2015). Accurate counting is made more difficult by overlapping people and occluded areas, necessitating complex discerning techniques (Sam et al., 2020).

Public spaces, characterized by dynamic crowd densities, pose another layer of complexity. Maintaining consistent accuracy becomes a formidable challenge as these spaces transition from peak to off-peak crowd conditions (Raghavachari et al., 2015). Additionally, the choice of camera orientation is critical in determining detection efficacy, with different angles yielding distinct perspectives on human objects, each presenting unique challenges (Sam et al., 2020).

Unfavorable circumstances, such as poor lighting and camera glare, can cause problems like visual blurriness, which makes it harder for people to perceive objects of interest (Bhangale et al., 2020). The already difficult task of precise counting is made more problematic by these environmental influences. Researchers use sophisticated computer vision techniques and machine learning algorithms in response. For example, deep learning models have the potential to improve human detection accuracy in a variety of circumstances (Bhangale et al., 2020). To handle particular issues like occlusion and different crowd sizes, complementary techniques are also suggested. These include trajectory clustering, feature-based regression, and individual pedestrian detection (Huang & Chung, 2004). These group efforts serve as an example of the continuous struggle to overcome the difficulties associated with human identification and counting in various dynamic environments.

2.2. Computer Vision Based Approaches To Crowd Counting

Crowd counting has witnessed a notable evolution in computer vision algorithms. Initially, density regression techniques were employed, estimating crowd density maps to infer counts (Raghavachari et al., 2015). However, this approach faced limitations, particularly in scenarios with dense crowds (Sam et al., 2020). To address this, recent advancements have shifted towards dense detection methods, which present more practical and accurate solutions (Sam et al., 2020). Notably, deep learning models, including Convolutional Neural Networks (CNNs) like VGG and ResNet, have played a crucial role in this transition.

According to Bhangale et al. (2020), these models excel in feature extraction and pattern recognition, rendering them well-suited for crowd counting tasks. Moreover, they have been adeptly adapted to handle the intricacies of crowd analysis, effectively capturing complex spatial arrangements and density variations (Sam et al., 2020; Sjöberg & Hyberg, 2023; Hussain, 2023). This adaptability showcases the versatility of CNNs in addressing the challenges posed by dynamic crowd environments.

In addition to CNNs, YOLO (You Only Look Once) models have emerged as powerful tools in object detection, including humans, and have been particularly effective in crowd counting tasks (Hussain, 2023). These models utilize a single neural network to simultaneously predict bounding boxes and class probabilities for multiple objects in a single pass. This makes them well-suited for real-time processing and applications like object detection and tracking in video streams. YOLO models, including the latest YOLO-v8 release, have demonstrated high-classification performance and fast detection capabilities, making them increasingly relevant in industrial settings (Hussain, 2023).

Additionally, techniques like multi-scale analysis have proved indispensable, given that crowd images often encompass individuals of varying sizes (Huang & Chung, 2004; Hussain, 2023). Methods such as image pyramid decomposition and feature pyramids facilitate the analysis of images at different resolutions. This approach ensures that individuals are detected and counted irrespective of their scale, thereby bolstering the overall robustness of the counting system (Huang & Chung, 2004; Hussain, 2023). This emphasis on multi-scale analysis underscores its pivotal role in accurately estimating crowd counts, especially in scenarios with diverse crowd compositions.

Furthermore, the integration of detection and tracking systems has demonstrated remarkable efficacy in dynamic environments. As suggested by Huang and Chung (2004) and Sjöberg and Hyberg (2023), detection algorithms identify individuals, while tracking algorithms ensure object continuity across frames. This synergistic approach significantly enhances crowd counting reliability, particularly in situations characterized by occlusions and dynamic movement (Sjöberg & Hyberg, 2023; Hussain, 2023). The fusion of detection and tracking mechanisms addresses challenges associated with the inherent movement and occlusions within crowd scenes.

Some other methods have also significantly contributed to the evolution of crowd counting algorithms. For instance, CSRNet, a CNN tailored for congested scene recognition, has emerged as a prominent tool in crowd counting applications (Bhangale et al., 2020). This model employs a multi-column architecture in conjunction with top-down modulation, allowing for the precise detection and localization of individuals within densely populated scenes. Notably, CSRNet's proficiency in generating accurate crowd density maps from point annotations underscores its robustness in crowd counting scenarios.

In addition to CSRNet, the integration of advanced image descriptors like Perspective Invariant Histograms of Oriented Gradients (HOGp) has revolutionized crowd counting, eliminating the need for camera calibration (Reis, 2014). This innovative technique not only addresses privacy concerns but also proves particularly adept in systems with limited resources. Consequently, it stands as a valuable tool in urban planning and crowd control applications. By emphasizing these image descriptors, this approach offers an alternative avenue to achieve accurate crowd counting.

Furthermore, the application of Hidden Markov Models (HMM) shows promise in recognizing various human dynamics, including walking and sitting (Huang & Chung, 2004). HMMs provide a robust framework for understanding temporal dependencies within crowd behavior, introducing an additional layer of sophistication to crowd counting algorithms (Huang & Chung, 2004). The fusion of HMMs with other detection and tracking methodologies presents a holistic approach to crowd analysis.

In summary, the landscape of crowd counting has witnessed a transition from traditional density regression techniques to the adoption of dense detection strategies. While deep learning models like VGG, ResNet, and YOLO have played pivotal roles, it is imperative to recognize the substantial contributions of other noteworthy models like CSRNet, innovative techniques like HOGp descriptors, and the integration of HMMs. These diverse methods collectively underscore the versatility and adaptability of computer vision in achieving accurate crowd estimates across various domains, ranging from smart building management to urban planning and public safety.

3. Evaluating Model Accuracies

Precisely evaluating crowd counting models' performance is a complex process that depends on a number of variables and methodological techniques. The assessment procedure depends on the particular approach taken. When it comes to urban areas that use the HOGp approach (Reis, 2014), comparing models that integrate HOGp with those that do not provides a solid framework for measuring accuracy when it comes to counting disparities. This method clearly illustrates the usefulness and effectiveness of the idea by reducing counting errors without requiring camera calibration.

For evaluating the accuracy of YOLOv8 in pedestrian detection (Sjöberg & Hyberg, 2023), mean Average Precision at 50 (mAP50) scores serve as a pivotal metric, illuminating the nuanced interplay between accuracy and model performance across diverse lighting conditions. This assessment provides critical insights into the model's viability for applications such as self-driving vehicles, while also highlighting areas for refinement, particularly under varying illumination scenarios.

The contour-based matching method for human detection (Rahman, 2017) relies on metrics like average accuracy and precision in still images to gauge accuracy. This approach acknowledges the trade-offs inherent to still-image analysis, especially in scenarios with overlapping human entities, emphasizing the method's efficacy within these constraints.

In the context of agricultural pest and disease detection (Zhang, Ding, Li, & Li, 2023), the focus of accuracy evaluation lies in model performance metrics encompassing detection accuracy and recognition. This assessment highlights improvements in model precision, indicating concrete strides in the field of agricultural pest and disease detection.

Metrics such as mean absolute error (MAE) and mean square error (MSE) are used by real-time crowd counting systems (Bhangale et al., 2020) to evaluate accuracy. This approach highlights the model's higher accuracy compared to traditional methods, confirming its usefulness for crowd counting applications in various real-time settings.

For YOLO models (Mokayed et al., 2022), accuracy percentages stand as critical metrics in the evaluation process, drawing attention to the intricate balance between processing speed and model accuracy. This assessment paradigm highlights the trade-offs that must be considered when deploying YOLO models in applications where both speed and accuracy are paramount.

3.1. Unified Evaluation Protocol

To guarantee consistent model assessment across various situations, datasets, and research goals, a unified evaluation protocol is necessary. But its foundation is hindered by differences in datasets, contextual differences, and the quick development of approaches. Because of these elements, developing uniform evaluation frameworks is both necessary and challenging. The scientific community is working to create more standardized standards in response to this need.

The lack of a single, universally applicable evaluation process results from the many different uses of computer vision and machine learning, each with its own set of objectives for research. The creation of a uniform evaluation framework is further complicated by differences in dataset quantity, quality, and annotations. Because of these differences, developing an assessment process that stays up to date in the face of deep learning and computer vision techniques' rapid advancement is difficult.

Despite these challenges, there is a growing recognition within the research community of the necessity for unified evaluation protocols. This collective effort signifies a commitment to enhancing the rigor and comparability of evaluations in the field. As the field continues to advance, establishing standardized evaluation protocols remains a crucial area of focus to ensure impartial and consistent assessments across various research endeavors.

4. Deployment Challenges and Solutions in Dockerized Object Detection Systems

Docker plays a key role in enabling the uniform deployment of object detection systems across various computing environments. On the other hand, setting up a Docker environment that works with GPU-accelerated applications requires matching up compatible NVIDIA drivers and PyTorch frameworks with CUDA versions. The constant updates and compatibility problems that come with quickly developing technology make this task more difficult. Docker makes it easier to deploy applications with multiple components by offering environments that are isolated and consistent, which is essential for scalability and reproducibility.

The need for a structured deployment strategy is further highlighted by the complexity of troubleshooting issues like backend connectivity and CUDA initialization errors. In spite of these obstacles, the scientific community is beginning to acknowledge Docker's value in expediting the deployment process across heterogeneous infrastructures. This acknowledgment drives the ongoing effort to enhance deployment methodologies, ensuring efficient and reliable setup of computer vision systems.

5. Conclusion

The culmination of our research and literature review unequivocally underscores the substantial potential of pre-trained networks, particularly YOLO-based models, in pivotal domains of visual processing such as object detection, head counting, and human counting. These studies shed light on the instrumental role such technologies can play in real-world applications. Real-time object detection and counting stand out as paramount in domains like security, surveillance, and crowd management, where the algorithms driving object tracking and detection form the cornerstone of effectiveness.

Beyond mere head counting, density estimation presents an opportunity for a more nuanced analysis of visual data. However, for research in this domain to progress, reliance on larger, high-resolution datasets and standardized evaluation protocols is imperative to ensure both accuracy and practicality. The imperative for faster and more precise real-time counting methods in various applications remains a critical requirement. Studies comparing different algorithms illuminate the robustness and adaptability of methods like HoG in diverse scenarios. While these methods demonstrate promise, it is essential to address challenges, such as varying lighting conditions, through further research.

In summary, our reviews and research studies unveil numerous opportunities and potential for advancement in the fields of object detection, human counting, and density estimation. The application of standardized evaluation protocols and specialized algorithms serves as a catalyst for further progress in these domains. It is clear that these research contributions wield a significant impact on crucial areas like real-time applications, security, and crowd management. The ongoing collaborative efforts in the research community to establish unified evaluation protocols signal a commitment to enhance the rigor and comparability of evaluations in the field. As the field continues its forward momentum, the establishment of standardized evaluation protocols remains a pivotal area of focus, ensuring impartial and consistent assessments across diverse research endeavors.

References

Bhangale, U., Patil, S., Vishwanath, V., Thakker, P., Bansode, A., & Navandhar, D. (2020). Near Real-time Crowd Counting using Deep Learning Approach. Procedia Computer Science, 171, 770–779. https://doi.org/10.1016/j.procs.2020.04.084

Crowd Models for Emergency Evacuation: A review Targeting Human-Centered Sensing. (2013, January 1). IEEE Conference Publication | IEEE Xplore. https://ieeexplore.ieee.org/abstract/document/6479853

Chakraborty, A. (2023). How People Counting Systems Enhance Customer Experience In Retail Stores. LinkedIn. https://www.linkedin.com/pulse/how-people-counting-systems-enhance-customer-retail-chakraborty/

Huang, C., & Chung, C. (2004). A Real-Time Model-Based human motion tracking and analysis for Human-Computer Interface systems. EURASIP Journal on Advances in Signal Processing, 2004(11). https://doi.org/10.1155/s1110865704401206

Hussain, M. (2023). YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines, 11(7), 677. https://doi.org/10.3390/machines11070677

People Counting System Market Size, Share & Trends Analysis Report By End-use (BFSI, Corporate), By Offering (Hardware, Software), By Mounting Platform (Ceiling, Wall), By Type, By Technology, And Segment Forecasts, 2022 - 2030. (n.d.). https://www.grandviewresearch.com/industry-analysis/people-counting-system-market-report

Raghavachari, C., Aparna, V., Chithira, S., & Balasubramanian, V. (2015). A Comparative Study of Vision Based Human Detection Techniques in People Counting Applications. Procedia Computer Science, 58, 461–469. https://doi.org/10.1016/j.procs.2015.08.064

Rahman, M. A. (2017). Computer Vision Based Human Detection. International Journal of Engineering and Information Systems (IJEAIS), 1(5), 62-85. https://hal.science/hal 01571292/document

Reis J. (2014). Image descriptors for counting people with uncalibrated cameras [Master Thesis, University of Porto]. U. Porto. https://paginas.fe.up.pt/~ee08266/thesis.html

Sam, D. B., Peri, S. V., Sundararaman, M. N., Kamath, A., & Babu, R. V. (2020). Locate, size and count: Accurately resolving people in dense crowds via detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1. https://doi.org/10.1109/tpami.2020.2974830

Sjöberg, A. & Hyberg, J. (2023). Investigation regarding the Performance of YOLOv8 in Pedestrian Detection [Bachelor's Thesis, KTH Royal Institute Of Technology]. Diva Portal. https://kth.diva-portal.org/smash/get/diva2:1778368/FULLTEXT01.pdf