SRS Documentation - DrAlzahraniProjects/csusb_fall2024_cse6550_team4 GitHub Wiki

Software Requirements Specification

for

SE research paper chatbot

Prepared by

Name: Upputuri, Sai Hemanth([email protected])

Group Name: csusb_fall2024_cse6550_team4

Instructor: Dr. Alzahrani, Nabeel

Course: CSE 6550: Software Engineer Concepts Fall 2024


Table of Contents

  1. Introduction
    1.1 Purpose
    1.2 Document Conventions
    1.3 Intended Audience and Reading Suggestions
    1.4 Product Scope
    1.5 References
  2. Overall Description
    2.1 Product Perspective
    2.2 Product Features
    2.3 User Classes and Characteristics
    2.4 Operating Environment
    2.5 Design and Implementation Constraints
    2.6 User Documentation
    2.7 Assumptions and Dependencies
  3. External Interface Requirements
    3.1 User Interfaces
    3.2 Hardware Interfaces
    3.3 Software Interfaces
    3.4 Communication Interfaces
  4. System Features
    4.1 System Feature 1: Query Answering
    4.2 System Feature 2: Feedback Mechanism
  5. Other Nonfunctional Requirements
    5.1 Performance Requirements
    5.2 Safety Requirements
    5.3 Security Requirements
    5.4 Software Quality Attributes
  6. Other Requirements
  7. Appendices

1. Introduction

This Software Requirements Specification (SRS) document describes the requirements and specifications for the Paper Chatbot, a system designed to facilitate interaction with academic papers. The Paper Chatbot allows users to upload research papers and engage in a conversational interface to ask questions, receive summaries, and clarify complex topics within the document. The primary goal is to enable users, including students, researchers, and professionals, to efficiently extract information from lengthy or complex research papers without the need to read the entire document. This document will serve as a blueprint for the development team, detailing functional and non-functional requirements, system behavior between users and the chatbot.

1.1 Purpose

The purpose of this Software Requirements Specification (SRS) document is to outline the requirements for the development of the Paper Chatbot, an AI-powered tool designed to assist users in extracting, summarizing, and understanding content from academic and research papers. The chatbot will provide an interactive Q&A experience, enabling users to query specific sections of a paper and receive accurate, context-based answers. This SRS will serve as a foundational document to guide the development team, stakeholders, and end-users, ensuring all requirements are thoroughly captured and agreed upon before implementation begins.

1.2 Document Conventions

Throughout this document, we will use a set of conventions to ensure clarity and consistency. Terms such as "system" refer to the Paper Chatbot application, and "user" designates individuals interacting with the system. Requirements are numbered sequentially and prefixed with identifiers for example Functional Requirements (FR). Definitions and abbreviations of technical terms are provided in the glossary section to aid in comprehension.

1.3 Intended Audience and Reading Suggestions

This SRS is intended for several key audiences, including project stakeholders, developers, testers, and end-users interested in understanding the scope and functionalities of the Paper Chatbot. Stakeholders may focus on sections detailing the chatbot’s product scope and overall description. Developers and testers will find value in the system feature descriptions and the specific requirements. End-users, such as students or researchers, can refer to this document to understand the chatbot’s intended functionality and limitations.

1.4 Product Scope

The Paper Chatbot is designed to serve as an advanced assistant for navigating and comprehending research papers. By utilizing natural language processing and machine learning, the chatbot will enable users to interact with content in ways that simplify learning and support research activities. The chatbot will allow users to upload documents, generate summaries, and answer questions based on the content. The system is intended for deployment as a web-based platform accessible via standard web browsers. This scope also includes the Paper Chatbot's potential to integrate into learning management systems or research tools, making it a valuable addition for institutions and individuals alike.

1.5 References


2. Overall Description

2.1 Product Perspective

The Paper Chatbot functions within the broader context of digital tools for knowledge acquisition and academic research assistance. With advancements in artificial intelligence, it has become feasible to design systems that can process, summarize, and analyze documents effectively. The chatbot seeks to bridge the gap between users and dense academic texts, enabling users to ask questions in conversational language and receive targeted, concise answers. Compared to traditional text analysis tools, the Paper Chatbot offers an interactive approach, empowering users to query specific document sections, understand complex topics more easily, and extract actionable insights from lengthy papers.

2.2 Product Features

The Paper Chatbot will provide several primary functions to users. First, it allows users to upload and process documents, extracting and storing text for efficient querying. Following the upload, users can request a summary, which condenses the document's content into an easily digestible format. The core functionality lies in the chatbot’s ability to respond to user questions based on the document's contents, parsing user inputs and returning contextually relevant responses. Additionally, the chatbot will provide in-text references or citations within responses, ensuring users understand the information source.

2.3 User Classes and Characteristics

The chatbot is designed with several user types in mind. Students represent a primary user group, typically seeking assistance in understanding complex academic content for their studies. Researchers form another significant user base, as they often need to identify relevant information quickly within extensive research papers. Lastly, Professionals who rely on technical literature in fields such as engineering, law, or medicine may use the chatbot to obtain quick summaries and extract specific insights without needing to read through every detail manually. Each user class is characterized by a need for reliable, accurate, and clear information processing within the chatbot environment.

2.4 Operating Environment

The Paper Chatbot will operate as a web-based application accessible on various devices, including desktops, laptops, and tablets. The application is optimized for compatibility with major operating systems, including Windows, macOS, and Linux. The Paper Chatbot system can be hosted either on university servers or in the cloud, depending on how it's set up. This setup is built to handle a large number of users, allowing up to 100 people to use it at the same time, while keeping response times under 3 seconds.

Using cloud-based hosting makes it easier to update and maintain the chatbot without needing to take it offline, so users always have access to the latest version. The system can also connect with the university’s login system, so only authorized users, like students and teachers, are able to access the chatbot’s features.

2.5 Design and Implementation Constraints

The SE paper chatbot is designed and set up with specific requirements to ensure it runs smoothly and meets high standards for performance and security. It uses several advanced AI tools, such as ChatGPT and LangChain, for natural language processing, and FAISS for quick document searches. The chatbot also includes Mistral 7b, a powerful language model that improves response quality. To manage the large datasets it uses, the system relies on NeMo Curator, and NeMo Guardrails is in place to ensure all chatbot responses meet safety and appropriateness standards.

To make it easy to scale, the chatbot runs inside Docker containers, which package all necessary files, like the main application, configuration files, and dependencies, to keep everything organized and easily deployable. The chatbot is available on port 5004 for Team 4's deployment and can support up to 100 users at once, with response times under 3 seconds. For security, there’s Denial-of-Service (DoS) protection, which limits users to 10 queries per minute to prevent overloading the system.

The chatbot leverages advanced Natural Language Processing (NLP) capabilities using AI models such as ChatGPT and LangChain to effectively handle natural language queries. For efficient document retrieval, it employs FAISS. The language model, Mistral 7b, enhances the quality of the chatbot's responses, while NeMo Curator is utilized for managing the large datasets that underpin document retrieval. To ensure safety and compliance, NeMo Guardrails enforces interaction and content guidelines.

The chatbot is deployed using Docker containers, incorporating components like app.py, Dockerfile, and requirements.txt to maintain consistent performance across environments. Designed for scalability, the system supports up to 100 concurrent users and delivers responses in under 3 seconds, with Team 4's deployment hosted on port 5004. To prevent denial-of-service (DoS) attacks, users are restricted to 10 queries per minute.

Development, documentation, and collaboration are facilitated through GitHub features, including Pages, Wiki, Copilot, Projects, Issue Tracking, and Pull Requests. These tools streamline workflows and support the project's overall progress.

2.6 User Documentation

The SE paper chatbot project offers detailed documentation to help both developers and users understand and work with the system. The README file covers the project’s goals, how to install it, and how to deploy the chatbot using Docker. It also gives a quick overview of the main technologies the chatbot uses, like ChatGPT, FAISS, LangChain, and Mistral 7b.

There’s also a Wiki, which serves as a central place for more detailed information, including the Software Requirements Specification (SRS), architecture design, and a step-by-step setup guide. Additionally, a Jupyter Notebook is included, providing an interactive way for users and developers to explore the chatbot’s features. The notebook explains key code elements, offers tutorials, and demonstrates how the chatbot processes queries and retrieves information, which is especially helpful for users who want a deeper technical understanding of the system.

2.7 Assumptions and Dependencies

The successful implementation of the chatbot assumes that the documents uploaded are in English and formatted correctly for text extraction. Dependencies include the availability and compatibility of LLM (Large language model) frameworks and libraries for PDF processing. Additionally, we assume that users have access to reliable internet to interact with the online platform effectively.


3. External Interface Requirements

3.1 User Interfaces

The chatbot interface consists of a simple chat window where users can type questions and receive answers, a file upload button for document submission, and a response window displaying answers. The user interface should be intuitive, with minimal input fields and buttons, ensuring that users of all technical skill levels can navigate the application easily.

The SE paper chatbot interface is simple and easy to use, making it accessible to people with different levels of technical experience. Built with StreamLit, the interface lets users type in questions, get quick answers, and use other features like giving feedback on how accurate the responses.

Fig. 1: Low-Fidelity Diagram for SE Research Paper Chatbot Interface

Srs ss

Along with handling queries, the user interface shows feedback statistics. These stats include how many questions have been asked, how accurate the chatbot's responses are, and the common keywords or topics users frequently ask about. There is also a performance dashboard that gives an overview of how well the chatbot is doing, displaying metrics like response times and user satisfaction levels. This dashboard is updated every day, so administrators always have the latest information about the system’s performance

3.2 Hardware Interfaces

The SE research paper Chatbot doesn't need any special hardware, just standard web-based systems. It can run on university servers or cloud services, using Docker containers to manage its environment. These servers should have enough memory and processing power to support multiple users at the same time and provide real-time responses. They need to be able to handle up to 100 users at once and ensure response times are under 3 seconds. The server setup must also support the chatbot's ability to scale, meaning it can manage changes in user traffic, especially during busy times like exams or assignment deadlines. Additionally, the hardware must follow the university's data protection policies to securely handle and store any user data collected during interactions with the chatbot

3.3 Software Interfaces

The chatbot works with several software components to provide a smooth and efficient user experience. It uses LangChain for natural language processing and FAISS for quick document retrieval, allowing the system to respond to user questions rapidly. These backend systems are essential for managing the chatbot’s main functions effectively. Docker is used to package all parts of the system into a lightweight container, making it easy to deploy the chatbot in different environments while ensuring consistent performance.

The chatbot communicates with users using HTTP/HTTPS protocols, and it supports real-time interaction through technologies like WebSockets. This setup allows for fast, interactive communication, ensuring that responses are delivered within the required 3 seconds. Security tools such as Burp Suite Community Edition (BSCE) and Zed Attack Proxy (ZAP) are used to find and fix any vulnerabilities in the system, while Docker Scout helps identify and address issues within the container environment. This combination of tools helps keep the chatbot secure, responsive, and able to handle increased user demand.

3.4 Communication Interfaces

The chatbot communicates with users using standard HTTP/HTTPS protocols. It will have an API to manage user requests and send back responses. To ensure quick, interactive replies to user questions, the chatbot will also use WebSockets or similar real-time communication methods.Additionally, the system will ensure that communication is secure by following encryption standards to protect user data and maintain privacy. The chatbot will interact with backend services like FAISS and LangChain through internal APIs designed for fast and efficient data retrieval and processing.


4. System Features

4.1 System Feature 1: Query Answering

The main feature of the SE Research Paper Chatbot is its ability to answer user questions based on the research papers. When a user asks a question, the chatbot finds the relevant sections from the textbook and gives a detailed answer. Along with the answer, the chatbot also provides the page number and a direct link to the corresponding page in the Research paper's PDF version. This allows users to check the response by accessing the original content, which boosts the chatbot's credibility and educational value.

4.2 System Feature 2: Feedback Mechanism

The chatbot has a feedback system that lets users rate responses as either “right” or “wrong.” This feedback is recorded and stored for future analysis, helping to improve the chatbot's accuracy over time. Users can also see aggregated feedback statistics through the user interface. These statistics offer insights into how well the chatbot is performing, including accuracy rates, user satisfaction levels, and common topics that users marked as correct or incorrect. By showing this feedback to users, the chatbot creates a transparent relationship between the system and its users, encouraging them to participate and trust in the chatbot's abilities.

Fig.2: Architecture Diagram

arc


5. Other Nonfunctional Requirements

5.1 Performance Requirements

Performance is very important for the SE Textbook Chatbot, especially since it will be used in academic settings. The system needs to support up to 100 users at the same time, with each user possibly submitting multiple questions in a short period. To keep things running smoothly, the chatbot is optimized to provide answers within 3 seconds, so users won’t experience delays, even during busy times. The chatbot’s backend is designed to scale horizontally, which means it can add more resources as needed when user demand increases. This helps ensure that the system stays responsive and reliable, even when there are a lot of users. Regular performance testing will be done to make sure the system can meet these standards, especially during critical times like exams.

5.2 Safety Requirements

The SE Research Paper Chatbot places a strong emphasis on safety to ensure it operates securely and reliably. First, it must follow the data protection policies set by the CSE department and the university. This means handling sensitive user information, such as query history and feedback data, securely. Data transmission will be encrypted, and personal information will only be accessible to authorized personnel.

To protect against Denial-of-Service (DoS) attacks that could overload the system, users are limited to a maximum of 10 queries per minute. If they exceed this limit, they will receive a message asking them to wait for 3 minutes before submitting more queries. This helps safeguard the system’s resources and prevents both intentional and unintentional overloads. In addition to DoS protection, the chatbot will use NeMo Guardrails, a tool designed to help AI applications follow specific safety and interaction guidelines. This will ensure that the chatbot stays within set boundaries and does not generate inappropriate or unsafe responses. NeMo Guardrails will enforce content moderation rules, ensuring the chatbot provides responses that meet educational and institutional standards. This extra layer of protection is crucial for maintaining a secure and trustworthy system.

Furthermore, the chatbot will have regular vulnerability scans using tools like Burp Suite Community Edition (BSCE), Zed Attack Proxy (ZAP), and Docker Scout. These tools will help find and fix any potential security weaknesses. Any issues found during these scans will be addressed promptly, and reports generated will help improve the system’s security over time. Finally, the system will have a solid backup and recovery plan to prevent data loss in case of system failures. This plan ensures that the chatbot remains available to users with minimal downtime and can recover quickly from unexpected crashes or errors.

5.3 Security Requirements

Security is a top priority for the SE Research Paper Chatbot, especially because it operates in an academic setting where sensitive user data is involved. The system must follow strict security protocols to guard against potential vulnerabilities. Regular security audits are carried out using tools like Burp Suite Community Edition (BSCE), Zed Attack Proxy (ZAP), and Docker Scout to find and fix any weaknesses in the system. These tools help ensure that the chatbot’s containerized environment is secure and that any risks are addressed quickly.

Additionally, the chatbot uses encryption for all communications between users and the server, making sure that sensitive data is protected from unauthorized access. This is especially important for keeping student information safe and maintaining the privacy of user interactions. If there is a system failure, backup and recovery plans are in place to minimize data loss and restore the system to full functionality as quickly as possible.

5.4 Software Quality Attributes

The SE Research Paper Chatbot is built to be highly reliable, easy to maintain, and scalable. The system aims to have 99.9% uptime during the academic year, which means it will always be available for students and teachers when they need it. There are backup and recovery systems in place to protect data in case of a system failure, reducing downtime and minimizing data loss.

The chatbot is also designed to be modular, allowing developers to update individual components without affecting the entire system. This modular approach helps the chatbot adapt and improve over time based on user feedback and new features. Lastly, the system's scalability ensures it can handle more users as more people start using the chatbot.


6. Other Requirements

In addition to the functional and non-functional requirements mentioned earlier, the SE Research Paper Chatbot must follow university guidelines for using third-party libraries and services. The chatbot uses GitHub for version control, which means all code changes are tracked, reviewed, and approved through pull requests. It also relies on cloud infrastructure to host its services, which helps with scalability and cost-effectiveness.

Any updates or changes to the chatbot need to be thoroughly tested in a staging environment before being deployed to the live system. This helps ensure that the system stays stable and that any issues are found and fixed before they impact end-users. Additionally, the system should provide administrators with detailed usage reports so they can monitor performance, user engagement, and feedback trends.


7. Appendices

Appendix A: Glossary

Retrieval-Augmented Generation (RAG) combines document retrieval with AI-generated responses, enabling the chatbot to answer queries by utilizing existing documents. LangChain, a robust framework for building applications powered by large language models such as ChatGPT, plays a pivotal role in enhancing the chatbot’s natural language processing capabilities.

The deployment and scalability of the chatbot are streamlined using Docker, a platform that packages applications and their dependencies into portable containers. The frontend interface is built with Streamlit, a Python-based framework ideal for creating interactive web applications, particularly in data science and machine learning

To ensure safe and controlled interactions, the chatbot leverages NeMo Guardrails, a tool designed to enforce predefined safety and interaction rules. Developers document and test the chatbot’s functionalities using Jupyter Notebook, an interactive development environment that supports live code, visualizations, and collaborative documentation.

Security is a priority, with Denial-of-Service (DoS) Protection implemented to prevent server overload by limiting the number of queries a user can submit within a specific timeframe. The chatbot is developed in Python, a versatile programming language known for its simplicity and extensive support for AI, machine learning, and web development.

Visual Studio Code serves as the Integrated Development Environment (IDE) of choice, offering features like debugging, version control, and code completion to streamline development. GitHub acts as the central platform for version control, collaboration, and project management, hosting the project’s source code, documentation, and resources such as issues, pull requests, and workflows.

Appendix B: Analysis Models

The SE Research Paper Chatbot is designed with a modular system, ensuring that different components work together smoothly to provide accurate and quick responses to user questions. It uses a Retrieval-Augmented Generation (RAG) architecture, which combines document retrieval with AI-generated responses.

When a user submits a question through the chatbot interface, the backend processes the input using LangChain, which helps understand the query and find the right response. The system then uses FAISS to pull relevant sections from the e-version of "Software Engineering: A Practitioner’s Approach." After gathering the needed information, a generative AI model like ChatGPT creates a detailed answer, which is presented to the user along with citations and highlighted sections from the research papers.

Throughout this process, NeMo Guardrails ensures that the chatbot follows predefined safety and content guidelines, providing accurate and appropriate answers. There’s also a feedback mechanism that lets users rate how correct the responses are, and all feedback is collected and stored for future improvements. Various performance metrics, such as response times and accuracy rates, are continuously monitored to make sure the chatbot stays responsive and efficient, even when user demand varies.

Security is a key part of the chatbot’s design. To prevent overload and ensure fair use of resources, a DoS Protection system limits users to 10 queries per minute. Additionally, vulnerability scanning tools like Burp Suite Community Edition (BSCE), Zed Attack Proxy (ZAP), and Docker Scout are used to identify and fix security risks. These tools help ensure that the system operates securely and complies with the university's data protection policies.

Appendix C: Project Data

Key metrics tracked include:

The system includes several key performance and functionality metrics to ensure optimal operation and user satisfaction. chatbotResponseTime measures the chatbot's response speed, with a target of less than 3 seconds to maintain a seamless user experience. concurrent users monitors system load, supporting up to 100 users simultaneously without compromising performance.

To gather insights into user satisfaction, feedbackRating collects feedback on response quality. Additionally, DoSProtection safeguards against system overload by limiting the number of queries a user can submit in a short period.

For research-focused queries, citation list maintains a repository of research paper citations retrieved for each query. Lastly, user engagement metrics collects data on user interactions, including the number of questions asked, satisfaction ratings, and response accuracy, enabling analysis and continuous improvement of the chatbot’s performance.