ChatNOC - leofurtadonyc/Network-Automation GitHub Wiki

Overview

ChatNOC: Network Automation Chatbot & Orchestrator

Author: Leonardo Furtado

Repository: Network-Automation

ChatNOC is an agent designed to answer common Network Operations questions about its environment, including the status of the network control plane, reachability testing (ping, traceroute), device health monitoring, customer service health monitoring, and routing analysis through RIB and BGP Local-RIB tablesβ€”along with other capabilities I plan to add. Built with Python, ChatNOC leverages an LLM (via Ollama) for query interpretation and Netmiko for SSH connectivity to execute device commands. It supports multiple operation modes, including live mode, demo mode (using pre-saved outputs), and general mode (for open-ended networking questions).

Features

Natural Language Interface - Utilizing an LLM, transform user-friendly queries into:

  • Device-specific commands for performing tasks on live devices.
  • Diagnosis and troubleshooting of various network and customer-impacting services.
  • Health checks against established baselines to promptly address root causes.
  • As the project progresses, additional features will be introduced!

Modular Architecture

The project is refactored into separate packages for core CLI functionality, command mapping, device inventory, executors, LLM integration, explanations, health checks, configuration, and authentication.

Multiple Modes

  • Normal Mode: Executes commands live on devices.
  • Demo Mode: Uses pre-saved command outputs for safe testing.
  • General Mode: Answers open-ended networking questions without executing device commands.

Device Inventory

Loads device definitions (e.g., IP addresses, device types, SSH ports) from a YAML file.

Extensible Command Mapper

Easily add new commands and their corresponding explanations for various platforms (Cisco, Juniper, Huawei, Nokia).

Health Check Capability

Perform automated health checks based on baseline configuration files.

Authentication Options

Supports both plain-text and secure (bcrypt-hashed) authentication with credentials configurable per device.

History and Help

Built-in commands to display the last 50 queries and a detailed help file with usage instructions

Installation

Clone the Repository

git clone https://github.com/leofurtadonyc/Network-Automation.git
cd Network-Automation/chatnoc

Create and Activate a Virtual Environment:

python3 -m venv venv
source venv/bin/activate

Install Dependencies:

pip install -r requirements.txt

Set Up Configuration Files:

  • config/config.yaml: Contains LLM settings and global authentication defaults.
  • devices.yaml: Contains device inventory data.
  • baseline/ folder: Contains YAML files for health-check baselines and approved topics.
  • auth/credentials.json: For secure authentication (generated via the provided auth/ssh-account.py script).

Directory Structure

chatnoc/
β”œβ”€β”€ chatnoc.py                       # Main entry point; starts the CLI loop.
β”œβ”€β”€ __init__.py                 
β”œβ”€β”€ core/                            # Core CLI functionality.
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ cli.py                       # Main CLI loop and mode management.
β”‚   └── history.py                   # History display functionality.
β”œβ”€β”€ commands/                        # Command mapping templates.
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── command_mapper.py            # Defines COMMAND_MAP and get_command().
β”œβ”€β”€ executors/                       # Device connection and command execution.
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── netmiko_executor.py          # Uses Netmiko for SSH connections.
β”œβ”€β”€ inventory/                       # Device inventory management.
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── devices_inventory.py         # Loads devices.yaml and defines Device and DeviceInventory.
β”œβ”€β”€ llm/                             # LLM interface.
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── llm_interface.py             # Provides get_llm() for interfacing with Ollama.
β”œβ”€β”€ explanations/                    # Explanation functions for commands.
β”‚   β”œβ”€β”€ __init__.py                  # Re-exports all explanation functions.
β”‚   β”œβ”€β”€ ospf_explanation.py
β”‚   β”œβ”€β”€ bgp_explanation.py
β”‚   β”œβ”€β”€ ldp_explanation.py
β”‚   β”œβ”€β”€ route_explanation.py
β”‚   β”œβ”€β”€ general_explanation.py
β”‚   β”œβ”€β”€ mpls_interfaces_explanation.py
β”‚   β”œβ”€β”€ mpls_forwarding_explanation.py
β”‚   β”œβ”€β”€ ospf_database_explanation.py
β”‚   β”œβ”€β”€ ip_explicit_paths_explanation.py
β”‚   β”œβ”€β”€ l2vpn_atom_vc_explanation.py
β”‚   β”œβ”€β”€ mpls_traffic_eng_explanation.py
β”‚   β”œβ”€β”€ version_explanation.py
β”‚   β”œβ”€β”€ bgp_vpnv4_all_explanation.py
β”‚   └── bgp_vpnv4_vrf_explanation.py
β”œβ”€β”€ healthcheck/                     # Health-check logic.
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── healthcheck.py               # Contains health check functions.
β”œβ”€β”€ config/                          # Configuration loader.
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── config_loader.py             # Loads config.yaml.
β”œβ”€β”€ auth/                            # Authentication utilities.
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ auth_manager.py              # Loads credentials from credentials.json.
β”‚   └── ssh-account.py               # Script to add secure user credentials.
β”‚   └── jumpserver_credentials.json  # Script to add secure jumpserver credentials (not used at this moment...)
β”œβ”€β”€ baseline/                        # Baseline data.
β”‚   β”œβ”€β”€ healthcheck_baseline.yaml
β”‚   └── index.yaml                   # Approved topics for general mode.
β”œβ”€β”€ demo/                            # Demo outputs for dry-run mode.
β”‚   β”œβ”€β”€ p1/
β”‚   β”œβ”€β”€ p2/
β”‚   └── ...                          # Additional device folders.
β”œβ”€β”€ help.txt                         # Detailed usage instructions.
└── requirements.txt                 # Python dependencies.

Configuration

config/config.yaml Contains settings for Ollama (LLM), global SSH defaults, and authentication mode.

# config/config.yaml
ollama:
  host: "192.168.0.213"
  port: 11434

ssh_port: 22

# authentication:
#   type: "plain-text"   # Options: "plain-text" or "secure"
#   username: "operador"
#   password: "Operador"

authentication:
  type: "secure" # Use secure authentication
  credentials_file: "credentials.json"  # File that holds hashed credentials

preferred_language: "en"  # e.g., "en" for English, "pt" for Portuguese, etc.

# Jumpserver settings
jumpserver:
  enabled: true
  host: "192.168.0.108"
  port: 22
  credentials_file: "jumpserver_credentials.json"
  conn_timeout: 10
  banner_timeout: 10
  extra_options:
    cisco: "-o KexAlgorithms=diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1 -o HostKeyAlgorithms=+ssh-rsa"
    juniper: "-o KexAlgorithms=curve25519-sha256,[email protected],ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1 -o HostKeyAlgorithms=+ssh-rsa"
    nokia: "-o KexAlgorithms=diffie-hellman-group14-sha1 -o HostKeyAlgorithms=+ssh-rsa"

# Logging settings for debug purposes
debug: false
# ... additional devices ...

auth/credentials.json (Generated by running auth/ssh-account.py.) Stores secure credentials.

baseline/ Contains YAML files for health-check baselines and approved topics.

Usage

Start ChatNOC:

(network-automation-venv) operador@eve-ng:~/Network-Automation/chatnoc-v2# python chatnoc.py 
  ____ _           _   _   _  ___   ____ 
 / ___| |__   __ _| |_| \ | |/ _ \ / ___|
| |   | '_ \ / _` | __|  \| | | | | |    
| |___| | | | (_| | |_| |\  | |_| | |___ 
 \____|_| |_|\__,_|\__|_| \_|\___/ \____|
                                         

Welcome to ChatNOC interactive shell V2025.1
By Leonardo Furtado - https://github.com/leofurtadonyc

I am an agent designed to answer questions about your network!
Type 'help' for usage instructions.
Type 'history' to display the last 50 queries.
Type 'demo' to enter demo mode (dry-run with pre-saved outputs).
Type 'general' to enter general mode for open-ended networking topics.
In demo or general mode, type 'exit demo' or 'exit general' to return to normal mode.
In normal mode, 'exit' or 'quit' will terminate the program.

ChatNOC >

Commands

  • Normal queries (e.g., "Can device P1 ping 100.65.255.14 using its loopback0 as the source IP?")
  • Healthchecks (e.g., "Healthcheck P1")
  • Demo Mode: type demo to use pre-saved outputs.
  • General Mode: type general for open-ended networking topics.
  • Other utility commands: help, history, approved topics. Exiting:
  • In normal mode: type exit or quit to terminate.
  • In demo or general mode: type exit demo or exit general to return to normal mode.
ChatNOC > How many OSPF adjacencies does P1 have?

Executing on p1 (192.168.255.1): show ip ospf neighbor | include FULL

Device Output:
100.65.255.2      0   FULL/  -        00:00:31    100.65.0.18     Ethernet0/3
100.65.255.12     0   FULL/  -        00:00:34    100.65.0.9      Ethernet0/2
100.65.255.3      0   FULL/  -        00:00:35    100.65.0.22     Ethernet0/1
100.65.255.11     0   FULL/  -        00:00:36    100.65.0.1      Ethernet0/0

------------------------------

Command issued:
show ip ospf neighbor | include FULL

Explanation:
This command displays OSPF neighbor adjacencies, showing which neighbors have reached FULL state.
It is used to verify that OSPF routing adjacencies are healthy and that the exchange of routing information is complete.

Course of action:
Actions:
  - Verify connectivity between OSPF neighbors if any expected neighbors are missing or not in FULL state.
  - Check OSPF configurations and interface settings.

Summary:
Input Query: How many OSPF adjacencies does P1 have?
Command Executed: show ip ospf neighbor | include FULL on device(s) p1
ChatNOC > can device p1 ping 100.65.255.14 using its loopback0 as the source IP?

Executing on p1 (192.168.255.1): ping 100.65.255.14 source loopback0

Device Output:
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 100.65.255.14, timeout is 2 seconds:
Packet sent with a source address of 100.65.255.1 
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/2 ms

------------------------------

Command issued:
ping 100.65.255.14 source loopback0

Explanation:
This command retrieves network information based on your query.
It provides raw output which you should review for any inconsistencies or unexpected results.

Course of action:
Actions:
  - Review the output carefully.
  - Verify the device's configuration and connectivity if the output does not match expectations.

Summary:
Input Query: can device p1 ping 100.65.255.14 using its loopback0 as the source IP?
Command Executed: ping 100.65.255.14 source loopback0 on device(s) p1
ChatNOC > I believe there is a problem in my network, can you check P1?

Performing health check on device p1 (192.168.255.1)...
Checking connectivity on p1...
Enter password for secure authentication: 

Health Check Results for p1:

== BACKBONE_INTERFACES ==
Confirmations:
  - Interface Ethernet 0/0 is healthy: up/up.
  - Interface Ethernet 0/1 is healthy: up/up.
  - Interface Ethernet 0/2 is healthy: up/up.
Issues:
  - Interface Ethernet 0/3: expected up/up but got administratively/down.

== OSPF_ADJACENCIES ==
Confirmations:
  - OSPF adjacency on Ethernet 0/0 is FULL.
  - OSPF adjacency on Ethernet 0/1 is FULL.
  - OSPF adjacency on Ethernet 0/2 is FULL.
Issues:
  - No OSPF neighbor entry found for interface Ethernet 0/3.

== LDP_SESSIONS ==
Confirmations:
  - LDP session for interface Ethernet 0/0 appears established.
  - LDP session for interface Ethernet 0/1 appears established.
  - LDP session for interface Ethernet 0/2 appears established.
  - Label bindings for 100.65.255.0/24 are present.
Issues:
  - LDP session for interface Ethernet 0/3 not found.

== BGP_SESSIONS ==
Confirmations:
  - BGP session with 100.65.255.3 is present.
  - BGP session with 100.65.255.4 is present.

--------------------------------------------------------------------------------

Direct Commands

ChatNOC lets you run commands directly on network devices. It displays the output, analyzes the results, and identifies potential issues with recommended solutions. To use this feature, simply type your command in quotes followed by the device name.

ChatNOC > run a "show mpls lsp" on pe1-junos

Device Output:
Ingress LSP: 2 sessions
To              From            State Rt P     ActivePath       LSPname
100.65.255.13   100.65.255.11   Up     0 *     ExpPath=>PE3-via-P2-P4-P6-P8 LSP=>PE3
100.65.255.13   100.65.255.11   Up     0 *                      LSP=>PE3_Low_Latency
Total 2 displayed, Up 2, Down 0

Egress LSP: 2 sessions
To              From            State   Rt Style Labelin Labelout LSPname 
100.65.255.11   100.65.255.13   Up       0  1 SE       3        - LSP=>PE1
100.65.255.11   100.65.255.13   Up       0  1 SE       3        - LSP=>PE1_low_latency
Total 2 displayed, Up 2, Down 0

Transit LSP: 0 sessions
Total 0 displayed, Up 0, Down 0


**Explanation:**

The `show mpls lsp` command is used to display information about Multiprotocol Label Switching (MPLS) Label Switched Paths (LSPs) on a Junos-based device. This command provides detailed information about the LSPs, including their state, active paths, and labels.

In this specific output, we can see that there are three types of LSPs: Ingress, Egress, and Transit. The differences between them are as follows:

* **Ingress LSP**: These are LSPs that enter a network through an ingress router (in this case, PE1). They show the source IP address, destination IP address, and the state of the LSP.
* **Egress LSP**: These are LSPs that exit a network through an egress router (in this case, also PE1). They show similar information to the ingress LSP, but with the roles reversed (source and destination IPs swapped).
* **Transit LSP**: These are LSPs that traverse a network through one or more transit routers. In this output, there are no transit LSPs.

**Output Analysis:**

Let's analyze every detail of the output:

1. The first section shows the Ingress LSPs:
        * There are 2 sessions.
        * Both sessions have an "Up" state, indicating that they are operational.
        * The source IP address is `100.65.255.11` and the destination IP address is `100.65.255.13`.
        * Each session has a unique LSP name (e.g., `LSP=>PE3-via-P2-P4-P6-P8`).
        * One of the sessions has an "ExpPath" which might indicate that this path is used as an explicit path for this particular LSP.
2. The second section shows the Egress LSPs:
        * There are also 2 sessions, similar to the ingress LSPs.
        * The source IP address is now `100.65.255.13` and the destination IP address is `100.65.255.11`.
        * Each session has a unique LSP name (e.g., `LSP=>PE1_low_latency`).
        * One of the sessions uses "SE" (Shortest Explicit) path style.
3. The third section shows that there are no Transit LSPs.

**Course of Action:**

Based on this analysis, here are some recommendations:

* **Verify MPLS configuration:** Ensure that the MPLS configuration is correct and consistent across all devices in the network.
* **Check for inconsistencies:** Verify that the source and destination IP addresses in the ingress and egress LSPs match. If not, it might indicate a misconfiguration or an issue with the routing protocol.
* **Monitor LSP performance:** Monitor the performance of the LSPs to ensure they are operating as expected.
* **Troubleshoot explicit paths:** Investigate why one of the sessions uses an explicit path ("ExpPath").
* **Check for missing transit LSPs:** If there are no transit LSPs, it might indicate that MPLS is not enabled or configured correctly on some devices in the network.

In conclusion, this output indicates that the Ingress and Egress LSPs are operational, but there are no Transit LSPs. Further investigation is required to ensure that the MPLS configuration and routing protocols are correct and consistent across all devices in the network.

More Info on Modes:

  • Normal Mode: Live device connections, executing commands via SSH.
  • Demo Mode: Uses pre-saved command outputs from the demo folder (no live connections).
  • General Mode: Answers open-ended networking questions using the LLM (does not execute device commands).

Using Jump servers (aka "netbastion")

Many organizations enforce strict security policies restricting direct SSH access to network devices. Instead, they require connections to be routed through a dedicated jump server (or bastion host). ChatNOC now supports this setup by allowing users to configure a jump server in the global configuration. When enabled, ChatNOC establishes an SSH tunnel through the jump server before connecting to the target network device.

Configuration

To enable jump server support, update your config/config.yaml with the following settings:

# Jumpserver settings.
jumpserver:
  enabled: true
  host: "192.168.0.108"
  port: 22
  credentials_file: "jumpserver_credentials.json"
  conn_timeout: 10
  banner_timeout: 10
  extra_options:
    cisco: "-o KexAlgorithms=diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1 -o HostKeyAlgorithms=+ssh-rsa"
    juniper: "-o KexAlgorithms=curve25519-sha256,[email protected],ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1 -o HostKeyAlgorithms=+ssh-rsa"
    nokia: "-o KexAlgorithms=diffie-hellman-group14-sha1 -o HostKeyAlgorithms=+ssh-rsa"

Note:

  • The jump server connection is implemented using key-based authentication.
  • If your jumpserver is configured for key-based authentication, you do not need a password entry in jumpserver_credentials.json.
  • The target device credentials (for connecting from the jump server to the device) are still provided via credentials.json or via the secure authentication process.

How It Works

When jump server support is enabled in the configuration:

  1. Configuration Loading:

    ChatNOC reads the jumpserver settings from config/config.yaml.

  2. ProxyCommand Setup:

    The code constructs a ProxyCommand that tells SSH to tunnel through the jumpserver. For example, the command might look like:

ssh -q -W %h:%p [email protected] -p 22 -o StrictHostKeyChecking=no -o BatchMode=yes

In this command:

  • %h:%p are placeholders that SSH replaces with the target device's host and port.
  • The jumpserver credentials (e.g., username from jumpserver_credentials.json) are used if provided; otherwise, options can be taken from the configuration directly.
  1. Connection Flow:
    • ChatNOC first connects to the jumpserver using the configured ProxyCommand.
    • Once connected to the jump server, it then connects to the target network device using the standard credentials from credentials.json (or through the secure method).

Troubleshooting

  • Device Connection Issues: Verify device IPs, SSH ports (as defined in devices.yaml), credentials, and firewall settings.
  • Configuration Issues: Ensure that config files (config.yaml, devices.yaml, baseline files) are correctly placed and formatted.
  • Authentication Problems: Check your authentication mode in config.yaml and ensure that secure credentials are generated and stored in the auth folder.
  • LLM Errors: Confirm that Ollama is running and reachable as per your configuration.
  • If you experience issues when using jump server support, consider the following steps:
  1. Verify Jump server Configuration: Ensure that your jumpserver is correctly set up for key-based authentication. If your jump server requires a password, note that the current implementation is optimized for key-based connections.

  2. Check the ProxyCommand: Enable debugging in ChatNOC to view the constructed ProxyCommand. This will help you verify that the command is correctly formatted for your environment.

  3. Test Directly via CLI: From your local machine, run a command similar to the following to test the jump server tunnel manually:

sshpass -p 'Operador' ssh -o KexAlgorithms=diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1 -o HostKeyAlgorithms=+ssh-rsa -o 
ProxyCommand="sshpass -p 'root_password' ssh -q -W %h:%p [email protected] -p 22" [email protected] 'show ip bgp summary' 

This manual test should help isolate issues with your jumpserver settings.

  1. Review Logs: Check the debug logs (if enabled) to see the detailed connection parameters and any errors reported by Paramiko or Netmiko.

  2. Make sure you have created your SSH keys and copied the public key to the jump server:

Future Work

  • Expand Device and Command Support: Extend support for additional device types and command sets to broaden ChatNOC’s applicability across diverse network environments.
  • Improve LLM Prompting: Refine the LLM prompt to better distinguish between networking queries and general questions, ensuring more accurate intent extraction.
  • Enhance Health-Check Logic: Develop more detailed and granular health-check analysis to improve diagnostic accuracy and provide actionable insights.
  • Increase Logging and Testing: Add robust logging and unit tests to improve reliability, simplify troubleshooting, and facilitate future enhancements.
  • Implement Intent-Based Networking: Integrate intent-based networking capabilities to verify that the network operates according to defined policies and intents, thereby improving availability and routing consistency.
  • Analyze Control Plane Behavior: Implement functionality to analyze network control plane behavior by examining routing and forwarding tables, which will assist engineers in diagnosing traffic flow paths and network performance issues.
  • Inspect BGP Local-RIB Tables: Develop tools to inspect the BGP Local-RIB (Routing Information Base) to help engineers understand, verify, and resolve routing issues more effectively.
  • Enable Maintenance Mode: Introduce a maintenance mode that supports traffic redirection (e.g., "tshifts") when devices fail health checks or baseline tests, thereby maintaining network availability during troubleshooting or scheduled maintenance.
  • Configuration and Deployment Introduce configuration, service activation generators, and deployment features to allow engineers to safely add customers and scale networks. This includes comprehensive checks at every stage: pre-deployment verification, in-flight monitoring, post-deployment validation, configuration versioning, and rollback capabilities.
  • Traffic Shifts Introduce a feature that enables engineers to control traffic flows between Autonomous Systems by using regex patterns for prefixes or AS-Paths, along with specified upstream providers. This allows for either dropping matched traffic or redirecting it to alternate upstream providers.

ChatNOC is designed to provide network operators with a powerful yet flexible tool for managing networks using natural language. Its modular architecture makes it easy to extend and customize as new requirements emerge. Contributions and feedback are welcome as the project continues to evolve.