ChatNOC - leofurtadonyc/Network-Automation GitHub Wiki
Overview
ChatNOC: Network Automation Chatbot & Orchestrator
Author: Leonardo Furtado
Repository: Network-Automation
ChatNOC is an agent designed to answer common Network Operations questions about its environment, including the status of the network control plane, reachability testing (ping, traceroute), device health monitoring, customer service health monitoring, and routing analysis through RIB and BGP Local-RIB tablesβalong with other capabilities I plan to add. Built with Python, ChatNOC leverages an LLM (via Ollama) for query interpretation and Netmiko for SSH connectivity to execute device commands. It supports multiple operation modes, including live mode, demo mode (using pre-saved outputs), and general mode (for open-ended networking questions).
Features
Natural Language Interface - Utilizing an LLM, transform user-friendly queries into:
- Device-specific commands for performing tasks on live devices.
- Diagnosis and troubleshooting of various network and customer-impacting services.
- Health checks against established baselines to promptly address root causes.
- As the project progresses, additional features will be introduced!
Modular Architecture
The project is refactored into separate packages for core CLI functionality, command mapping, device inventory, executors, LLM integration, explanations, health checks, configuration, and authentication.
Multiple Modes
- Normal Mode: Executes commands live on devices.
- Demo Mode: Uses pre-saved command outputs for safe testing.
- General Mode: Answers open-ended networking questions without executing device commands.
Device Inventory
Loads device definitions (e.g., IP addresses, device types, SSH ports) from a YAML file.
Extensible Command Mapper
Easily add new commands and their corresponding explanations for various platforms (Cisco, Juniper, Huawei, Nokia).
Health Check Capability
Perform automated health checks based on baseline configuration files.
Authentication Options
Supports both plain-text and secure (bcrypt-hashed) authentication with credentials configurable per device.
History and Help
Built-in commands to display the last 50 queries and a detailed help file with usage instructions
Installation
Clone the Repository
git clone https://github.com/leofurtadonyc/Network-Automation.git
cd Network-Automation/chatnoc
Create and Activate a Virtual Environment:
python3 -m venv venv
source venv/bin/activate
Install Dependencies:
pip install -r requirements.txt
Set Up Configuration Files:
config/config.yaml
: Contains LLM settings and global authentication defaults.devices.yaml
: Contains device inventory data.baseline/ folder
: Contains YAML files for health-check baselines and approved topics.auth/credentials.json
: For secure authentication (generated via the provided auth/ssh-account.py
script).
Directory Structure
chatnoc/
βββ chatnoc.py # Main entry point; starts the CLI loop.
βββ __init__.py
βββ core/ # Core CLI functionality.
β βββ __init__.py
β βββ cli.py # Main CLI loop and mode management.
β βββ history.py # History display functionality.
βββ commands/ # Command mapping templates.
β βββ __init__.py
β βββ command_mapper.py # Defines COMMAND_MAP and get_command().
βββ executors/ # Device connection and command execution.
β βββ __init__.py
β βββ netmiko_executor.py # Uses Netmiko for SSH connections.
βββ inventory/ # Device inventory management.
β βββ __init__.py
β βββ devices_inventory.py # Loads devices.yaml and defines Device and DeviceInventory.
βββ llm/ # LLM interface.
β βββ __init__.py
β βββ llm_interface.py # Provides get_llm() for interfacing with Ollama.
βββ explanations/ # Explanation functions for commands.
β βββ __init__.py # Re-exports all explanation functions.
β βββ ospf_explanation.py
β βββ bgp_explanation.py
β βββ ldp_explanation.py
β βββ route_explanation.py
β βββ general_explanation.py
β βββ mpls_interfaces_explanation.py
β βββ mpls_forwarding_explanation.py
β βββ ospf_database_explanation.py
β βββ ip_explicit_paths_explanation.py
β βββ l2vpn_atom_vc_explanation.py
β βββ mpls_traffic_eng_explanation.py
β βββ version_explanation.py
β βββ bgp_vpnv4_all_explanation.py
β βββ bgp_vpnv4_vrf_explanation.py
βββ healthcheck/ # Health-check logic.
β βββ __init__.py
β βββ healthcheck.py # Contains health check functions.
βββ config/ # Configuration loader.
β βββ __init__.py
β βββ config_loader.py # Loads config.yaml.
βββ auth/ # Authentication utilities.
β βββ __init__.py
β βββ auth_manager.py # Loads credentials from credentials.json.
β βββ ssh-account.py # Script to add secure user credentials.
β βββ jumpserver_credentials.json # Script to add secure jumpserver credentials (not used at this moment...)
βββ baseline/ # Baseline data.
β βββ healthcheck_baseline.yaml
β βββ index.yaml # Approved topics for general mode.
βββ demo/ # Demo outputs for dry-run mode.
β βββ p1/
β βββ p2/
β βββ ... # Additional device folders.
βββ help.txt # Detailed usage instructions.
βββ requirements.txt # Python dependencies.
Configuration
config/config.yaml Contains settings for Ollama (LLM), global SSH defaults, and authentication mode.
# config/config.yaml
ollama:
host: "192.168.0.213"
port: 11434
ssh_port: 22
# authentication:
# type: "plain-text" # Options: "plain-text" or "secure"
# username: "operador"
# password: "Operador"
authentication:
type: "secure" # Use secure authentication
credentials_file: "credentials.json" # File that holds hashed credentials
preferred_language: "en" # e.g., "en" for English, "pt" for Portuguese, etc.
# Jumpserver settings
jumpserver:
enabled: true
host: "192.168.0.108"
port: 22
credentials_file: "jumpserver_credentials.json"
conn_timeout: 10
banner_timeout: 10
extra_options:
cisco: "-o KexAlgorithms=diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1 -o HostKeyAlgorithms=+ssh-rsa"
juniper: "-o KexAlgorithms=curve25519-sha256,[email protected],ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1 -o HostKeyAlgorithms=+ssh-rsa"
nokia: "-o KexAlgorithms=diffie-hellman-group14-sha1 -o HostKeyAlgorithms=+ssh-rsa"
# Logging settings for debug purposes
debug: false
# ... additional devices ...
auth/credentials.json
(Generated by running auth/ssh-account.py
.) Stores secure credentials.
baseline/ Contains YAML files for health-check baselines and approved topics.
Usage
Start ChatNOC:
(network-automation-venv) operador@eve-ng:~/Network-Automation/chatnoc-v2# python chatnoc.py
____ _ _ _ _ ___ ____
/ ___| |__ __ _| |_| \ | |/ _ \ / ___|
| | | '_ \ / _` | __| \| | | | | |
| |___| | | | (_| | |_| |\ | |_| | |___
\____|_| |_|\__,_|\__|_| \_|\___/ \____|
Welcome to ChatNOC interactive shell V2025.1
By Leonardo Furtado - https://github.com/leofurtadonyc
I am an agent designed to answer questions about your network!
Type 'help' for usage instructions.
Type 'history' to display the last 50 queries.
Type 'demo' to enter demo mode (dry-run with pre-saved outputs).
Type 'general' to enter general mode for open-ended networking topics.
In demo or general mode, type 'exit demo' or 'exit general' to return to normal mode.
In normal mode, 'exit' or 'quit' will terminate the program.
ChatNOC >
Commands
- Normal queries (e.g., "Can device P1 ping 100.65.255.14 using its loopback0 as the source IP?")
- Healthchecks (e.g., "Healthcheck P1")
- Demo Mode: type
demo
to use pre-saved outputs. - General Mode: type
general
for open-ended networking topics. - Other utility commands:
help
,history
,approved topics
. Exiting: - In normal mode: type
exit
orquit
to terminate. - In demo or general mode: type
exit demo
orexit general
to return to normal mode.
ChatNOC > How many OSPF adjacencies does P1 have?
Executing on p1 (192.168.255.1): show ip ospf neighbor | include FULL
Device Output:
100.65.255.2 0 FULL/ - 00:00:31 100.65.0.18 Ethernet0/3
100.65.255.12 0 FULL/ - 00:00:34 100.65.0.9 Ethernet0/2
100.65.255.3 0 FULL/ - 00:00:35 100.65.0.22 Ethernet0/1
100.65.255.11 0 FULL/ - 00:00:36 100.65.0.1 Ethernet0/0
------------------------------
Command issued:
show ip ospf neighbor | include FULL
Explanation:
This command displays OSPF neighbor adjacencies, showing which neighbors have reached FULL state.
It is used to verify that OSPF routing adjacencies are healthy and that the exchange of routing information is complete.
Course of action:
Actions:
- Verify connectivity between OSPF neighbors if any expected neighbors are missing or not in FULL state.
- Check OSPF configurations and interface settings.
Summary:
Input Query: How many OSPF adjacencies does P1 have?
Command Executed: show ip ospf neighbor | include FULL on device(s) p1
ChatNOC > can device p1 ping 100.65.255.14 using its loopback0 as the source IP?
Executing on p1 (192.168.255.1): ping 100.65.255.14 source loopback0
Device Output:
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 100.65.255.14, timeout is 2 seconds:
Packet sent with a source address of 100.65.255.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/2 ms
------------------------------
Command issued:
ping 100.65.255.14 source loopback0
Explanation:
This command retrieves network information based on your query.
It provides raw output which you should review for any inconsistencies or unexpected results.
Course of action:
Actions:
- Review the output carefully.
- Verify the device's configuration and connectivity if the output does not match expectations.
Summary:
Input Query: can device p1 ping 100.65.255.14 using its loopback0 as the source IP?
Command Executed: ping 100.65.255.14 source loopback0 on device(s) p1
ChatNOC > I believe there is a problem in my network, can you check P1?
Performing health check on device p1 (192.168.255.1)...
Checking connectivity on p1...
Enter password for secure authentication:
Health Check Results for p1:
== BACKBONE_INTERFACES ==
Confirmations:
- Interface Ethernet 0/0 is healthy: up/up.
- Interface Ethernet 0/1 is healthy: up/up.
- Interface Ethernet 0/2 is healthy: up/up.
Issues:
- Interface Ethernet 0/3: expected up/up but got administratively/down.
== OSPF_ADJACENCIES ==
Confirmations:
- OSPF adjacency on Ethernet 0/0 is FULL.
- OSPF adjacency on Ethernet 0/1 is FULL.
- OSPF adjacency on Ethernet 0/2 is FULL.
Issues:
- No OSPF neighbor entry found for interface Ethernet 0/3.
== LDP_SESSIONS ==
Confirmations:
- LDP session for interface Ethernet 0/0 appears established.
- LDP session for interface Ethernet 0/1 appears established.
- LDP session for interface Ethernet 0/2 appears established.
- Label bindings for 100.65.255.0/24 are present.
Issues:
- LDP session for interface Ethernet 0/3 not found.
== BGP_SESSIONS ==
Confirmations:
- BGP session with 100.65.255.3 is present.
- BGP session with 100.65.255.4 is present.
--------------------------------------------------------------------------------
Direct Commands
ChatNOC lets you run commands directly on network devices. It displays the output, analyzes the results, and identifies potential issues with recommended solutions. To use this feature, simply type your command in quotes followed by the device name.
ChatNOC > run a "show mpls lsp" on pe1-junos
Device Output:
Ingress LSP: 2 sessions
To From State Rt P ActivePath LSPname
100.65.255.13 100.65.255.11 Up 0 * ExpPath=>PE3-via-P2-P4-P6-P8 LSP=>PE3
100.65.255.13 100.65.255.11 Up 0 * LSP=>PE3_Low_Latency
Total 2 displayed, Up 2, Down 0
Egress LSP: 2 sessions
To From State Rt Style Labelin Labelout LSPname
100.65.255.11 100.65.255.13 Up 0 1 SE 3 - LSP=>PE1
100.65.255.11 100.65.255.13 Up 0 1 SE 3 - LSP=>PE1_low_latency
Total 2 displayed, Up 2, Down 0
Transit LSP: 0 sessions
Total 0 displayed, Up 0, Down 0
**Explanation:**
The `show mpls lsp` command is used to display information about Multiprotocol Label Switching (MPLS) Label Switched Paths (LSPs) on a Junos-based device. This command provides detailed information about the LSPs, including their state, active paths, and labels.
In this specific output, we can see that there are three types of LSPs: Ingress, Egress, and Transit. The differences between them are as follows:
* **Ingress LSP**: These are LSPs that enter a network through an ingress router (in this case, PE1). They show the source IP address, destination IP address, and the state of the LSP.
* **Egress LSP**: These are LSPs that exit a network through an egress router (in this case, also PE1). They show similar information to the ingress LSP, but with the roles reversed (source and destination IPs swapped).
* **Transit LSP**: These are LSPs that traverse a network through one or more transit routers. In this output, there are no transit LSPs.
**Output Analysis:**
Let's analyze every detail of the output:
1. The first section shows the Ingress LSPs:
* There are 2 sessions.
* Both sessions have an "Up" state, indicating that they are operational.
* The source IP address is `100.65.255.11` and the destination IP address is `100.65.255.13`.
* Each session has a unique LSP name (e.g., `LSP=>PE3-via-P2-P4-P6-P8`).
* One of the sessions has an "ExpPath" which might indicate that this path is used as an explicit path for this particular LSP.
2. The second section shows the Egress LSPs:
* There are also 2 sessions, similar to the ingress LSPs.
* The source IP address is now `100.65.255.13` and the destination IP address is `100.65.255.11`.
* Each session has a unique LSP name (e.g., `LSP=>PE1_low_latency`).
* One of the sessions uses "SE" (Shortest Explicit) path style.
3. The third section shows that there are no Transit LSPs.
**Course of Action:**
Based on this analysis, here are some recommendations:
* **Verify MPLS configuration:** Ensure that the MPLS configuration is correct and consistent across all devices in the network.
* **Check for inconsistencies:** Verify that the source and destination IP addresses in the ingress and egress LSPs match. If not, it might indicate a misconfiguration or an issue with the routing protocol.
* **Monitor LSP performance:** Monitor the performance of the LSPs to ensure they are operating as expected.
* **Troubleshoot explicit paths:** Investigate why one of the sessions uses an explicit path ("ExpPath").
* **Check for missing transit LSPs:** If there are no transit LSPs, it might indicate that MPLS is not enabled or configured correctly on some devices in the network.
In conclusion, this output indicates that the Ingress and Egress LSPs are operational, but there are no Transit LSPs. Further investigation is required to ensure that the MPLS configuration and routing protocols are correct and consistent across all devices in the network.
More Info on Modes:
- Normal Mode: Live device connections, executing commands via SSH.
- Demo Mode: Uses pre-saved command outputs from the demo folder (no live connections).
- General Mode: Answers open-ended networking questions using the LLM (does not execute device commands).
Using Jump servers (aka "netbastion")
Many organizations enforce strict security policies restricting direct SSH access to network devices. Instead, they require connections to be routed through a dedicated jump server (or bastion host). ChatNOC now supports this setup by allowing users to configure a jump server in the global configuration. When enabled, ChatNOC establishes an SSH tunnel through the jump server before connecting to the target network device.
Configuration
To enable jump server support, update your config/config.yaml
with the following settings:
# Jumpserver settings.
jumpserver:
enabled: true
host: "192.168.0.108"
port: 22
credentials_file: "jumpserver_credentials.json"
conn_timeout: 10
banner_timeout: 10
extra_options:
cisco: "-o KexAlgorithms=diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1 -o HostKeyAlgorithms=+ssh-rsa"
juniper: "-o KexAlgorithms=curve25519-sha256,[email protected],ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1 -o HostKeyAlgorithms=+ssh-rsa"
nokia: "-o KexAlgorithms=diffie-hellman-group14-sha1 -o HostKeyAlgorithms=+ssh-rsa"
Note:
- The jump server connection is implemented using key-based authentication.
- If your jumpserver is configured for key-based authentication, you do not need a password entry in
jumpserver_credentials.json
. - The target device credentials (for connecting from the jump server to the device) are still provided via
credentials.json
or via the secure authentication process.
How It Works
When jump server support is enabled in the configuration:
-
Configuration Loading:
ChatNOC reads the jumpserver settings from
config/config.yaml
. -
ProxyCommand Setup:
The code constructs a
ProxyCommand
that tells SSH to tunnel through the jumpserver. For example, the command might look like:
ssh -q -W %h:%p [email protected] -p 22 -o StrictHostKeyChecking=no -o BatchMode=yes
In this command:
%h:%p
are placeholders that SSH replaces with the target device's host and port.- The jumpserver credentials (e.g., username from
jumpserver_credentials.json
) are used if provided; otherwise, options can be taken from the configuration directly.
- Connection Flow:
- ChatNOC first connects to the jumpserver using the configured
ProxyCommand
. - Once connected to the jump server, it then connects to the target network device using the standard credentials from
credentials.json
(or through the secure method).
- ChatNOC first connects to the jumpserver using the configured
Troubleshooting
- Device Connection Issues:
Verify device IPs, SSH ports (as defined in
devices.yaml
), credentials, and firewall settings. - Configuration Issues:
Ensure that config files (
config.yaml
,devices.yaml
, baseline files) are correctly placed and formatted. - Authentication Problems:
Check your authentication mode in
config.yaml
and ensure that secure credentials are generated and stored in the auth folder. - LLM Errors: Confirm that Ollama is running and reachable as per your configuration.
- If you experience issues when using jump server support, consider the following steps:
-
Verify Jump server Configuration: Ensure that your jumpserver is correctly set up for key-based authentication. If your jump server requires a password, note that the current implementation is optimized for key-based connections.
-
Check the ProxyCommand: Enable debugging in ChatNOC to view the constructed
ProxyCommand
. This will help you verify that the command is correctly formatted for your environment. -
Test Directly via CLI: From your local machine, run a command similar to the following to test the jump server tunnel manually:
sshpass -p 'Operador' ssh -o KexAlgorithms=diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1 -o HostKeyAlgorithms=+ssh-rsa -o
ProxyCommand="sshpass -p 'root_password' ssh -q -W %h:%p [email protected] -p 22" [email protected] 'show ip bgp summary'
This manual test should help isolate issues with your jumpserver settings.
-
Review Logs: Check the debug logs (if enabled) to see the detailed connection parameters and any errors reported by Paramiko or Netmiko.
-
Make sure you have created your SSH keys and copied the public key to the jump server:
ssh-keygen -t rsa -b 4096 -C "[email protected]"
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected] -p 22 # Jumpserver
Future Work
- Expand Device and Command Support: Extend support for additional device types and command sets to broaden ChatNOCβs applicability across diverse network environments.
- Improve LLM Prompting: Refine the LLM prompt to better distinguish between networking queries and general questions, ensuring more accurate intent extraction.
- Enhance Health-Check Logic: Develop more detailed and granular health-check analysis to improve diagnostic accuracy and provide actionable insights.
- Increase Logging and Testing: Add robust logging and unit tests to improve reliability, simplify troubleshooting, and facilitate future enhancements.
- Implement Intent-Based Networking: Integrate intent-based networking capabilities to verify that the network operates according to defined policies and intents, thereby improving availability and routing consistency.
- Analyze Control Plane Behavior: Implement functionality to analyze network control plane behavior by examining routing and forwarding tables, which will assist engineers in diagnosing traffic flow paths and network performance issues.
- Inspect BGP Local-RIB Tables: Develop tools to inspect the BGP Local-RIB (Routing Information Base) to help engineers understand, verify, and resolve routing issues more effectively.
- Enable Maintenance Mode: Introduce a maintenance mode that supports traffic redirection (e.g., "tshifts") when devices fail health checks or baseline tests, thereby maintaining network availability during troubleshooting or scheduled maintenance.
- Configuration and Deployment Introduce configuration, service activation generators, and deployment features to allow engineers to safely add customers and scale networks. This includes comprehensive checks at every stage: pre-deployment verification, in-flight monitoring, post-deployment validation, configuration versioning, and rollback capabilities.
- Traffic Shifts Introduce a feature that enables engineers to control traffic flows between Autonomous Systems by using regex patterns for prefixes or AS-Paths, along with specified upstream providers. This allows for either dropping matched traffic or redirecting it to alternate upstream providers.
ChatNOC is designed to provide network operators with a powerful yet flexible tool for managing networks using natural language. Its modular architecture makes it easy to extend and customize as new requirements emerge. Contributions and feedback are welcome as the project continues to evolve.