Future Updates and Known Issues - splunk/ShellSweep GitHub Wiki

Future Updates

Feedback Loop and Model Optimization

Current Implementation

In ShellSweepX, we've opted not to implement a feedback loop, unlike its predecessor ShellSweepML. This decision was based on the current model's robust performance in detecting webshells.

Rationale

Model Efficacy: The existing model demonstrates high accuracy in identifying webshells.
Acceptable False Positive Rate: We've determined that the current low rate of false positives is within acceptable limits for operational use.

Feature Optimization

During our development process, we conducted extensive testing on feature selection:

Initial Testing: Performed in ShellSweepML
Feature Range: Experimented with varying numbers of features:
- Tested with 10,000 features
- Tested with 1,000 features
- Settled on 5,000 features as the optimal balance

Conclusion

After thorough evaluation, we concluded that 5,000 features provide the best compromise between model accuracy and computational efficiency. This configuration offers robust webshell detection capabilities without the immediate need for a feedback loop mechanism.

Future Considerations

While not currently implemented, we remain open to introducing a feedback loop in future versions if user experiences and evolving threat landscapes indicate a need for continuous model refinement.

Search the Content of Web Shells

Current Status

We initially tested content searching using the syntax content:"cmd". However, this feature was removed from the initial release due to performance issues.

Reasons for Removal

Increased search delay, significantly impacting user experience
Operational challenges in implementing it correctly

Future Plans

We aim to reintroduce this feature in a future update, focusing on:

Optimizing search performance
Improving the accuracy of content matching
Balancing functionality with user experience

SSL/TLS Implementation

Current Status

SSL/TLS is not currently implemented in ShellSweepX.

Considerations

Internal Hosting: The primary use case for ShellSweepX is internal network deployment, which may reduce the immediate need for SSL/TLS.
Security: While internal hosting provides some security, implementing SSL/TLS would enhance data protection and privacy.

Future Plans

Evaluate the necessity of SSL/TLS based on user feedback and deployment scenarios
Potentially include SSL/TLS support in future releases to enhance security for various deployment options

Note to Users

If deploying ShellSweepX in an environment where additional security layers are required, consider implementing network-level security measures in the interim.

Known Issues

Scikit-learn Version Mismatch Warning

When starting the server, you may encounter warnings about unpickling estimators from an older version of scikit-learn. This is due to a version mismatch between the saved models (v1.3.0) and the currently installed scikit-learn (v1.5.1).

To resolve this issue:

Recommended: Update pickled models
- Retrain and save models using scikit-learn 1.5.1
- Update relevant code to use the new models
Alternative: Downgrade scikit-learn
```
pip install scikit-learn==1.3.0
```
Temporary solution (not recommended for production) Add this code before loading models:
```
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
```

Note: Option 1 is the best long-term solution. Option 3 should only be used for temporary testing purposes. However, we not noticed any issues even when handling over 7000+ files in the database.

Server and Database Load Management

Current Challenge

Sending a large number of files simultaneously, especially from an entire server or a directory with numerous files, can potentially overload the server and cause unresponsiveness.

Implemented Solution

To mitigate this issue, we've implemented a batching system:

$batchSize = 50
$waitTime = 20

This approach:

Sends files in batches of 50
Waits 20 seconds between each batch

Best Practices for Users

To optimize performance and prevent overload:

Target Specific Directories:
- Focus your scans on directories likely to contain web shells
- Avoid sweeping entire servers if possible
Utilize File Extensions:
- Tailor your scans to relevant file types
- Remember: PHP files are less likely on Windows servers, while ASPX files are uncommon on Linux
Consider Server Environment:
- Adjust your sweep parameters based on the server's OS and typical web technologies
Monitor Performance:
- Keep an eye on server responsiveness during scans
- Adjust batch size or wait time if needed

Future Improvements

We are continuously working to enhance the efficiency of the scanning process. Future updates may include:

Dynamic batch sizing based on server performance
More granular control over scan parameters
Improved server-side processing to handle larger volumes of data

Remember: Efficient and targeted sweeping not only prevents server overload but also improves the accuracy and speed of your web shell detection efforts.