Future Updates and Known Issues - splunk/ShellSweep GitHub Wiki
Future Updates
Feedback Loop and Model Optimization
Current Implementation
In ShellSweepX, we've opted not to implement a feedback loop, unlike its predecessor ShellSweepML. This decision was based on the current model's robust performance in detecting webshells.
Rationale
- Model Efficacy: The existing model demonstrates high accuracy in identifying webshells.
- Acceptable False Positive Rate: We've determined that the current low rate of false positives is within acceptable limits for operational use.
Feature Optimization
During our development process, we conducted extensive testing on feature selection:
- Initial Testing: Performed in ShellSweepML
- Feature Range: Experimented with varying numbers of features:
- Tested with 10,000 features
- Tested with 1,000 features
- Settled on 5,000 features as the optimal balance
Conclusion
After thorough evaluation, we concluded that 5,000 features provide the best compromise between model accuracy and computational efficiency. This configuration offers robust webshell detection capabilities without the immediate need for a feedback loop mechanism.
Future Considerations
While not currently implemented, we remain open to introducing a feedback loop in future versions if user experiences and evolving threat landscapes indicate a need for continuous model refinement.
Search the Content of Web Shells
Current Status
We initially tested content searching using the syntax content:"cmd"
. However, this feature was removed from the initial release due to performance issues.
Reasons for Removal
- Increased search delay, significantly impacting user experience
- Operational challenges in implementing it correctly
Future Plans
We aim to reintroduce this feature in a future update, focusing on:
- Optimizing search performance
- Improving the accuracy of content matching
- Balancing functionality with user experience
SSL/TLS Implementation
Current Status
SSL/TLS is not currently implemented in ShellSweepX.
Considerations
- Internal Hosting: The primary use case for ShellSweepX is internal network deployment, which may reduce the immediate need for SSL/TLS.
- Security: While internal hosting provides some security, implementing SSL/TLS would enhance data protection and privacy.
Future Plans
- Evaluate the necessity of SSL/TLS based on user feedback and deployment scenarios
- Potentially include SSL/TLS support in future releases to enhance security for various deployment options
Note to Users
If deploying ShellSweepX in an environment where additional security layers are required, consider implementing network-level security measures in the interim.
Known Issues
Scikit-learn Version Mismatch Warning
When starting the server, you may encounter warnings about unpickling estimators from an older version of scikit-learn. This is due to a version mismatch between the saved models (v1.3.0) and the currently installed scikit-learn (v1.5.1).
To resolve this issue:
-
Recommended: Update pickled models
- Retrain and save models using scikit-learn 1.5.1
- Update relevant code to use the new models
-
Alternative: Downgrade scikit-learn
pip install scikit-learn==1.3.0
-
Temporary solution (not recommended for production) Add this code before loading models:
import warnings warnings.filterwarnings("ignore", category=UserWarning)
Note: Option 1 is the best long-term solution. Option 3 should only be used for temporary testing purposes. However, we not noticed any issues even when handling over 7000+ files in the database.
Server and Database Load Management
Current Challenge
Sending a large number of files simultaneously, especially from an entire server or a directory with numerous files, can potentially overload the server and cause unresponsiveness.
Implemented Solution
To mitigate this issue, we've implemented a batching system:
$batchSize = 50
$waitTime = 20
This approach:
- Sends files in batches of 50
- Waits 20 seconds between each batch
Best Practices for Users
To optimize performance and prevent overload:
-
Target Specific Directories:
- Focus your scans on directories likely to contain web shells
- Avoid sweeping entire servers if possible
-
Utilize File Extensions:
- Tailor your scans to relevant file types
- Remember: PHP files are less likely on Windows servers, while ASPX files are uncommon on Linux
-
Consider Server Environment:
- Adjust your sweep parameters based on the server's OS and typical web technologies
-
Monitor Performance:
- Keep an eye on server responsiveness during scans
- Adjust batch size or wait time if needed
Future Improvements
We are continuously working to enhance the efficiency of the scanning process. Future updates may include:
- Dynamic batch sizing based on server performance
- More granular control over scan parameters
- Improved server-side processing to handle larger volumes of data
Remember: Efficient and targeted sweeping not only prevents server overload but also improves the accuracy and speed of your web shell detection efforts.