5.6 KiB
5.6 KiB
Performance Optimization Guide
Overview
Ghost is designed for process injection detection with configurable performance characteristics. This guide covers actual optimization strategies and expected performance.
Performance Characteristics
Expected Detection Engine Performance
- Process Enumeration: 10-50ms for all system processes
- Memory Region Analysis: 1-5ms per process (platform-dependent)
- Thread Enumeration: 1-10ms per process
- Detection Heuristics: <1ms per process
- Memory Usage: ~10-20MB for core engine
Note: Actual performance varies significantly by:
- Number of processes (100-1000+ typical)
- Memory region count per process
- Thread count per process
- Platform (Windows APIs vs Linux procfs)
Configuration Options
1. Selective Detection
use ghost_core::config::DetectionConfig;
// Disable expensive detections for performance
let mut config = DetectionConfig::default();
config.rwx_detection = true; // Fast: O(n) memory regions
config.shellcode_detection = false; // Skip pattern matching
config.hook_detection = false; // Skip module enumeration
config.thread_detection = true; // Moderate: thread enum
config.hollowing_detection = false; // Skip heuristics
2. Preset Modes
// Fast scanning mode
let config = DetectionConfig::performance_mode();
// Thorough scanning mode
let config = DetectionConfig::thorough_mode();
3. Process Filtering
// Skip system processes
config.skip_system_processes = true;
// Limit memory scan size
config.max_memory_scan_size = 10 * 1024 * 1024; // 10MB per process
Performance Considerations
Platform-Specific Performance
Windows:
- CreateToolhelp32Snapshot: Single syscall, fast
- VirtualQueryEx: Iterative, slower for processes with many regions
- ReadProcessMemory: Cross-process, requires proper handles
- NtQueryInformationThread: Undocumented API call per thread
Linux:
- /proc enumeration: Directory reads, fast
- /proc/[pid]/maps parsing: File I/O, moderate
- /proc/[pid]/mem reading: Requires ptrace or same user
- /proc/[pid]/task parsing: Per-thread file I/O
macOS:
- sysctl KERN_PROC_ALL: Single syscall, fast
- Memory/thread analysis: Not yet implemented
Running Tests
# Run all tests including performance assertions
cargo test
# Run tests with timing output
cargo test -- --nocapture
Tuning Guidelines
For Continuous Monitoring
- Adjust scan interval: Configure
scan_interval_msin DetectionConfig - Skip system processes: Set
skip_system_processes = true - Limit memory scans: Reduce
max_memory_scan_size - Disable heavy detections: Turn off hook_detection and shellcode_detection
For One-Time Analysis
- Enable all detections: Use
DetectionConfig::thorough_mode() - Full memory scanning: Increase
max_memory_scan_size - Include system processes: Set
skip_system_processes = false
Platform-Specific Optimizations
Windows
- Run as Administrator for full process access
- Use
PROCESS_QUERY_LIMITED_INFORMATIONwhenPROCESS_QUERY_INFORMATIONfails - Handle access denied errors gracefully (system processes)
Linux
- Run with appropriate privileges (root or CAP_SYS_PTRACE)
- Handle permission denied for /proc/[pid]/mem gracefully
- Consider using process groups for batch access
macOS
- Limited functionality (process enumeration only)
- Most detection features require kernel extensions or Endpoint Security framework
Troubleshooting Performance Issues
High CPU Usage
- Reduce scan frequency (
scan_interval_ms) - Disable thread analysis for each scan
- Skip memory region enumeration
- Filter out known-good processes
High Memory Usage
- Reduce baseline cache size (limited processes tracked)
- Clear detection history periodically
- Limit memory reading buffer sizes
Slow Detection Response
- Disable hook detection (expensive module enumeration)
- Skip shellcode pattern matching
- Use performance preset mode
Current Implementation Limits
What's NOT implemented:
- No performance metrics collection system
- No Prometheus/monitoring integration
- No SIMD-accelerated pattern matching
- No parallel/async process scanning (single-threaded)
- No LRU caching of results
- No batch processing APIs
Current architecture:
- Sequential process scanning
- Simple HashMap for baseline tracking
- Basic confidence scoring
- Manual timer-based intervals (TUI)
Testing Performance
#[test]
fn test_detection_performance() {
use std::time::Instant;
let mut engine = DetectionEngine::new().unwrap();
let process = ProcessInfo::new(1234, 4, "test.exe".to_string());
let regions = vec![/* test regions */];
let start = Instant::now();
for _ in 0..100 {
engine.analyze_process(&process, ®ions, None);
}
let duration = start.elapsed();
// Should complete 100 analyses in under 100ms
assert!(duration.as_millis() < 100);
}
Best Practices
- Start with defaults: Use
DetectionConfig::default()initially - Profile specific modules: Identify which detection is slow
- Adjust based on needs: Disable features you don't need
- Handle errors gracefully: Processes may exit during scan
- Test on target hardware: Performance varies by system
Future Performance Improvements
Potential enhancements (not yet implemented):
- Parallel process analysis using rayon
- Async I/O for file system operations (Linux)
- Result caching with TTL
- Incremental scanning (only changed processes)
- Memory-mapped file parsing
- SIMD pattern matching for shellcode