# Performance Optimization Guide ## Overview Ghost is designed for process injection detection with configurable performance characteristics. This guide covers actual optimization strategies and expected performance. ## Performance Characteristics ### Expected Detection Engine Performance - **Process Enumeration**: 10-50ms for all system processes - **Memory Region Analysis**: 1-5ms per process (platform-dependent) - **Thread Enumeration**: 1-10ms per process - **Detection Heuristics**: <1ms per process - **Memory Usage**: ~10-20MB for core engine **Note**: Actual performance varies significantly by: - Number of processes (100-1000+ typical) - Memory region count per process - Thread count per process - Platform (Windows APIs vs Linux procfs) ### Configuration Options #### 1. Selective Detection ```rust use ghost_core::config::DetectionConfig; // Disable expensive detections for performance let mut config = DetectionConfig::default(); config.rwx_detection = true; // Fast: O(n) memory regions config.shellcode_detection = false; // Skip pattern matching config.hook_detection = false; // Skip module enumeration config.thread_detection = true; // Moderate: thread enum config.hollowing_detection = false; // Skip heuristics ``` #### 2. Preset Modes ```rust // Fast scanning mode let config = DetectionConfig::performance_mode(); // Thorough scanning mode let config = DetectionConfig::thorough_mode(); ``` #### 3. Process Filtering ```rust // Skip system processes config.skip_system_processes = true; // Limit memory scan size config.max_memory_scan_size = 10 * 1024 * 1024; // 10MB per process ``` ## Performance Considerations ### Platform-Specific Performance **Windows**: - CreateToolhelp32Snapshot: Single syscall, fast - VirtualQueryEx: Iterative, slower for processes with many regions - ReadProcessMemory: Cross-process, requires proper handles - NtQueryInformationThread: Undocumented API call per thread **Linux**: - /proc enumeration: Directory reads, fast - /proc/[pid]/maps parsing: File I/O, moderate - /proc/[pid]/mem reading: Requires ptrace or same user - /proc/[pid]/task parsing: Per-thread file I/O **macOS**: - sysctl KERN_PROC_ALL: Single syscall, fast - Memory/thread analysis: Not yet implemented ### Running Tests ```bash # Run all tests including performance assertions cargo test # Run tests with timing output cargo test -- --nocapture ``` ## Tuning Guidelines ### For Continuous Monitoring 1. **Adjust scan interval**: Configure `scan_interval_ms` in DetectionConfig 2. **Skip system processes**: Set `skip_system_processes = true` 3. **Limit memory scans**: Reduce `max_memory_scan_size` 4. **Disable heavy detections**: Turn off hook_detection and shellcode_detection ### For One-Time Analysis 1. **Enable all detections**: Use `DetectionConfig::thorough_mode()` 2. **Full memory scanning**: Increase `max_memory_scan_size` 3. **Include system processes**: Set `skip_system_processes = false` ## Platform-Specific Optimizations ### Windows - Run as Administrator for full process access - Use `PROCESS_QUERY_LIMITED_INFORMATION` when `PROCESS_QUERY_INFORMATION` fails - Handle access denied errors gracefully (system processes) ### Linux - Run with appropriate privileges (root or CAP_SYS_PTRACE) - Handle permission denied for /proc/[pid]/mem gracefully - Consider using process groups for batch access ### macOS - Limited functionality (process enumeration only) - Most detection features require kernel extensions or Endpoint Security framework ## Troubleshooting Performance Issues ### High CPU Usage 1. Reduce scan frequency (`scan_interval_ms`) 2. Disable thread analysis for each scan 3. Skip memory region enumeration 4. Filter out known-good processes ### High Memory Usage 1. Reduce baseline cache size (limited processes tracked) 2. Clear detection history periodically 3. Limit memory reading buffer sizes ### Slow Detection Response 1. Disable hook detection (expensive module enumeration) 2. Skip shellcode pattern matching 3. Use performance preset mode ## Current Implementation Limits **What's NOT implemented**: - No performance metrics collection system - No Prometheus/monitoring integration - No SIMD-accelerated pattern matching - No parallel/async process scanning (single-threaded) - No LRU caching of results - No batch processing APIs **Current architecture**: - Sequential process scanning - Simple HashMap for baseline tracking - Basic confidence scoring - Manual timer-based intervals (TUI) ## Testing Performance ```rust #[test] fn test_detection_performance() { use std::time::Instant; let mut engine = DetectionEngine::new().unwrap(); let process = ProcessInfo::new(1234, 4, "test.exe".to_string()); let regions = vec![/* test regions */]; let start = Instant::now(); for _ in 0..100 { engine.analyze_process(&process, ®ions, None); } let duration = start.elapsed(); // Should complete 100 analyses in under 100ms assert!(duration.as_millis() < 100); } ``` ## Best Practices 1. **Start with defaults**: Use `DetectionConfig::default()` initially 2. **Profile specific modules**: Identify which detection is slow 3. **Adjust based on needs**: Disable features you don't need 4. **Handle errors gracefully**: Processes may exit during scan 5. **Test on target hardware**: Performance varies by system ## Future Performance Improvements Potential enhancements (not yet implemented): - Parallel process analysis using rayon - Async I/O for file system operations (Linux) - Result caching with TTL - Incremental scanning (only changed processes) - Memory-mapped file parsing - SIMD pattern matching for shellcode