feat: Add PE header validation and LD_PRELOAD detection

2025-11-17 22:02:41 +02:00
parent 96b0d12099
commit b1f098571d
15 changed files with 2708 additions and 459 deletions
--- a/docs/PERFORMANCE_GUIDE.md
+++ b/docs/PERFORMANCE_GUIDE.md
@@ -2,298 +2,197 @@

 ## Overview

-Ghost is designed for high-performance real-time detection with minimal system impact. This guide covers optimization strategies and performance monitoring.
+Ghost is designed for process injection detection with configurable performance characteristics. This guide covers actual optimization strategies and expected performance.

 ## Performance Characteristics

-### Detection Engine Performance
+### Expected Detection Engine Performance

- **Scan Speed**: 500-1000 processes/second on modern hardware
- **Memory Usage**: 50-100MB base footprint
- **CPU Impact**: <2% during active monitoring
- **Latency**: <10ms detection response time
+- **Process Enumeration**: 10-50ms for all system processes
+- **Memory Region Analysis**: 1-5ms per process (platform-dependent)
+- **Thread Enumeration**: 1-10ms per process
+- **Detection Heuristics**: <1ms per process
+- **Memory Usage**: ~10-20MB for core engine

-### Optimization Techniques
+**Note**: Actual performance varies significantly by:
+- Number of processes (100-1000+ typical)
+- Memory region count per process
+- Thread count per process
+- Platform (Windows APIs vs Linux procfs)

-#### 1. Selective Scanning
+### Configuration Options
+
+#### 1. Selective Detection

 ```rust
-// Configure detection modules based on threat landscape
-let mut config = DetectionConfig::new();
-config.enable_shellcode_detection(true);
-config.enable_hook_detection(false); // Disable if not needed
-config.enable_anomaly_detection(true);
+use ghost_core::config::DetectionConfig;
+
+// Disable expensive detections for performance
+let mut config = DetectionConfig::default();
+config.rwx_detection = true;      // Fast: O(n) memory regions
+config.shellcode_detection = false; // Skip pattern matching
+config.hook_detection = false;    // Skip module enumeration
+config.thread_detection = true;   // Moderate: thread enum
+config.hollowing_detection = false; // Skip heuristics
 ```

-#### 2. Batch Processing
+#### 2. Preset Modes

 ```rust
-// Process multiple items in batches for efficiency
-let processes = enumerate_processes()?;
-let results: Vec<DetectionResult> = processes
-    .chunks(10)
-    .flat_map(|chunk| engine.analyze_batch(chunk))
-    .collect();
+// Fast scanning mode
+let config = DetectionConfig::performance_mode();
+
+// Thorough scanning mode
+let config = DetectionConfig::thorough_mode();
 ```

-#### 3. Memory Pool Management
+#### 3. Process Filtering

 ```rust
-// Pre-allocate memory pools to reduce allocations
-pub struct MemoryPool {
-    process_buffers: Vec<ProcessBuffer>,
-    detection_results: Vec<DetectionResult>,
-}
+// Skip system processes
+config.skip_system_processes = true;
+
+// Limit memory scan size
+config.max_memory_scan_size = 10 * 1024 * 1024; // 10MB per process
 ```

-## Performance Monitoring
+## Performance Considerations

-### Built-in Metrics
+### Platform-Specific Performance

-```rust
-use ghost_core::metrics::PerformanceMonitor;
+**Windows**:
+- CreateToolhelp32Snapshot: Single syscall, fast
+- VirtualQueryEx: Iterative, slower for processes with many regions
+- ReadProcessMemory: Cross-process, requires proper handles
+- NtQueryInformationThread: Undocumented API call per thread

-let monitor = PerformanceMonitor::new();
-monitor.start_collection();
+**Linux**:
+- /proc enumeration: Directory reads, fast
+- /proc/[pid]/maps parsing: File I/O, moderate
+- /proc/[pid]/mem reading: Requires ptrace or same user
+- /proc/[pid]/task parsing: Per-thread file I/O

-// Detection operations...
+**macOS**:
+- sysctl KERN_PROC_ALL: Single syscall, fast
+- Memory/thread analysis: Not yet implemented

-let stats = monitor.get_statistics();
-println!("Avg scan time: {:.2}ms", stats.avg_scan_time);
-println!("Memory usage: {}MB", stats.memory_usage_mb);
-```
-
-### Custom Benchmarks
+### Running Tests

 ```bash
-# Run comprehensive benchmarks
-cargo bench
+# Run all tests including performance assertions
+cargo test

-# Profile specific operations
-cargo bench -- shellcode_detection
-cargo bench -- process_enumeration
+# Run tests with timing output
+cargo test -- --nocapture
 ```

 ## Tuning Guidelines

-### For High-Volume Environments
+### For Continuous Monitoring

-1. **Increase batch sizes**: Process 20-50 items per batch
-2. **Reduce scan frequency**: 2-5 second intervals
-3. **Enable result caching**: Cache stable process states
-4. **Use filtered scanning**: Skip known-good processes
+1. **Adjust scan interval**: Configure `scan_interval_ms` in DetectionConfig
+2. **Skip system processes**: Set `skip_system_processes = true`
+3. **Limit memory scans**: Reduce `max_memory_scan_size`
+4. **Disable heavy detections**: Turn off hook_detection and shellcode_detection

-### For Low-Latency Requirements
+### For One-Time Analysis

-1. **Decrease batch sizes**: Process 1-5 items per batch
-2. **Increase scan frequency**: Sub-second intervals
-3. **Disable heavy detections**: Skip complex ML analysis
-4. **Use memory-mapped scanning**: Direct memory access
-
-### Memory Optimization
-
-```rust
-// Configure memory limits
-let config = DetectionConfig {
-    max_memory_usage_mb: 200,
-    enable_result_compression: true,
-    cache_size_limit: 1000,
-    ..Default::default()
-};
-```
+1. **Enable all detections**: Use `DetectionConfig::thorough_mode()`
+2. **Full memory scanning**: Increase `max_memory_scan_size`
+3. **Include system processes**: Set `skip_system_processes = false`

 ## Platform-Specific Optimizations

 ### Windows

- Use `SetProcessWorkingSetSize` to limit memory
- Enable `SE_INCREASE_QUOTA_NAME` privilege for better access
- Leverage Windows Performance Toolkit (WPT) for profiling
+- Run as Administrator for full process access
+- Use `PROCESS_QUERY_LIMITED_INFORMATION` when `PROCESS_QUERY_INFORMATION` fails
+- Handle access denied errors gracefully (system processes)

 ### Linux

- Use `cgroups` for resource isolation
- Enable `CAP_SYS_PTRACE` for enhanced process access
- Leverage `perf` for detailed performance analysis
+- Run with appropriate privileges (root or CAP_SYS_PTRACE)
+- Handle permission denied for /proc/[pid]/mem gracefully
+- Consider using process groups for batch access
+
+### macOS
+
+- Limited functionality (process enumeration only)
+- Most detection features require kernel extensions or Endpoint Security framework

 ## Troubleshooting Performance Issues

 ### High CPU Usage

-1. Check scan frequency settings
-2. Verify filter effectiveness
-3. Profile detection module performance
-4. Consider disabling expensive detections
+1. Reduce scan frequency (`scan_interval_ms`)
+2. Disable thread analysis for each scan
+3. Skip memory region enumeration
+4. Filter out known-good processes

 ### High Memory Usage

-1. Monitor result cache sizes
-2. Check for memory leaks in custom modules
-3. Verify proper cleanup of process handles
-4. Consider reducing batch sizes
+1. Reduce baseline cache size (limited processes tracked)
+2. Clear detection history periodically
+3. Limit memory reading buffer sizes

 ### Slow Detection Response

-1. Profile individual detection modules
-2. Check system resource availability
-3. Verify network latency (if applicable)
-4. Consider async processing optimization
+1. Disable hook detection (expensive module enumeration)
+2. Skip shellcode pattern matching
+3. Use performance preset mode

-## Benchmarking Results
+## Current Implementation Limits

-### Baseline Performance (Intel i7-9700K, 32GB RAM)
+**What's NOT implemented**:
+- No performance metrics collection system
+- No Prometheus/monitoring integration
+- No SIMD-accelerated pattern matching
+- No parallel/async process scanning (single-threaded)
+- No LRU caching of results
+- No batch processing APIs

-```
-Process Enumeration:     2.3ms (avg)
-Shellcode Detection:     0.8ms per process
-Hook Detection:          1.2ms per process
-Anomaly Analysis:        3.5ms per process
-Full Scan (100 proc):    847ms total
-```
+**Current architecture**:
+- Sequential process scanning
+- Simple HashMap for baseline tracking
+- Basic confidence scoring
+- Manual timer-based intervals (TUI)

-### Memory Usage
-
-```
-Base Engine:            45MB
-+ Shellcode Patterns:   +12MB
-+ ML Models:           +23MB
-+ Result Cache:        +15MB (1000 entries)
-Total Runtime:         95MB typical
-```
-
-## Advanced Optimizations
-
-### SIMD Acceleration
+## Testing Performance

 ```rust
-// Enable SIMD for pattern matching
-#[cfg(target_feature = "avx2")]
-use std::arch::x86_64::*;
+#[test]
+fn test_detection_performance() {
+    use std::time::Instant;

-// Vectorized shellcode scanning
-unsafe fn simd_pattern_search(data: &[u8], pattern: &[u8]) -> bool {
-    // AVX2 accelerated pattern matching
-}
-```
+    let mut engine = DetectionEngine::new().unwrap();
+    let process = ProcessInfo::new(1234, 4, "test.exe".to_string());
+    let regions = vec![/* test regions */];

-### Multi-threading
-
-```rust
-use rayon::prelude::*;
-
-// Parallel process analysis
-let results: Vec<DetectionResult> = processes
-    .par_iter()
-    .map(|process| engine.analyze_process(process))
-    .collect();
-```
-
-### Caching Strategies
-
-```rust
-use lru::LruCache;
-
-pub struct DetectionCache {
-    process_hashes: LruCache<u32, u64>,
-    shellcode_results: LruCache<u64, bool>,
-    anomaly_profiles: LruCache<u32, ProcessProfile>,
-}
-```
-
-## Monitoring Dashboard Integration
-
-### Prometheus Metrics
-
-```rust
-use prometheus::{Counter, Histogram, Gauge};
-
-lazy_static! {
-    static ref SCAN_DURATION: Histogram = Histogram::new(
-        "ghost_scan_duration_seconds",
-        "Time spent scanning processes"
-    ).unwrap();
-    
-    static ref DETECTIONS_TOTAL: Counter = Counter::new(
-        "ghost_detections_total",
-        "Total number of detections"
-    ).unwrap();
-}
-```
-
-### Real-time Monitoring
-
-```rust
-// WebSocket-based real-time metrics
-pub struct MetricsServer {
-    connections: Vec<WebSocket>,
-    metrics_collector: PerformanceMonitor,
-}
-
-impl MetricsServer {
-    pub async fn broadcast_metrics(&self) {
-        let metrics = self.metrics_collector.get_real_time_stats();
-        let json = serde_json::to_string(&metrics).unwrap();
-        
-        for connection in &self.connections {
-            connection.send(json.clone()).await.ok();
-        }
+    let start = Instant::now();
+    for _ in 0..100 {
+        engine.analyze_process(&process, &regions, None);
    }
+    let duration = start.elapsed();
+
+    // Should complete 100 analyses in under 100ms
+    assert!(duration.as_millis() < 100);
 }
 ```

 ## Best Practices

-1. **Profile First**: Always benchmark before optimizing
-2. **Measure Impact**: Quantify optimization effectiveness
-3. **Monitor Production**: Continuous performance monitoring
-4. **Gradual Tuning**: Make incremental adjustments
-5. **Document Changes**: Track optimization history
+1. **Start with defaults**: Use `DetectionConfig::default()` initially
+2. **Profile specific modules**: Identify which detection is slow
+3. **Adjust based on needs**: Disable features you don't need
+4. **Handle errors gracefully**: Processes may exit during scan
+5. **Test on target hardware**: Performance varies by system

-## Performance Testing Framework
+## Future Performance Improvements

-```rust
-#[cfg(test)]
-mod performance_tests {
-    use super::*;
-    use std::time::Instant;
-    
-    #[test]
-    fn benchmark_full_system_scan() {
-        let engine = DetectionEngine::new().unwrap();
-        let start = Instant::now();
-        
-        let results = engine.scan_all_processes().unwrap();
-        let duration = start.elapsed();
-        
-        assert!(duration.as_millis() < 5000, "Scan took too long");
-        assert!(results.len() > 0, "No processes detected");
-    }
-    
-    #[test]
-    fn memory_usage_benchmark() {
-        let initial = get_memory_usage();
-        let engine = DetectionEngine::new().unwrap();
-        
-        // Perform operations
-        for _ in 0..1000 {
-            engine.analyze_dummy_process();
-        }
-        
-        let final_usage = get_memory_usage();
-        let growth = final_usage - initial;
-        
-        assert!(growth < 50_000_000, "Memory usage grew too much: {}MB", 
-                growth / 1_000_000);
-    }
-}
-```
-
-## Conclusion
-
-Ghost's performance can be fine-tuned for various deployment scenarios. Regular monitoring and benchmarking ensure optimal operation while maintaining security effectiveness.
-
-For additional performance support, see:
-
- [Profiling Guide](PROFILING.md)
- [Deployment Strategies](DEPLOYMENT.md)
- [Scaling Recommendations](SCALING.md)
+Potential enhancements (not yet implemented):
+- Parallel process analysis using rayon
+- Async I/O for file system operations (Linux)
+- Result caching with TTL
+- Incremental scanning (only changed processes)
+- Memory-mapped file parsing
+- SIMD pattern matching for shellcode