Files
ghost/docs/PERFORMANCE_GUIDE.md

5.6 KiB

Performance Optimization Guide

Overview

Ghost is designed for process injection detection with configurable performance characteristics. This guide covers actual optimization strategies and expected performance.

Performance Characteristics

Expected Detection Engine Performance

  • Process Enumeration: 10-50ms for all system processes
  • Memory Region Analysis: 1-5ms per process (platform-dependent)
  • Thread Enumeration: 1-10ms per process
  • Detection Heuristics: <1ms per process
  • Memory Usage: ~10-20MB for core engine

Note: Actual performance varies significantly by:

  • Number of processes (100-1000+ typical)
  • Memory region count per process
  • Thread count per process
  • Platform (Windows APIs vs Linux procfs)

Configuration Options

1. Selective Detection

use ghost_core::config::DetectionConfig;

// Disable expensive detections for performance
let mut config = DetectionConfig::default();
config.rwx_detection = true;      // Fast: O(n) memory regions
config.shellcode_detection = false; // Skip pattern matching
config.hook_detection = false;    // Skip module enumeration
config.thread_detection = true;   // Moderate: thread enum
config.hollowing_detection = false; // Skip heuristics

2. Preset Modes

// Fast scanning mode
let config = DetectionConfig::performance_mode();

// Thorough scanning mode
let config = DetectionConfig::thorough_mode();

3. Process Filtering

// Skip system processes
config.skip_system_processes = true;

// Limit memory scan size
config.max_memory_scan_size = 10 * 1024 * 1024; // 10MB per process

Performance Considerations

Platform-Specific Performance

Windows:

  • CreateToolhelp32Snapshot: Single syscall, fast
  • VirtualQueryEx: Iterative, slower for processes with many regions
  • ReadProcessMemory: Cross-process, requires proper handles
  • NtQueryInformationThread: Undocumented API call per thread

Linux:

  • /proc enumeration: Directory reads, fast
  • /proc/[pid]/maps parsing: File I/O, moderate
  • /proc/[pid]/mem reading: Requires ptrace or same user
  • /proc/[pid]/task parsing: Per-thread file I/O

macOS:

  • sysctl KERN_PROC_ALL: Single syscall, fast
  • Memory/thread analysis: Not yet implemented

Running Tests

# Run all tests including performance assertions
cargo test

# Run tests with timing output
cargo test -- --nocapture

Tuning Guidelines

For Continuous Monitoring

  1. Adjust scan interval: Configure scan_interval_ms in DetectionConfig
  2. Skip system processes: Set skip_system_processes = true
  3. Limit memory scans: Reduce max_memory_scan_size
  4. Disable heavy detections: Turn off hook_detection and shellcode_detection

For One-Time Analysis

  1. Enable all detections: Use DetectionConfig::thorough_mode()
  2. Full memory scanning: Increase max_memory_scan_size
  3. Include system processes: Set skip_system_processes = false

Platform-Specific Optimizations

Windows

  • Run as Administrator for full process access
  • Use PROCESS_QUERY_LIMITED_INFORMATION when PROCESS_QUERY_INFORMATION fails
  • Handle access denied errors gracefully (system processes)

Linux

  • Run with appropriate privileges (root or CAP_SYS_PTRACE)
  • Handle permission denied for /proc/[pid]/mem gracefully
  • Consider using process groups for batch access

macOS

  • Limited functionality (process enumeration only)
  • Most detection features require kernel extensions or Endpoint Security framework

Troubleshooting Performance Issues

High CPU Usage

  1. Reduce scan frequency (scan_interval_ms)
  2. Disable thread analysis for each scan
  3. Skip memory region enumeration
  4. Filter out known-good processes

High Memory Usage

  1. Reduce baseline cache size (limited processes tracked)
  2. Clear detection history periodically
  3. Limit memory reading buffer sizes

Slow Detection Response

  1. Disable hook detection (expensive module enumeration)
  2. Skip shellcode pattern matching
  3. Use performance preset mode

Current Implementation Limits

What's NOT implemented:

  • No performance metrics collection system
  • No Prometheus/monitoring integration
  • No SIMD-accelerated pattern matching
  • No parallel/async process scanning (single-threaded)
  • No LRU caching of results
  • No batch processing APIs

Current architecture:

  • Sequential process scanning
  • Simple HashMap for baseline tracking
  • Basic confidence scoring
  • Manual timer-based intervals (TUI)

Testing Performance

#[test]
fn test_detection_performance() {
    use std::time::Instant;

    let mut engine = DetectionEngine::new().unwrap();
    let process = ProcessInfo::new(1234, 4, "test.exe".to_string());
    let regions = vec![/* test regions */];

    let start = Instant::now();
    for _ in 0..100 {
        engine.analyze_process(&process, &regions, None);
    }
    let duration = start.elapsed();

    // Should complete 100 analyses in under 100ms
    assert!(duration.as_millis() < 100);
}

Best Practices

  1. Start with defaults: Use DetectionConfig::default() initially
  2. Profile specific modules: Identify which detection is slow
  3. Adjust based on needs: Disable features you don't need
  4. Handle errors gracefully: Processes may exit during scan
  5. Test on target hardware: Performance varies by system

Future Performance Improvements

Potential enhancements (not yet implemented):

  • Parallel process analysis using rayon
  • Async I/O for file system operations (Linux)
  • Result caching with TTL
  • Incremental scanning (only changed processes)
  • Memory-mapped file parsing
  • SIMD pattern matching for shellcode