Files

pandaadir05 b1f098571d feat: Add PE header validation and LD_PRELOAD detection

2025-11-17 22:02:41 +02:00

5.6 KiB

Raw Blame History

Performance Optimization Guide

Overview

Ghost is designed for process injection detection with configurable performance characteristics. This guide covers actual optimization strategies and expected performance.

Performance Characteristics

Expected Detection Engine Performance

Process Enumeration: 10-50ms for all system processes
Memory Region Analysis: 1-5ms per process (platform-dependent)
Thread Enumeration: 1-10ms per process
Detection Heuristics: <1ms per process
Memory Usage: ~10-20MB for core engine

Note: Actual performance varies significantly by:

Number of processes (100-1000+ typical)
Memory region count per process
Thread count per process
Platform (Windows APIs vs Linux procfs)

Configuration Options

1. Selective Detection

use ghost_core::config::DetectionConfig;

// Disable expensive detections for performance
let mut config = DetectionConfig::default();
config.rwx_detection = true;      // Fast: O(n) memory regions
config.shellcode_detection = false; // Skip pattern matching
config.hook_detection = false;    // Skip module enumeration
config.thread_detection = true;   // Moderate: thread enum
config.hollowing_detection = false; // Skip heuristics

2. Preset Modes

// Fast scanning mode
let config = DetectionConfig::performance_mode();

// Thorough scanning mode
let config = DetectionConfig::thorough_mode();

3. Process Filtering

// Skip system processes
config.skip_system_processes = true;

// Limit memory scan size
config.max_memory_scan_size = 10 * 1024 * 1024; // 10MB per process

Performance Considerations

Platform-Specific Performance

Windows:

CreateToolhelp32Snapshot: Single syscall, fast
VirtualQueryEx: Iterative, slower for processes with many regions
ReadProcessMemory: Cross-process, requires proper handles
NtQueryInformationThread: Undocumented API call per thread

Linux:

/proc enumeration: Directory reads, fast
/proc/[pid]/maps parsing: File I/O, moderate
/proc/[pid]/mem reading: Requires ptrace or same user
/proc/[pid]/task parsing: Per-thread file I/O

macOS:

sysctl KERN_PROC_ALL: Single syscall, fast
Memory/thread analysis: Not yet implemented

Running Tests

# Run all tests including performance assertions
cargo test

# Run tests with timing output
cargo test -- --nocapture

Tuning Guidelines

For Continuous Monitoring

Adjust scan interval: Configure scan_interval_ms in DetectionConfig
Skip system processes: Set skip_system_processes = true
Limit memory scans: Reduce max_memory_scan_size
Disable heavy detections: Turn off hook_detection and shellcode_detection

For One-Time Analysis

Enable all detections: Use DetectionConfig::thorough_mode()
Full memory scanning: Increase max_memory_scan_size
Include system processes: Set skip_system_processes = false

Platform-Specific Optimizations

Windows

Run as Administrator for full process access
Use PROCESS_QUERY_LIMITED_INFORMATION when PROCESS_QUERY_INFORMATION fails
Handle access denied errors gracefully (system processes)

Linux

Run with appropriate privileges (root or CAP_SYS_PTRACE)
Handle permission denied for /proc/[pid]/mem gracefully
Consider using process groups for batch access

macOS

Limited functionality (process enumeration only)
Most detection features require kernel extensions or Endpoint Security framework

Troubleshooting Performance Issues

High CPU Usage

Reduce scan frequency (scan_interval_ms)
Disable thread analysis for each scan
Skip memory region enumeration
Filter out known-good processes

High Memory Usage

Reduce baseline cache size (limited processes tracked)
Clear detection history periodically
Limit memory reading buffer sizes

Slow Detection Response

Disable hook detection (expensive module enumeration)
Skip shellcode pattern matching
Use performance preset mode

Current Implementation Limits

What's NOT implemented:

No performance metrics collection system
No Prometheus/monitoring integration
No SIMD-accelerated pattern matching
No parallel/async process scanning (single-threaded)
No LRU caching of results
No batch processing APIs

Current architecture:

Sequential process scanning
Simple HashMap for baseline tracking
Basic confidence scoring
Manual timer-based intervals (TUI)

Testing Performance

#[test]
fn test_detection_performance() {
    use std::time::Instant;

    let mut engine = DetectionEngine::new().unwrap();
    let process = ProcessInfo::new(1234, 4, "test.exe".to_string());
    let regions = vec![/* test regions */];

    let start = Instant::now();
    for _ in 0..100 {
        engine.analyze_process(&process, &regions, None);
    }
    let duration = start.elapsed();

    // Should complete 100 analyses in under 100ms
    assert!(duration.as_millis() < 100);
}

Best Practices

Start with defaults: Use DetectionConfig::default() initially
Profile specific modules: Identify which detection is slow
Adjust based on needs: Disable features you don't need
Handle errors gracefully: Processes may exit during scan
Test on target hardware: Performance varies by system

Future Performance Improvements

Potential enhancements (not yet implemented):

Parallel process analysis using rayon
Async I/O for file system operations (Linux)
Result caching with TTL
Incremental scanning (only changed processes)
Memory-mapped file parsing
SIMD pattern matching for shellcode

5.6 KiB Raw Blame History