feat: Add PE header validation and LD_PRELOAD detection

This commit is contained in:
pandaadir05
2025-11-17 22:02:41 +02:00
parent 96b0d12099
commit b1f098571d
15 changed files with 2708 additions and 459 deletions

View File

@@ -46,11 +46,37 @@ Monitors thread count changes over time. Sudden increases may indicate CreateRem
Threads created by external processes via CreateRemoteThread or NtCreateThreadEx.
**Detection Logic** (Planned):
- Compare thread creator PID with owner PID
- Check thread start addresses against known modules
**Detection Logic**:
- Enumerate threads using CreateToolhelp32Snapshot (Windows) or /proc/[pid]/task (Linux)
- Get thread start addresses via NtQueryInformationThread (Windows) or /proc syscall file (Linux)
- Get thread creation times via GetThreadTimes (Windows) or stat parsing (Linux)
- Track thread state (Running, Waiting, Suspended, Terminated)
- Flag threads starting in private memory regions
## Hook Detection
### Inline API Hooks
**MITRE ATT&CK**: T1055.003
Detects JMP patches at the start of critical API functions.
**Detection Logic**:
- Enumerate loaded modules in target process (EnumProcessModulesEx)
- Check entry points of critical APIs (ntdll, kernel32, user32)
- Detect common hook patterns:
- JMP rel32 (E9 xx xx xx xx)
- JMP [rip+disp32] (FF 25 xx xx xx xx)
- MOV RAX, imm64; JMP RAX (48 B8 ... FF E0)
- PUSH imm32; RET (68 xx xx xx xx C3)
**Critical APIs Monitored**:
- NtCreateThread, NtCreateThreadEx
- NtAllocateVirtualMemory, NtWriteVirtualMemory, NtProtectVirtualMemory
- VirtualAllocEx, WriteProcessMemory, CreateRemoteThread
- LoadLibraryA, LoadLibraryW
- SetWindowsHookExA, SetWindowsHookExW
## Heuristic Analysis
### Confidence Scoring
@@ -74,23 +100,37 @@ Ghost uses weighted confidence scoring:
### Windows
- [x] Classic DLL injection detection
- [x] Memory region analysis
- [x] Thread enumeration
- [x] Memory region analysis (VirtualQueryEx)
- [x] Memory reading (ReadProcessMemory)
- [x] Thread enumeration (CreateToolhelp32Snapshot)
- [x] Thread start addresses (NtQueryInformationThread)
- [x] Thread creation times (GetThreadTimes)
- [x] Inline hook detection (JMP pattern scanning)
- [x] Process hollowing heuristics
- [ ] APC injection detection
- [ ] Process hollowing detection
- [ ] Hook detection (IAT/EAT)
- [ ] Reflective DLL injection
- [ ] SetWindowsHookEx chain enumeration
- [ ] Reflective DLL injection signature matching
### Linux
- [ ] ptrace injection
- [ ] LD_PRELOAD detection
- [x] Process enumeration (/proc filesystem)
- [x] Memory region analysis (/proc/[pid]/maps)
- [x] Memory reading (/proc/[pid]/mem)
- [x] Thread enumeration (/proc/[pid]/task)
- [x] Thread state detection (stat parsing)
- [x] ptrace injection detection
- [x] LD_PRELOAD detection
- [ ] process_vm_writev monitoring
- [ ] Shared memory inspection
### macOS
- [ ] DYLD_INSERT_LIBRARIES
- [x] Process enumeration (sysctl KERN_PROC_ALL)
- [x] Process path retrieval (proc_pidpath)
- [ ] Memory enumeration (vm_region)
- [ ] Memory reading (vm_read)
- [ ] Thread enumeration (task_threads)
- [ ] DYLD_INSERT_LIBRARIES detection
- [ ] task_for_pid monitoring
- [ ] Mach port analysis

View File

@@ -46,7 +46,9 @@ Ghost detection engine coverage mapped to MITRE ATT&CK framework techniques.
- **Indicators**:
- Unmapped main executable image
- Suspicious memory gaps (>16MB)
- PE header mismatches
- PE header validation (DOS/NT signatures)
- Image base mismatches
- Corrupted PE structures
- Unusual entry point locations
- Memory layout anomalies
- **Confidence**: Very High (0.8-1.0)
@@ -121,35 +123,82 @@ Ghost detection engine coverage mapped to MITRE ATT&CK framework techniques.
| Technique | Detection Module | Implementation Status | Test Coverage |
|-----------|------------------|----------------------|---------------|
| T1055.001 | hooks.rs | ✅ Complete | ✅ Tested |
| T1055.002 | shellcode.rs | ✅ Complete | ✅ Tested |
| T1055.003 | thread.rs | ✅ Complete | ✅ Tested |
| T1055.004 | detection.rs | ⚠️ Partial | ✅ Tested |
| T1055.012 | hollowing.rs | ✅ Complete | ✅ Tested |
| T1027 | shellcode.rs | ✅ Complete | ✅ Tested |
| T1036 | process.rs | ⚠️ Partial | ❌ Pending |
| T1106 | detection.rs | ⚠️ Basic | ❌ Pending |
| T1055.001 | hooks.rs | ✅ Inline hooks + Linux LD_PRELOAD | ❌ Basic |
| T1055.002 | shellcode.rs | ⚠️ Heuristic only | ✅ Basic |
| T1055.003 | thread.rs | ✅ Thread enumeration | ✅ Unit tests |
| T1055.004 | detection.rs | ⚠️ Heuristic only | ✅ Basic |
| T1055.012 | hollowing.rs | ✅ PE header validation | ❌ Pending |
| T1027 | shellcode.rs | ⚠️ Basic patterns | ❌ Pending |
| T1036 | process.rs | ❌ Not implemented | ❌ Pending |
| T1106 | detection.rs | ❌ Not implemented | ❌ Pending |
**Implementation Status Legend**:
- ✅ Complete: Full implementation with actual API calls
- ⚠️ Partial: Heuristic-based or incomplete implementation
- ❌ Not implemented: Placeholder or missing
## Current Implementation Details
### What's Actually Implemented
1. **Memory Analysis** (memory.rs)
- Windows: VirtualQueryEx, ReadProcessMemory
- Linux: /proc/[pid]/maps parsing, /proc/[pid]/mem reading
- macOS: Not implemented
2. **Thread Analysis** (thread.rs)
- Windows: Thread32First/Next, NtQueryInformationThread, GetThreadTimes
- Linux: /proc/[pid]/task enumeration, stat parsing
- macOS: Not implemented
3. **Hook Detection** (hooks.rs)
- Windows: Inline hook detection via JMP pattern scanning
- Linux: LD_PRELOAD detection, LD_LIBRARY_PATH monitoring, ptrace detection
- Detects suspicious library loading from /tmp/, /dev/shm/, etc.
- Does NOT enumerate SetWindowsHookEx chains on Windows
- No IAT/EAT hook scanning (pattern detection only)
4. **Process Hollowing Detection** (hollowing.rs)
- Windows: Full PE header validation (DOS/NT signatures, image base)
- Detects corrupted PE structures
- Detects image base mismatches
- Memory layout anomaly detection
- Memory gap analysis
5. **Process Enumeration** (process.rs)
- Windows: CreateToolhelp32Snapshot
- Linux: /proc filesystem
- macOS: sysctl KERN_PROC_ALL
### What's NOT Implemented
- Actual shellcode signature database
- Entropy analysis for obfuscation detection
- SetWindowsHookEx chain parsing (Windows)
- APC injection detection
- MITRE ATT&CK technique attribution (framework only)
- process_vm_writev monitoring (Linux)
## Future Enhancements
### High Priority
- **T1055.008** - Ptrace System Calls (Linux)
- **T1055.009** - Proc Memory (Linux)
- **T1055.008** - Ptrace System Calls (Linux) - ✅ Basic detection implemented
- **T1055.013** - Process Doppelgänging
- **T1055.014** - VDSO Hijacking (Linux)
- Shellcode signature database
### Medium Priority
### Medium Priority
- **T1134** - Access Token Manipulation
- **T1548.002** - Bypass User Account Control
- **T1562.001** - Disable or Modify Tools
- SetWindowsHookEx chain enumeration
- IAT/EAT hook scanning
- LD_PRELOAD detection (Linux) - ✅ Implemented
### Research Areas
- Machine learning-based anomaly detection
- Graph analysis of process relationships
- Timeline analysis for attack progression
- Behavioral analysis over time
- Process relationship analysis
- Integration with threat intelligence feeds
## References

View File

@@ -2,298 +2,197 @@
## Overview
Ghost is designed for high-performance real-time detection with minimal system impact. This guide covers optimization strategies and performance monitoring.
Ghost is designed for process injection detection with configurable performance characteristics. This guide covers actual optimization strategies and expected performance.
## Performance Characteristics
### Detection Engine Performance
### Expected Detection Engine Performance
- **Scan Speed**: 500-1000 processes/second on modern hardware
- **Memory Usage**: 50-100MB base footprint
- **CPU Impact**: <2% during active monitoring
- **Latency**: <10ms detection response time
- **Process Enumeration**: 10-50ms for all system processes
- **Memory Region Analysis**: 1-5ms per process (platform-dependent)
- **Thread Enumeration**: 1-10ms per process
- **Detection Heuristics**: <1ms per process
- **Memory Usage**: ~10-20MB for core engine
### Optimization Techniques
**Note**: Actual performance varies significantly by:
- Number of processes (100-1000+ typical)
- Memory region count per process
- Thread count per process
- Platform (Windows APIs vs Linux procfs)
#### 1. Selective Scanning
### Configuration Options
#### 1. Selective Detection
```rust
// Configure detection modules based on threat landscape
let mut config = DetectionConfig::new();
config.enable_shellcode_detection(true);
config.enable_hook_detection(false); // Disable if not needed
config.enable_anomaly_detection(true);
use ghost_core::config::DetectionConfig;
// Disable expensive detections for performance
let mut config = DetectionConfig::default();
config.rwx_detection = true; // Fast: O(n) memory regions
config.shellcode_detection = false; // Skip pattern matching
config.hook_detection = false; // Skip module enumeration
config.thread_detection = true; // Moderate: thread enum
config.hollowing_detection = false; // Skip heuristics
```
#### 2. Batch Processing
#### 2. Preset Modes
```rust
// Process multiple items in batches for efficiency
let processes = enumerate_processes()?;
let results: Vec<DetectionResult> = processes
.chunks(10)
.flat_map(|chunk| engine.analyze_batch(chunk))
.collect();
// Fast scanning mode
let config = DetectionConfig::performance_mode();
// Thorough scanning mode
let config = DetectionConfig::thorough_mode();
```
#### 3. Memory Pool Management
#### 3. Process Filtering
```rust
// Pre-allocate memory pools to reduce allocations
pub struct MemoryPool {
process_buffers: Vec<ProcessBuffer>,
detection_results: Vec<DetectionResult>,
}
// Skip system processes
config.skip_system_processes = true;
// Limit memory scan size
config.max_memory_scan_size = 10 * 1024 * 1024; // 10MB per process
```
## Performance Monitoring
## Performance Considerations
### Built-in Metrics
### Platform-Specific Performance
```rust
use ghost_core::metrics::PerformanceMonitor;
**Windows**:
- CreateToolhelp32Snapshot: Single syscall, fast
- VirtualQueryEx: Iterative, slower for processes with many regions
- ReadProcessMemory: Cross-process, requires proper handles
- NtQueryInformationThread: Undocumented API call per thread
let monitor = PerformanceMonitor::new();
monitor.start_collection();
**Linux**:
- /proc enumeration: Directory reads, fast
- /proc/[pid]/maps parsing: File I/O, moderate
- /proc/[pid]/mem reading: Requires ptrace or same user
- /proc/[pid]/task parsing: Per-thread file I/O
// Detection operations...
**macOS**:
- sysctl KERN_PROC_ALL: Single syscall, fast
- Memory/thread analysis: Not yet implemented
let stats = monitor.get_statistics();
println!("Avg scan time: {:.2}ms", stats.avg_scan_time);
println!("Memory usage: {}MB", stats.memory_usage_mb);
```
### Custom Benchmarks
### Running Tests
```bash
# Run comprehensive benchmarks
cargo bench
# Run all tests including performance assertions
cargo test
# Profile specific operations
cargo bench -- shellcode_detection
cargo bench -- process_enumeration
# Run tests with timing output
cargo test -- --nocapture
```
## Tuning Guidelines
### For High-Volume Environments
### For Continuous Monitoring
1. **Increase batch sizes**: Process 20-50 items per batch
2. **Reduce scan frequency**: 2-5 second intervals
3. **Enable result caching**: Cache stable process states
4. **Use filtered scanning**: Skip known-good processes
1. **Adjust scan interval**: Configure `scan_interval_ms` in DetectionConfig
2. **Skip system processes**: Set `skip_system_processes = true`
3. **Limit memory scans**: Reduce `max_memory_scan_size`
4. **Disable heavy detections**: Turn off hook_detection and shellcode_detection
### For Low-Latency Requirements
### For One-Time Analysis
1. **Decrease batch sizes**: Process 1-5 items per batch
2. **Increase scan frequency**: Sub-second intervals
3. **Disable heavy detections**: Skip complex ML analysis
4. **Use memory-mapped scanning**: Direct memory access
### Memory Optimization
```rust
// Configure memory limits
let config = DetectionConfig {
max_memory_usage_mb: 200,
enable_result_compression: true,
cache_size_limit: 1000,
..Default::default()
};
```
1. **Enable all detections**: Use `DetectionConfig::thorough_mode()`
2. **Full memory scanning**: Increase `max_memory_scan_size`
3. **Include system processes**: Set `skip_system_processes = false`
## Platform-Specific Optimizations
### Windows
- Use `SetProcessWorkingSetSize` to limit memory
- Enable `SE_INCREASE_QUOTA_NAME` privilege for better access
- Leverage Windows Performance Toolkit (WPT) for profiling
- Run as Administrator for full process access
- Use `PROCESS_QUERY_LIMITED_INFORMATION` when `PROCESS_QUERY_INFORMATION` fails
- Handle access denied errors gracefully (system processes)
### Linux
- Use `cgroups` for resource isolation
- Enable `CAP_SYS_PTRACE` for enhanced process access
- Leverage `perf` for detailed performance analysis
- Run with appropriate privileges (root or CAP_SYS_PTRACE)
- Handle permission denied for /proc/[pid]/mem gracefully
- Consider using process groups for batch access
### macOS
- Limited functionality (process enumeration only)
- Most detection features require kernel extensions or Endpoint Security framework
## Troubleshooting Performance Issues
### High CPU Usage
1. Check scan frequency settings
2. Verify filter effectiveness
3. Profile detection module performance
4. Consider disabling expensive detections
1. Reduce scan frequency (`scan_interval_ms`)
2. Disable thread analysis for each scan
3. Skip memory region enumeration
4. Filter out known-good processes
### High Memory Usage
1. Monitor result cache sizes
2. Check for memory leaks in custom modules
3. Verify proper cleanup of process handles
4. Consider reducing batch sizes
1. Reduce baseline cache size (limited processes tracked)
2. Clear detection history periodically
3. Limit memory reading buffer sizes
### Slow Detection Response
1. Profile individual detection modules
2. Check system resource availability
3. Verify network latency (if applicable)
4. Consider async processing optimization
1. Disable hook detection (expensive module enumeration)
2. Skip shellcode pattern matching
3. Use performance preset mode
## Benchmarking Results
## Current Implementation Limits
### Baseline Performance (Intel i7-9700K, 32GB RAM)
**What's NOT implemented**:
- No performance metrics collection system
- No Prometheus/monitoring integration
- No SIMD-accelerated pattern matching
- No parallel/async process scanning (single-threaded)
- No LRU caching of results
- No batch processing APIs
```
Process Enumeration: 2.3ms (avg)
Shellcode Detection: 0.8ms per process
Hook Detection: 1.2ms per process
Anomaly Analysis: 3.5ms per process
Full Scan (100 proc): 847ms total
```
**Current architecture**:
- Sequential process scanning
- Simple HashMap for baseline tracking
- Basic confidence scoring
- Manual timer-based intervals (TUI)
### Memory Usage
```
Base Engine: 45MB
+ Shellcode Patterns: +12MB
+ ML Models: +23MB
+ Result Cache: +15MB (1000 entries)
Total Runtime: 95MB typical
```
## Advanced Optimizations
### SIMD Acceleration
## Testing Performance
```rust
// Enable SIMD for pattern matching
#[cfg(target_feature = "avx2")]
use std::arch::x86_64::*;
#[test]
fn test_detection_performance() {
use std::time::Instant;
// Vectorized shellcode scanning
unsafe fn simd_pattern_search(data: &[u8], pattern: &[u8]) -> bool {
// AVX2 accelerated pattern matching
}
```
let mut engine = DetectionEngine::new().unwrap();
let process = ProcessInfo::new(1234, 4, "test.exe".to_string());
let regions = vec![/* test regions */];
### Multi-threading
```rust
use rayon::prelude::*;
// Parallel process analysis
let results: Vec<DetectionResult> = processes
.par_iter()
.map(|process| engine.analyze_process(process))
.collect();
```
### Caching Strategies
```rust
use lru::LruCache;
pub struct DetectionCache {
process_hashes: LruCache<u32, u64>,
shellcode_results: LruCache<u64, bool>,
anomaly_profiles: LruCache<u32, ProcessProfile>,
}
```
## Monitoring Dashboard Integration
### Prometheus Metrics
```rust
use prometheus::{Counter, Histogram, Gauge};
lazy_static! {
static ref SCAN_DURATION: Histogram = Histogram::new(
"ghost_scan_duration_seconds",
"Time spent scanning processes"
).unwrap();
static ref DETECTIONS_TOTAL: Counter = Counter::new(
"ghost_detections_total",
"Total number of detections"
).unwrap();
}
```
### Real-time Monitoring
```rust
// WebSocket-based real-time metrics
pub struct MetricsServer {
connections: Vec<WebSocket>,
metrics_collector: PerformanceMonitor,
}
impl MetricsServer {
pub async fn broadcast_metrics(&self) {
let metrics = self.metrics_collector.get_real_time_stats();
let json = serde_json::to_string(&metrics).unwrap();
for connection in &self.connections {
connection.send(json.clone()).await.ok();
}
let start = Instant::now();
for _ in 0..100 {
engine.analyze_process(&process, &regions, None);
}
let duration = start.elapsed();
// Should complete 100 analyses in under 100ms
assert!(duration.as_millis() < 100);
}
```
## Best Practices
1. **Profile First**: Always benchmark before optimizing
2. **Measure Impact**: Quantify optimization effectiveness
3. **Monitor Production**: Continuous performance monitoring
4. **Gradual Tuning**: Make incremental adjustments
5. **Document Changes**: Track optimization history
1. **Start with defaults**: Use `DetectionConfig::default()` initially
2. **Profile specific modules**: Identify which detection is slow
3. **Adjust based on needs**: Disable features you don't need
4. **Handle errors gracefully**: Processes may exit during scan
5. **Test on target hardware**: Performance varies by system
## Performance Testing Framework
## Future Performance Improvements
```rust
#[cfg(test)]
mod performance_tests {
use super::*;
use std::time::Instant;
#[test]
fn benchmark_full_system_scan() {
let engine = DetectionEngine::new().unwrap();
let start = Instant::now();
let results = engine.scan_all_processes().unwrap();
let duration = start.elapsed();
assert!(duration.as_millis() < 5000, "Scan took too long");
assert!(results.len() > 0, "No processes detected");
}
#[test]
fn memory_usage_benchmark() {
let initial = get_memory_usage();
let engine = DetectionEngine::new().unwrap();
// Perform operations
for _ in 0..1000 {
engine.analyze_dummy_process();
}
let final_usage = get_memory_usage();
let growth = final_usage - initial;
assert!(growth < 50_000_000, "Memory usage grew too much: {}MB",
growth / 1_000_000);
}
}
```
## Conclusion
Ghost's performance can be fine-tuned for various deployment scenarios. Regular monitoring and benchmarking ensure optimal operation while maintaining security effectiveness.
For additional performance support, see:
- [Profiling Guide](PROFILING.md)
- [Deployment Strategies](DEPLOYMENT.md)
- [Scaling Recommendations](SCALING.md)
Potential enhancements (not yet implemented):
- Parallel process analysis using rayon
- Async I/O for file system operations (Linux)
- Result caching with TTL
- Incremental scanning (only changed processes)
- Memory-mapped file parsing
- SIMD pattern matching for shellcode