feat: add multi-encoding support, .gitignore integration, and WebSocket optimization

- Add multi-encoding file support (UTF-8, GBK, GB2312, Latin-1)
- Add .gitignore integration for automatic pattern exclusion
- Optimize WebSocket reconnection with exponential backoff
- Change tool response format from list to dict for better compatibility
- Add bilingual README (Chinese and English)
- Update installation instructions
- Add project description to pyproject.toml
This commit is contained in:
wmymz
2025-11-07 13:20:18 +08:00
parent ab4451b777
commit c15139f5ee
10 changed files with 747 additions and 153 deletions

349
README.md
View File

@@ -1,23 +1,44 @@
简体中文 | [English](./README_EN.md)
# Acemcp
MCP server for codebase indexing and semantic search.
代码库索引和语义搜索的 MCP 服务器。
<a href="https://glama.ai/mcp/servers/@qy527145/acemcp">
<img width="380" height="200" src="https://glama.ai/mcp/servers/@qy527145/acemcp/badge" alt="Acemcp MCP server" />
</a>
## Installation
## 安装
### 作为工具安装(推荐)
```bash
uv add mcp httpx fastapi "uvicorn[standard]" toml websockets
uv sync
# 安装到系统
uv tool install acemcp
# 或临时运行(无需安装)
uvx acemcp
```
## Configuration
### 开发安装
The configuration file is automatically created at `~/.acemcp/settings.toml` on first run with default values.
```bash
# 克隆仓库
git clone https://github.com/qy527145/acemcp.git
cd acemcp
Edit `~/.acemcp/settings.toml` to configure:
# 安装依赖
uv sync
# 运行
uv run acemcp
```
## 配置
配置文件会在首次运行时自动创建在 `~/.acemcp/settings.toml`,包含默认值。
编辑 `~/.acemcp/settings.toml` 进行配置:
```toml
BATCH_SIZE = 10
MAX_LINES_PER_BLOB = 800
@@ -27,24 +48,24 @@ TEXT_EXTENSIONS = [".py", ".js", ".ts", ...]
EXCLUDE_PATTERNS = [".venv", "node_modules", ".git", "__pycache__", "*.pyc", ...]
```
**Configuration options:**
- `BATCH_SIZE`: Number of files to upload per batch (default: 10)
- `MAX_LINES_PER_BLOB`: Maximum lines per blob before splitting large files (default: 800)
- `BASE_URL`: API endpoint URL
- `TOKEN`: Authentication token
- `TEXT_EXTENSIONS`: List of file extensions to index
- `EXCLUDE_PATTERNS`: List of patterns to exclude from indexing (supports wildcards like `*.pyc`)
**配置选项:**
- `BATCH_SIZE`: 每批上传的文件数量(默认:10
- `MAX_LINES_PER_BLOB`: 大文件分割前的最大行数(默认:800
- `BASE_URL`: API 端点 URL
- `TOKEN`: 认证令牌
- `TEXT_EXTENSIONS`: 要索引的文件扩展名列表
- `EXCLUDE_PATTERNS`: 要排除的模式列表(支持通配符如 `*.pyc`
You can also configure via:
- **Command line arguments** (highest priority): `--base-url`, `--token`
- **Web management interface** (updates user config file)
- **Environment variables** with `ACEMCP_` prefix
您还可以通过以下方式配置:
- **命令行参数**(最高优先级):`--base-url``--token`
- **Web 管理界面**(更新用户配置文件)
- **环境变量**(使用 `ACEMCP_` 前缀)
## MCP Configuration
## MCP 配置
Add the following to your MCP client configuration (e.g., Claude Desktop):
将以下内容添加到您的 MCP 客户端配置中(例如 Claude Desktop
### Basic Configuration
### 基础配置
```json
{
@@ -60,14 +81,14 @@ Add the following to your MCP client configuration (e.g., Claude Desktop):
```
**Available command line arguments:**
- `--base-url`: Override BASE_URL configuration
- `--token`: Override TOKEN configuration
- `--web-port`: Enable web management interface on specified port (e.g., 8080)
**可用的命令行参数:**
- `--base-url`: 覆盖 BASE_URL 配置
- `--token`: 覆盖 TOKEN 配置
- `--web-port`: 在指定端口启用 Web 管理界面(例如 8080
### Configuration with Web Management Interface
### 启用 Web 管理界面的配置
To enable the web management interface, add the `--web-port` argument:
要启用 Web 管理界面,添加 `--web-port` 参数:
```json
{
@@ -84,111 +105,118 @@ To enable the web management interface, add the `--web-port` argument:
}
```
Then access the management interface at `http://localhost:8888`
然后访问管理界面:`http://localhost:8888`
**Web Management Features:**
- **Configuration Management**: View and edit server configuration (BASE_URL, TOKEN, BATCH_SIZE, MAX_LINES_PER_BLOB, TEXT_EXTENSIONS)
- **Real-time Logs**: Monitor server logs in real-time via WebSocket connection
- **Tool Debugger**: Test and debug MCP tools directly from the web interface
- Test `index_code` tool with any project path
- Test `search_context` tool with project path and query
- View formatted results and error messages
**Web 管理功能:**
- **配置管理**:查看和编辑服务器配置(BASE_URLTOKENBATCH_SIZEMAX_LINES_PER_BLOBTEXT_EXTENSIONS
- **实时日志**:通过 WebSocket 连接实时监控服务器日志,具有智能重连功能
- 指数退避重连策略1秒 → 1.5秒 → 2.25秒 ... 最大 30秒
- 最多 10 次重连尝试,防止无限循环
- 网络故障时自动重连
- 减少日志噪音WebSocket 连接记录在 DEBUG 级别)
- **工具调试器**:直接从 Web 界面测试和调试 MCP 工具
- 测试 `search_context` 工具,输入项目路径和查询
- 查看格式化的结果和错误消息
## Tools
## 工具
### search_context
Search for relevant code context based on a query. This tool **automatically performs incremental indexing** before searching, ensuring results are always up-to-date. It performs **semantic search** across your codebase and returns formatted text snippets showing where relevant code is located.
基于查询搜索相关的代码上下文。此工具在搜索前**自动执行增量索引**,确保结果始终是最新的。它在您的代码库中执行**语义搜索**,并返回格式化的文本片段,显示相关代码的位置。
**Key Features:**
- **Automatic Incremental Indexing**: Before each search, the tool automatically indexes only new or modified files, skipping unchanged files for efficiency
- **No Manual Indexing Required**: You don't need to manually index your project - just search and the tool handles indexing automatically
- **Always Up-to-Date**: Search results reflect the current state of your codebase
**核心特性:**
- **自动增量索引**:每次搜索前,工具自动仅索引新文件或修改过的文件,跳过未更改的文件以提高效率
- **无需手动索引**:您无需手动索引项目 - 只需搜索,工具会自动处理索引
- **始终保持最新**:搜索结果反映代码库的当前状态
- **多编码支持**自动检测和处理多种文件编码UTF-8、GBK、GB2312、Latin-1
- **.gitignore 集成**:索引项目时自动遵守 `.gitignore` 模式
**Parameters:**
- `project_root_path` (string): Absolute path to the project root directory
- **IMPORTANT**: Use forward slashes (`/`) as path separators, even on Windows
- Windows example: `C:/Users/username/projects/myproject`
- Linux/Mac example: `/home/username/projects/myproject`
- `query` (string): Natural language search query to find relevant code context
- Use descriptive keywords related to what you're looking for
- The tool performs semantic matching, not just keyword search
- Returns code snippets with file paths and line numbers
**参数:**
- `project_root_path`(字符串):项目根目录的绝对路径
- **重要**:即使在 Windows 上也使用正斜杠(`/`)作为路径分隔符
- Windows 示例:`C:/Users/username/projects/myproject`
- Linux/Mac 示例:`/home/username/projects/myproject`
- `query`(字符串):用于查找相关代码上下文的自然语言搜索查询
- 使用与您要查找的内容相关的描述性关键词
- 工具执行语义匹配,而不仅仅是关键词搜索
- 返回带有文件路径和行号的代码片段
**What it returns:**
- Formatted text snippets from files that match your query
- File paths and line numbers for each snippet
- Context around the relevant code sections
- Multiple results ranked by relevance
**返回内容:**
- 与您的查询匹配的文件中的格式化文本片段
- 每个片段的文件路径和行号
- 相关代码部分周围的上下文
- 按相关性排序的多个结果
**Query Examples:**
**查询示例:**
1. **Finding configuration code:**
1. **查找配置代码:**
```json
{
"project_root_path": "C:/Users/username/projects/myproject",
"query": "查找所有调用 get_model 的地方"
"query": "日志配置 设置 初始化 logger"
}
```
Returns: Code related to logging setup, logger initialization, and configuration
返回与日志设置、logger 初始化和配置相关的代码
2. **Finding authentication logic:**
2. **查找认证逻辑:**
```json
{
"project_root_path": "C:/Users/username/projects/myproject",
"query": "user authentication login password validation"
"query": "用户认证 登录 密码验证"
}
```
Returns: Authentication handlers, login functions, password validation code
返回:认证处理器、登录函数、密码验证代码
3. **Finding database code:**
3. **查找数据库代码:**
```json
{
"project_root_path": "C:/Users/username/projects/myproject",
"query": "database connection pool initialization"
"query": "数据库连接池 初始化"
}
```
Returns: Database connection setup, pool configuration, initialization code
返回:数据库连接设置、连接池配置、初始化代码
4. **Finding error handling:**
4. **查找错误处理:**
```json
{
"project_root_path": "C:/Users/username/projects/myproject",
"query": "error handling exception try catch"
"query": "错误处理 异常 try catch"
}
```
Returns: Error handling patterns, exception handlers, try-catch blocks
返回:错误处理模式、异常处理器、try-catch
5. **Finding API endpoints:**
5. **查找 API 端点:**
```json
{
"project_root_path": "C:/Users/username/projects/myproject",
"query": "API endpoint routes HTTP handlers"
"query": "API 端点 路由 HTTP 处理器"
}
```
Returns: API route definitions, HTTP handlers, endpoint implementations
返回API 路由定义、HTTP 处理器、端点实现
**Tips for better results:**
- Use multiple related keywords (e.g., "logging configuration setup" instead of just "logging")
- Include technical terms specific to what you're looking for
- Describe the functionality rather than exact variable names
- Try different phrasings if the first query doesn't return what you need
**获得更好结果的技巧:**
- 使用多个相关关键词(例如,"日志配置设置"而不仅仅是"日志"
- 包含您要查找的特定技术术语
- 描述功能而不是确切的变量名
- 如果第一次查询没有返回您需要的内容,尝试不同的措辞
**Indexing Features:**
- **Incremental Indexing**: Only new or modified files are uploaded, unchanged files are skipped
- **Hash-based Deduplication**: Files are identified by SHA-256 hash of path + content
- **Automatic Retry**: Network requests are automatically retried up to 3 times with exponential backoff (1s, 2s, 4s)
- **Batch Resilience**: If a batch upload fails after retries, the tool continues with the next batch
- **File Splitting**: Large files are automatically split into multiple blobs (default: 800 lines per blob)
- **Exclude Patterns**: Automatically skips virtual environments, node_modules, .git, build artifacts, etc.
**索引特性:**
- **增量索引**:仅上传新文件或修改过的文件,跳过未更改的文件
- **基于哈希的去重**:通过路径 + 内容的 SHA-256 哈希识别文件
- **自动重试**:网络请求自动重试最多 3 次采用指数退避1秒、2秒、4秒
- **批次弹性**:如果批次上传在重试后失败,工具会继续处理下一批次
- **文件分割**:大文件自动分割为多个块(默认:每块 800 行)
- **排除模式**:自动跳过虚拟环境、node_modules.git、构建产物等
- **多编码支持**自动检测文件编码UTF-8、GBK、GB2312、Latin-1并在失败时回退到 UTF-8 错误处理
- **.gitignore 集成**:自动从项目根目录加载并遵守 `.gitignore` 模式,与配置的排除模式结合使用
**Search Features:**
- **Automatic Retry**: Search requests are automatically retried up to 3 times with exponential backoff (2s, 4s, 8s)
- **Graceful Degradation**: Returns a clear error message if the search fails after all retries
- **Timeout Handling**: Uses a 60-second timeout to handle long-running searches
- **Empty Result Handling**: Returns a helpful message if no relevant code is found
**搜索特性:**
- **自动重试**:搜索请求自动重试最多 3 次采用指数退避2秒、4秒、8秒
- **优雅降级**:如果所有重试后搜索失败,返回清晰的错误消息
- **超时处理**:使用 60 秒超时来处理长时间运行的搜索
- **空结果处理**:如果未找到相关代码,返回有用的消息
**Default Exclude Patterns:**
**默认排除模式:**
```
.venv, venv, .env, env, node_modules, .git, .svn, .hg, __pycache__,
.pytest_cache, .mypy_cache, .tox, .eggs, *.egg-info, dist, build,
@@ -196,54 +224,127 @@ Search for relevant code context based on a query. This tool **automatically per
pip-log.txt, pip-delete-this-directory.txt, .coverage, htmlcov,
.gradle, target, bin, obj
```
Patterns support wildcards (`*`, `?`) and match against directory/file names or paths.
模式支持通配符(`*``?`),并匹配目录/文件名或路径。
## Usage
**注意:** 如果项目根目录存在 `.gitignore` 文件,其模式将自动加载并与配置的排除模式结合使用。`.gitignore` 模式遵循 Git 的标准 wildmatch 语法。
1. Start the MCP server (automatically started by MCP client)
2. Use `search_context` to search for code context
- The tool automatically indexes your project before searching
- Incremental indexing ensures only new/modified files are uploaded
- No manual indexing step required!
## 高级特性
## Data Storage
### 多编码文件支持
- **Configuration**: `~/.acemcp/settings.toml`
- **Indexed projects**: `~/.acemcp/data/projects.json` (fixed location)
- **Log files**: `~/.acemcp/log/acemcp.log` (with automatic rotation)
- Projects are identified by their absolute path (normalized with forward slashes)
Acemcp 自动检测和处理不同字符编码的文件,适用于国际化项目:
## Logging
- **自动检测**按顺序尝试多种编码UTF-8 → GBK → GB2312 → Latin-1
- **回退处理**:如果所有编码都失败,使用 UTF-8 错误处理以防止崩溃
- **日志记录**记录每个文件成功使用的编码DEBUG 级别)
- **无需配置**:开箱即用,支持大多数常见编码
The application automatically logs to `~/.acemcp/log/acemcp.log` with the following features:
这对以下情况特别有用:
- 混合编码文件的项目例如UTF-8 源代码 + GBK 文档)
- 使用非 UTF-8 编码的遗留代码库
- 具有不同语言文件的国际团队
- **Console output**: INFO level and above (colored output)
- **File output**: DEBUG level and above (detailed format with module, function, and line number)
- **Automatic rotation**: Log files are rotated when they reach 5MB
- **Retention**: Maximum of 10 log files are kept
- **Compression**: Rotated log files are automatically compressed to `.zip` format
- **Thread-safe**: Logging is thread-safe for concurrent operations
### .gitignore 集成
**Log format:**
Acemcp 自动遵守您项目的 `.gitignore` 文件:
- **自动加载**:如果存在,从项目根目录读取 `.gitignore`
- **标准语法**:支持 Git 的标准 wildmatch 模式
- **组合过滤**:与配置的 `EXCLUDE_PATTERNS` 一起工作
- **目录处理**:正确处理带有尾部斜杠的目录模式
- **无需配置**:只需在项目根目录放置 `.gitignore`
**`.gitignore` 模式示例:**
```gitignore
# 依赖
node_modules/
vendor/
# 构建输出
dist/
build/
*.pyc
# IDE 文件
.vscode/
.idea/
# 环境文件
.env
.env.local
```
所有这些模式在索引期间都会自动遵守,并与默认排除模式结合使用。
## 使用方法
1. 启动 MCP 服务器(由 MCP 客户端自动启动)
2. 使用 `search_context` 搜索代码上下文
- 工具在搜索前自动索引您的项目
- 增量索引确保仅上传新文件/修改过的文件
- 无需手动索引步骤!
- 无论编码如何,文件都会自动处理
- 自动遵守 `.gitignore` 模式
## 数据存储
- **配置**`~/.acemcp/settings.toml`
- **已索引项目**`~/.acemcp/data/projects.json`(固定位置)
- **日志文件**`~/.acemcp/log/acemcp.log`(自动轮转)
- 项目通过其绝对路径识别(使用正斜杠规范化)
## 日志记录
应用程序自动记录日志到 `~/.acemcp/log/acemcp.log`,具有以下特性:
- **控制台输出**INFO 级别及以上(彩色输出)
- **文件输出**DEBUG 级别及以上(详细格式,包含模块、函数和行号)
- **自动轮转**:日志文件达到 5MB 时自动轮转
- **保留策略**:最多保留 10 个日志文件
- **压缩**:轮转的日志文件自动压缩为 `.zip` 格式
- **线程安全**:日志记录对并发操作是线程安全的
**日志格式:**
```
2025-11-06 13:51:25 | INFO | acemcp.server:main:103 - Starting acemcp MCP server...
```
The log files are automatically created on first run and require no manual configuration.
日志文件在首次运行时自动创建,无需手动配置。
## Web Management Interface
## Web 管理界面
The web management interface provides:
- **Real-time server status** monitoring
- **Live log streaming** via WebSocket
- **Configuration viewing** (current settings)
- **Project statistics** (number of indexed projects)
Web 管理界面提供:
- **实时服务器状态**监控
- **实时日志流**通过 WebSocket
- **配置查看**(当前设置)
- **项目统计**(已索引项目数量)
To enable the web interface, use the `--web-port` argument when starting the server.
要启用 Web 界面,在启动服务器时使用 `--web-port` 参数。
**Features:**
- Real-time log display with auto-scroll
- Server status and metrics
- Configuration overview
- Responsive design with Tailwind CSS
- No build step required (uses CDN resources)
**功能:**
- 带自动滚动的实时日志显示
- 服务器状态和指标
- 配置概览
- 使用 Tailwind CSS 的响应式设计
- 无需构建步骤(使用 CDN 资源)
- 具有指数退避的智能 WebSocket 重连
## 最近更新
### 版本 0.1.3(最新)
**新特性:**
- ✨ **多编码支持**自动检测和处理多种文件编码UTF-8、GBK、GB2312、Latin-1
- ✨ **.gitignore 集成**:自动从项目根目录加载并遵守 `.gitignore` 模式
-**改进的工具响应格式**:从基于列表的格式改为基于字典的格式,以提高客户端兼容性
**改进:**
- 🔧 **WebSocket 优化**具有指数退避的智能重连1秒 → 最大 30秒
- 🔧 **减少日志噪音**WebSocket 连接现在记录在 DEBUG 级别而不是 INFO
- 🔧 **连接稳定性**:最多 10 次重连尝试,防止无限循环
- 🔧 **更好的错误处理**:对无法用任何编码解码的文件进行优雅回退
**错误修复:**
- 🐛 修复了频繁的 WebSocket 连接/断开循环
- 🐛 修复了读取非 UTF-8 编码文件时的编码错误
- 🐛 改进了对带有目录匹配的 .gitignore 模式的处理