style: 三篇文章正文 ASCII 标点统一为中文标点
Build and Deploy Blog / build (push) Successful in 28s

新增 scripts/cn-punct.py 做转换:跳过代码块/URL/链接 URL 部分,保留数学公式、数字列表、英文紧贴的标识符括号 (DNS(...))、嵌套数学记号 (GF(2⁸)) 等。
This commit is contained in:
2026-05-12 11:00:34 +08:00
parent 3ff184a03a
commit 480f4a0e99
4 changed files with 904 additions and 514 deletions
+10 -10
View File
@@ -5,13 +5,13 @@ lastmod: 2026-05-03
slug: xray-reality slug: xray-reality
tags: ["TLS", "Xray", "VLESS", "Reality", "X25519", "代理协议"] tags: ["TLS", "Xray", "VLESS", "Reality", "X25519", "代理协议"]
categories: ["网络协议"] categories: ["网络协议"]
description: "REALITY 协议通过 TLS 1.3 key_share 字段嵌入身份标记 + 主动探测时透明回放真站,从协议层消除 TLS 指纹特征。本文从协议设计到服务端 / 客户端完整搭建。" description: "REALITY 协议通过 TLS 1.3 key_share 字段嵌入身份标记 + 主动探测时透明回放真站从协议层消除 TLS 指纹特征。本文从协议设计到服务端 / 客户端完整搭建。"
draft: false draft: false
--- ---
> 整理自 [bandwh.com](https://www.bandwh.com/net/994.html)(原文 2023-04-11),本文于 2026-05 重新整理发布。 > 整理自 [bandwh.com](https://www.bandwh.com/net/994.html)原文 2023-04-11),本文于 2026-05 重新整理发布。
> 适用系统:Debian 11 | Xray 版本:>= 1.8.0 > 适用系统Debian 11 | Xray 版本>= 1.8.0
> 文中所有 UUID / X25519 密钥均为示例值,实际部署务必使用 `xray uuid` / `xray x25519` 重新生成。 > 文中所有 UUID / X25519 密钥均为示例值实际部署务必使用 `xray uuid` / `xray x25519` 重新生成。
--- ---
@@ -19,7 +19,7 @@ draft: false
### 1.1 为什么需要 Reality ### 1.1 为什么需要 Reality
传统 v2ray 方案需要购买域名并生成 TLS 证书,通过各种流量伪装来规避检测。然而随着 DPI 检测能力的升级,**v2ray 的 TLS/XTLS 协议特征已可被精准识别**,导致 VPS 的 443 端口频繁被封锁或阻断。 传统 v2ray 方案需要购买域名并生成 TLS 证书通过各种流量伪装来规避检测。然而随着 DPI 检测能力的升级**v2ray 的 TLS/XTLS 协议特征已可被精准识别**导致 VPS 的 443 端口频繁被封锁或阻断。
Xray 1.8.0 版本推出了全新的 **REALITY 协议**,配合此前的 **Vision 流控**,组成了当前最新的协议组合: Xray 1.8.0 版本推出了全新的 **REALITY 协议**,配合此前的 **Vision 流控**,组成了当前最新的协议组合:
``` ```
@@ -34,7 +34,7 @@ VLESS + Vision + uTLS + REALITY
| 前向保密 | 仍保有 TLS 前向保密性,历史流量无法被解密 | | 前向保密 | 仍保有 TLS 前向保密性,历史流量无法被解密 |
| 抗证书链攻击 | 证书链攻击无效,安全性超越常规 TLS | | 抗证书链攻击 | 证书链攻击无效,安全性超越常规 TLS |
| 无需域名 | 指向他人网站的 SNI,无需自己购买域名或配置 TLS | | 无需域名 | 指向他人网站的 SNI,无需自己购买域名或配置 TLS |
| 中间人防御 | 即使客户端配置泄露,审查方也无法进行有效中间人攻击 | | 中间人防御 | 即使客户端配置泄露审查方也无法进行有效中间人攻击 |
| SNI 阻断消失 | 据实测,使用 Reality 后 SNI 阻断现象消失 | | SNI 阻断消失 | 据实测,使用 Reality 后 SNI 阻断现象消失 |
### 1.3 使用前提 ### 1.3 使用前提
@@ -42,7 +42,7 @@ VLESS + Vision + uTLS + REALITY
- 一台可访问的 VPS(无需域名) - 一台可访问的 VPS(无需域名)
- 服务端与客户端 **Xray 均需 >= 1.8.0 版本** - 服务端与客户端 **Xray 均需 >= 1.8.0 版本**
- 443 端口不被 Nginx、Caddy 等其他程序占用 - 443 端口不被 Nginx、Caddy 等其他程序占用
- **不支持 CDN 代理**(如 Cloudflare 橙云,会终止 TLS 让 Reality 的端到端伪装失效)。CF **灰云(DNS only)** 只做 DNS 解析、不接管流量,等价于直连 VPS,可正常使用 - **不支持 CDN 代理**如 Cloudflare 橙云会终止 TLS 让 Reality 的端到端伪装失效。CF **灰云DNS only** 只做 DNS 解析、不接管流量等价于直连 VPS可正常使用
官方 GitHubhttps://github.com/XTLS/REALITY 官方 GitHubhttps://github.com/XTLS/REALITY
@@ -237,13 +237,13 @@ wget --no-check-certificate https://github.com/teddysun/across/raw/master/bbr.sh
### 5.1 为什么使用公私钥而非仅 UUID? ### 5.1 为什么使用公私钥而非仅 UUID?
传统方案若使用对称密钥(UUID),攻击者一旦获取客户端配置,即可实施中间人攻击。 传统方案若使用对称密钥UUID),攻击者一旦获取客户端配置即可实施中间人攻击。
REALITY 使用 **X25519 非对称密钥 + TLSv1.3 key_share** 机制: REALITY 使用 **X25519 非对称密钥 + TLSv1.3 key_share** 机制:
- 即使攻击者获取到客户端公钥,也**无法验证某条连接是否属于 REALITY** - 即使攻击者获取到客户端公钥也**无法验证某条连接是否属于 REALITY**
- 更无法进行有效的中间人攻击 - 更无法进行有效的中间人攻击
> REALITY 的设计原则是:**默认假设客户端配置已泄露**,将安全边界收敛至服务端私钥。只要服务端私钥不泄露,流量就是安全的。即使私钥泄露,攻击者也无法直接解密历史流量(前向保密),只能尝试中间人攻击,但中间人需要持有 Reality 私钥才能伪装服务端,这做不到。 > REALITY 的设计原则是**默认假设客户端配置已泄露**将安全边界收敛至服务端私钥。只要服务端私钥不泄露流量就是安全的。即使私钥泄露攻击者也无法直接解密历史流量前向保密),只能尝试中间人攻击但中间人需要持有 Reality 私钥才能伪装服务端这做不到。
建议:**定期更换公私钥对**,公钥可在多个客户端间安全共享。 建议:**定期更换公私钥对**,公钥可在多个客户端间安全共享。
File diff suppressed because it is too large Load Diff
+2 -2
View File
@@ -4,11 +4,11 @@ date: 2026-05-02
slug: ai-engineer-map slug: ai-engineer-map
tags: ["AI", "LLM", "Prompt", "RAG", "MCP", "Agent", "Claude", "Cursor", "Ollama"] tags: ["AI", "LLM", "Prompt", "RAG", "MCP", "Agent", "Claude", "Cursor", "Ollama"]
categories: ["AI"] categories: ["AI"]
description: "从大模型 / Prompt / RAG / MCP / Agent / 多模态 / 成本控制 / 编码工具一路捋下来,适合有技术背景的开发者快速建立 AI 知识框架。" description: "从大模型 / Prompt / RAG / MCP / Agent / 多模态 / 成本控制 / 编码工具一路捋下来适合有技术背景的开发者快速建立 AI 知识框架。"
draft: false draft: false
--- ---
> 适合有一定技术背景的开发者快速建立 AI 知识框架。涵盖核心概念、工程实践、工具选型,持续更新。 > 适合有一定技术背景的开发者快速建立 AI 知识框架。涵盖核心概念、工程实践、工具选型持续更新。
--- ---
+390
View File
@@ -0,0 +1,390 @@
"""Convert ASCII punctuation to Chinese punctuation in CJK context.
Strategy:
- Skip fenced code blocks entirely.
- Mask out inline code, markdown links, bare URLs as opaque blobs (placeholder
char from the Private Use Area), so paren matching can span them.
- Process YAML front matter only by converting `description:` / `title:` /
`summary:` string values — leaves `tags: [...]` arrays alone.
- For each prose line in the body, decide if it's "Chinese-flavored" (>= 3 CJK
chars). On Chinese lines, convert ASCII , . : ; ? ! ( ) → Chinese
counterparts where it makes sense.
Preservation rules (kept as ASCII):
- Number lists / decimals: `1,234`, `RS(255, 239)`, `1:8` (ratio).
- Math context: a comma/colon between two math expressions (one of `∇ ∂ √
≡ ≈ ± ¹²³⁴⁵⁶⁷⁸⁹⁰ ₀₁₂₃₄₅₆₇₈₉` or Greek letters within a 20-char window
on each side).
- Nested inside an existing Chinese paren: only converts prose-like content
(no math/digit indicator), preserving notation like `GF(2⁸) 列混合)`.
- English-attached parens: `DNS(Domain Name System)` (the `(` immediately
follows an English letter/digit) stays ASCII.
- Inside ASCII parens we chose to keep ASCII (e.g. `cookie(Ch8, Ch9)`),
inner punctuation stays ASCII.
Usage:
python scripts/cn-punct.py path/to/file1.md path/to/file2.md ...
This is the one-shot conversion script used in the 2026-05 blog cleanup. It is
deliberately conservative; if you re-run it on already-converted files it
should be a near no-op.
"""
import re
import sys
CJK_RE = re.compile(r'[一-鿿]')
PLACEHOLDER = '' # private-use char for opaque blobs
# Walk past these to find the "real" neighbor of a punctuation mark.
WEAK = set(' \t*_"\'`)]}>“”‘’')
# Punctuation that signals Chinese context for adjacent ASCII punctuation.
CHINESE_PUNCT = set(
',。:;?!、()'
'【】「」『』'
'“”‘’…—《》〈〉'
)
# Math-specific characters: superscript / subscript digits, operators, Greek
# letters that almost only appear in formulas. Used to detect comma/colon
# sitting between two math expressions (where it must stay ASCII).
MATH_CHARS = set(
'∇∂√≡≈±'
'¹²³⁴⁵⁶⁷⁸⁹⁰'
'₀₁₂₃₄₅₆₇₈₉'
'αβγδεζηθικ'
'λμνξπρστυφ'
'χψω'
'ΓΔΘΛΞΠΣΦΨΩ'
'∞∑∏∫·×÷'
)
def is_cjk(ch: str) -> bool:
return bool(ch and CJK_RE.match(ch))
def is_chinese_context(ch: str) -> bool:
return is_cjk(ch) or (ch in CHINESE_PUNCT)
def is_ascii_alnum(ch: str) -> bool:
return bool(ch) and ord(ch) < 128 and ch.isalnum()
def find_strong_neighbor(text: str, idx: int, direction: int) -> str:
"""Walk past WEAK chars to find the nearest 'strong' character, or '' if
we hit a boundary."""
n = len(text)
i = idx + direction
while 0 <= i < n:
ch = text[i]
if ch in WEAK:
i += direction
continue
return ch
return ''
def looks_like_math_context(text: str, idx: int, window: int = 20) -> bool:
"""Heuristic: comma at idx is between two math expressions if both sides
contain math-specific characters within a small window."""
left_window = text[max(0, idx - window):idx]
right_window = text[idx + 1:idx + 1 + window]
return (
any(c in MATH_CHARS for c in left_window)
and any(c in MATH_CHARS for c in right_window)
)
def convert_parens(text: str, aggressive: bool, depth_offset: int = 0) -> str:
"""Convert ( ) to when in Chinese context.
Tracks Chinese-paren depth so that a paren nested inside an outer Chinese
paren only converts if its own content contains CJK — preserves math
notation like `GF(2⁸) 列混合)`.
Skips conversion when the immediate preceding char is an English letter or
digit (e.g. `DNS(Domain Name System)`), since such parens behave like a
function call / abbreviation expansion in the source language.
"""
n = len(text)
out = []
i = 0
cn_depth = depth_offset
while i < n:
ch = text[i]
if ch == '': #
cn_depth += 1
out.append(ch)
i += 1
continue
if ch == '': #
cn_depth = max(0, cn_depth - 1)
out.append(ch)
i += 1
continue
if ch == '(':
depth = 1
j = i + 1
while j < n and depth > 0:
if text[j] == '(':
depth += 1
elif text[j] == ')':
depth -= 1
if depth == 0:
break
j += 1
if depth == 0:
content = text[i + 1:j]
left = find_strong_neighbor(text, i, -1)
right = find_strong_neighbor(text, j, +1)
immediate_prev = text[i - 1] if i > 0 else ''
content_has_cjk = bool(CJK_RE.search(content))
neighbor_chinese = (
is_chinese_context(left) or is_chinese_context(right)
)
content_has_math = bool(re.search(
r'[\d=+\-*/×÷^¹²³⁴-⁹⁰₀-₉]',
content,
))
if cn_depth > 0:
# Nested inside Chinese paren — convert prose-like content
should_convert = content_has_cjk or not content_has_math
elif is_ascii_alnum(immediate_prev):
# Attached to English identifier — leave ASCII
should_convert = content_has_cjk
else:
should_convert = (
content_has_cjk
or (aggressive and neighbor_chinese)
)
inner_offset = cn_depth + (1 if should_convert else 0)
converted_content = convert_parens(content, aggressive, inner_offset)
if should_convert:
out.append('')
out.append(converted_content)
out.append('')
else:
out.append('(')
out.append(converted_content)
out.append(')')
i = j + 1
continue
out.append(ch)
i += 1
return ''.join(out)
def convert_punct(text: str, aggressive: bool) -> str:
"""Convert ASCII , . : ; ? ! to Chinese counterparts.
Tracks both Chinese-paren depth and ASCII-paren depth:
- Inside `...` → aggressive (those are Chinese parentheticals).
- Inside `(...)` → conservative (the surviving ASCII parens were kept
ASCII for a reason — likely English-attached or notation).
"""
chars = list(text)
n = len(chars)
out = []
cn_paren_depth = 0
ascii_paren_depth = 0
for i, ch in enumerate(chars):
if ch == '':
cn_paren_depth += 1
out.append(ch)
continue
if ch == '':
cn_paren_depth = max(0, cn_paren_depth - 1)
out.append(ch)
continue
if ch == '(':
ascii_paren_depth += 1
out.append(ch)
continue
if ch == ')':
ascii_paren_depth = max(0, ascii_paren_depth - 1)
out.append(ch)
continue
prev = chars[i - 1] if i > 0 else ''
nxt = chars[i + 1] if i + 1 < n else ''
in_cn_paren = cn_paren_depth > 0
in_ascii_paren = ascii_paren_depth > 0
if ch == ',':
# Number-list separator: digit, [space,] digit → keep ASCII
prev_is_digit = prev.isascii() and prev.isdigit()
after_space_nxt = chars[i + 2] if (nxt == ' ' and i + 2 < n) else nxt
nxt_is_digit = (
after_space_nxt
and after_space_nxt.isascii()
and after_space_nxt.isdigit()
)
if prev_is_digit and nxt_is_digit:
out.append(ch)
continue
if looks_like_math_context(text, i):
out.append(ch)
continue
if is_chinese_context(prev) or is_chinese_context(nxt):
out.append('')
continue
if in_ascii_paren:
out.append(ch)
continue
if in_cn_paren or aggressive:
out.append('')
continue
elif ch == '.':
if is_ascii_alnum(nxt):
pass # decimal / file ext / version
elif is_chinese_context(prev):
out.append('')
continue
elif prev in WEAK:
left = find_strong_neighbor(text, i, -1)
if is_chinese_context(left):
out.append('')
continue
elif ch in (':', ';', '?', '!'):
mapping = {':': '', ';': '', '?': '', '!': ''}
if ch == ':':
# Ratio / time notation like "1:8" or "12:34" → keep ASCII colon
prev_is_digit = prev.isascii() and prev.isdigit()
nxt_is_digit = nxt.isascii() and nxt.isdigit()
if prev_is_digit and nxt_is_digit:
out.append(ch)
continue
if ch in (':', ';') and looks_like_math_context(text, i):
out.append(ch)
continue
left = find_strong_neighbor(text, i, -1)
right = find_strong_neighbor(text, i, +1)
if is_chinese_context(left) or is_chinese_context(right):
out.append(mapping[ch])
continue
if in_ascii_paren:
out.append(ch)
continue
if in_cn_paren or aggressive:
out.append(mapping[ch])
continue
out.append(ch)
return ''.join(out)
def line_is_chinese(line: str) -> bool:
"""Heuristic: does this line have enough CJK to call it 'Chinese-flavored'?"""
cjk_count = sum(1 for c in line if is_cjk(c))
return cjk_count >= 3
def process_text(text: str, aggressive: bool) -> str:
return convert_punct(convert_parens(text, aggressive), aggressive)
# Markdown image, markdown link, bare URL, or inline code (in this priority)
OPAQUE_RE = re.compile(
r'!?\[[^\]]*\]\([^\)]+\)'
r'|https?://[^\s\)\]\>]+'
r'|`[^`\n]+`'
)
FENCE_RE = re.compile(r'(```[\s\S]*?```)')
def process_segment(segment: str, aggressive: bool) -> str:
"""Mask opaque blobs, run conversions, restore (recursively processing
markdown link text)."""
saved = []
def stash(m: re.Match) -> str:
saved.append(m.group(0))
return PLACEHOLDER
masked = OPAQUE_RE.sub(stash, segment)
converted = process_text(masked, aggressive)
out = []
idx = 0
for ch in converted:
if ch == PLACEHOLDER:
tok = saved[idx]
idx += 1
m = re.match(r'(!?\[)([^\]]*)(\]\()([^\)]+)(\))', tok)
if m:
inside = process_text(m.group(2), aggressive)
tok = m.group(1) + inside + m.group(3) + m.group(4) + m.group(5)
out.append(tok)
else:
out.append(ch)
return ''.join(out)
def process_body_segment(segment: str) -> str:
"""Process a non-fenced body segment line by line, choosing aggressive mode
based on whether each logical line is Chinese-flavored."""
lines = segment.split('\n')
out_lines = []
for line in lines:
out_lines.append(process_segment(line, aggressive=line_is_chinese(line)))
return '\n'.join(out_lines)
def process_yaml_frontmatter(text: str) -> str:
"""Convert only quoted string values for description / title / summary keys.
Leaves list/array values like `tags: [...]` alone."""
def replace_value(m: re.Match) -> str:
prefix, value, suffix = m.group(1), m.group(2), m.group(3)
return prefix + process_segment(value, aggressive=line_is_chinese(value)) + suffix
return re.sub(
r'^(\s*(?:description|title|summary):\s*")([^"\n]*)(")',
replace_value,
text,
flags=re.MULTILINE,
)
def process_markdown(content: str) -> str:
front = ''
body = content
if content.startswith('---\n') or content.startswith('---\r\n'):
m = re.match(r'(---\r?\n[\s\S]*?\r?\n---\r?\n)', content)
if m:
front = m.group(1)
body = content[m.end():]
front = process_yaml_frontmatter(front)
parts = FENCE_RE.split(body)
out = []
for i, part in enumerate(parts):
if i % 2 == 1: # fenced code block
out.append(part)
else:
out.append(process_body_segment(part))
return front + ''.join(out)
def main() -> None:
changed = []
for path in sys.argv[1:]:
with open(path, encoding='utf-8', newline='') as f:
content = f.read()
new_content = process_markdown(content)
if new_content != content:
with open(path, 'w', encoding='utf-8', newline='') as f:
f.write(new_content)
changed.append(path)
print(f'updated {path}')
else:
print(f'unchanged {path}')
print(f'\nTotal updated: {len(changed)}')
if __name__ == '__main__':
main()