# Phase 5 · 探索主播 Explorer 迁移 · Implementation Plan (技术大纲)

> **本 plan 是迁移计划**：catclaw `web/server/tiktok/collectors/scout-collector.ts` (759 行) + `backstage-collector.ts` (198 行) + `qwen-vision.ts` 已经在生产跑过，本 phase 是把它们抽到 Pawcast 的 `packages/core/explorer/` 并适配新的 IPC + DB schema。

**Goal:** 实装"进入 TikTok /live 自动滑卡片 → 提取 username → WebSocket 连接 → 录制视频 → Qwen Vision 14 维度分析 → Backstage 一次性批量邀约"完整链路。

**Architecture:** TaskRunner 任务驱动（不是 cron），单次任务采 30 个主播。

**Tech Stack:** 同 catclaw（tiktok-live-connector + Electron BrowserWindow + ffmpeg + DashScope Qwen Vision API）

---

## File Structure（迁移 + 改造）

```
packages/core/src/explorer/
  ├── scout-collector.ts          ★ 迁移自 catclaw (759 行 → 拆 4 文件)
  │   ├── scout-collector.ts      (主流程: scoutOneLiveRoom + 循环控制)
  │   ├── username-extractor.ts   (5 策略提取 username, 149-263 行)
  │   ├── streamer-profile.ts     (fetchStreamerProfile, 80-114 行)
  │   ├── highlight-segments.ts   (5s 桶化高光算法, 118-143 行)
  │   └── video-recorder.ts       (MediaRecorder + ffmpeg 转码, 23-62 行)
  ├── backstage-collector.ts      ★ 迁移自 catclaw (39-198 行)
  ├── task-runner.ts              ★ 迁移 (146-376 行)
  └── types.ts                    (TaskConfig / TaskProgress / ScoutSession)

packages/core/src/ai-orchestrator/
  ├── provider.ts                 (Provider 接口)
  ├── qwen-vision.ts              ★ 迁移自 catclaw (qwen-vision.ts)
  ├── anthropic.ts                (Phase 6 用)
  ├── openai.ts                   (Phase 6 用)
  └── usage-tracker.ts

packages/db/src/migrations/
  └── 005-explorer.sql            (broadcasters / scout_sessions / tasks 表)

packages/db/src/repositories/
  ├── broadcasters.ts             (探索池 + 团员库共用)
  ├── scout-sessions.ts           (每次扫描的 30+ 字段)
  └── tasks.ts                    (TaskRunner 调度记录)

packages/ipc-contract/src/
  └── explorer.ts                 (ExplorerContract)

apps/desktop/src/pages/Explore/
  ├── index.tsx                   (主页瀑布流)
  ├── components/
  │   ├── BroadcasterCard.tsx
  │   ├── DetailDrawer.tsx        (右侧 480px 滑入抽屉)
  │   ├── RadarChart.tsx          (14 维度雷达图)
  │   ├── ScanModal.tsx           (启动扫描)
  │   ├── ScanProgressModal.tsx   (扫描进行中 + 实时计数)
  │   └── InviteFromExploreModal.tsx (一键邀约)
  ├── stores/exploreStore.ts
  └── hooks/useExplorerTask.ts
```

---

## Task 列表（迁移导向）

| # | 主题 | 来源 (catclaw) | 改造点 |
|---|---|---|---|
| 1 | DB migration: broadcasters / scout_sessions / tasks | `database.ts:259-305` | 字段名 `tk_*` → 去前缀；type 重写为 TS interface |
| 2 | broadcastersRepo + scoutSessionsRepo + tasksRepo | catclaw 原生 SQL | 抽到 repository 模式 |
| 3 | username-extractor (5 策略) | `scout-collector.ts:149-263` | **直接复制**逻辑，封装为 `extractUsername(bridge, page)` |
| 4 | video-recorder + ffmpeg 转码 | `scout-collector.ts:23-62` | **直接复制** |
| 5 | streamer-profile fetcher | `scout-collector.ts:80-114` | **直接复制**，调通 |
| 6 | highlight-segments 算法 | `scout-collector.ts:118-143` | **直接复制**（5s 桶化 + 权重 gift=3/chat=2/其他=1/取 top 10） |
| 7 | qwen-vision provider | `qwen-vision.ts` | 抽到 ai-orchestrator package；Provider 接口化 |
| 8 | scoutOneLiveRoom 单房采集 | `scout-collector.ts:341-538` | 整合 Step 3-7，写到 Pawcast schema |
| 9 | TaskRunner | `task-runner.ts:146-376` | 适配 `tasks` 表 + IPC progress event push |
| 10 | runScoutCollection 主循环 | `scout-collector.ts:619-757` | skip 列表 / 卡住检测 / 重置回 /live |
| 11 | backstage-collector 批量邀约 | `backstage-collector.ts:39-198` | 一次性 30 个 + 解析返回状态表 + 速率限制 5-10s |
| 12 | ExplorerContract IPC | 新增 | startTask / stopTask / listSessions / getDetail / startInvite |
| 13 | Renderer Explore 主页瀑布流 | spec design | BroadcasterCard 显示 14 维度评分 |
| 14 | DetailDrawer 抽屉 + 雷达图 | spec design | 14 维度 SVG polygon |
| 15 | ScanModal 启动 + 进度推送 | spec design | progress event subscribe |
| 16 | 集成测试: 录制一个 task 全流程 | 用 fixture 喂事件 | 验证 14 维度落库 |

---

## 关键代码 Sketch

### Username Extractor（5 策略，从 catclaw 直接迁移）

```typescript
// packages/core/src/explorer/username-extractor.ts
// 迁移自: catclaw web/server/tiktok/collectors/scout-collector.ts:149-263

export async function extractUsername(bridge: BrowserBridge): Promise<string | null> {
  const result = await bridge.executeJS(`(() => {
    // 策略 1: URL regex
    const urlMatch = location.pathname.match(/@([^/?\\s#]+)/);
    if (urlMatch) return urlMatch[1];

    // 策略 2: data-e2e attributes
    const e2e = document.querySelector('[data-e2e="live-username"], [data-e2e="live-user-name"], [data-e2e="user-card-nickname"]');
    if (e2e?.textContent) return e2e.textContent.trim();

    // 策略 3: header /@username 链接
    const link = document.querySelector('header a[href*="/@"]');
    if (link) {
      const m = link.getAttribute('href').match(/@([^/?\\s#]+)/);
      if (m) return m[1];
    }

    // 策略 4: SIGI_STATE / __NEXT_DATA__
    try {
      const sigi = JSON.parse(document.querySelector('#SIGI_STATE')?.textContent ?? '{}');
      const liveRoom = sigi.LiveRoom?.liveRoomUserInfo?.user?.uniqueId;
      if (liveRoom) return liveRoom;
    } catch {}

    // 策略 5: document.title parse
    const titleMatch = document.title.match(/^(.+?)\\s*is\\s*LIVE/);
    if (titleMatch) return titleMatch[1];

    return null;
  })()`);
  return result ?? null;
}
```

### Qwen Vision Provider（迁移自 catclaw qwen-vision.ts）

```typescript
// packages/core/src/ai-orchestrator/qwen-vision.ts

export interface ScoutAnalysis {
  ethnicity: string;
  gender: string;
  skinTone: string;
  ageRange: string;
  appearance: string;
  language: string;
  liveCategory: string;
  liveSubCategory: string;
  liveScene: string;
  interactivityScore: number;     // 0-100
  contentQuality: number;          // 0-100
  audienceEngagement: 'very_high' | 'high' | 'medium' | 'low' | 'very_low';
  contentTags: string[];
  summary: string;
  recommendation: '强烈推荐' | '推荐' | '一般' | '不推荐';
}

const PROMPT = `分析这个 TikTok 直播间的视频内容，输出 JSON：
{
  "ethnicity": "asian/european/african/latin/middle_eastern/mixed/unknown",
  "gender": "male/female/unknown",
  "skinTone": "light/medium/dark",
  "ageRange": "18-25/26-35/36-45/46+",
  "appearance": "中文描述外貌（长发、穿黑色T恤、戴耳机）",
  "language": "中/英/韩/日/泰/西班牙/其他",
  "liveCategory": "娱乐/游戏/生活/教育/电商/音乐/户外/其他",
  "liveSubCategory": "跳舞/唱歌/聊天/才艺/FPS射击/MOBA/美食/等等",
  "liveScene": "室内/室外/车内/工作室/舞台/其他",
  "interactivityScore": 0-100,
  "contentQuality": 0-100,
  "audienceEngagement": "very_high/high/medium/low/very_low",
  "contentTags": ["标签1", "标签2"],
  "summary": "2-3句中文总结",
  "recommendation": "强烈推荐/推荐/一般/不推荐"
}`;

export async function analyzeVideo(videoPath: string, apiKey: string): Promise<ScoutAnalysis> {
  const buffer = await Bun.file(videoPath).arrayBuffer();
  const base64 = Buffer.from(buffer).toString('base64');
  const res = await fetch('https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${apiKey}`,
    },
    body: JSON.stringify({
      model: 'qwen3.6-plus',           // primary
      messages: [{
        role: 'user',
        content: [
          { type: 'video_url', video_url: { url: `data:video/mp4;base64,${base64}` } },
          { type: 'text', text: PROMPT },
        ],
      }],
      response_format: { type: 'json_object' },
    }),
  });
  const json = await res.json();
  return JSON.parse(json.choices[0].message.content) as ScoutAnalysis;
}

export async function analyzeScreenshot(base64: string, apiKey: string): Promise<ScoutAnalysis> {
  // 同上，但用 image_url 而非 video_url；fallback 到 qwen-vl-max
}
```

### TaskRunner 状态机

```typescript
// packages/core/src/explorer/task-runner.ts
// 迁移自 catclaw web/server/tiktok/task-runner.ts

export interface TaskConfig {
  type: 'scout' | 'backstage' | 'scout_then_backstage';
  maxCount?: number;     // 默认 30
  skipStreamers?: string[];
  autoInvite?: boolean;  // scout 完成后自动邀约
}

export interface TaskProgress {
  collected: number;
  skipped: number;
  failed: number;
  lastUsername?: string;
  lastStatus?: string;
  feedResets: number;
}

export class TaskRunner {
  private runtime: { stopRequested: boolean } = { stopRequested: false };

  async startScoutTask(config: TaskConfig): Promise<number> {
    const taskId = this.tasksRepo.create({ type: config.type, status: 'running', config });
    this.runtime.stopRequested = false;
    this.runScoutLoop(taskId, config).catch((e) => logger.error('TaskRunner', 'scout failed', e));
    return taskId;
  }

  private async runScoutLoop(taskId: number, config: TaskConfig): Promise<void> {
    const max = config.maxCount ?? 30;
    const progress: TaskProgress = { collected: 0, skipped: 0, failed: 0, feedResets: 0 };

    while (progress.collected < max && !this.runtime.stopRequested) {
      // ...flow per Phase 5 plan
    }

    if (config.autoInvite) {
      await this.runBackstageBatch(taskId);
    }

    this.tasksRepo.update(taskId, { status: 'completed', progress });
  }

  stopTask(taskId: number): void {
    this.runtime.stopRequested = true;
  }
}
```

### Backstage 批量邀约（迁移）

```typescript
// packages/core/src/explorer/backstage-collector.ts
// 迁移自 catclaw backstage-collector.ts:39-198

const BATCH_SIZE = 30;

export async function runBackstageInvite(bridge: BrowserBridge, sessions: ScoutSession[]) {
  await bridge.navigate('https://live-backstage.tiktok.com/portal/anchor/relation');
  await delay(3000);

  const batches = chunk(sessions, BATCH_SIZE);
  for (const batch of batches) {
    // 1. 点击 "邀请主播"
    await bridge.executeJS(`(() => {
      const btns = [...document.querySelectorAll('button')];
      btns.find(b => b.textContent?.includes('邀请主播'))?.click();
    })()`);
    await delay(2000);

    // 2. 填充 textarea（用 native setter 触发 React 监听）
    const usernames = batch.map(s => s.streamer_username).join('\n');
    await bridge.executeJS(`(() => {
      const ta = document.querySelector('textarea');
      const setter = Object.getOwnPropertyDescriptor(HTMLTextAreaElement.prototype, 'value').set;
      setter.call(ta, ${JSON.stringify(usernames)});
      ta.dispatchEvent(new Event('input', { bubbles: true }));
    })()`);

    // 3. 点击下一步 + 解析结果表
    await bridge.executeJS(`(() => {
      const next = [...document.querySelectorAll('button')].find(b => b.textContent?.includes('下一步'));
      next?.click();
    })()`);
    await delay(3000);

    const results = await bridge.executeJS(`(() => {
      const rows = [...document.querySelectorAll('table tbody tr')];
      return rows.map(r => ({
        cells: [...r.querySelectorAll('td')].map(c => c.textContent?.trim()),
      }));
    })()`);

    // 4. 解析 + 更新 DB
    for (const row of results) {
      const session = matchBackstageResult(batch, row);
      if (session) {
        scoutSessionsRepo.update(session.id, {
          invite_status: parseStatus(row),  // 地区不匹配/已邀约/可邀约/...
          invite_sent_at: Date.now(),
        });
      }
    }

    await delay(5000 + Math.random() * 5000);  // 速率限制
  }
}
```

---

## 准入条件

- ✅ Phase 1 + 2 已 merge

## 准出条件

- ✅ 输入「点击启动扫描」+ 默认 30 上限 → 30 分钟内完成所有评分
- ✅ 单个主播分析约 30-60 秒（含视频抽帧 + Qwen 分析）
- ✅ scout_sessions 表有 30 条新记录，每条含 14 维度 AI 字段
- ✅ Backstage 批量邀约自动跑完，30 个状态正确解析
- ✅ Renderer 主页瀑布流实时显示卡片（增量推送）
- ✅ 抽屉打开看到 14 维度雷达图 + AI 评语 + 视频列表

## 关键技术决策（区别于设计稿）

| 设计稿描述 | catclaw 实际 + 本 phase 采用 |
|---|---|
| 关键词 / 话题搜索入口 | **进 /live 自动滑卡片**（更接近真实 TikTok 推荐流） |
| 7 维度评分 | **14 维度** Qwen Vision 分析 |
| GPT-4o Vision | **Qwen Vision** (DASHSCOPE_API_KEY) |
| Playwright | **Electron BrowserWindow + bridge.executeJS** |
| 周期性任务 | **手动触发 TaskRunner**（默认单次 30 个） |

→ design HTML 需要按这些更新（见本 phase 之外的 design 校准任务）

## Self-Review

- ✅ 直接迁移 catclaw 生产验证过的 ~1500 行代码
- ✅ 涵盖 spec §2 全部 + design 抽屉 / 扫描 / 邀约弹窗
- ✅ Phase 6 (DM) 复用本 phase 的 Backstage 框架
