FABBI

Technical Intelligence Brief — LLM/Coding Agents

2026-05-28 04:36
Gate: PARTIAL/PUBLISHABLE
Focus: harness, agents, AI SDLC

1Executive Technical Signal

  • Agent harness chuyển từ demo sang packaging. 207 candidates; HN có VAEN/Peers/Unspaghettit/Mneme trong 24h → NEXA nên chuẩn hóa portable harness + replay logs trong 2 tuần. [S01-S09]
  • Reliability/eval là điểm nghẽn chính. DeepSWE, SWE-bench, Terminal-Bench + 33 arXiv signals → SYNCA cần scorecard pass@k, regression, sandbox escape. [S02,S55-S60]
  • Security risk đã thành signal thực dụng. HN/NewStack: AI agents cài package không chủ sở hữu; metric 2 pts/0 cmt nhưng rủi ro supply-chain 4/5 → bật allowlist dependency. [S03]
  • CLI/IDE agent market phân mảnh. 8 product sources: Claude Code, Codex, Cursor, Copilot agent, Jules, Replit Agent, OpenCode, Cody → Fabbi cần abstraction layer thay vì khóa vendor. [P01-P08]
  • Context/org knowledge nổi lên. Ask HN cross-tool org knowledge + Mneme repo-native rules → FARE nên ưu tiên repo map + architecture rules trước autonomous coding scale. [S05,S08]

2KPI Dashboard

207
candidates
87
HN/dev web
75
GitHub repos
33
paper signals
64%
confidence

X/Reddit/Youtube public fallback incomplete trong cron; không invent metrics.

3KOL/OG Feed Watch

PlatformAuthor/channelTimestampEngagementURLWhy CTO cares
HNShow HN: VAEN – Package and import portable AI coding-agent Harnessessjhalani72026-05-27T20:52:31Z4 pts/2 cmtdev discussion
HNDeepSWE Measuring frontier coding agentse2e42026-05-27T19:57:16Z2 pts/1 cmtdev discussion
HNAI coding agents are installing packages no one ownsspeckx2026-05-27T19:39:42Z2 pts/0 cmtdev discussion
HNAsk HN: Examples of products and services created via agentic codingd_silin2026-05-27T18:45:32Z2 pts/0 cmtdev discussion
HNAsk HN: Do coding agents need cross-tool org knowledge? Or, just good to have?srbsa2026-05-27T18:00:06Z2 pts/0 cmtdev discussion
HNI built an agentic coding harness across three CLI hostshomescout2026-05-27T15:43:24Z2 pts/0 cmtdev discussion
HNPeers – Multi-agent AI coding with measurable convergencedash0r2026-05-27T14:51:35Z1 pts/0 cmtdev discussion
HNShow HN: Mneme HQ – repo-native architectural rules for AI coding agentsTval2026-05-27T14:16:48Z1 pts/0 cmtdev discussion
HNAming Claw – Zero-orchestration multi-agent codingaming5572026-05-27T13:59:01Z1 pts/0 cmtdev discussion
HNShow HN: Unspaghettit – executable behavior specs for AI coding agentsD3F2026-05-27T12:06:19Z5 pts/0 cmtdev discussion
HNBill Gates AI on AI (one month later)vbutsomesayw2026-05-27T04:01:44Z3 pts/0 cmtdev discussion
HNShow HN: Simple Sprite Sheet Generationarmcat2026-05-24T19:37:43Z3 pts/0 cmtdev discussion
HNShow HN: My first app, artisanally vibe-coded in 4 monthsjeroen_stulen2026-05-24T10:07:13Z3 pts/4 cmtdev discussion
HNZero – Programming Language for Agentsxendo2026-05-23T11:13:35Z3 pts/0 cmtdev discussion

4Trend Radar

HOT Harness portability, coding-agent eval, CLI orchestration

EMERGING Repo-native architectural rules, behavior specs, org knowledge graph

NOISE Vibe-coded product bragging without eval/security metrics

WATCH DeepSWE vs SWE-bench vs Terminal-Bench practical transfer

5Repo Watch

RepoMetricSignal
GitHubsjsyrek/design-councilsjsyrek2026-05-27T21:35:06Z151 stars/14 forks/0 issuesrepo momentum
GitHubasheshgoplani/agent-deckasheshgoplani2026-05-27T21:34:39Z2544 stars/295 forks/20 issuesrepo momentum
GitHubInsForge/InsForgeInsForge2026-05-27T21:33:35Z10710 stars/919 forks/74 issuesrepo momentum
GitHubinkeep/agentsinkeep2026-05-27T21:32:32Z1166 stars/142 forks/36 issuesrepo momentum
GitHubboldsoftware/shelleyboldsoftware2026-05-27T21:31:59Z479 stars/80 forks/93 issuesrepo momentum
GitHubGLips/Figma-Context-MCPGLips2026-05-27T21:31:58Z14888 stars/1179 forks/25 issuesrepo momentum
GitHubcluesmith/codevcluesmith2026-05-27T21:29:59Z273 stars/36 forks/86 issuesrepo momentum
GitHubyvgude/lean-ctxyvgude2026-05-27T21:27:53Z2221 stars/231 forks/6 issuesrepo momentum
GitHubtontinton/makitontinton2026-05-27T21:27:26Z389 stars/43 forks/17 issuesrepo momentum
GitHubKimYx0207/Meta_KimKimYx02072026-05-27T21:27:18Z145 stars/47 forks/2 issuesrepo momentum

6Paper / Benchmark Watch

TypeSourceMetricImplication
Paper/arXivGoverned Evolution of Agent Runtimes through Executable Operational CognitionarXiv2026-05-26T17:36:48Zpaperresearch direction
Paper/arXivEviACT: An Evidence-to-Action Framework for Agentic Program RepairarXiv2026-05-26T16:17:47Zpaperresearch direction
Paper/arXivProDebug: An Automated Debugging System for PrologarXiv2026-05-26T14:57:50Zpaperresearch direction
Paper/arXivConVer: Using Contracts and Loop Invariant Synthesis for Scalable Formal Software VerificationarXiv2026-05-26T14:04:40Zpaperresearch direction
Paper/arXivNeuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)arXiv2026-05-26T12:32:17Zpaperresearch direction
Paper/arXivStrategies for Guiding LLMs to Use Software Design Patterns: A Case of SingletonarXiv2026-05-26T11:58:23Zpaperresearch direction
Paper/arXivLLM-based Mockless Unit Test Generation for JavaarXiv2026-05-26T11:08:04Zpaperresearch direction
Paper/arXivHTMLCure: Turning Browser Experience into State Guided Repair for Interactive HTMLarXiv2026-05-26T10:22:56Zpaperresearch direction
Paper/arXivTrajAudit: Automated Failure Diagnosis for Agentic Coding SystemsarXiv2026-05-26T05:24:37Zpaperresearch direction
Paper/arXivTesting Agentic Workflows with Structural Coverage CriteriaarXiv2026-05-26T04:07:55Zpaperresearch direction
Paper/arXivVerus-SpecGym: An Agentic Environment for Evaluating Specification AutoformalizationarXiv2026-05-26T02:12:48Zpaperresearch direction
Paper/arXivBeyond Summaries: Structure-Aware Labeling of Code Changes with Large Language ModelsarXiv2026-05-25T17:56:46Zpaperresearch direction

7Product / Business Watch

ProductLinkMetricFabbi use
ProductAnthropic Claude Code docsAnthropic Claude Code docs2026-05official sourceproduct/adoption
ProductOpenAI CodexOpenAI Codex2026-05official sourceproduct/adoption
ProductCursor changelogCursor changelog2026-05official sourceproduct/adoption
ProductGitHub Copilot coding agentGitHub Copilot coding agent2026-05official sourceproduct/adoption
ProductSourcegraph CodySourcegraph Cody2026-05official sourceproduct/adoption
ProductOpenCodeOpenCode2026-05official sourceproduct/adoption
ProductGoogle JulesGoogle Jules2026-05official sourceproduct/adoption
ProductReplit AgentReplit Agent2026-05official sourceproduct/adoption

8Impact Coverage

DomainNow 0-2wNext 1-2mLater 3-6mDecision
FARERepo map + rulesCross-tool knowledge indexArchitecture memorytrial
NEXAPortable harnessCLI adapter matrixMulti-agent runtimeadopt
SYNCAEval gatesRisk scoringGovernance consoleadopt
DOMUSLow signalInternal ops agentDomain workflow agentmonitor
Japan/VN/GlobalVendor-neutral PoCJP enterprise security storyManaged AI-SDLC servicetrial

9CTO Evaluation Matrix

SignalThesisEvidenceCounterFabbiConfDecisionNext validation
Harness portabilityWinner is workflow layer9 HN/dev items/24hEngagement lowNEXA core70%trial3 repos x 20 tasks
Eval/reliabilityAdoption blocked by trust33 papers + 3 benchmarksBenchmark transfer unknownSYNCA gate76%adoptpass@k baseline
Supply-chainAgents amplify package riskNewStack/HN signal1 source onlySecurity policy58%watchsimulate package install
Context layerOrg knowledge improves agent precisionAsk HN + MnemeNo hard ROI yetFARE differentiator64%trialretrieval ablation

10CTO Recommendations

1. Build NEXA harness adapter for 3 CLIs. ROI/time-saving 18-28%; risk 2/5; owner AI Platform Lead; TTV 10 ngày; validate 60 tasks, pass@1/pass@3.
2. Add SYNCA eval gate before agent PR merge. ROI/time-saving 12-20%; risk 2/5; owner QA Automation Lead; TTV 7 ngày; validate regression suite + flaky rate.
3. Prototype FARE repo-context map. ROI/time-saving 10-16%; risk 3/5; owner Solution Architect; TTV 14 ngày; validate retrieval hit-rate + diff quality.
4. Enforce dependency allowlist/sandbox. ROI/risk-avoidance 20-35%; risk 1/5; owner Security Lead; TTV 5 ngày; validate blocked unknown packages.

Watch 2-4 tuần: DeepSWE/Terminal-Bench adoption, Codex/Copilot agent enterprise controls. Ignore: demos thiếu eval/security/ROI.

11Detailed Source Appendix

PlatformSourceAuthorTimeMetricSignal
HNShow HN: VAEN – Package and import portable AI coding-agent Harnessessjhalani72026-05-27T20:52:31Z4 pts/2 cmtdev discussion
HNDeepSWE Measuring frontier coding agentse2e42026-05-27T19:57:16Z2 pts/1 cmtdev discussion
HNAI coding agents are installing packages no one ownsspeckx2026-05-27T19:39:42Z2 pts/0 cmtdev discussion
HNAsk HN: Examples of products and services created via agentic codingd_silin2026-05-27T18:45:32Z2 pts/0 cmtdev discussion
HNAsk HN: Do coding agents need cross-tool org knowledge? Or, just good to have?srbsa2026-05-27T18:00:06Z2 pts/0 cmtdev discussion
HNI built an agentic coding harness across three CLI hostshomescout2026-05-27T15:43:24Z2 pts/0 cmtdev discussion
HNPeers – Multi-agent AI coding with measurable convergencedash0r2026-05-27T14:51:35Z1 pts/0 cmtdev discussion
HNShow HN: Mneme HQ – repo-native architectural rules for AI coding agentsTval2026-05-27T14:16:48Z1 pts/0 cmtdev discussion
HNAming Claw – Zero-orchestration multi-agent codingaming5572026-05-27T13:59:01Z1 pts/0 cmtdev discussion
HNShow HN: Unspaghettit – executable behavior specs for AI coding agentsD3F2026-05-27T12:06:19Z5 pts/0 cmtdev discussion
HNBill Gates AI on AI (one month later)vbutsomesayw2026-05-27T04:01:44Z3 pts/0 cmtdev discussion
HNShow HN: Simple Sprite Sheet Generationarmcat2026-05-24T19:37:43Z3 pts/0 cmtdev discussion
HNShow HN: My first app, artisanally vibe-coded in 4 monthsjeroen_stulen2026-05-24T10:07:13Z3 pts/4 cmtdev discussion
HNZero – Programming Language for Agentsxendo2026-05-23T11:13:35Z3 pts/0 cmtdev discussion
HNShow HN: opub, donated compute for open-sourcegoodroot2026-05-21T14:59:15Z2 pts/0 cmtdev discussion
HNZero: The Programming Language for Agentsafshinmeh2026-05-19T20:19:46Z3 pts/0 cmtdev discussion
HNShow HN: Korveo – a local firewall for AI agentsamitbidlan2026-05-19T17:40:39Z1 pts/3 cmtdev discussion
HNThe Programming Language for AgentsMarius772026-05-19T14:09:50Z20 pts/7 cmtdev discussion
HNVercel's Zero: A Programming Language Designed for AI Agentssteveharing12026-05-17T20:25:40Z5 pts/2 cmtdev discussion
HNShow HN: Sneakily steer candidates toward naive brute-force solutionsabr0ahm2026-05-27T20:48:44Z1 pts/0 cmtdev discussion
HNResearcher "gave Claude Code 'ADHD' and it thinks 2x better now."udit_502026-05-27T20:38:56Z1 pts/0 cmtdev discussion
HNIs Amp more or less expensive than Claude Code? Is it better?markosn2026-05-27T19:19:46Z2 pts/0 cmtdev discussion
HNCC-Wiki: Turn Claude Code sessions into a shareable knowledge base wikitejpal-diffuse2026-05-27T18:57:02Z4 pts/2 cmtdev discussion
HNClaude Code's creator on the end of the software engineerspeckx2026-05-27T18:05:42Z1 pts/0 cmtdev discussion
HNShow HN: Claude Code's $200 plan is a 17× subsidy on the raw APIHiteshjain1182026-05-27T17:25:53Z5 pts/7 cmtdev discussion
HNShow HN: GTFS·X – a free, web-based transit schedule (GTFS) editormarkegge2026-05-27T17:08:39Z1 pts/0 cmtdev discussion
HNShow HN: Hm – a task runner with a Python DSL, growing into a CI/CD systemsuis_siva2026-05-27T16:41:36Z11 pts/0 cmtdev discussion
HNShow HN: Workplane – collaborative filesystem for humans and AItweezers0x2026-05-27T16:22:41Z5 pts/0 cmtdev discussion
HNCodex has dethroned Claude as the king of AI programminggalaxyLogic2026-05-27T15:53:55Z3 pts/1 cmtdev discussion
HNBuilding self-improving tax agents with Codexdnw2026-05-27T15:48:40Z2 pts/0 cmtdev discussion
HNThe Codex Showcasewordsaboutcode2026-05-27T03:00:38Z4 pts/0 cmtdev discussion
HNBuilding a safe, effective sandbox to enable Codex on Windowsgmays2026-05-26T21:37:19Z1 pts/0 cmtdev discussion
HNShow HN: PrismCat – Local transparent proxy and debugging console for LLM APIsetgpao2026-05-26T13:11:26Z2 pts/2 cmtdev discussion
HNWhy codex /goal fails on complex workflows: compaction amnesia and context rotshaurya-sethi2026-05-26T06:33:40Z1 pts/0 cmtdev discussion
HNCodex is flagged as malware on macOSvldszn2026-05-23T22:50:35Z3 pts/4 cmtdev discussion
HNTell HN: OpenAI Codex: Increase in users hitting Codex rate limitsembedding-shape2026-05-23T13:42:10Z6 pts/4 cmtdev discussion
HNOpenAI intentionally removed Codex's visible context usage indicatorgobdovan2026-05-22T20:51:01Z2 pts/1 cmtdev discussion
HNOpenAI's Codex Can Now Use Your Mac Even When It's Lockedtosh2026-05-22T11:49:09Z1 pts/0 cmtdev discussion
HNWindows computer-use: synthetic cursors for background agentsfrabonacci2026-05-27T18:48:20Z2 pts/0 cmtdev discussion
HNShow HN: GridPath – Faster and Better Agent for Spreadsheets (Tauri, Rust)pixelmash132026-05-27T15:14:11Z1 pts/0 cmtdev discussion
HNFlowLink: MCP proxy blocking destructive AI agent commandsbraincoder2026-05-26T18:01:25Z1 pts/0 cmtdev discussion
HNShow HN: Chunk sidecars for validating agent-generated code before pushing to CIolafmol2026-05-26T15:41:32Z1 pts/2 cmtdev discussion
HNShow HN: Agent Launch – One CLI for Codex, Claude Code, Cursor, Gemini, OpenCodedhruv_anand2026-05-26T11:18:03Z2 pts/0 cmtdev discussion
HNIs it too soon to built software factories?Bnowako2026-05-25T16:39:32Z4 pts/4 cmtdev discussion
HNShow HN: I built a RAG and knowledge graph agent that runs locallygabriel_oauth2026-05-23T16:06:25Z7 pts/7 cmtdev discussion
HNShow HN: I built a powerful RAG and knowledge graph agent that runs locallyGabrielBlessed2026-05-23T11:05:43Z5 pts/3 cmtdev discussion
HNShow HN: 97% on SWE-bench Verified with subscription-token agentskimjune012026-05-24T18:03:28Z2 pts/0 cmtdev discussion
HNBito's AI Architect Boosts Claude Opus's task success rate by 35%Sushrutkm2026-05-19T10:02:03Z2 pts/0 cmtdev discussion
HNShow HN: Statewright – Visual state machines that make AI agents reliableazurewraith2026-05-12T14:24:55Z126 pts/59 cmtdev discussion
HNShow HN: New Benchmark from SWE-bench team is 0% solvedlieret2026-05-05T15:10:41Z24 pts/3 cmtdev discussion
HNtalkie-coder: From 1930 to SWE-benchPhilpax2026-05-02T21:35:54Z2 pts/0 cmtdev discussion
HNAnthropic's Argument for Mythos SWE-bench improvement contains a fatal errorjryio2026-04-29T19:16:48Z2 pts/0 cmtdev discussion
HNSWE-bench Verified no longer measures frontier coding capabilitieskmdupree2026-04-26T13:58:13Z343 pts/181 cmtdev discussion
HNShow HN: Codex context bloat? 87% avg reduction on SWE-bench Verified tracesgeorge_ciobanu2026-04-24T21:34:31Z10 pts/2 cmtdev discussion
HNThe Terminal Bench 3.0 community is looking for task contributorsneversettles2026-05-03T03:40:04Z1 pts/2 cmtdev discussion
HNForgeCode: Top open source coding agent in Terminal-Bench 2.0gk12026-04-29T18:16:23Z4 pts/0 cmtdev discussion
HNOpen-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025)ubermon2026-04-28T19:11:57Z6 pts/9 cmtdev discussion
HNShow HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-previewGodelNumbering2026-04-27T12:35:55Z393 pts/148 cmtdev discussion
HNShow HN: Terminal-Wrench, a dataset of 331 realistic hackable environmentsneversupervised2026-04-15T00:42:30Z6 pts/2 cmtdev discussion
HNA simple test-time method that beats Claude Mythos on Terminal-Benchjackykwok2026-04-14T20:27:39Z1 pts/1 cmtdev discussion

12Data Quality / Scan Health Appendix

Scanned 207 candidates: {'HN': 87, 'GitHub': 75, 'Paper/arXiv': 33, 'Product': 8, 'PublicWeb/KOL': 4}. PASS source volume; PARTIAL social completeness: Reddit/X/Youtube/Facebook collectors bị hạn chế public/API trong cron; dùng HN/dev web, GitHub, arXiv, product official, public KOL fallback. Engagement N/A khi RSS/public page không cung cấp. Confidence 64% vì thiếu X/FB direct metrics.