FABBI

Technical Intelligence Brief — LLM/Coding Agents

2026-05-28 04:36
Gate: PARTIAL/PUBLISHABLE
Focus: harness, agents, AI SDLC

1Executive Technical Signal

Agent harness chuyển từ demo sang packaging. 207 candidates; HN có VAEN/Peers/Unspaghettit/Mneme trong 24h → NEXA nên chuẩn hóa portable harness + replay logs trong 2 tuần. [S01-S09]
Reliability/eval là điểm nghẽn chính. DeepSWE, SWE-bench, Terminal-Bench + 33 arXiv signals → SYNCA cần scorecard pass@k, regression, sandbox escape. [S02,S55-S60]
Security risk đã thành signal thực dụng. HN/NewStack: AI agents cài package không chủ sở hữu; metric 2 pts/0 cmt nhưng rủi ro supply-chain 4/5 → bật allowlist dependency. [S03]
CLI/IDE agent market phân mảnh. 8 product sources: Claude Code, Codex, Cursor, Copilot agent, Jules, Replit Agent, OpenCode, Cody → Fabbi cần abstraction layer thay vì khóa vendor. [P01-P08]
Context/org knowledge nổi lên. Ask HN cross-tool org knowledge + Mneme repo-native rules → FARE nên ưu tiên repo map + architecture rules trước autonomous coding scale. [S05,S08]

2KPI Dashboard

207

candidates

HN/dev web

GitHub repos

paper signals

64%

confidence

X/Reddit/Youtube public fallback incomplete trong cron; không invent metrics.

3KOL/OG Feed Watch

Platform	Author/channel	Timestamp	Engagement	URL	Why CTO cares
HN	Show HN: VAEN – Package and import portable AI coding-agent Harnesses	sjhalani7	2026-05-27T20:52:31Z	4 pts/2 cmt	dev discussion
HN	DeepSWE Measuring frontier coding agents	e2e4	2026-05-27T19:57:16Z	2 pts/1 cmt	dev discussion
HN	AI coding agents are installing packages no one owns	speckx	2026-05-27T19:39:42Z	2 pts/0 cmt	dev discussion
HN	Ask HN: Examples of products and services created via agentic coding	d_silin	2026-05-27T18:45:32Z	2 pts/0 cmt	dev discussion
HN	Ask HN: Do coding agents need cross-tool org knowledge? Or, just good to have?	srbsa	2026-05-27T18:00:06Z	2 pts/0 cmt	dev discussion
HN	I built an agentic coding harness across three CLI hosts	homescout	2026-05-27T15:43:24Z	2 pts/0 cmt	dev discussion
HN	Peers – Multi-agent AI coding with measurable convergence	dash0r	2026-05-27T14:51:35Z	1 pts/0 cmt	dev discussion
HN	Show HN: Mneme HQ – repo-native architectural rules for AI coding agents	Tval	2026-05-27T14:16:48Z	1 pts/0 cmt	dev discussion
HN	Aming Claw – Zero-orchestration multi-agent coding	aming557	2026-05-27T13:59:01Z	1 pts/0 cmt	dev discussion
HN	Show HN: Unspaghettit – executable behavior specs for AI coding agents	D3F	2026-05-27T12:06:19Z	5 pts/0 cmt	dev discussion
HN	Bill Gates AI on AI (one month later)	vbutsomesayw	2026-05-27T04:01:44Z	3 pts/0 cmt	dev discussion
HN	Show HN: Simple Sprite Sheet Generation	armcat	2026-05-24T19:37:43Z	3 pts/0 cmt	dev discussion
HN	Show HN: My first app, artisanally vibe-coded in 4 months	jeroen_stulen	2026-05-24T10:07:13Z	3 pts/4 cmt	dev discussion
HN	Zero – Programming Language for Agents	xendo	2026-05-23T11:13:35Z	3 pts/0 cmt	dev discussion

4Trend Radar

HOT Harness portability, coding-agent eval, CLI orchestration

EMERGING Repo-native architectural rules, behavior specs, org knowledge graph

NOISE Vibe-coded product bragging without eval/security metrics

WATCH DeepSWE vs SWE-bench vs Terminal-Bench practical transfer

5Repo Watch

Repo	Metric	Signal
GitHub	sjsyrek/design-council	sjsyrek	2026-05-27T21:35:06Z	151 stars/14 forks/0 issues	repo momentum
GitHub	asheshgoplani/agent-deck	asheshgoplani	2026-05-27T21:34:39Z	2544 stars/295 forks/20 issues	repo momentum
GitHub	InsForge/InsForge	InsForge	2026-05-27T21:33:35Z	10710 stars/919 forks/74 issues	repo momentum
GitHub	inkeep/agents	inkeep	2026-05-27T21:32:32Z	1166 stars/142 forks/36 issues	repo momentum
GitHub	boldsoftware/shelley	boldsoftware	2026-05-27T21:31:59Z	479 stars/80 forks/93 issues	repo momentum
GitHub	GLips/Figma-Context-MCP	GLips	2026-05-27T21:31:58Z	14888 stars/1179 forks/25 issues	repo momentum
GitHub	cluesmith/codev	cluesmith	2026-05-27T21:29:59Z	273 stars/36 forks/86 issues	repo momentum
GitHub	yvgude/lean-ctx	yvgude	2026-05-27T21:27:53Z	2221 stars/231 forks/6 issues	repo momentum
GitHub	tontinton/maki	tontinton	2026-05-27T21:27:26Z	389 stars/43 forks/17 issues	repo momentum
GitHub	KimYx0207/Meta_Kim	KimYx0207	2026-05-27T21:27:18Z	145 stars/47 forks/2 issues	repo momentum

6Paper / Benchmark Watch

Type	Source	Metric	Implication
Paper/arXiv	Governed Evolution of Agent Runtimes through Executable Operational Cognition	arXiv	2026-05-26T17:36:48Z	paper	research direction
Paper/arXiv	EviACT: An Evidence-to-Action Framework for Agentic Program Repair	arXiv	2026-05-26T16:17:47Z	paper	research direction
Paper/arXiv	ProDebug: An Automated Debugging System for Prolog	arXiv	2026-05-26T14:57:50Z	paper	research direction
Paper/arXiv	ConVer: Using Contracts and Loop Invariant Synthesis for Scalable Formal Software Verification	arXiv	2026-05-26T14:04:40Z	paper	research direction
Paper/arXiv	Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)	arXiv	2026-05-26T12:32:17Z	paper	research direction
Paper/arXiv	Strategies for Guiding LLMs to Use Software Design Patterns: A Case of Singleton	arXiv	2026-05-26T11:58:23Z	paper	research direction
Paper/arXiv	LLM-based Mockless Unit Test Generation for Java	arXiv	2026-05-26T11:08:04Z	paper	research direction
Paper/arXiv	HTMLCure: Turning Browser Experience into State Guided Repair for Interactive HTML	arXiv	2026-05-26T10:22:56Z	paper	research direction
Paper/arXiv	TrajAudit: Automated Failure Diagnosis for Agentic Coding Systems	arXiv	2026-05-26T05:24:37Z	paper	research direction
Paper/arXiv	Testing Agentic Workflows with Structural Coverage Criteria	arXiv	2026-05-26T04:07:55Z	paper	research direction
Paper/arXiv	Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization	arXiv	2026-05-26T02:12:48Z	paper	research direction
Paper/arXiv	Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models	arXiv	2026-05-25T17:56:46Z	paper	research direction

7Product / Business Watch

Product	Link	Metric	Fabbi use
Product	Anthropic Claude Code docs	Anthropic Claude Code docs	2026-05	official source	product/adoption
Product	OpenAI Codex	OpenAI Codex	2026-05	official source	product/adoption
Product	Cursor changelog	Cursor changelog	2026-05	official source	product/adoption
Product	GitHub Copilot coding agent	GitHub Copilot coding agent	2026-05	official source	product/adoption
Product	Sourcegraph Cody	Sourcegraph Cody	2026-05	official source	product/adoption
Product	OpenCode	OpenCode	2026-05	official source	product/adoption
Product	Google Jules	Google Jules	2026-05	official source	product/adoption
Product	Replit Agent	Replit Agent	2026-05	official source	product/adoption

8Impact Coverage

Domain	Now 0-2w	Next 1-2m	Later 3-6m	Decision
FARE	Repo map + rules	Cross-tool knowledge index	Architecture memory	trial
NEXA	Portable harness	CLI adapter matrix	Multi-agent runtime	adopt
SYNCA	Eval gates	Risk scoring	Governance console	adopt
DOMUS	Low signal	Internal ops agent	Domain workflow agent	monitor
Japan/VN/Global	Vendor-neutral PoC	JP enterprise security story	Managed AI-SDLC service	trial

9CTO Evaluation Matrix

Signal	Thesis	Evidence	Counter	Fabbi	Conf	Decision	Next validation
Harness portability	Winner is workflow layer	9 HN/dev items/24h	Engagement low	NEXA core	70%	trial	3 repos x 20 tasks
Eval/reliability	Adoption blocked by trust	33 papers + 3 benchmarks	Benchmark transfer unknown	SYNCA gate	76%	adopt	pass@k baseline
Supply-chain	Agents amplify package risk	NewStack/HN signal	1 source only	Security policy	58%	watch	simulate package install
Context layer	Org knowledge improves agent precision	Ask HN + Mneme	No hard ROI yet	FARE differentiator	64%	trial	retrieval ablation

10CTO Recommendations

1. Build NEXA harness adapter for 3 CLIs. ROI/time-saving 18-28%; risk 2/5; owner AI Platform Lead; TTV 10 ngày; validate 60 tasks, pass@1/pass@3.

2. Add SYNCA eval gate before agent PR merge. ROI/time-saving 12-20%; risk 2/5; owner QA Automation Lead; TTV 7 ngày; validate regression suite + flaky rate.

3. Prototype FARE repo-context map. ROI/time-saving 10-16%; risk 3/5; owner Solution Architect; TTV 14 ngày; validate retrieval hit-rate + diff quality.

4. Enforce dependency allowlist/sandbox. ROI/risk-avoidance 20-35%; risk 1/5; owner Security Lead; TTV 5 ngày; validate blocked unknown packages.

Watch 2-4 tuần: DeepSWE/Terminal-Bench adoption, Codex/Copilot agent enterprise controls. Ignore: demos thiếu eval/security/ROI.

11Detailed Source Appendix

Platform	Source	Author	Time	Metric	Signal
HN	Show HN: VAEN – Package and import portable AI coding-agent Harnesses	sjhalani7	2026-05-27T20:52:31Z	4 pts/2 cmt	dev discussion
HN	DeepSWE Measuring frontier coding agents	e2e4	2026-05-27T19:57:16Z	2 pts/1 cmt	dev discussion
HN	AI coding agents are installing packages no one owns	speckx	2026-05-27T19:39:42Z	2 pts/0 cmt	dev discussion
HN	Ask HN: Examples of products and services created via agentic coding	d_silin	2026-05-27T18:45:32Z	2 pts/0 cmt	dev discussion
HN	Ask HN: Do coding agents need cross-tool org knowledge? Or, just good to have?	srbsa	2026-05-27T18:00:06Z	2 pts/0 cmt	dev discussion
HN	I built an agentic coding harness across three CLI hosts	homescout	2026-05-27T15:43:24Z	2 pts/0 cmt	dev discussion
HN	Peers – Multi-agent AI coding with measurable convergence	dash0r	2026-05-27T14:51:35Z	1 pts/0 cmt	dev discussion
HN	Show HN: Mneme HQ – repo-native architectural rules for AI coding agents	Tval	2026-05-27T14:16:48Z	1 pts/0 cmt	dev discussion
HN	Aming Claw – Zero-orchestration multi-agent coding	aming557	2026-05-27T13:59:01Z	1 pts/0 cmt	dev discussion
HN	Show HN: Unspaghettit – executable behavior specs for AI coding agents	D3F	2026-05-27T12:06:19Z	5 pts/0 cmt	dev discussion
HN	Bill Gates AI on AI (one month later)	vbutsomesayw	2026-05-27T04:01:44Z	3 pts/0 cmt	dev discussion
HN	Show HN: Simple Sprite Sheet Generation	armcat	2026-05-24T19:37:43Z	3 pts/0 cmt	dev discussion
HN	Show HN: My first app, artisanally vibe-coded in 4 months	jeroen_stulen	2026-05-24T10:07:13Z	3 pts/4 cmt	dev discussion
HN	Zero – Programming Language for Agents	xendo	2026-05-23T11:13:35Z	3 pts/0 cmt	dev discussion
HN	Show HN: opub, donated compute for open-source	goodroot	2026-05-21T14:59:15Z	2 pts/0 cmt	dev discussion
HN	Zero: The Programming Language for Agents	afshinmeh	2026-05-19T20:19:46Z	3 pts/0 cmt	dev discussion
HN	Show HN: Korveo – a local firewall for AI agents	amitbidlan	2026-05-19T17:40:39Z	1 pts/3 cmt	dev discussion
HN	The Programming Language for Agents	Marius77	2026-05-19T14:09:50Z	20 pts/7 cmt	dev discussion
HN	Vercel's Zero: A Programming Language Designed for AI Agents	steveharing1	2026-05-17T20:25:40Z	5 pts/2 cmt	dev discussion
HN	Show HN: Sneakily steer candidates toward naive brute-force solutions	abr0ahm	2026-05-27T20:48:44Z	1 pts/0 cmt	dev discussion
HN	Researcher "gave Claude Code 'ADHD' and it thinks 2x better now."	udit_50	2026-05-27T20:38:56Z	1 pts/0 cmt	dev discussion
HN	Is Amp more or less expensive than Claude Code? Is it better?	markosn	2026-05-27T19:19:46Z	2 pts/0 cmt	dev discussion
HN	CC-Wiki: Turn Claude Code sessions into a shareable knowledge base wiki	tejpal-diffuse	2026-05-27T18:57:02Z	4 pts/2 cmt	dev discussion
HN	Claude Code's creator on the end of the software engineer	speckx	2026-05-27T18:05:42Z	1 pts/0 cmt	dev discussion
HN	Show HN: Claude Code's $200 plan is a 17× subsidy on the raw API	Hiteshjain118	2026-05-27T17:25:53Z	5 pts/7 cmt	dev discussion
HN	Show HN: GTFS·X – a free, web-based transit schedule (GTFS) editor	markegge	2026-05-27T17:08:39Z	1 pts/0 cmt	dev discussion
HN	Show HN: Hm – a task runner with a Python DSL, growing into a CI/CD system	suis_siva	2026-05-27T16:41:36Z	11 pts/0 cmt	dev discussion
HN	Show HN: Workplane – collaborative filesystem for humans and AI	tweezers0x	2026-05-27T16:22:41Z	5 pts/0 cmt	dev discussion
HN	Codex has dethroned Claude as the king of AI programming	galaxyLogic	2026-05-27T15:53:55Z	3 pts/1 cmt	dev discussion
HN	Building self-improving tax agents with Codex	dnw	2026-05-27T15:48:40Z	2 pts/0 cmt	dev discussion
HN	The Codex Showcase	wordsaboutcode	2026-05-27T03:00:38Z	4 pts/0 cmt	dev discussion
HN	Building a safe, effective sandbox to enable Codex on Windows	gmays	2026-05-26T21:37:19Z	1 pts/0 cmt	dev discussion
HN	Show HN: PrismCat – Local transparent proxy and debugging console for LLM APIs	etgpao	2026-05-26T13:11:26Z	2 pts/2 cmt	dev discussion
HN	Why codex /goal fails on complex workflows: compaction amnesia and context rot	shaurya-sethi	2026-05-26T06:33:40Z	1 pts/0 cmt	dev discussion
HN	Codex is flagged as malware on macOS	vldszn	2026-05-23T22:50:35Z	3 pts/4 cmt	dev discussion
HN	Tell HN: OpenAI Codex: Increase in users hitting Codex rate limits	embedding-shape	2026-05-23T13:42:10Z	6 pts/4 cmt	dev discussion
HN	OpenAI intentionally removed Codex's visible context usage indicator	gobdovan	2026-05-22T20:51:01Z	2 pts/1 cmt	dev discussion
HN	OpenAI's Codex Can Now Use Your Mac Even When It's Locked	tosh	2026-05-22T11:49:09Z	1 pts/0 cmt	dev discussion
HN	Windows computer-use: synthetic cursors for background agents	frabonacci	2026-05-27T18:48:20Z	2 pts/0 cmt	dev discussion
HN	Show HN: GridPath – Faster and Better Agent for Spreadsheets (Tauri, Rust)	pixelmash13	2026-05-27T15:14:11Z	1 pts/0 cmt	dev discussion
HN	FlowLink: MCP proxy blocking destructive AI agent commands	braincoder	2026-05-26T18:01:25Z	1 pts/0 cmt	dev discussion
HN	Show HN: Chunk sidecars for validating agent-generated code before pushing to CI	olafmol	2026-05-26T15:41:32Z	1 pts/2 cmt	dev discussion
HN	Show HN: Agent Launch – One CLI for Codex, Claude Code, Cursor, Gemini, OpenCode	dhruv_anand	2026-05-26T11:18:03Z	2 pts/0 cmt	dev discussion
HN	Is it too soon to built software factories?	Bnowako	2026-05-25T16:39:32Z	4 pts/4 cmt	dev discussion
HN	Show HN: I built a RAG and knowledge graph agent that runs locally	gabriel_oauth	2026-05-23T16:06:25Z	7 pts/7 cmt	dev discussion
HN	Show HN: I built a powerful RAG and knowledge graph agent that runs locally	GabrielBlessed	2026-05-23T11:05:43Z	5 pts/3 cmt	dev discussion
HN	Show HN: 97% on SWE-bench Verified with subscription-token agents	kimjune01	2026-05-24T18:03:28Z	2 pts/0 cmt	dev discussion
HN	Bito's AI Architect Boosts Claude Opus's task success rate by 35%	Sushrutkm	2026-05-19T10:02:03Z	2 pts/0 cmt	dev discussion
HN	Show HN: Statewright – Visual state machines that make AI agents reliable	azurewraith	2026-05-12T14:24:55Z	126 pts/59 cmt	dev discussion
HN	Show HN: New Benchmark from SWE-bench team is 0% solved	lieret	2026-05-05T15:10:41Z	24 pts/3 cmt	dev discussion
HN	talkie-coder: From 1930 to SWE-bench	Philpax	2026-05-02T21:35:54Z	2 pts/0 cmt	dev discussion
HN	Anthropic's Argument for Mythos SWE-bench improvement contains a fatal error	jryio	2026-04-29T19:16:48Z	2 pts/0 cmt	dev discussion
HN	SWE-bench Verified no longer measures frontier coding capabilities	kmdupree	2026-04-26T13:58:13Z	343 pts/181 cmt	dev discussion
HN	Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified traces	george_ciobanu	2026-04-24T21:34:31Z	10 pts/2 cmt	dev discussion
HN	The Terminal Bench 3.0 community is looking for task contributors	neversettles	2026-05-03T03:40:04Z	1 pts/2 cmt	dev discussion
HN	ForgeCode: Top open source coding agent in Terminal-Bench 2.0	gk1	2026-04-29T18:16:23Z	4 pts/0 cmt	dev discussion
HN	Open-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025)	ubermon	2026-04-28T19:11:57Z	6 pts/9 cmt	dev discussion
HN	Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview	GodelNumbering	2026-04-27T12:35:55Z	393 pts/148 cmt	dev discussion
HN	Show HN: Terminal-Wrench, a dataset of 331 realistic hackable environments	neversupervised	2026-04-15T00:42:30Z	6 pts/2 cmt	dev discussion
HN	A simple test-time method that beats Claude Mythos on Terminal-Bench	jackykwok	2026-04-14T20:27:39Z	1 pts/1 cmt	dev discussion

12Data Quality / Scan Health Appendix

Scanned 207 candidates: {'HN': 87, 'GitHub': 75, 'Paper/arXiv': 33, 'Product': 8, 'PublicWeb/KOL': 4}. PASS source volume; PARTIAL social completeness: Reddit/X/Youtube/Facebook collectors bị hạn chế public/API trong cron; dùng HN/dev web, GitHub, arXiv, product official, public KOL fallback. Engagement N/A khi RSS/public page không cung cấp. Confidence 64% vì thiếu X/FB direct metrics.