ViTeX-Bench Leaderboard

#	Method	Authors / Org	Src	TextScore↑	SeqAcc↑	CharAcc↑	TTS↑	Flk_f↓	Flk_c↓	Wp_f↓	Wp_c↓	MUSIQ_f↑	MUSIQ_c↑	PSNR↑	SSIM↑	LPIPS↓	DSim↓	Links	Fam
1	TextCtrl	Zeng et al., 2024	Admin	0.5624	0.475	0.734	0.511	3.80	4.29	1.59	2.09	70.32	42.77	41.14	0.994	0.008	0.0043	📄🔗	A
2	ViTeX-Edit-14B (Composite)	Anonymous (NeurIPS 2026 D&B submission)	Admin	0.5410	0.345	0.689	0.666	3.73	3.83	1.51	1.56	70.27	44.94	42.95	0.993	0.006	0.0023	🔗	Ref
3	ViTeX-Edit-14B	Anonymous (NeurIPS 2026 D&B submission)	Admin	0.5338	0.341	0.688	0.648	3.27	3.42	1.55	1.53	69.64	43.53	29.08	0.951	0.060	0.0235	🔗	Ref
4	VideoPainter	Bian et al., 2025	Admin	0.5151	0.364	0.619	0.606	2.38	2.62	2.93	3.35	67.16	40.59	28.56	0.915	0.104	0.0239	📄🔗	C
5	FLUX-Text	Chen et al., 2025	Admin	0.5023	0.528	0.737	0.326	5.11	14.81	3.03	13.01	70.26	43.85	31.49	0.975	0.029	0.0120	📄🔗	A
6	RS-STE	Zhao et al., 2025	Admin	0.4908	0.354	0.626	0.534	3.73	3.66	1.61	1.81	69.57	34.26	37.00	0.983	0.024	0.0073	📄🔗	A
7	AnyText2	Tuo et al., 2024	Admin	0.4074	0.280	0.633	0.382	3.34	4.95	2.04	3.95	66.68	41.65	25.56	0.905	0.091	0.0431	📄🔗	A
8	TextCtrl + AnyV2V	Composite of Zeng 2024 + Ku 2024	Admin	0.1649	0.057	0.308	0.257	4.98	4.98	4.11	3.97	69.41	33.85	21.08	0.785	0.225	0.0732		B
9	Identity (sanity)	—	Admin	0.0000	0.000	0.317	0.760	3.72	3.68	1.46	1.27	70.33	45.12	100.00	1.000	0.000	-0.0000		—
10	Wan2.1-VACE-14B	Wan-AI, 2025	Admin	0.0000	0.000	0.298	0.689	3.78	3.84	1.69	1.56	70.54	45.26	35.21	0.976	0.022	0.0071	📄🔗	C
11	Kling Video 3.0 Omni	Kuaishou (closed)	Admin	0.0000	0.000	0.208	0.641	4.25	4.08	3.12	2.90	72.23	47.75	21.18	0.843	0.176	0.0608		D

Ranked by TextScore, the geometric mean of the three text-correctness primitives. Click any column header to re-sort by that metric. ↑ higher-better, ↓ lower-better. Flk = Flicker, Wp = Warp, DSim = DreamSim; subscripts f / c = full-frame / text-crop scope. A † next to a method name marks a published caveat (hover for details).

Upload the eval.json produced by bash scripts/run_benchmark.sh <method> in the Benchmark code repo. All 13 metrics and TextScore are read directly from the JSON; fill the method metadata below.

If the admin passphrase field is filled with the correct value, the entry is published immediately with the Admin source badge and skips the review queue. Otherwise the submission is queued for the maintainers to approve.

Submissions — 11 total · 0 pending · 11 approved · 0 rejected.

Space secrets — ADMIN_PASSPHRASE: ✅ set · HF_TOKEN: ✅ set.

Configure ADMIN_PASSPHRASE (auth) and HF_TOKEN (persistence) in Space → Settings → Variables and secrets.

ViTeX-Bench evaluates video scene text editing on a frozen 157-clip test split sourced from Panda-70M and InternVid, across three orthogonal axes:

Text correctness — SeqAcc, CharAcc, TTS via PP-OCRv5 with per-clip language routing and substring edit distance.
Visual quality — Flicker, Warp (RAFT-flow), MUSIQ; each at full-frame and text-crop scope.
Edit locality — PSNR, SSIM, LPIPS on a locality-only prediction, plus DreamSim for VAE-noise-robust similarity.

Ranking key TextScore = ∛(SeqAcc · CharAcc · TTS). Each primitive is natively in [0, 1]; SeqAcc = 0 collapses TextScore to zero — the intended semantics for methods that never produce the requested target string. The full thirteen-metric vector remains the unit of report.

Full protocol, normalization rules, and per-axis weights live in docs/PROTOCOL.md on the Benchmark code repo.

Anonymous release under double-blind review at NeurIPS 2026 Datasets and Benchmarks Track. Author list and DOI updated after deanonymization.

🏆 ViTeX-Bench Leaderboard