π ViTeX-Bench Leaderboard
π Project page Β Β·Β π Dataset Β Β·Β π§ͺ Benchmark code Β Β·Β π€ Model & Inference code Β Β·Β π Leaderboard
Public ranking for video scene text editing under the 13-metric ViTeX-Bench protocol. Methods are ranked by TextScore = β(SeqAcc Β· CharAcc Β· TTS), the geometric mean of the three text-correctness primitives; the full thirteen-metric vector is shown alongside it.
| # | Method | Authors / Org | Src | TextScoreβ | SeqAccβ | CharAccβ | TTSβ | Flk_fβ | Flk_cβ | Wp_fβ | Wp_cβ | MUSIQ_fβ | MUSIQ_cβ | PSNRβ | SSIMβ | LPIPSβ | DSimβ | Links | Fam |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | TextCtrl | Zeng et al., 2024 | Admin | 0.5624 | 0.475 | 0.734 | 0.511 | 3.80 | 4.29 | 1.59 | 2.09 | 70.32 | 42.77 | 41.14 | 0.994 | 0.008 | 0.0043 | ππ | A |
| 2 | ViTeX-14B (Composite) | Anonymous (NeurIPS 2026 D&B submission) | Admin | 0.5410 | 0.345 | 0.689 | 0.666 | 3.73 | 3.83 | 1.51 | 1.56 | 70.27 | 44.94 | 42.95 | 0.993 | 0.006 | 0.0023 | π | Ref |
| 3 | ViTeX-14B | Anonymous (NeurIPS 2026 D&B submission) | Admin | 0.5338 | 0.341 | 0.688 | 0.648 | 3.27 | 3.42 | 1.55 | 1.53 | 69.64 | 43.53 | 29.08 | 0.951 | 0.060 | 0.0235 | π | Ref |
| 4 | VideoPainter | Bian et al., 2025 | Admin | 0.5151 | 0.364 | 0.619 | 0.606 | 2.38 | 2.62 | 2.93 | 3.35 | 67.16 | 40.59 | 28.56 | 0.915 | 0.104 | 0.0239 | ππ | C |
| 5 | FLUX-Text | Chen et al., 2025 | Admin | 0.5023 | 0.528 | 0.737 | 0.326 | 5.11 | 14.81 | 3.03 | 13.01 | 70.26 | 43.85 | 31.49 | 0.975 | 0.029 | 0.0120 | ππ | A |
| 6 | RS-STE | Zhao et al., 2025 | Admin | 0.4908 | 0.354 | 0.626 | 0.534 | 3.73 | 3.66 | 1.61 | 1.81 | 69.57 | 34.26 | 37.00 | 0.983 | 0.024 | 0.0073 | ππ | A |
| 7 | AnyText2 | Tuo et al., 2024 | Admin | 0.4074 | 0.280 | 0.633 | 0.382 | 3.34 | 4.95 | 2.04 | 3.95 | 66.68 | 41.65 | 25.56 | 0.905 | 0.091 | 0.0431 | ππ | A |
| 8 | TextCtrl + AnyV2V | Composite of Zeng 2024 + Ku 2024 | Admin | 0.1649 | 0.057 | 0.308 | 0.257 | 4.98 | 4.98 | 4.11 | 3.97 | 69.41 | 33.85 | 21.08 | 0.785 | 0.225 | 0.0732 | B | |
| 9 | Wan2.1-VACE-14B | Wan-AI, 2025 | Admin | 0.0000 | 0.000 | 0.298 | 0.689 | 3.78 | 3.84 | 1.69 | 1.56 | 70.54 | 45.26 | 35.21 | 0.976 | 0.022 | 0.0071 | ππ | C |
| 10 | Kling Video 3.0 Omni | Kuaishou (closed) | Admin | 0.0000 | 0.000 | 0.208 | 0.641 | 4.25 | 4.08 | 3.12 | 2.90 | 72.23 | 47.75 | 21.18 | 0.843 | 0.176 | 0.0608 | D |
Flk = Flicker, Wp = Warp, DSim = DreamSim; subscripts f / c = full-frame / text-crop scope. A β next to a method name marks a published caveat (hover for details).Upload the eval.json produced by bash scripts/run_benchmark.sh <method> in the Benchmark code repo. All 13 metrics and TextScore are read directly from the JSON; fill the method metadata below.
If the admin passphrase field is filled with the correct value, the entry is published immediately with the Admin source badge and skips the review queue. Otherwise the submission is queued for the maintainers to approve.
Submissions β 10 total Β· 0 pending Β· 10 approved Β· 0 rejected.
Space secrets β ADMIN_PASSPHRASE: β
set Β Β·Β HF_TOKEN: β
set.
Configure ADMIN_PASSPHRASE (auth) and HF_TOKEN (persistence) in Space β Settings β Variables and secrets.
ViTeX-Bench evaluates video scene text editing on a frozen 157-clip test split sourced from Panda-70M and InternVid, across three orthogonal axes:
- Text correctness β SeqAcc, CharAcc, TTS via PP-OCRv5 with per-clip language routing and substring edit distance.
- Visual quality β Flicker, Warp (RAFT-flow), MUSIQ; each at full-frame and text-crop scope.
- Edit locality β PSNR, SSIM, LPIPS on a locality-only prediction, plus DreamSim for VAE-noise-robust similarity.
Ranking key TextScore = β(SeqAcc Β· CharAcc Β· TTS). Each primitive is natively in [0, 1]; SeqAcc = 0 collapses TextScore to zero β the intended semantics for methods that never produce the requested target string. The full thirteen-metric vector remains the unit of report.
Full protocol, normalization rules, and per-axis weights live in docs/PROTOCOL.md on the Benchmark code repo.
Anonymous release under double-blind review at NeurIPS 2026 Datasets and Benchmarks Track. Author list and DOI updated after deanonymization.