πŸ† ViTeX-Bench Leaderboard

🌐 Project page Β Β·Β  πŸ“Š Dataset Β Β·Β  πŸ§ͺ Benchmark code Β Β·Β  πŸ€– Model & Inference code Β Β·Β  πŸ† Leaderboard

Public ranking for video scene text editing under the 13-metric ViTeX-Bench protocol. Methods are ranked by TextScore = βˆ›(SeqAcc Β· CharAcc Β· TTS), the geometric mean of the three text-correctness primitives; the full thirteen-metric vector is shown alongside it.

#MethodAuthors / OrgSrcTextScore↑SeqAcc↑CharAcc↑TTS↑Flk_f↓Flk_c↓Wp_f↓Wp_c↓MUSIQ_f↑MUSIQ_c↑PSNR↑SSIM↑LPIPS↓DSim↓LinksFam
1 TextCtrl Zeng et al., 2024 Admin 0.5624 0.475 0.734 0.511 3.80 4.29 1.59 2.09 70.32 42.77 41.14 0.994 0.008 0.0043 A
2 ViTeX-Edit-14B (Composite) Anonymous (NeurIPS 2026 D&B submission) Admin 0.5410 0.345 0.689 0.666 3.73 3.83 1.51 1.56 70.27 44.94 42.95 0.993 0.006 0.0023 Ref
3 ViTeX-Edit-14B Anonymous (NeurIPS 2026 D&B submission) Admin 0.5338 0.341 0.688 0.648 3.27 3.42 1.55 1.53 69.64 43.53 29.08 0.951 0.060 0.0235 Ref
4 VideoPainter Bian et al., 2025 Admin 0.5151 0.364 0.619 0.606 2.38 2.62 2.93 3.35 67.16 40.59 28.56 0.915 0.104 0.0239 C
5 FLUX-Text Chen et al., 2025 Admin 0.5023 0.528 0.737 0.326 5.11 14.81 3.03 13.01 70.26 43.85 31.49 0.975 0.029 0.0120 A
6 RS-STE Zhao et al., 2025 Admin 0.4908 0.354 0.626 0.534 3.73 3.66 1.61 1.81 69.57 34.26 37.00 0.983 0.024 0.0073 A
7 AnyText2 Tuo et al., 2024 Admin 0.4074 0.280 0.633 0.382 3.34 4.95 2.04 3.95 66.68 41.65 25.56 0.905 0.091 0.0431 A
8 TextCtrl + AnyV2V Composite of Zeng 2024 + Ku 2024 Admin 0.1649 0.057 0.308 0.257 4.98 4.98 4.11 3.97 69.41 33.85 21.08 0.785 0.225 0.0732 B
9 Identity (sanity) β€” Admin 0.0000 0.000 0.317 0.760 3.72 3.68 1.46 1.27 70.33 45.12 100.00 1.000 0.000 -0.0000 β€”
10 Wan2.1-VACE-14B Wan-AI, 2025 Admin 0.0000 0.000 0.298 0.689 3.78 3.84 1.69 1.56 70.54 45.26 35.21 0.976 0.022 0.0071 C
11 Kling Video 3.0 Omni Kuaishou (closed) Admin 0.0000 0.000 0.208 0.641 4.25 4.08 3.12 2.90 72.23 47.75 21.18 0.843 0.176 0.0608 D
Ranked by TextScore, the geometric mean of the three text-correctness primitives. Click any column header to re-sort by that metric. ↑ higher-better, ↓ lower-better. Flk = Flicker, Wp = Warp, DSim = DreamSim; subscripts f / c = full-frame / text-crop scope. A † next to a method name marks a published caveat (hover for details).