The University of Melbourne
Apertus is a fully open model. We pair the release of the weights of the Apertus model suite with a full set of reproduction artifacts, including source code, final and intermediate model checkpoints, reproducibility scripts for training data, evaluation suites, and this technical report. (Hernández-Cano et al. 2025, 7)
Open models discussed in the Apertus report (Hernández-Cano et al. 2025, 40)
For the non-controversial prompt-completion pairs (Section 4.1.4 above), we assign rewards with a pretrained reward model. Specifically, we use Skywork-Reward-V2-Llama-3.1-8B (Liu et al., 2025a), an 8B-parameter Llama 3.1 decoder finetuned on 26M preference pairs curated with a human–AI annotation pipeline. As of summer 2025, it ranks highly on reward model benchmarks (Liu et al., 2025a).
… a suite of eight reward models ranging from 0.6B to 8B parameters, trained on a carefully curated subset of 26 million preference pairs from SynPref-40M. (Liu et al. 2025)