-
Oct 27, 2024
Experiments with the Platonic Representation Hypothesis
Investigating the validity of PRH in OOD setting -
Aug 28, 2024
Understanding Hidden Computations in Chain-of-Thought Reasoning
chain-of-thought is decryptable -
Mar 24, 2023
Goal-misgeneralization might be ELK-hard
can goal-misgeneralization be formulated as an instance of ELK? -
Oct 16, 2021
The AGI needs to be honest
building truthful-ai is hard