Home

I am an independent researcher working on AI Alignment/Safety, Agency & Goals, and questions related to the long-term development of life.

Currently, I'm working on understanding the possibility of detecting an agent's goals via interpretability methods, controlling language model agents, and alignment targets for superintelligent agents. I'm also working with the Autonomous Systems group at the UK AI Safety Institute on risk modeling.

Previously I did a PhD in math at the University of Warwick. You can read a summary of my PhD work here.

Aside from the above, I'm interested in Tibetan Buddhism, math, the technological singularity, consciousness, philosophy of utopias, and other such topics.

For fun, I enjoy learning, exploring cities & nature, dancing, spending time with friends & family, watching films, meditation & relaxation, and (contemporary) art.

Contact me at: <first name>.<last name>@gmail.com

Follow me on: https://x.com/paulcolognese