Mutation testing is a powerful technique for assessing and improv- ing test suite quality that artificially introduces bugs and checks whether the test suites catch them. However, it is also computation- ally expensive and thus does not scale to large systems and projects. One promising recent approach to tackling this scalability prob- lem uses machine learning to predict whether the tests will detect the synthetic bugs, without actually running those tests. However existing predictive mutation testing approaches still misclassify 48% of undetected bugs on a randomly sampled set of mutant-test suite pairs. We propose a novel machine learning approach for predictive mutation testing that simultaneously encodes the source method mutation and test method, capturing key context in the input representation. We use this input representation to leverage recent advances in transformers for machine learning for source code tasks. We show that our approach, MutationBERT, outper- forms the state-of-the-art in both same project and cross project settings, with meaningful improvements in precision, recall and F1 score. We empirically validate our novel input representation, and aggregation approaches for lifting predictions from the test matrix level to the test suite level. Finally, we show that our approach saves up to 10,758 test executions compared to the prior approach, depending on whether the model was trained on same project or cross project data and the size of projects being run.