Massive stars have played a dominant role in shaping our universe since its earliest times, but there is still no consensus on the mechanism by which they form. I review the physics that is important for massive star formation and the connection this process may have with star cluster formation. I then focus on a particular theoretical model, Turbulent Core Accretion, which assumes the initial conditions are massive, turbulent, magnetized cores of gas and dust that are reasonably close to virial equilibrium. Our group has been exploring this scenario via analytic models and numerical simulations of the physics and chemistry of the interstellar medium, ranging from the earliest pre-stellar core phase to protostellar cores being impacted by strong self-feedback. Crucially, these models can now be tested in detail with ALMA, SOFIA and other facilities, and I present the latest results from multiple projects that are zooming in to massive star birth in the darkest shadows of giant molecular clouds. Extension of this work has the potential to also determine how the full stellar initial mass function is established across different Galactic environments. Finally, I discuss an application of massive star formation theory to the early universe: how massive were the first stars and could they have been the progenitors of supermassive black holes?