Optimal transport has been one interesting topic of mathematical analysis since Monge (1781). The problem's close connections with differential geometry and kinetic descriptions were discovered within the past century, and the seminal work of Kantorovich (1942) showed its power to solve real-world problems. Recently, we proposed the quadratic Wasserstein distance from optimal transport theory for inverse problems, tackling the classical least-squares method's longstanding difficulties such as nonconvexity and noise sensitivity. The work was soon adopted in the oil industry. As we advance, we discover that the advantage of changing the data misfit is more general in a broader class of data-fitting problems by examing the preconditioning and "implicit" regularization effects of different mathematical metrics as the objective function in optimization, as the likelihood function in Bayesian inference, and as the measure of residual in numerical solution to PDEs.