
In this talk, I will give an introduction to three data science methods, which utilize the underlying geometry of the data: Wasserstein optimal transport, manifold learning, and topological data analysis. In a recent paper we apply all three methods to analyze gene expression data from different sarcoma types. Wasserstein optimal transport is used to compare distributions of gene expressions across different patients, manifold learning to find and reduce the dimension of the underlying data manifold, and topological data analysis to cluster the data. Based on the output of our pipeline, we identify a new signature in the sarcoma data that is mainly described by inactivation of tumor suppressor genes. I will end my talk with a short presentation of my current research which aims at reducing the computational effort for computing Wasserstein distances.