The Cancer Genome Atlas (TCGA) collected many types of data for each of over 20,000 tumor and normal samples. Each step in the Genome Characterization Pipeline generated numerous data points, such as: clinical information (e.g., smoking status) molecular analyte metadata (e.g., sample portion weight).