TITLE:
Improved Representation of Biological Information by Using Correlation as Distance Function for Heatmap Cluster Analysis
AUTHORS:
Axel Tiessen, Edgar A. Cubedo-Ruiz, Robert Winkler
KEYWORDS:
Clusters, Corn, Dendogram, Grain Yield, Heterosis, Hybrid Vigour, Plant Breeding, Phenotyping, Pearson Correlation Coefficient, Zea mays
JOURNAL NAME:
American Journal of Plant Sciences,
Vol.8 No.3,
February
17,
2017
ABSTRACT: Heatmap cluster figures are often used to represent
data sets in theomic sciences. The default option of
the frequently used R heatmap function is to cluster data according to
Euclidean distance, which groups data mainly to their numerical value and not
to its relative behaviour. The disadvantage of using the default clusteringdendrograms of R is demonstrated. Instead, a script is provided that uses correlation as
distance function, which better reveals biologically meaningful information.
This optimized script was used to detect heterotic groups in Vitamaize hybrids
(purple maize with high nutraceutical value). A field trial with different
genetic combinations was performed through an agricultural phenomics approach
(holistic evaluation of the phenotype). The grain yield data and other
phenotypic variables were represented through heatmap figures. In the data set
of Mexican tropical maize germplasm, at least three heterotic groups were
detected, in contrast to only two heterotic groups reported earlier in
temperate yellow maize from USA and Europe. This optimized script for heatmap
correlation bicluster can also be used to better represent metabolomic
fingerprints and transcriptomic data sets.