A Scalable Method for Cross-Platform Merging of SNP Array Datasets ()
ABSTRACT
Single
nucleotide polymorphism (SNP) array is a recently developed biotechnology that
is extensively used in the study of cancer genomes. The various available
platforms make cross-study validations/comparisons difficult. Meanwhile, sample
sizes of the studies are fast increasing, which poses a heavy computational
burden to even the fastest PC.Here, we describe a novel method that
can generate a platform-independent dataset given SNP arrays from multiple
platforms. It extracts the common probesets from individual platforms, and
performs cross-platform normalizations and summari-zations based on these
probesets. Since different platforms may have different numbers of probes per
probeset (PPP), the above steps produce preprocessed signals with different
noise levels for the platforms. To handle this problem, we adopt a
platform-dependent smoothing strategy, and produce a preprocessed dataset that
demonstrates uniform noise levels for individual samples.To increase
the scalability of the method to a large number of samples, we devised an
algorithm that split the samples into multiple tasks, and probesets into
multiple segments before submitting to a parallel computing facility. This
scheme results in a drastically reduced computation time and increased ability
to process ultra-large sample sizes and arrays.
Share and Cite:
Chen, P. and Hung, Y. (2013) A Scalable Method for Cross-Platform Merging of SNP Array Datasets.
Engineering,
5, 502-508. doi:
10.4236/eng.2013.510B103.
Cited by
No relevant information.