SCIRP Mobile Website
Paper Submission

Why Us? >>

  • - Open Access
  • - Peer-reviewed
  • - Rapid publication
  • - Lifetime hosting
  • - Free indexing service
  • - Free promotion service
  • - More citations
  • - Search engine friendly

Free SCIRP Newsletters>>

Add your e-mail address to receive free newsletters from SCIRP.


Contact Us >>

Article citations


James, G. and Witten, D. and Hastie, T. and Tibshirani, R. (2013) An Introduction to Statistical Learning. Springer Texts in Statistics, Springer-Verlag, New York, 856-875.

has been cited by the following article:

  • TITLE: Using Boosted Regression Trees and Remotely Sensed Data to Drive Decision-Making

    AUTHORS: Brigitte Colin, Samuel Clifford, Paul Wu, Samuel Rathmanner, Kerrie Mengersen

    KEYWORDS: Boosted Regression Trees, Remotely Sensed Data, Big Data Modelling Approach, Missing Data

    JOURNAL NAME: Open Journal of Statistics, Vol.7 No.5, October 31, 2017

    ABSTRACT: Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT has the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) shows that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.