SMT: Sparse multivariate tree

BaşlıkSMT: Sparse multivariate tree
Publication TypeJournal Article
Year of Publication2014
AuthorsDeng, H., M. Gokce Baydogan, and G. Runger
JournalStatistical Analysis and Data Mining
Volume7
Issue1
Pagination53-69
Date Published02/2014
ISSN1932-1872
Anahtar kelimelerdecision tree, feature extraction, fused Lasso, Lasso, time series classification
Abstract

A multivariate decision tree attempts to improve upon the single variable split in a traditional tree. With the increase in datasets with many features and a small number of labeled instances in a variety of domains (bioinformatics, text mining, etc.), a traditional tree-based approach with a greedy variable selection at a node may omit important information. Therefore, the recursive partitioning idea of a simple decision tree combined with the intrinsic feature selection of L1 regularized logistic regression (LR) at each node is a natural choice for a multivariate tree model that is simple, but broadly applicable. This natural solution leads to the sparse multivariate tree (SMT) considered here. SMT can naturally handle non-time-series data and is extended to handle time-series classification problems with the power of extracting interpretable temporal patterns (e.g., means, slopes, and deviations). Binary L1 regularized LR models are used here for binary classification problems. However, SMT may be extended to solve multiclass problems with multinomial LR models. The accuracy and computational efficiency of SMT is compared to a large number of competitors on time series and non-time-series data.

URLhttp://dx.doi.org/10.1002/sam.11208
DOI10.1002/sam.11208