Classification of data set #1 and data set #3
This example is from Section IV.C of Tabassum and Ollila (2019). Running this code you are able to reproduce figures for data set #1 (Isolet wovels) and data set #3 (Khan et al.) given the Table 1 in the paper referenced below.
Reference:
[1] M.N. Tabassum and E. Ollila (2019), "A Compressive Classification Framework for High-Dimensional Data," Preprint, Submitted for publication, Oct. 2019.
(c) E. Ollila and M.N. Tabassum, CompressiveRDA MATLAB toolbox.
Contents
Initialize
clear; clc; Q = 3; % Nr of CRDA approaches L = 10; % # of MC splits of data to training / test sets cntrX = true; use_uniform_prior = true; CT = zeros(L,Q); % computation times FSR = zeros(L,Q); % feature selection rate TER = zeros(L,Q); % test error rate print_info = true;
Load and check the data
pt = 1/3; % percentage of training observations dsname = 'IsoletVowels'; % Uncheck these to compute the results for data set #3 %dsname = 'khan2001'; %pt = 0.6; load(sprintf('%s.mat',dsname), 'Xo','yo'); Xo = crda_check_data(Xo,yo,print_info); yo = double(yo); G = max(yo); p = size(Xo,1);
percentage of missing values = 0.00
Start the simulatios
rng('default'); % for reproducibility for mc=1:L % Simulation with L splits of data for training-test sets [yt,Xt,y,X,mu,prior] = crda_create_data(Xo,yo,pt,cntrX); if use_uniform_prior prior = (1/G)*ones(1,G); end Nt = length(yt); fprintf('Data-split# %d ...\n', mc); scurr = mc*1e3; % CRDA1 (Ell1-RSCM, {K,q} = CV) rng(scurr); algo = 1; tic; [yhat1,~,~,K1,~] = CRDA(Xt,X,y,'method','crda1','prior',prior,'mu',mu); CT(mc,algo) = toc; TER(mc,algo) = sum(yhat1 ~= yt)/Nt; FSR(mc,algo) = K1/p; if print_info fprintf('\tCRDA%d : {TER, FSR} = {%5.2f, %5.2f} | CT = %.2f\n', ... algo, 100*TER(mc,algo),100*FSR(mc,algo), CT(mc,algo)); end % CRDA2 (Ell2-RSCM, {K,q} = CV) rng(scurr); algo = algo + 1; tic; [yhat2,~,~,K2,~] = CRDA(Xt,X,y,'method','crda2','prior',prior,'mu',mu); CT(mc,algo) = toc; TER(mc,algo) = sum(yhat2 ~= yt)/Nt; FSR(mc,algo) = K2/p; if print_info fprintf('\tCRDA%d : {TER, FSR} = {%5.2f, %5.2f} | CT = %.2f\n', ... algo, 100*TER(mc,algo),100*FSR(mc,algo), CT(mc,algo)); end % CRDA3 (PSCM, K = Kub, q = CV) rng(scurr); algo = algo + 1; tic; [yhat3,~,~,K3,~] = CRDA(Xt,X,y,'method','crda3','prior',prior,'mu',mu); CT(mc,algo) = toc; TER(mc,algo) = sum(yhat3 ~= yt)/Nt; FSR(mc,algo) = K3/p; if print_info fprintf('\tCRDA%d : {TER, FSR} = {%5.2f, %5.2f} | CT = %.2f\n', ... algo, 100*TER(mc,algo),100*FSR(mc,algo), CT(mc,algo)); end end
Data-split# 1 ... CRDA1 : {TER, FSR} = { 3.00, 9.89} | CT = 0.94 CRDA2 : {TER, FSR} = { 2.00, 7.78} | CT = 0.44 CRDA3 : {TER, FSR} = { 1.50, 46.68} | CT = 1.44 Data-split# 2 ... CRDA1 : {TER, FSR} = { 3.00, 14.91} | CT = 0.75 CRDA2 : {TER, FSR} = { 3.00, 14.75} | CT = 0.41 CRDA3 : {TER, FSR} = { 1.50, 46.35} | CT = 0.80 Data-split# 3 ... CRDA1 : {TER, FSR} = { 1.00, 27.55} | CT = 0.86 CRDA2 : {TER, FSR} = { 2.50, 17.67} | CT = 0.68 CRDA3 : {TER, FSR} = { 0.00, 43.44} | CT = 0.83 Data-split# 4 ... CRDA1 : {TER, FSR} = { 2.50, 9.40} | CT = 0.29 CRDA2 : {TER, FSR} = { 1.00, 11.67} | CT = 0.22 CRDA3 : {TER, FSR} = { 1.00, 46.19} | CT = 0.49 Data-split# 5 ... CRDA1 : {TER, FSR} = { 4.50, 14.59} | CT = 0.32 CRDA2 : {TER, FSR} = { 1.50, 11.67} | CT = 0.30 CRDA3 : {TER, FSR} = { 0.50, 47.65} | CT = 0.59 Data-split# 6 ... CRDA1 : {TER, FSR} = { 2.00, 36.14} | CT = 0.30 CRDA2 : {TER, FSR} = { 2.00, 28.85} | CT = 0.22 CRDA3 : {TER, FSR} = { 0.50, 48.46} | CT = 0.56 Data-split# 7 ... CRDA1 : {TER, FSR} = { 4.00, 9.40} | CT = 0.27 CRDA2 : {TER, FSR} = { 4.00, 9.40} | CT = 0.20 CRDA3 : {TER, FSR} = { 1.50, 45.54} | CT = 0.50 Data-split# 8 ... CRDA1 : {TER, FSR} = { 2.50, 12.32} | CT = 0.34 CRDA2 : {TER, FSR} = { 3.50, 9.89} | CT = 0.21 CRDA3 : {TER, FSR} = { 1.50, 45.54} | CT = 0.58 Data-split# 9 ... CRDA1 : {TER, FSR} = { 1.00, 26.74} | CT = 0.29 CRDA2 : {TER, FSR} = { 4.50, 17.83} | CT = 0.21 CRDA3 : {TER, FSR} = { 1.50, 44.57} | CT = 0.56 Data-split# 10 ... CRDA1 : {TER, FSR} = { 3.00, 11.83} | CT = 0.29 CRDA2 : {TER, FSR} = { 2.50, 9.56} | CT = 0.27 CRDA3 : {TER, FSR} = { 1.50, 46.19} | CT = 0.57
Calculate the naive TER
n = histcounts(y); [~,tmp_indx] = max(n); avgNaiveTER = 100 * ( sum(repmat(tmp_indx,size(yt)) ~= yt) / length(yt) );
Make a table
avgFSR = 100*mean(FSR)'; avgTER = 100*mean(TER)'; avgCT = mean(CT)'; fprintf('\nResults for %s dataset\n',dsname) table({'CRDA1','CRDA2','CRDA3'}',round(avgTER,2),round(avgFSR,2), ... round(avgCT,2), 'VariableNames',{'method','TER','FSR','CT'})
Results for IsoletVowels dataset ans = 3×4 table method TER FSR CT _______ ____ _____ ____ 'CRDA1' 2.65 17.28 0.46 'CRDA2' 2.65 13.91 0.32 'CRDA3' 1.1 46.06 0.69
Make a bar plot
figure(1); clf; names = categorical({'CRDA1','CRDA2','CRDA3'}); subplot(1,2,1) bar(names,avgTER); title('Training error rate'); grid on; set(gca,'FontSize',16,'LineWidth',1.3) subplot(1,2,2); bar(names,avgFSR); title('Feature selection rate','FontSize',18); grid on; set(gca,'FontSize',16,'LineWidth',1.3)
