Portfolio optimization using historical stock data
This example is from Section VI of Ollila and Raninen (2019). Running this code you are able to reproduce the Figure 7 and Figure 8 in the aforementioned paper.
We investigate the out-of-sample performance of global mean variance portfolios (GMVP-s) when based on different RSCM estimators. We use the divident adjusted daily closing prices downloaded from the Yahoo! Finance (http://finance.yahoo.com) database to obtain the net returns for 50 stocks that are currently included in the Hang Seng Index (HSI) for two different time periods, from Jan. 4, 2010 to Dec. 24, 2011, and from Jan. 1, 2016 to Dec. 27, 2017 (excluding the weekends and public holidays). In both cases, the time series contain T=491 trading days. For the first period (2010-2011), we had full length time series for only p=45 stocks, whereas in the latter case we had full length time series for all stocks, so p=50.
Our third time series contains the net returns of p=396 stocks from Standard and Poor's 500 (SP500) index for the time period from Jan. 4, 2016 to Apr. 27, 2018 (excluding the weekends and public holidays). In this case, the time series contains T=583 trading days.
Contents
Choose the data set
Analyze the 3rd data set (i.e. the Standard & Poor 500 data set)
choosedata = 'SP500';
If you want to obtain the results for Hang Seng Index (HDS) data for years 2010-2011 or 2016-2018 then replace the line above by one of the following lines, respectively:
choosedata = 'HSI_y2010'; choosedata = 'HSI_y2016';
test_len = 20; % window length for keeping the portfolio switch choosedata case 'HSI_y2010' load HSIstocks2010to2011.mat; % net returns of p=45 HSI stocks T0 = 50; train_len_arr=T0:test_len:200; % period 2010-2011 HSI data case 'HSI_y2016' load HSIstocks2016to2017.mat; T0 = 130; train_len_arr=T0:test_len:270; % period 2016-2017 HSI data case 'SP500' load SP500stocks2016to2018.mat T0 = 70; train_len_arr=T0:test_len:290; % this for 2010-2011 end
Process the data
portfolio_data = stocksdata.Net_returns; [n, p] = size(portfolio_data); nwins = length(train_len_arr); beta_ave = zeros(nwins,4); risk = zeros(nwins,5); w = zeros(p,5); computew = @(M) ( M\ones(p,1) ) / ( ones(1,p)*(M\ones(p,1)) );
Start the sliding window stuff..
for j=1:nwins % OUTERLOOP iterates over different training window lengths train_len=train_len_arr(j); % choose the window length of training data step_size = test_len; % step size % indices for sliding window ind_list = 1:step_size:n-train_len; % start points of training data ind_list2 = train_len:step_size:n; % end points of training data tmp = (n-train_len)/step_size; if tmp ~= round(tmp) ind_list2 = [ind_list2 n]; end % initialize n_ind = length(ind_list); beta = zeros(n_ind,4); rets = zeros(n,5); fprintf('\n%d ',j); for i=1:n_ind % INNERLOOP iterates the time period using the sliding window ind = ind_list(i); % Compute the train set and covariance matrices X_train = portfolio_data(ind:ind_list2(i),:); [RSCM1,~,stat] = regscm(X_train,'approach','ell1','verbose','off'); %; beta(i,1) = stat.beta; [RSCM2,~,stat2] =regscm(X_train,'approach','ell2','verbose','off','kappa',stat.kappa); beta(i,2) = stat2.beta; [LWE,~,stat4] =regscm(X_train,'approach','lw','verbose','off'); beta(i,4) = stat4.beta; w(:,1) = computew(RSCM1); w(:,2) = computew(RSCM2); if (stat.gamma - stat2.gamma) < 0 w(:,3) = w(:,1); beta(i,3)=beta(i,1); else w(:,3) = w(:,2); beta(i,3)=beta(i,2); end w(:,4) = computew(LWE); % compute the portfolio net returns X_test = portfolio_data(1+ind_list2(i):ind_list2(i+1),:); rets((1+ind_list2(i)):ind_list2(i+1),:) = X_test*w; fprintf('.'); end beta_ave(j,:) = mean(beta); risk(j,:) = std(rets(1+train_len:end,:),1); end
1 .......................... 2 ......................... 3 ........................ 4 ....................... 5 ...................... 6 ..................... 7 .................... 8 ................... 9 .................. 10 ................. 11 ................ 12 ...............
The plot the figures
figure(1); clf K = sqrt(250); % plot(train_len_arr,risk(:,1)*K,'bo-','Linewidth',1.8,... 'DisplayName', 'Ell1-RSCM','MarkerSize',14) hold on; plot(train_len_arr,risk(:,2)*K,'r^-','Linewidth',1.8,... 'DisplayName', 'Ell2-RSCM','MarkerSize',14) plot(train_len_arr,risk(:,3)*K,'s-','Linewidth',1.8,... 'DisplayName', 'Ell3-RSCM','MarkerSize',17,'Color',[0.4 0.4 0.4]) plot(train_len_arr,risk(:,4)*K,'k*-','Linewidth',1.8,... 'DisplayName', 'LW-RSCM','MarkerSize',14) xlabel('$n$','FontSize',12,'Interpreter','Latex') ylabel('Annualized realized risk') set(gca,'FontSize',22,'LineWidth',1.5) switch choosedata case 'HSI_y2016' h=legend('show', 'Location', 'best'); xlim([128 272]); ylim([0.074 0.093]); case 'HSI_y2010' h=legend('show', 'Location', 'southeast'); xlim([40 200]) ylim([0.12 0.128]); yticks([0.075 0.078 0.081 0.084 0.087]); case 'SP500' h=legend('show', 'Location', 'northeast'); xlim([60 300]); ylim([0.075 0.087]); end legend('boxoff') set(h,'FontSize',22) figure(2); clf; plot(train_len_arr,beta_ave(:,1),'bo-','Linewidth',1.8,... 'DisplayName', 'Ell1-RSCM','MarkerSize',14) hold on; plot(train_len_arr,beta_ave(:,2),'r^-','Linewidth',1.8,... 'DisplayName', 'Ell2-RSCM','MarkerSize',14) plot(train_len_arr,beta_ave(:,3),'s-','Linewidth',1.8,... 'DisplayName', 'Ell3-RSCM','MarkerSize',17,'Color',[0.4 0.4 0.4]) plot(train_len_arr,beta_ave(:,4),'k*-','Linewidth',1.8,... 'DisplayName', 'LW-RSCM','MarkerSize',14) h=legend('show', 'Location', 'southeast'); legend('boxoff') set(h,'FontSize',22) xlabel('$n$','FontSize',14,'Interpreter','Latex') ylabel('$\hat \beta_o$','FontSize',14,'Interpreter','Latex') set(gca,'FontSize',22,'LineWidth',1.5) switch choosedata case 'HSI_y2016' xlim([128 272]); ylim([0.75 0.96]); case 'HSI_y2010' xlim([40 200]) ylim([0.75 0.96]) case 'SP500' xlim([60 300]) end


for more details, see our paper Ollila and Raninen (2019)
References
- E. Ollila and E. Raninen, "Optimal shrinkage covariance matrix estimation under random sampling from elliptical distributions," IEEE Transactions on Signal Processing, vol. 67, no. 10, pp. 2707 - 2719, 2019.