function [As,Q,R]=banditE(N,Aq,E) %FUNCTION [As,Q,R]=banditE(N,Aq,E) % Performs the N-armed bandit example using the epsilon-greedy strategy % INPUT: % N=total number of trials % Aq=Actual rewards for each bandit (means from normal distrib) % E=epsilon for epsilon-greedy % OUTPUT: % As=Action (Machine) selected on trial j, j=1..N % Q =Reward estimates % R =reward at step j, j=1..N numbandits=length(Aq); %Number of Bandits ActNum=zeros(numbandits,1); %Keep a running sum of the number of times each action is selected ActVal=zeros(numbandits,1); %Keep a running sum of the total reward for each action Q=zeros(1,numbandits); %Current reward estimates (initially set to zero) As=zeros(N,1); %Storage for action selected R=zeros(N,1); %Storage for averaging reward %************************************************************** % Set up a flag so we know when to choose at random %************************************************************** greedy=zeros(1,N); if E>0 m=round(E*N); %Total number of times to choose at random in N trials greedy(1:m)=ones(1,m); m=randperm(N); greedy=greedy(m); %Put the ones in random order clear m end %************************************************************* % Main Loop %************************************************************* for j=1:N %Select an machine, cQ, get the reward, cR if greedy(j)>0 %Choose a machine at random cQ=ceil(rand*numbandits); cR=randn+Aq(cQ); else [val,idx]=find(Q==max(Q)); m=ceil(rand*length(idx)); %Choose from the winners cQ=idx(m); cR=randn+Aq(cQ); end R(j)=cR; As(j)=cQ; %Update for the next round: ActNum(cQ)=ActNum(cQ)+1; ActVal(cQ)=ActVal(cQ)+cR; Q(cQ)=ActVal(cQ)/ActNum(cQ); end