function [total,unigram,bigram] = tally(text) % [TOTAL,UNIGRAM,BIGRAM] = TALLY(TEXT): compute TOTAL tallies and % UNIGRAM and BIGRAM frequencies (fractions) of string TEXT or the open file % with file id TEXT fid = -1; % file id of file, if any: start by assuming TEXT is a string line = text; % next line of text to process go = 1; % keep reading? if ~ischar(text) fid = text; fprintf('reading %s...', fopen(fid)); line = fgetl(fid); go = ischar(line); end % while have more text to process, add up bigram tallies bi = sparse(27,27); % bigram tallies so far while go line = lower(line) - 'a' + 2; % +2 because space ' ' is at index 1 line (line < 1 | line > 27) = 1; % convert non-letters to space bi = bi + sparse([1 line], [line 1], 1, 27, 27); go = fid > 0; % if have a file, then read its next line of text if go line = fgetl(fid); go = ischar(line); end end if fid>0, fclose(fid); end % if processed a file, then close it % compute requested frequencies bi(1,1) = 0; % delete double spaces bigram = full(bi); unigram = sum(bigram); total = sum(unigram); bigram = bigram/total; unigram = unigram/total; fprintf('read %d characters\n', total); % print stats about processed text