Better way to improve the for loop for my case in R? -


prob stat : data set holds 2 columns mstr_program_list, loc_cat 600000. loc_cat column holds both missing , non missing cells. other columns not havings na's. each prog in mstr_program_list, need find total number of loc-cat associated program, % of non missing rows , among non missing rows find count of categories divided into.

ex : unknown prog - total number of rows = 3, non missing rows in loc_cat 1 therefore % (2/3)*100 , number of categories divided 2 (rests:full) (rests:lim)

   > head(data)         l.name                   mstr_program_list            loc_cat     1  6 j'sgroup                   unknown              <na>     2  bj's- maine             roasted tomat               rests: full     3  bj's- maine                     unknown             rests: full     4  brad's q q                      unknown             rests: lim 

expected output:

     mstr_prog   total_count    %good(non missing rows)   number of loc_cat      unknown         3                      66.7                     2 

the code below taking lot of time. in fact results not showing. can me improve code. prob code per view adding vectors.

upon research came know add values vector not use append , go c()

v <- c(v, 'y') # adding elements vector 

code:

data          <- read.csv("mgdata.csv",header=t, na.strings="", colclasses = classes,      nrows = 600338,comment.char="")                           ## import data. data_nonull   <- na.omit(data) mpl_unique    <- unique(data$mstr_program_list) mas_prog_list <- as.character() loc_count     <- as.numeric() per_seg       <- as.numeric() num_seg       <- as.numeric() for(i in 1:length(mpl_unique)) {     l_t <- length(data$mstr_program_list[data$mstr_program_list == i])   # loc_cat specific prog     l_g <- length(data_nonull$mstr_program_list[data_nonull$mstr_program_list == i])    ## know filled ones excluding empty     s <- subset(data_nonull, mstr_program_list==i, select =c(loc_cat))     if((any(i == mas_prog_list)) == false) {         no_seg <- nrow(unique(s))         mas_prog_list <- c(mas_prog_list, i)                 # adding values vector         loc_count     <- c(loc_count, l_t)         perct_seg     <- ((l_g/l_t)*100)         per_seg       <- c(per_seg, perct_seg)         num_seg       <- c(num_seg, no_seg)         }        }       }  seg_analysis <- data.frame(mas_prog_list, loc_count, per_seg, num_seg) 

i new r. correct me changes in code, naming convention/ terminology used.


Comments

Popular posts from this blog

How to run C# code using mono without Xamarin in Android? -

c# - SharpSsh Command Execution -

python - Specify path of savefig with pylab or matplotlib -