r - preparing data for analysis -


i new here , new r , statistic in general. got simple 1million rows of data in csv format. there 4 columns: col1 - location col2 - someone's name col3 - date visit col4 - time of visit

when importing r translated data frame , columns character (i use str() find structure of imported data , class() thats why know data.frame.

as see none of them numeric, want able aggregation e.g count number of visits person, day, time location or vice versa.

do need manipulate data outside r e.g import sql , aggregation there or can in r?

i hope can guide me in right direction... many peddie

i suggest getting familiar plyr package.

install.packages("plyr") 

it ask choose place download from, choose closest 1 you. load library

library(plyr) 

ok lets have data frame looks this

 > df     name  day location 52  jake  wed       mi 25 sally  tue       ny 38   sue  fri       ny 45 sally  tue       mi 42   sue  mon       mi 17 sally  fri       ca 28  jake  tue       ny 14   sue thur       ca 47   jim  tue       mi 67   jim  tue       al 

we can ask how many times each location visited

> ddply(df, .(location), summarise, count=length(name))   location count 1       ca     2 2       ny     3 3       mi     4 4       al     1 

or how many people visited location on particular day

> ddply(df, .(location, day), summarise, count=length(name))   location  day count 1       ca thur     1 2       ca  fri     1 3       ny  tue     2 4       ny  fri     1 5       mi  mon     1 6       mi  tue     2 7       mi  wed     1 8       al  tue     1 

you should thorough tutorial on plyr, commands above going on ddply splitting data frame unique combinations of values in columns specify , summarizing data based on function (in our case length) specify.

i hope helps.


Comments

Popular posts from this blog

How to run C# code using mono without Xamarin in Android? -

c# - SharpSsh Command Execution -

python - Specify path of savefig with pylab or matplotlib -