scala - How to transpose an RDD in Spark -


i have rdd this:

1 2 3 4 5 6 7 8 9 

it matrix. want transpose rdd this:

1 4 7 2 5 8 3 6 9 

how can this?

say have n×m matrix.

if both n , m small can hold n×m items in memory, doesn't make sense use rdd. transposing easy:

val rdd = sc.parallelize(seq(seq(1, 2, 3), seq(4, 5, 6), seq(7, 8, 9))) val transposed = sc.parallelize(rdd.collect.toseq.transpose) 

if n or m large cannot hold n or m entries in memory, cannot have rdd line of size. either original or transposed matrix impossible represent in case.

n , m may of medium size: can hold n or m entries in memory, cannot hold n×m entries. in case have blow matrix , put again:

val rdd = sc.parallelize(seq(seq(1, 2, 3), seq(4, 5, 6), seq(7, 8, 9))) // split matrix 1 number per line. val bycolumnandrow = rdd.zipwithindex.flatmap {   case (row, rowindex) => row.zipwithindex.map {     case (number, columnindex) => columnindex -> (rowindex, number)   } } // build transposed matrix. group , sort column index first. val bycolumn = bycolumnandrow.groupbykey.sortbykey().values // sort row index. val transposed = bycolumn.map {   indexedrow => indexedrow.toseq.sortby(_._1).map(_._2) } 

Comments

Popular posts from this blog

How to run C# code using mono without Xamarin in Android? -

c# - SharpSsh Command Execution -

python - Specify path of savefig with pylab or matplotlib -