Tuesday 10 March 2015

Apache spark word count in scala and python

Wordcount in Scala

bin/spark-shell

scala>val textFile = sc.textFile("/home/username/word.txt")

scala>val counts = textFile.flatMap(line => line.split(" "))map(word => (word, 1))reduceByKey(_ + _)

scala>counts.collect()

Wordcount in Python

bin/pyspark

>>>text_file = sc.textFile("/home/username/word.txt")

>>>counts = text_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)

>>>counts.collect()

Input File for word.txt

I love bigdata
I like bigdata

Spark web UI


1 comment:

  1. Nice explanation. but show answer/ output as well for better understanding.

    by
    Venu
    Spark developer

    ReplyDelete