Wednesday 18 March 2015

Apache Spark and Hadoop Integration with example

Step 1 : Install hadoop in your machine  ( 1x or 2x) and also you need to set java path and scala path in .bashrc ( for setting path refer this post Spark installation )

Step 2: Check all hadoop daemons are up running.



Step 3: Write some data in your hdfs (here my file name in hdfs is word)

Step 4: Download apache spark for hadoop 1x or 2x based on you hadoop installed version in step 1.



Step 5: Untar the spark hadoop file.

Step 6: Start the spark hadoop shell.



Step 7: Type the following command.once spark shell started



Step 8: See the Out put In terminal.



Step 9: Check the Namenode UI (localhost:50070)







Step 10: Check the spark UI (localhost:4040) for monitoring the job


Tuesday 10 March 2015

Apache spark word count in scala and python

Wordcount in Scala

bin/spark-shell

scala>val textFile = sc.textFile("/home/username/word.txt")

scala>val counts = textFile.flatMap(line => line.split(" "))map(word => (word, 1))reduceByKey(_ + _)

scala>counts.collect()

Wordcount in Python

bin/pyspark

>>>text_file = sc.textFile("/home/username/word.txt")

>>>counts = text_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)

>>>counts.collect()

Input File for word.txt

I love bigdata
I like bigdata

Spark web UI


Friday 6 March 2015

Apache Spark videos

Apache Spark Quick introduction  Lesson 1



Apache Spark wordcount in scala and python 



Apache Spark Installation

Step 1 Download Spark Click here

Step 2 Download Scala Click here


Step 3 Download Java

Click here

NOTE install git Go to -->terminal --> sudo apt-get insatll git 

Step 4 Untar Spark , Scala , Jdk




Step 5 Set the environment path in .bashrc


Step 6 Start building spark using sbt 




Step 7 Start spark shell



For Video Apache Spark installation