Thursday, 9 April 2015

Apache Storm single node installation

Video Reference 


Step 1:  Download Zookeeper

wget http://www.eu.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

Step 2:
 tar -zxcf zookeeper-3.4.6.tar.gz
 cd zookeeper-3.4.6
 cd conf

 Step 3:
 cp zookeeper_sample.cfg zoo.cfg
 vi zoo.cfg
 tickTime=2000
 dataDir=/home/hadoop/zookeeper
 clientPort=2181


Step 4: Download Storm

wget http://apache.mesi.com.ar/storm/apache-storm-0.9.3/apache-storm-0.9.3.tar.gz

Step 5:
tar -zxvf apache-storm-0.9.3.tar.gz
cd apache-storm-0.9.3
cd conf

Step 6:
vi storm.yaml

storm.zookeeper.servers:
    - "localhost"

 storm.zookeeper.port: 2181
 nimbus.host: "localhost"


Step 6: Start all the service (Zookeeper + Storm )

Zookeeper

bin/zkServer.sh start
jps
QuorumPeerMain


Storm 
 bin/storm nimbus


 bin/storm supervisor

 bin/storm ui

JPS over all service



Step 7:Check UI 

localhost:8080


Additional Native dependencies:(Optional to install but need when you go for advance )

wget http://download.zeromq.org/zeromq-2.1.7.tar.gz

tar –xzf zeromq-2.1.7.tar.gz

cd  zeromq-2.1.7

./configure

Make

sudo make install

Download and installation commands for JZMQ:

Obtain JZMQ using

git clone https://github.com/nathanmarz/jzmq.git

cd jzmq

sudo apt-get install autoconf
sudo apt-get install automake
sudo apt-get install libtool


./autogen.sh

./configure

make

sudo make install






Wednesday, 18 March 2015

Apache Spark and Hadoop Integration with example

Step 1 : Install hadoop in your machine  ( 1x or 2x) and also you need to set java path and scala path in .bashrc ( for setting path refer this post Spark installation )

Step 2: Check all hadoop daemons are up running.



Step 3: Write some data in your hdfs (here my file name in hdfs is word)

Step 4: Download apache spark for hadoop 1x or 2x based on you hadoop installed version in step 1.



Step 5: Untar the spark hadoop file.

Step 6: Start the spark hadoop shell.



Step 7: Type the following command.once spark shell started



Step 8: See the Out put In terminal.



Step 9: Check the Namenode UI (localhost:50070)







Step 10: Check the spark UI (localhost:4040) for monitoring the job


Tuesday, 10 March 2015

Apache spark word count in scala and python

Wordcount in Scala

bin/spark-shell

scala>val textFile = sc.textFile("/home/username/word.txt")

scala>val counts = textFile.flatMap(line => line.split(" "))map(word => (word, 1))reduceByKey(_ + _)

scala>counts.collect()

Wordcount in Python

bin/pyspark

>>>text_file = sc.textFile("/home/username/word.txt")

>>>counts = text_file.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)

>>>counts.collect()

Input File for word.txt

I love bigdata
I like bigdata

Spark web UI


Friday, 6 March 2015

Apache Spark videos

Apache Spark Quick introduction  Lesson 1



Apache Spark wordcount in scala and python 



Apache Spark Installation

Step 1 Download Spark Click here

Step 2 Download Scala Click here


Step 3 Download Java

Click here

NOTE install git Go to -->terminal --> sudo apt-get insatll git 

Step 4 Untar Spark , Scala , Jdk




Step 5 Set the environment path in .bashrc


Step 6 Start building spark using sbt 




Step 7 Start spark shell



For Video Apache Spark installation  



Thursday, 13 November 2014

Top 30 Hive Interview Questions

1.What is Hive Shell ?

The shell is the primary way that we will interact with hive by using hiveql commands.In other words shell is nothing but a prompt which is used to enter the Hiveql commands for interacting the Hive shell

2.How we can enter into Hive shell from normal terminal ?

just by entering the hive command like ‘bin/hive’

3.How we can check the hive shell is working or not ?

After entered into hive shell just enter another Hiveql command like ‘show databases;’

Is it necessary to add semi colon (;) at end of the Hiveql commands ?

Yes,We have to add semicolon (;) at end of the Hiveql every  command.

Wednesday, 12 November 2014

Top 60 Hadoop Interview Question

1. What is Hadoop framework?
Ans: Hadoop is an open source framework which is written in java by apache software foundation.This framework is used to write software application which requires to process vast amount of data (It could handle multi tera bytes of data). It works in-parallel on large clusters which could have 1000 of computers (Nodes) on the clusters. It also process data very reliably and fault-tolerant manner.

2. On What concept the Hadoop framework works?
Ans: It works on MapReduce, and it is devised by the Google.

3. What is MapReduce?

Ans: Map reduces is an algorithm or concept to process Huge amount of data in a faster way. As per its name you can divide it Map and Reduce.
• The main MapReduce job usually splits the input data-set into independent chunks. (Big data sets in the multiple small datasets)
• Reduce Task: And the above output will be the input for the reduce tasks, produces the final result.
Your business logic would be written in the Mapped Task and Reduced Task. Typically both the input and the output of the job are stored in a file-system (Not database). The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.