Top 10 most popular false belief about Hadoop
Hadoop and Big Data are practically synonymous these days.
There is so much info on Hadoop and Big Data out there, but as the Big Data
hype machine gears up, there's a lot of confusion about where Hadoop actually
fits into the overall Big Data landscape. Let’s have a look at some of the
popular myths about Hadoop.
false belief #1: Hadoop is a database
Hadoop is often talked about like it's a database, but it
isn't. Hadoop is primarily a distributed file system and doesn’t contain
database features like query optimization, indexing and random access to data.
However, Hadoop can be used to build a database system.
false belief #2: Hadoop is a complete, single product
It's not. This is the biggest myth of all! Hadoop consists
of multiple open source products like HDFS (Hadoop Distributed File System),
MapReduce, PIG, Hive, HBase, Ambari, Mahout, Flume and HCatalog. Basically,
Hadoop is an ecosystem -- a family of open source products and technologies
overseen by the Apache Software Foundation (ASF).
false belief #3: Hadoop is cheap
This is a common misconception associated with anything open
source. Just because you're able to reduce or eliminate the initial costs of
purchasing software doesn't mean you'll necessarily save money. Though Hadoop
is open source, there are a lot of costs associated with deploying Hadoop.
false belief #4: Hadoop needs a bunch of
programmers
This totally depends on what the organization plans to do.
If the plan is to build a fancy Hadoop based Big Data suite, then programmers
come into picture. If not, then programming should not be a worry at all, as
most data integration tools have GUIs that abstract MapReduce programming
complexity and pre-built templates.
false belief #5: Hadoop can only handle web analytics
When it comes to Hadoop, Web Analytics is highlighted as
most of the companies use it for analyzing web logs and other web data. But,
its application is not limited to web analytics alone. Hadoop is capable of
handling a wider range of data and analytics appealing to broader range of
organizations.
false belief #6: Big Data can do without Hadoop
When we say Big Data, then immediate thing that comes to
mind is Hadoop, in-spite of other options available in the market. Therefore,
when dealing with Big Data, there has to be Hadoop. The two have become
synonymous.
false belief #7: Hive resembles SQL
People who know SQL can quickly learn to hand code Hive, but
that doesn’t solve compatibility issues with SQL-based tools. Over the time, it
is believed that Hadoop products will support standard SQL and SQL based vendor
tools will support Hadoop.
false belief #8: Hadoop requires MapReduce
Hadoop and MapReduce are related, but they are not married
to each other. Saying this, they are not mutually exclusive to each other.
There are some variations of MapReduce that work with a variety of storage
technologies that includes HDFS and some relational DBMSs. Some users opt to
deploy HDFS with Hive or HBase, but not MapReduce.
false belief #9: MapReduce only controls analytics
MapReduce handles parallel programming, fault tolerance of
wide variety of coded logics and other applications, than just analytics.
false belief #10: Hadoop is too risky for
enterprise use
Many organizations fear that Hadoop is too new and untested
to be suited for the enterprise. Nothing could be further from the truth.
Today, Hadoop is used by everyone from Netflix to Twitter to eBay, and major
vendors including Microsoft, IBM and Oracle all sell Hadoop tools.
No comments:
Post a Comment