Checking the Health of HDFS Cluster
ISSUE How do I check the health of my HDFS cluster (name node and all data nodes)? SOLUTION Hadoop includes the dfsadmin command line tool for HDFS administration functionality. This tool allows the...
View ArticleFailure of Active NameNode in Hadoop Prior to HA
ISSUE: Failure of Active Namenode in a non-HA deployment SOLUTION: The best approach to mitigating the risk of data loss due to a NameNode failure is to harden the NameNode system and components to...
View ArticleInstall the Latest MySQL on a Linux Target
ISSUE: hcat requires some sort of persistent db to store schema information SOLUTION 1: Specific host access only grab the latest package > yum -y install mysql-server configure autostart at boot...
View ArticleLinux File Systems for HDFS
ISSUE: Choosing the appropriate Linux file system for HDFS deployment SOLUTION: The Hadoop Distributed File System is platform independent and can function on top of any underlying file system and...
View ArticleOptimal Way to Shut Down an HDP Slave Node
ISSUE What is the optimal way to shut down a HDP slave node SOLUTION HDP slave nodes are usually configured to run the datanode and tasktracker processes. If HBase is installed, then the slave nodes...
View ArticleTesting Hbase Setup
ISSUE How do I test that Hbase is working properly? OR What is a simple set of Hbase Commands? SOLUTION If HBase processes are not running, start them with the following commands: To start the HBase...
View ArticleTesting HDFS Setup
ISSUE How do I run simple Hadoop Distributed File System tasks? Or How do I test the HDFS services are working? SOLUTION Make sure the name node and the data nodes are started. To start the name...
View ArticleTesting MapReduce Setup
ISSUE How do I run an example map reduce job? Or How do I test the map reduce services are working? SOLUTION Make sure the job tracker and the task trackers are started. To start the job tracker: su...
View ArticleUsing Apache Sqoop for Data Import from Relational DBs
ISSUE How do I use Apache Sqoop for importing data from a relational DB? SOLUTION Apache Sqoop can be used to import data from any relational DB into HDFS, Hive or HBase. To import data into HDFS, use...
View ArticleWorking with Files in HCatalog Tables
ISSUE: How can I use HCatalog to discover which files are associated with a partition in a table so that the files can be read directly from HDFS? How do I place files in HDFS and then add them as a...
View ArticleBig Data Defined
Big Data is defined in terms of transformative economics. A Big Data system has four properties: It uses local storage to be fast but inexpensive It uses clusters of commodity hardware to be...
View ArticleHadoop Distributed File System (HDFS) Defined
The best place for a deep dive into HDFS is the HDFS Architecture page. Here we’ll take an abbreviated view of what HDFS is, and why it matters. The Hadoop Distributed File System is the backbone of a...
View ArticleHadoop MapReduce Defined
Hadoop MapReduce is the way Hadoop processes data. MapReduce uses the Hadoop Distributed File System to handle the distribution of data on the cluster. MapReduce is how Hadoop parallelizes its...
View ArticleHOWTO: Install the Latest MySQL on a Linux Target
ISSUE: hcat requires some sort of persistent db to store schema information SOLUTION 1: Specific host access only grab the latest package > yum -y install mysql-server configure autostart at boot...
View ArticleHOWTO: Ambari on EC2
This document is an informal guide to setting up a test cluster on Amazon AWS, specifically the EC2 service. This is not a best practice guide nor is it suitable for a full PoC or production install of...
View ArticleGet Started: Ambari for provisioning, managing and monitoring Hadoop
Ambari is 100% open source and included in HDP, greatly simplifying installation and initial configuration of Hadoop clusters. In this article we’ll be running through some installation steps to get...
View ArticleHow To: Install and Configure the Hortonworks ODBC driver on Mac OSX
This document describes how to install and configure the Hortonworks ODBC driver on Mac OS X. After you install and configure the ODBC driver, you will be able to access Hortonworks sandbox data using...
View ArticleHow To: Install and Configure the Hortonworks ODBC driver on Windows 7
This document describes how to install and configure the Hortonworks ODBC driver on Windows 7. After you install and configure the ODBC driver, you will be able to access Hortonworks sandbox data using...
View ArticleHOWTO: Optimal Way to Shut Down an HDP Slave Node
ISSUE What is the optimal way to shut down a HDP slave node SOLUTION HDP slave nodes are usually configured to run the datanode and tasktracker processes. If HBase is installed, then the slave nodes...
View ArticleHOWTO Test Hbase Setup
ISSUE How do I test that Hbase is working properly? OR What is a simple set of Hbase Commands? SOLUTION If HBase processes are not running, start them with the following commands: To start the HBase...
View Article