Quantcast
Channel: Hortonworks » Knowledgebase
Browsing all 31 articles
Browse latest View live

Checking the Health of HDFS Cluster

ISSUE How do I check the health of my HDFS cluster (name node and all data nodes)? SOLUTION Hadoop includes the dfsadmin command line tool for HDFS administration functionality. This tool allows the...

View Article


Failure of Active NameNode in Hadoop Prior to HA

ISSUE: Failure of Active Namenode in a non-HA deployment SOLUTION: The best approach to mitigating the risk of data loss due to a NameNode failure is to harden the NameNode system and components to...

View Article


Install the Latest MySQL on a Linux Target

ISSUE: hcat requires some sort of persistent db to store schema information SOLUTION 1: Specific host access only grab the latest package > yum -y install mysql-server configure autostart at boot...

View Article

Linux File Systems for HDFS

ISSUE: Choosing the appropriate Linux file system for HDFS deployment SOLUTION: The Hadoop Distributed File System is platform independent and can function on top of any underlying file system and...

View Article

Optimal Way to Shut Down an HDP Slave Node

ISSUE What is the optimal way to shut down a HDP slave node SOLUTION HDP slave nodes are usually configured to run the datanode and tasktracker processes. If HBase is installed, then the slave nodes...

View Article


Testing Hbase Setup

ISSUE How do I test that Hbase is working properly? OR What is a simple set of Hbase Commands? SOLUTION If HBase processes are not running, start them with the following commands: To start the HBase...

View Article

Testing HDFS Setup

  ISSUE How do I run simple Hadoop Distributed File System tasks? Or How do I test the HDFS services are working? SOLUTION Make sure the name node and the data nodes are started. To start the name...

View Article

Testing MapReduce Setup

ISSUE How do I run an example map reduce job? Or How do I test the map reduce services are working? SOLUTION Make sure the job tracker and the task trackers are started. To start the job tracker: su...

View Article


Using Apache Sqoop for Data Import from Relational DBs

ISSUE How do I use Apache Sqoop for importing data from a relational DB? SOLUTION Apache Sqoop can be used to import data from any relational DB into HDFS, Hive or HBase. To import data into HDFS, use...

View Article


Working with Files in HCatalog Tables

ISSUE: How can I use HCatalog to discover which files are associated with a partition in a table so that the files can be read directly from HDFS? How do I place files in HDFS and then add them as a...

View Article

Big Data Defined

Big Data is defined in terms of transformative economics. A Big Data system has four properties: It uses local storage to be fast but inexpensive It uses clusters of commodity hardware to be...

View Article

Image may be NSFW.
Clik here to view.

Hadoop Distributed File System (HDFS) Defined

The best place for a deep dive into HDFS is the HDFS Architecture page. Here we’ll take an abbreviated view of what HDFS is, and why it matters. The Hadoop Distributed File System is the backbone of a...

View Article

Image may be NSFW.
Clik here to view.

Hadoop MapReduce Defined

Hadoop MapReduce is the way Hadoop processes data. MapReduce uses the Hadoop Distributed File System to handle the distribution of data on the cluster. MapReduce is how Hadoop parallelizes its...

View Article


HOWTO: Install the Latest MySQL on a Linux Target

ISSUE: hcat requires some sort of persistent db to store schema information SOLUTION 1: Specific host access only grab the latest package > yum -y install mysql-server configure autostart at boot...

View Article

Image may be NSFW.
Clik here to view.

HOWTO: Ambari on EC2

This document is an informal guide to setting up a test cluster on Amazon AWS, specifically the EC2 service. This is not a best practice guide nor is it suitable for a full PoC or production install of...

View Article


Image may be NSFW.
Clik here to view.

Get Started: Ambari for provisioning, managing and monitoring Hadoop

Ambari is 100% open source and included in HDP, greatly simplifying installation and initial configuration of Hadoop clusters. In this article we’ll be running through some installation steps to get...

View Article

How To: Install and Configure the Hortonworks ODBC driver on Mac OSX

This document describes how to install and configure the Hortonworks ODBC driver on Mac OS X. After you install and configure the ODBC driver, you will be able to access Hortonworks sandbox data using...

View Article


How To: Install and Configure the Hortonworks ODBC driver on Windows 7

This document describes how to install and configure the Hortonworks ODBC driver on Windows 7. After you install and configure the ODBC driver, you will be able to access Hortonworks sandbox data using...

View Article

HOWTO: Optimal Way to Shut Down an HDP Slave Node

ISSUE What is the optimal way to shut down a HDP slave node SOLUTION HDP slave nodes are usually configured to run the datanode and tasktracker processes. If HBase is installed, then the slave nodes...

View Article

HOWTO Test Hbase Setup

ISSUE How do I test that Hbase is working properly? OR What is a simple set of Hbase Commands? SOLUTION If HBase processes are not running, start them with the following commands: To start the HBase...

View Article
Browsing all 31 articles
Browse latest View live