Using Apache Sqoop for Data Import from Relational DBs
ISSUE How do I use Apache Sqoop for importing data from a relational DB? SOLUTION Apache Sqoop can be used to import data from any relational DB into HDFS, Hive or HBase. To import data into HDFS, use...
View ArticleWorking with Files in HCatalog Tables
ISSUE: How can I use HCatalog to discover which files are associated with a partition in a table so that the files can be read directly from HDFS? How do I place files in HDFS and then add them as a...
View ArticleHOWTO: Ambari on EC2
This document is an informal guide to setting up a test cluster on Amazon AWS, specifically the EC2 service. This is not a best practice guide nor is it suitable for a full PoC or production install of...
View ArticleGet Started: Ambari for provisioning, managing and monitoring Hadoop
Ambari is 100% open source and included in HDP, greatly simplifying installation and initial configuration of Hadoop clusters. In this article we’ll be running through some installation steps to get...
View ArticleHOW TO: Connect Tableau to Hortonworks Sandbox
Tableau, Apache Hive and the Hortonworks Sandbox As with most BI tools Tableau can use Apache Hive (via ODBC connection) as the defacto standard for SQL access in Hadoop. Establishing a connection from...
View ArticleHOW TO: Connect/Write a File to Hortonworks Sandbox from Talend Studio
Writing a file to Hortonworks Sandbox from Talend Studio I recently needed to quickly build some test data for my Hadoop environment and was looking for a tool to help me out. What I discovered was...
View ArticleHOWTO: Make the Sandbox run faster
If you are having performance issues with the Sandbox, try the following: Run only 1 virtual machine at a time Reboot the virtual machine Allocate more RAM to the Sandbox VM. This assumes you have more...
View ArticleStorm on YARN Install on HDP2 Beta Cluster
This is the installation instructions for Storm on YARN. Our work is based on the code and documentation provided by Yahoo in the Storm-YARN repository at https://github.com/yahoo/storm-yarn We...
View ArticleUsing Apache Spark: Technical Preview with HDP 2.2
Introduction The Spark Technical preview lets you evaluate Apache Spark 1.2.0 on YARN with HDP 2.2. With YARN, Hadoop can now support various types of workloads; Spark on YARN becomes yet another...
View ArticleHDFS Transparent Data Encryption
Many HDP users are increasing their focus on security within Hadoop and are looking for ways to encrypt their data. Fortunately, Hadoop provides several options for encrypting data at rest. At the...
View Article