Writing a file to Hortonworks Sandbox from Talend Studio
I recently needed to quickly build some test data for my Hadoop environment and was looking for a tool to help me out. What I discovered was this is a very simple process within Talend Studio. (you can get the latest Talend Studio from their site)
Here is how…
Step 1 – Generating Test Data within Talend Studio
- Create a New Job within the Job Designer
- Drag a tRowGenerator onto the Designer
- Double Click on your tRowGenerator component and add in fields you want to generate
Step 2 – Connecting to HDFS from Talend
- Drag a tHDFSConnection onto the Designer
- Change the “Name Node URI” property to point to your Hortonworks Sandbox on port 8020.
- Change the connection your to “sandbox”.
- Right click on the tHDFSConnection and add a OK trigger that connects the tHDFSConnection to the tRowGenerator
Step 3 – Writing to HDFS
- Drag a tHDFSOutput onto the Designer
- Change the “Name Node URI” property to point to your Hortonworks Sandbox on port 8020. Example:”hdfs://<YOUR SANDBOX IP>:8020/”
- Change the connection your to “sandbox”.
- Set the name of the output file in File Name field
- Right click on the tRowGenerator and add a row main that connects the tRowGenerator to the tHDFSOutput
Step 4 – Running the Job from Talend
- Click on the “Run” Tab and press the “Run” button
Step 5 – Viewing the file in the Hortonworks Sandbox
- Open your web browser and enter the URL: http://<YOUR SANDBOX IP>:8000
- Click of the File Browser Icon on the top bar
- Your file should have appeared within the sandbox user’s home directory
VOILA!
The post HOW TO: Connect/Write a File to Hortonworks Sandbox from Talend Studio appeared first on Hortonworks.