Sqoop is a tool used to transfer data between HDFS and relational databases having a JDBC driver.
Stambia provide tools for both Sqoop and Sqoop2 versions, which are different in term of usage.
This article explains the basics to start using Sqoop in Stambia.
Prerequisites:You must install the Hadoop connector to be able to work with Sqoop.
Please refer to the following article that will guide you to accomplish this.
Important:
Hadoop's Sqoop is using JDBC drivers to transfer data.
Therefore, the JDBC driver corresponding to the technology you want to import or export data from must be installed on the Sqoop server.
Otherwise, Sqoop will not be able to transfer any data.
Metadata
The first step, when you want to work with Sqoop in Stambia DI, consists of creating and configuring the Sqoop Metadata.
Here is an example of a common Metadata Configuration
Metadata creation
Create first the Sqoop Metadata, as usual, by selecting the technology in the Metadata Creation Wizard:
Click next, choose a name and click on finish.
Configuration of the server properties
The Metadata is now created, you can configure the Sqoop server properties accordingly to your requirements and environment:
The available properties are different depending on the Sqoop Version selected.
Here are the common Properties:
Property | Description |
Name | Logical label (alias) for the server |
Version |
Sqoop version that should be used.
|
Here are the Sqoop1 Properties:
Property | Description | Examples |
Default API |
The API to use by default with the Sqoop1 version:
|
CommandLine |
Sqoop Home |
Directory where the sqoop commands are installed. This must be set to the folder just before the 'bin' folder. |
As an example, if the sqoop-import / sqoop-export utilities are under /usr/bin/, the value should be the following: /usr/ |
If you are using the CommandLine over SSH API, you must drag and drop a SSH Metadata Link containing the SSH connection information in the HDFS Metadata.
Rename it to 'SSH'.
Here are the Sqoop2 Properties:
Property | Description | Examples |
Default API | The API to use by default with the Sqoop2 version:
|
REST |
URL | Sqoop2 Rest API base URL | http://<hostname>:12000/sqoop |
Hadoop Configuration Directory |
Path to a directory containing Hadoop Configuration Files on the remote server, such as core-site.xml, hdfs-site.xml, ... It is required when creating a Sqoop2 HDFS Link |
/home/cloudera/stambia/conf |
Configuration of the Kerberos Security
When working with Kerberos secured Hadoop clusters, connections will be protected, and you'll therefore need to specify in Stambia the credentials and necessary information to perform the Kerberos connection.
A Kerberos Metadata is available to specify everything required.
- Create a new Kerberos Metadata (or use an existing one)
- Define inside the Kerberos Principal to use for Sqoop
- Drag and drop it in the Sqoop Metadata
- Rename the Metadata Link to 'KERBEROS'
Kerberos is only supported for the Sqoop1 version
Refer to this dedicated article for further information about the Kerberos Metadata configuration
Using Sqoop in Stambia
Sqoop1
Stambia provides two Process tools to work with Sqoop1:
Tool | Description |
TOOL Sqoop Export | Export data from HDFS to any database having a JDBC driver. |
TOOL Sqoop Import | Import data from any database having a JDBC driver to HDFS. |
To use a tool:
- Drag and drop it in a Process
- Drag and drop the HDFS Folder Metadata Link from which you want to import or export data on it
- Drag and drop a Sqoop Metadata Link on it
- Drag and drop the Database Table Metadata Link from which you want to import or export data on it
- Execute the Process
Note: For further information, please consult the tool's Process and parameters description.
Sqoop2
Sqoop2 has the same goal as Sqoop1 but is completely different at use.
Please refer to the Sqoop2 documentation to understand it's concepts and usage.
Stambia provides all the necessary tools to create and manage Sqoop2 Links and Jobs.
Tool | Description |
TOOL Sqoop2 Describe Connectors | Retrieve information about the Sqoop driver and the available connectors. |
TOOL Sqoop2 Create Link | Create a Sqoop2 Link. |
TOOL Sqoop2 Monitor Link | Monitor Sqoop2 Links (enable, disable, delete). |
TOOL Sqoop2 Create Job | Create a Sqoop2 Job. |
TOOL Sqoop2 Set Job FROM HDFS | Generate the HDFS From part of a Job. |
TOOL Sqoop2 Set Job FROM JDBC | Generate the JDBC From part of a Job. |
TOOL Sqoop2 Set Job TO HDFS | Generate the HDFS To part of a Job. |
TOOL Sqoop2 Set Job TO JDBC | Generate the JDBC To part of a Job. |
TOOL Sqoop2 Monitor Job | Monitor Sqoop2 Jobs (enable, disable, delete, start, stop, status). |
To use a tool:
- Drag and drop it in a Process
- Set the properties to your needs
Note: For further information, please consult the tool's Process and parameters description.
Demonstration Project
The Hadoop demonstration project that you can find on the download page contains examples for Sqoop.
Do not hesitate to have a look at this project to find samples and examples on how to use it.