Cloudera Enterprise 5.15.x | Other versions

Sqoop 1 Installation

Apache Sqoop 1 is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop 1 to import data from external structured datastores into the Hadoop Distributed File System (HDFS) or related systems such as Hive and HBase. Conversely, you can use Sqoop 1 to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.

  Note:

To see which version of Sqoop 1 is shipping in CDH 5, check the CDH Version and Packaging Information. For important information on new and changed components, see the CDH 5 Release Notes.

Continue reading:

Upgrading Sqoop 1 from an Earlier CDH 5 release

These instructions assume that you are upgrading Sqoop 1 as part of an upgrade to the latest CDH 5 release, and have already performed the steps under Upgrading from an Earlier CDH 5 Release to the Latest Release.

To upgrade Sqoop 1 from an earlier CDH 5 release, install the new version of Sqoop 1 using one of the methods described below: Sqoop 1 Prerequisites or .Installing the Sqoop 1 Tarball.

  Important: Configuration files
  • If you install a newer version of a package that is already on the system, configuration files that you have modified will remain intact.
  • If you uninstall a package, the package manager renames any configuration files you have modified from <file> to <file>.rpmsave. If you then re-install the package (probably to install a new version) the package manager creates a new <file> with applicable defaults. You are responsible for applying any changes captured in the original configuration file to the new configuration file. In the case of Ubuntu and Debian upgrades, you will be prompted if you have made changes to a file for which there is a new version. For details, see Automatic handling of configuration files by dpkg.

Sqoop 1 Packaging

The packaging options for installing Sqoop 1 are:

  • RPM packages
  • Tarball
  • Debian packages

Sqoop 1 Prerequisites

Sqoop 1 requires the following:

  • An operating system supported by CDH 5.
  • Oracle JDK.
  • Services that you want to use with Sqoop, such as HBase, Hive HCatalog, and Accumulo. When you run Sqoop, it checks to see if these services are installed and configured. It logs warnings for services it does not find. These warnings, shown below, are harmless. You can suppress these error messages by setting the variables $HBASE_HOME, $HCAT_HOME and $ACCUMULO_HOME to any existing directory.
    > Warning: /usr/lib/sqoop/../hbase does not exist! HBase imports will fail.
    > Please set $HBASE_HOME to the root of your HBase installation.
    > Warning: /usr/lib/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.
    > Please set $HCAT_HOME to the root of your HCatalog installation.
    > Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
    > Please set $ACCUMULO_HOME to the root of your Accumulo installation. 

Installing the Sqoop 1 RPM or Debian Packages

Installing the Sqoop 1 RPM or Debian packages is more convenient than installing the Sqoop 1 tarball because the packages:

  • Handle dependencies
  • Provide for easy upgrades
  • Automatically install resources to conventional locations

The Sqoop 1 packages consist of:

  • sqoop — Complete Sqoop 1 distribution
  • sqoop-metastore — For installation of the Sqoop 1 metastore only
  Note: Install Cloudera Repository
Before using the instructions on this page to install or upgrade:
  • Install the Cloudera yum, zypper/YaST or apt repository.
  • Install or upgrade CDH 5 and make sure it is functioning correctly.
For instructions, see Installing the Latest CDH 5 Release and Upgrading Unmanaged CDH Using the Command Line.

To install Sqoop 1 on a RHEL-compatible system:

$ sudo yum install sqoop

To install Sqoop 1 on an Ubuntu or other Debian system:

$ sudo apt-get install sqoop

To install Sqoop 1 on a SLES system:

$ sudo zypper install sqoop

If you have already configured CDH on your system, there is no further configuration necessary for Sqoop 1. You can start using Sqoop 1 by using commands such as:

$ sqoop help
$ sqoop version
$ sqoop import

Installing the Sqoop 1 Tarball

The Sqoop 1 tarball is a self-contained package containing everything necessary to use Sqoop 1 with YARN on a Unix-like system.

  Important:

Make sure you have read and understood the section on tarballs before you proceed with a tarball installation.

To install Sqoop 1 from the tarball, unpack the tarball in a convenient location. Once it is unpacked, add the bin directory to the shell path for easy access to Sqoop 1 commands. Documentation for users and developers can be found in the docs directory.

To install the Sqoop 1 tarball on Linux-based systems:

Run the following command:

$ (cd /usr/local/ && sudo tar -zxvf _<path_to_sqoop.tar.gz>_)
  Note:

When installing Sqoop 1 from the tarball package, you must make sure that the environment variables JAVA_HOME and HADOOP_MAPRED_HOME are configured correctly. The variable HADOOP_MAPRED_HOME should point to the root directory of Hadoop installation. Optionally, if you intend to use any Hive or HBase related functionality, you must also make sure that they are installed. Configure the variables HIVE_HOME and HBASE_HOME to point to the root directory of their respective installation.

Installing the JDBC Drivers for Sqoop 1

Sqoop 1 does not ship with third party JDBC drivers. You must download them separately and save them to the /var/lib/sqoop/ directory on the server. The following sections show how to install the most common JDBC Drivers.
  Note:
  • The JDBC drivers need to be installed only on the machine where Sqoop runs; you do not need to install them on all hosts in your Hadoop cluster.

  • Kerberos authentication is not supported by the Sqoop Connector for Teradata.

Before you begin:

Make sure the /var/lib/sqoop directory exists and has the correct ownership and permissions:
mkdir -p /var/lib/sqoop
chown sqoop:sqoop /var/lib/sqoop
chmod 755 /var/lib/sqoop

This sets permissions to drwxr-xr-x.

For JDBC drivers for Hive, Impala, Teradata, or Netezza, see the Connectors documentation.

Installing the MySQL JDBC Driver

Download the MySQL JDBC driver from http://www.mysql.com/downloads/connector/j/5.1.html. You will need to sign up for an account if you do not already have one, and log in, before you can download it. Then copy it to the /var/lib/sqoop/ directory. For example:

$ sudo cp mysql-connector-java-version/mysql-connector-java-version-bin.jar /var/lib/sqoop/
  Note:
At the time of publication, version was 5.1.31, but the version may have changed by the time you read this.
  Important:

Make sure you have at least version 5.1.31. Some systems ship with an earlier version that may not work correctly with Sqoop.

Installing the Oracle JDBC Driver

You can download the JDBC Driver from the Oracle website, for example http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html. You must accept the license agreement before you can download the driver. Download the ojdbc6.jar file and copy it to the /var/lib/sqoop/ directory:

$ sudo cp ojdbc6.jar /var/lib/sqoop/

Installing the Microsoft SQL Server JDBC Driver

Download the Microsoft SQL Server JDBC driver from http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774 and copy it to the /var/lib/sqoop/ directory. For example:

$ curl -L 'http://download.microsoft.com/download/0/2/A/02AAE597-3865-456C-AE7F-613F99F850A8/sqljdbc_4.0.2206.100_enu.tar.gz' | tar xz
$ sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop/

Installing the PostgreSQL JDBC Driver

Download the PostgreSQL JDBC driver from http://jdbc.postgresql.org/download.html and copy it to the /var/lib/sqoop/ directory. For example:

$ curl -L 'http://jdbc.postgresql.org/download/postgresql-9.2-1002.jdbc4.jar' -o postgresql-9.2-1002.jdbc4.jar
$ sudo cp postgresql-9.2-1002.jdbc4.jar /var/lib/sqoop/

Syntax for Configuring JDBC Connection Strings

These are the JDBC connection strings for supported databases.

MySql Connection String

Syntax:

jdbc:mysql://<HOST>:<PORT>/<DATABASE_NAME>

Example:

jdbc:mysql://my_mysql_server_hostname:3306/my_database_name

Oracle Connection String

Syntax:

jdbc:oracle:thin@<HOST>:<PORT>:<DATABASE_NAME>

Example:

jdbc:oracle:thin@my_oracle_server_hostname:1521:my_database_name

PostgreSQL Connection String

Syntax:

jdbc:postgresql://<HOST>:<PORT>/<DATABASE_NAME>

Example:

jdbc:postgresql://my_postgres_server_hostname:5432/my_database_name

Netezza Connection String

Syntax:

jdbc:netezza://<HOST>:<PORT>/<DATABASE_NAME>
Example:
jdbc:netezza://my_netezza_server_hostname:5480/my_database_name

Teradata Connection String

  Note: Kerberos authentication is not supported by the Sqoop Connector for Teradata.

Syntax:

jdbc:teradata://<HOST>/DBS_PORT=1025/DATABASE=<DATABASE_NAME>
Example:
jdbc:teradata://my_teradata_server_hostname/DBS_PORT=1025/DATABASE=my_database_name

Setting HADOOP_MAPRED_HOME for Sqoop 1

  • For each user who will be submitting MapReduce jobs using MapReduce v2 (YARN), or running Pig, Hive, or Sqoop 1 in a YARN installation, make sure that the HADOOP_MAPRED_HOME environment variable is set correctly, as follows:
    $ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
  • For each user who will be submitting MapReduce jobs using MapReduce v1 (MRv1), or running Pig, Hive, or Sqoop 1 in an MRv1 installation, set the HADOOP_MAPRED_HOME environment variable as follows:
    $ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-0.20-mapreduce

Viewing the Sqoop 1 Documentation

For additional documentation see the Sqoop user guides.

Page generated May 18, 2018.