Configuring Sqoop 2
This section explains how to configure the Sqoop 2 server.
Configuring which Hadoop Version to Use
The Sqoop 2 client does not interact directly with Hadoop MapReduce, and so it does not require any MapReduce configuration.
- To use YARN:
alternatives --set sqoop2-tomcat-conf /etc/sqoop2/tomcat-conf.dist
- To use MRv1:
alternatives --set sqoop2-tomcat-conf /etc/sqoop2/tomcat-conf.mr1
In earlier releases, the mechanism for setting the MapReduce version was the CATALINA_BASEvariable in the /etc/defaults/sqoop2-server file. This does not work as of CDH 5 Beta 2, and in fact could cause problems. Check your /etc/defaults/sqoop2-server file and make sure CATALINA_BASE is not set.
Configuring Sqoop 2 to Use PostgreSQL instead of Apache Derby
Deciding which Database to Use
- Derby runs in embedded mode and it is not possible to monitor its health.
- Though it might be possible, Cloudera currently has no live backup strategy for the embedded Derby database.
- Under load, Cloudera has observed locks and rollbacks with the embedded Derby database that do not happen with server-based databases.
Cloudera currently has no recommended way to migrate data from an existing Derby database into the new PostgreSQL database.
Use the procedure that follows to configure Sqoop 2 to use PostgreSQL instead of Apache Derby.
- Install PostgreSQL 8.4.x or 9.0.x
- Create the Sqoop User and Sqoop Database
- Stop the Sqoop 2 Server
- Configure Sqoop 2 to use PostgreSQL
- Restart the Sqoop 2 Server
Install PostgreSQL 8.4.x or 9.0.x
Create the Sqoop User and Sqoop Database
For example, using the PostgreSQL psql command-line tool:
$ psql -U postgres Password for user postgres: ***** postgres=# CREATE ROLE sqoop LOGIN ENCRYPTED PASSWORD 'sqoop' NOSUPERUSER INHERIT CREATEDB NOCREATEROLE; CREATE ROLE postgres=# CREATE DATABASE "sqoop" WITH OWNER = sqoop ENCODING = 'UTF8' TABLESPACE = pg_default LC_COLLATE = 'en_US.UTF8' LC_CTYPE = 'en_US.UTF8' CONNECTION LIMIT = -1; CREATE DATABASE postgres=# \q
Stop the Sqoop 2 Server
$ sudo /sbin/service sqoop2-server stop
Configure Sqoop 2 to use PostgreSQL
Edit the sqoop.properties file (normally /etc/sqoop2/conf) as follows:
org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.postgresql.PostgresqlRepositoryHandler org.apache.sqoop.repository.jdbc.transaction.isolation=isolation level org.apache.sqoop.repository.jdbc.maximum.connections=max connections org.apache.sqoop.repository.jdbc.url=jdbc URL org.apache.sqoop.repository.jdbc.driver=org.postgresql.Driver org.apache.sqoop.repository.jdbc.user=username org.apache.sqoop.repository.jdbc.password=password org.apache.sqoop.repository.jdbc.properties.property=value
- Replace isolation level with a value such as READ_COMMITTED.
- Replace max connections with a value such as 10.
- Replace jdbc URL with the hostname on which you installed PostgreSQL.
- Replace username with (in this example) sqoop
- Replace password with (in this example) sqoop
- Use org.apache.sqoop.repository.jdbc.properties.property to set each additional property you want to configure; see https://jdbc.postgresql.org/documentation/head/connect.html for details. For example, replace property with loglevel and value with 3
Restart the Sqoop 2 Server
$ sudo /sbin/service sqoop2-server start
Installing the JDBC Drivers
The JDBC drivers need to be installed only on the machine running Sqoop. You do not need to install them on all hosts in your Hadoop cluster.
Installing the MySQL JDBC Driver
Download the MySQL JDBC driver here. You must sign up for an account if you do not already have one, then log in before you can download the driver. Copy it to the /var/lib/sqoop2/ directory. For example:
$ sudo cp mysql-connector-java-version/mysql-connector-java-version-bin.jar /var/lib/sqoop2/At the time of publication, version was 5.1.31, but the version might change by the time you read this.
Make sure you have at least version 5.1.31. Some systems ship with an earlier version that might not work correctly with Sqoop.
Installing the Oracle JDBC Driver
You can download the JDBC Driver from the Oracle website, for example here. You must accept the license agreement before you can download the driver. Download the ojdbc6.jar file and copy it to /var/lib/sqoop2/ directory:
$ sudo cp ojdbc6.jar /var/lib/sqoop2/
Installing the Microsoft SQL Server JDBC Driver
Download the Microsoft SQL Server JDBC driver here and copy it to the /var/lib/sqoop2/ directory. For example:
$ curl -L 'http://download.microsoft.com/download/0/2/A/02AAE597-3865-456C-AE7F-613F99F850A8/sqljdbc_4.0.2206.100_enu.tar.gz' | tar xz $ sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop2/
Installing the PostgreSQL JDBC Driver
Download the PostgreSQL JDBC driver here and copy it to the /var/lib/sqoop2/ directory. For example:
$ curl -L 'http://jdbc.postgresql.org/download/postgresql-9.2-1002.jdbc4.jar' -o postgresql-9.2-1002.jdbc4.jar $ sudo cp postgresql-9.2-1002.jdbc4.jar /var/lib/sqoop2/
Syntax for Configuring JDBC Connection Strings
These are the JDBC connection strings for supported databases.
MySql Connection String
Syntax:
jdbc:mysql://<HOST>:<PORT>/<DATABASE_NAME>
Example:
jdbc:mysql://my_mysql_server_hostname:3306/my_database_name
Oracle Connection String
Syntax:
jdbc:oracle:thin@<HOST>:<PORT>:<DATABASE_NAME>
Example:
jdbc:oracle:thin@my_oracle_server_hostname:1521:my_database_name
PostgreSQL Connection String
Syntax:
jdbc:postgresql://<HOST>:<PORT>/<DATABASE_NAME>
Example:
jdbc:postgresql://my_postgres_server_hostname:5432/my_database_name
Netezza Connection String
Syntax:
jdbc:netezza://<HOST>:<PORT>/<DATABASE_NAME>Example:
jdbc:netezza://my_netezza_server_hostname:5480/my_database_name
Teradata Connection String
Syntax:
jdbc:teradata://<HOST>/DBS_PORT=1025/DATABASE=<DATABASE_NAME>
jdbc:teradata://my_teradata_server_hostname/DBS_PORT=1025/DATABASE=my_database_name
<< Installing Sqoop 2 | ©2016 Cloudera, Inc. All rights reserved | Starting, Stopping, and Accessing the Sqoop 2 Server >> |
Terms and Conditions Privacy Policy |