SQOOP

Sqoop is used to transfer data from and to external sources to Hadoop.

Basic Syntax of sqoop job

In the above example we are scooping from mysql data base (Jdbc:mysql), the url is where the sql is installed (quickstart.cloudera), and we are using the default port (3306) to get connected to my sql.

/test indicates that we are connecting to database called test. user name and password are usual and the table that we are reading from is cities. >>
 * 1) ==To import a single Table==
 * sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx table cities -m 1;
 * 1) ==To import more then One Table==
 * sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx table cities,Students -m 1;
 * 1) ==To import all the table from a given Schema==
 * sqoop import-all-tables --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx ;
 * 1) ==Specifying Target Directory==
 * 2) sqoop import-all-tables --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx --target-dir /user/cloudera/;
 * 3) ==Specifying the Target with Warehouse==
 * 4) sqoop import-all-tables --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx --warehouse-dir /user/cloudera/;
 * 5) ==Difference between Target Directory and Warehouse Directory==
 * 6) When importing more then one table, its a good practice to use Warehouse directory as that would create a table specific directory in the destination
 * 7) ==Using file format other then CSV==
 * 8) Binary format
 * Sequencefile: is a binary format, used for PDF and images
 * sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx table cities --as-sequencefile -m 1;
 * avro file : is fast it contains the metadata of the file
 * sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx table cities --as-avrodatafile -m 1;
 * 1) ==Compressing Imported Data==
 * 2) sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx table cities --compress -m 1;
 * 3) ==Speeding up Transfer==
 * 4) sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx table cities --direct -m 1;
 * 5) ==Overriding type mapping==
 * 6) ==Controlling Parallelism==
 * 7) sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx table cities --num-mappers 10;
 * 8) ==Encoding Null Values==
 * 9) sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx table cities --null-string '\\N';//
 * 10) //sqoop import --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx table cities --null-non-string '\\N';

How to Create a Sqoop Job sqoop job --create Cities -- import --connect jdbc:mysql://quickstart.cloudera:3306/test --username root --password xxxxx table cities --as-sequencefile -m 1;

NOTE: space between -- and Import