Monday, February 15, 2016

Sqoop Example - Session 2

Create hadoopguide in mysql

Mysql> CREATE DATABASE hadoopguide;
Mysql> GRANT ALL PRIVILEGES ON hadoopguide.* TO ‘%’@’localhost’;
Mysql> GRANT ALL PRIVILEGES ON haddopguide.* TO ‘’@’localhost’;

% mysql hadoopguide

Mysql> CREATE TABLE widgets(id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
                widget_name VARCHAR(64) NOT NULL),
                price DECIMAL(10,2),
                design_date DATE,
                version INT,
                design_comment VARCHAR(100));
MYSQL> INSERT INTO widgets VALUES (NULL, 'sprocket', 0.25, '2010-02-10', 1, 'Connects two gizmos');
MYSQL> INSERT INTO WIDGETS VALUES (NULL, 'gizmo', 4.00, '2009-11-30', 4, NULL);
MYSQL> INSERT INTO WIDGETS VALUES (NULL, 'gadget', 99.99, '1983-08-13', 13, 'Our flagship product');

mysql> quit;

Lets use Sqoop to import this table into hdfs

Sqoop import --connect jdbc:mysql://localhost:hadoopguide --table widgets –m 1

Sqoop import tool runs a MR job that connects to mysql db and reads the table. By default it runs 4 mappers in parallel creating 4 output files in the same dir.
Since we know that we are importing only 3 rows, we have specified to use only 1 mapper by using  -m 1, so we get a single file in HDFS.

% hadoop fs –cat widgets/part-m-00000
1,sprocket,0.25,2010-02-10,1,Connects two gizmos
2,gizmo,4.00,2009-11-30,4,null
3,gadget,99.99,1983-08-13,13,Our flagship product

The connect string jdbc:mysql//localhost:hadoopguide will read from a local database. Do not mention localhost, if the Hadoop cluster is used; specify the full hostname.

Generate code without import

% sqoop codegen --connect jdbc:mysql://localhost/hadoopguide --table widgets --class-name Widget


Codegen tool simply generates code without doing the full import. It generates a class Widget.java. If we are working with records imported to SequenceFiles, we have to work with generated code to de-serialize data from Sequence File storage. We can work with text based records without using generated code, but it can handle some tedious aspects of data processing for us.

No comments:

Post a Comment