{"id":1364,"date":"2017-02-23T14:05:35","date_gmt":"2017-02-23T19:05:35","guid":{"rendered":"http:\/\/www.xavignu.com\/?p=1364"},"modified":"2017-02-23T14:05:35","modified_gmt":"2017-02-23T19:05:35","slug":"using-sqoop-to-import-a-db-table-into-hdfs","status":"publish","type":"post","link":"https:\/\/www.xavignu.com\/?p=1364","title":{"rendered":"Using sqoop to import a DB table into HDFS"},"content":{"rendered":"<p>In the world of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Big_data\" target=\"_blank\">Big Data<\/a> to import data from a DB into HDFS you need <a href=\"http:\/\/sqoop.apache.org\/\" target=\"_blank\">Apache Sqoop<\/a>.<\/p>\n<pre id=\"terminal\">sqoop import --connect jdbc:mysql:\/\/localhost\/mysql  --username training  -P --warehouse-dir \/home\/training\/db --table user\r\n17\/02\/23 10:38:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0\r\nEnter password: \r\n17\/02\/23 10:38:24 INFO manager.SqlManager: Using default fetchSize of 1000\r\n17\/02\/23 10:38:24 INFO tool.CodeGenTool: Beginning code generation\r\n17\/02\/23 10:38:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1\r\n17\/02\/23 10:38:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1\r\n17\/02\/23 10:38:24 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is \/usr\/lib\/hadoop-0.20-mapreduce\r\nNote: \/tmp\/sqoop-training\/compile\/7f3a9709c50f58c2c6bb24de91922c6b\/user.java uses or overrides a deprecated API.\r\nNote: Recompile with -Xlint:deprecation for details.\r\n17\/02\/23 10:38:29 INFO orm.CompilationManager: Writing jar file: \/tmp\/sqoop-training\/compile\/7f3a9709c50f58c2c6bb24de91922c6b\/user.jar\r\n17\/02\/23 10:38:29 WARN manager.MySQLManager: It looks like you are importing from mysql.\r\n17\/02\/23 10:38:29 WARN manager.MySQLManager: This transfer can be faster! Use the --direct\r\n17\/02\/23 10:38:29 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.\r\n17\/02\/23 10:38:29 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)\r\n17\/02\/23 10:38:29 WARN manager.CatalogQueryManager: The table user contains a multi-column primary key. Sqoop will default to the column Host only for this job.\r\n17\/02\/23 10:38:29 WARN manager.CatalogQueryManager: The table user contains a multi-column primary key. Sqoop will default to the column Host only for this job.\r\n17\/02\/23 10:38:29 INFO mapreduce.ImportJobBase: Beginning import of user\r\n17\/02\/23 10:38:31 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.\r\n17\/02\/23 10:38:33 INFO db.DBInputFormat: Using read commited transaction isolation\r\n17\/02\/23 10:38:33 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`Host`), MAX(`Host`) FROM `user`\r\n17\/02\/23 10:38:33 WARN db.TextSplitter: Generating splits for a textual index column.\r\n17\/02\/23 10:38:33 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.\r\n17\/02\/23 10:38:33 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.\r\n17\/02\/23 10:38:33 INFO mapred.JobClient: Running job: job_201702221239_0003\r\n17\/02\/23 10:38:34 INFO mapred.JobClient:  map 0% reduce 0%\r\n17\/02\/23 10:38:55 INFO mapred.JobClient:  map 17% reduce 0%\r\n17\/02\/23 10:38:56 INFO mapred.JobClient:  map 33% reduce 0%\r\n17\/02\/23 10:39:08 INFO mapred.JobClient:  map 50% reduce 0%\r\n17\/02\/23 10:39:09 INFO mapred.JobClient:  map 67% reduce 0%\r\n17\/02\/23 10:39:21 INFO mapred.JobClient:  map 83% reduce 0%\r\n17\/02\/23 10:39:22 INFO mapred.JobClient:  map 100% reduce 0%\r\n17\/02\/23 10:39:26 INFO mapred.JobClient: Job complete: job_201702221239_0003\r\n17\/02\/23 10:39:26 INFO mapred.JobClient: Counters: 23\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:   File System Counters\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     FILE: Number of bytes read=0\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     FILE: Number of bytes written=1778658\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     FILE: Number of read operations=0\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     FILE: Number of large read operations=0\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     FILE: Number of write operations=0\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     HDFS: Number of bytes read=791\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     HDFS: Number of bytes written=818\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     HDFS: Number of read operations=6\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     HDFS: Number of large read operations=0\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     HDFS: Number of write operations=6\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:   Job Counters \r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Launched map tasks=6\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=89702\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:   Map-Reduce Framework\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Map input records=8\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Map output records=8\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Input split bytes=791\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Spilled Records=0\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     CPU time spent (ms)=5490\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Physical memory (bytes) snapshot=666267648\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=4423995392\r\n17\/02\/23 10:39:26 INFO mapred.JobClient:     Total committed heap usage (bytes)=191102976\r\n17\/02\/23 10:39:26 INFO mapreduce.ImportJobBase: Transferred 818 bytes in 56.7255 seconds (14.4203 bytes\/sec)\r\n17\/02\/23 10:39:26 INFO mapreduce.ImportJobBase: Retrieved 8 records.\r\n<\/pre>\n<p>Example above dumps table user from mysql DB into hadoop.<br \/>\nFirst connect to DB using <a href=\"https:\/\/sqoop.apache.org\/docs\/1.4.2\/SqoopUserGuide.html#_connecting_to_a_database_server\" target=\"_blank\"><i>&#8211;connect<\/i><\/a><br \/>\n<a href=\"https:\/\/sqoop.apache.org\/docs\/1.4.2\/SqoopUserGuide.html#_syntax_2\" target=\"_blank\"><i>&#8211;username<\/i><\/a> would by the authentication username, <i>-P<\/i> to ask for password at prompt. <i>&#8211;warehouse-dir<\/i> HDFS parent for table destination and <i>&#8211;table<\/i> to select the table to import. <\/p>\n<p>Below dumped content is shown.<\/p>\n<pre id=\"terminal\">hdfs dfs -cat   \/home\/training\/db\/user\/part-m-0000*\r\n127.0.0.1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0\r\nlocalhost,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0\r\nlocalhost.localdomain,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0\r\nlocalhost,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0\r\nlocalhost.localdomain,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0\r\nlocalhost,training,*27CF0BD18BDADD517165824F8C1FFF667B47D04B,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0\r\nlocalhost,hiveuser,*2470C0C06DEE42FD1618BB99005ADCA2EC9D1E19,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0\r\nlocalhost,hue,*15221DE9A04689C4D312DEAC3B87DDF542AF439E,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In the world of Big Data to import data from a DB into HDFS you need Apache Sqoop. sqoop import &#8211;connect jdbc:mysql:\/\/localhost\/mysql &#8211;username training -P &#8211;warehouse-dir \/home\/training\/db &#8211;table user 17\/02\/23 10:38:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0 Enter password: 17\/02\/23 10:38:24 INFO manager.SqlManager: Using default fetchSize of 1000 17\/02\/23 10:38:24 INFO tool.CodeGenTool: Beginning code generation [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[74,3],"tags":[20,23,79],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_shortlink":"https:\/\/wp.me\/pTQgt-m0","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts\/1364"}],"collection":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1364"}],"version-history":[{"count":5,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts\/1364\/revisions"}],"predecessor-version":[{"id":1369,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts\/1364\/revisions\/1369"}],"wp:attachment":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1364"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1364"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1364"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}