{"id":1371,"date":"2017-02-27T16:02:57","date_gmt":"2017-02-27T21:02:57","guid":{"rendered":"http:\/\/www.xavignu.com\/?p=1371"},"modified":"2017-02-28T11:18:21","modified_gmt":"2017-02-28T16:18:21","slug":"export-data-from-hdfs-to-mysql","status":"publish","type":"post","link":"https:\/\/www.xavignu.com\/?p=1371","title":{"rendered":"Export data from HDFS to MySQL"},"content":{"rendered":"<p>First create the DB and table where you want to populate.<\/p>\n<pre id=\"terminal\">echo \"create database staff2; use staff2; CREATE TABLE editorial (id INT(100) unsigned not null AUTO_INCREMENT, name VARCHAR(20), email VARCHAR(20), primary key (id));\" | mysql -u root -p\r\n<\/pre>\n<p>Once done, we have the data we want to copy in <a href=\"https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-project-dist\/hadoop-hdfs\/HdfsUserGuide.html\" target=\"_blank\">HDFS<\/a>.<\/p>\n<pre id=\"terminal\">hdfs dfs -cat \/home\/training\/staff\/editorial\/part-m-*\r\n1,Peter,peter@example.com\r\n2,Jack,jack@example.com\r\n<\/pre>\n<p>Now dump into <a href=\"https:\/\/www.mysql.com\/\" target=\"_blank\">MySQL<\/a> using <a href=\"http:\/\/sqoop.apache.org\/\" target=\"_blank\">sqoop<\/a>.<\/p>\n<pre id=\"terminal\">sqoop export --connect jdbc:mysql:\/\/localhost\/staff2 --username root -P --table editorial --export-dir \/home\/training\/staff\/editorial\r\n17\/02\/27 12:51:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0\r\nEnter password: \r\n17\/02\/27 12:51:58 INFO manager.SqlManager: Using default fetchSize of 1000\r\n17\/02\/27 12:51:58 INFO tool.CodeGenTool: Beginning code generation\r\n17\/02\/27 12:51:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `editorial` AS t LIMIT 1\r\n17\/02\/27 12:51:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `editorial` AS t LIMIT 1\r\n17\/02\/27 12:51:59 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is \/usr\/lib\/hadoop-0.20-mapreduce\r\nNote: \/tmp\/sqoop-training\/compile\/e560499b42a9738bbc5ef127712adc7b\/editorial.java uses or overrides a deprecated API.\r\nNote: Recompile with -Xlint:deprecation for details.\r\n17\/02\/27 12:52:03 INFO orm.CompilationManager: Writing jar file: \/tmp\/sqoop-training\/compile\/e560499b42a9738bbc5ef127712adc7b\/editorial.jar\r\n17\/02\/27 12:52:03 INFO mapreduce.ExportJobBase: Beginning export of editorial\r\n17\/02\/27 12:52:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.\r\n17\/02\/27 12:52:08 INFO input.FileInputFormat: Total input paths to process : 2\r\n17\/02\/27 12:52:08 INFO input.FileInputFormat: Total input paths to process : 2\r\n17\/02\/27 12:52:09 INFO mapred.JobClient: Running job: job_201702221239_0006\r\n17\/02\/27 12:52:10 INFO mapred.JobClient:  map 0% reduce 0%\r\n17\/02\/27 12:52:31 INFO mapred.JobClient:  map 50% reduce 0%\r\n17\/02\/27 12:52:45 INFO mapred.JobClient:  map 100% reduce 0%\r\n17\/02\/27 12:52:49 INFO mapred.JobClient: Job complete: job_201702221239_0006\r\n17\/02\/27 12:52:49 INFO mapred.JobClient: Counters: 24\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:   File System Counters\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     FILE: Number of bytes read=0\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     FILE: Number of bytes written=1176756\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     FILE: Number of read operations=0\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     FILE: Number of large read operations=0\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     FILE: Number of write operations=0\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of bytes read=759\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of bytes written=0\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of read operations=19\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of large read operations=0\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of write operations=0\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:   Job Counters \r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Launched map tasks=4\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Data-local map tasks=4\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=64216\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:   Map-Reduce Framework\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Map input records=2\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Map output records=2\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Input split bytes=661\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Spilled Records=0\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     CPU time spent (ms)=3390\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Physical memory (bytes) snapshot=422584320\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2940895232\r\n17\/02\/27 12:52:49 INFO mapred.JobClient:     Total committed heap usage (bytes)=127401984\r\n17\/02\/27 12:52:49 INFO mapreduce.ExportJobBase: Transferred 759 bytes in 42.9426 seconds (17.6748 bytes\/sec)\r\n17\/02\/27 12:52:49 INFO mapreduce.ExportJobBase: Exported 2 records.\r\n<\/pre>\n<p>Now we can see the content in MySQL DB named staff2.<\/p>\n<pre id=\"terminal\">echo \"use staff2; SELECT * FROM editorial;\" | mysql -u root -p \r\nEnter password: \r\nid\tname\temail\r\n1\tPeter\tpeter@example.com\r\n2\tJack\tjack@example.com\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>First create the DB and table where you want to populate. echo &#8220;create database staff2; use staff2; CREATE TABLE editorial (id INT(100) unsigned not null AUTO_INCREMENT, name VARCHAR(20), email VARCHAR(20), primary key (id));&#8221; | mysql -u root -p Once done, we have the data we want to copy in HDFS. hdfs dfs -cat \/home\/training\/staff\/editorial\/part-m-* 1,Peter,peter@example.com [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[74],"tags":[20,22,6,51,23,70],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_shortlink":"https:\/\/wp.me\/pTQgt-m7","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts\/1371"}],"collection":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1371"}],"version-history":[{"count":12,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts\/1371\/revisions"}],"predecessor-version":[{"id":1383,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts\/1371\/revisions\/1383"}],"wp:attachment":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1371"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1371"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1371"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}