Tag Archives: hadoop

Create a new table with Apache Hive

We are going to create a new table with Apache Hive from a previous one, populate it and then perform a UNION ALL of both tables. Below is the script that will create the new table.

-- Below script creates a new table
USE testdb;
-- show current tables
SHOW tables;
-- describe mytable2, table we will use to create mytable4
DESCRIBE mytable2;
-- create new table copying format from mytable2
CREATE TABLE mytable4 LIKE mytable2 ;

SHOW tables;
-- describe newly created table
DESCRIBE mytable4;
-- select content from newly created table
SELECT * FROM mytable4;

We proceed executing via hive in a linux shell.

hive    -f  create-new-table.hql

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 0.898 seconds
OK
mytable
mytable2
newtable3
Time taken: 0.206 seconds, Fetched: 3 row(s)
OK
id                  	int                 	                    
lname               	string              	                    
fname               	string              	                    
Time taken: 0.263 seconds, Fetched: 3 row(s)
OK
Time taken: 0.272 seconds
OK
mytable
mytable2
mytable4
newtable3
Time taken: 0.043 seconds, Fetched: 4 row(s)
OK
id                  	int                 	                    
lname               	string              	                    
fname               	string              	                    
Time taken: 0.166 seconds, Fetched: 3 row(s)
OK
Time taken: 0.666 seconds

Continue reading →

Listing namenodes and datanodes in Hadoop

Apache Flume to write web server logs to Hadoop

Leave a reply

In this post we will use flume to dump Apache webserver logs into HDFS. We already have a web server running and flume installed, but we need to configure a target and a source.

We use the following file as target.

## TARGET AGENT ##  
## configuration file location:  /etc/flume-ng/conf
## START Agent: flume-ng agent -c conf -f /etc/flume-ng/conf/flume-trg-agent.conf -n collector

#http://flume.apache.org/FlumeUserGuide.html#avro-source
collector.sources = AvroIn  
collector.sources.AvroIn.type = avro  
collector.sources.AvroIn.bind = 0.0.0.0  
collector.sources.AvroIn.port = 4545  
collector.sources.AvroIn.channels = mc1 mc2

## Channels ##
## Source writes to 2 channels, one for each sink
collector.channels = mc1 mc2

#http://flume.apache.org/FlumeUserGuide.html#memory-channel

collector.channels.mc1.type = memory  
collector.channels.mc1.capacity = 100

collector.channels.mc2.type = memory  
collector.channels.mc2.capacity = 100

## Sinks ##
collector.sinks = LocalOut HadoopOut

## Write copy to Local Filesystem 
#http://flume.apache.org/FlumeUserGuide.html#file-roll-sink
collector.sinks.LocalOut.type = file_roll  
collector.sinks.LocalOut.sink.directory = /var/log/flume-ng  
collector.sinks.LocalOut.sink.rollInterval = 0  
collector.sinks.LocalOut.channel = mc1

## Write to HDFS
#http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
collector.sinks.HadoopOut.type = hdfs  
collector.sinks.HadoopOut.channel = mc2  
collector.sinks.HadoopOut.hdfs.path = /user/training/flume/events/%{log_type}/%y%m%d  
collector.sinks.HadoopOut.hdfs.fileType = DataStream  
collector.sinks.HadoopOut.hdfs.writeFormat = Text  
collector.sinks.HadoopOut.hdfs.rollSize = 0  
collector.sinks.HadoopOut.hdfs.rollCount = 10000  
collector.sinks.HadoopOut.hdfs.rollInterval = 600

Continue reading →

XaviGNU

Me and my world

Tag Archives: hadoop

Create a new table with Apache Hive

Listing namenodes and datanodes in Hadoop

Apache Flume to write web server logs to Hadoop