Tag Archives: linux

Run httpd with docker

So below is the script:

#!/bin/bash

echo "Running httpd with docker."

docker run   --rm -v "$PWD":/usr/local/apache2/htdocs  httpd

We use the following options:
-v, –volume list Bind mount a volume
–rm Automatically remove the container when it exits

Quite simple, right? Execution below.

user@computer:$ bash ~/docker/run_httpd.sh
Running httpd with docker.
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.6. Set the 'ServerName' directive globally to suppress this message
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.6. Set the 'ServerName' directive globally to suppress this message
[Thu Aug 17 22:22:55.249981 2017] [mpm_event:notice] [pid 1:tid 140029488904064] AH00489: Apache/2.4.27 (Unix) configured -- resuming normal operations
[Thu Aug 17 22:22:55.250079 2017] [core:notice] [pid 1:tid 140029488904064] AH00094: Command line: 'httpd -D FOREGROUND'

And we proceed to test.

user@computer:$ curl http://172.17.0.6
<HTML>
<HEAD>
First page
</HEAD>
<BODY>
Testing docker httpd



We are getting below index.html because we are mapping /tmp/httpd (current $PWD) to /usr/local/apache2/htdocs. In /tmp/httpd we created an example index.html as shown above.
More info here

Run python script with docker

So I started playing with docker and asked myself whether it would be possible to run a python script with docker. Well, answer is yes. Example of the script below.

#!/usr/bin/python

import sys
print "Running script!!"
print sys.version_info

Execution below:

user@computer:$ docker run -it --rm --name pythonscript -v "$PWD":/usr/src/myapp -w /usr/src/myapp python:2 python script.py
Running script!!
sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0)

References:
Docker documentation

Apache Flume to write web server logs to Hadoop

In this post we will use flume to dump Apache webserver logs into HDFS. We already have a web server running and flume installed, but we need to configure a target and a source.

We use the following file as target.

## TARGET AGENT ##  
## configuration file location:  /etc/flume-ng/conf
## START Agent: flume-ng agent -c conf -f /etc/flume-ng/conf/flume-trg-agent.conf -n collector

#http://flume.apache.org/FlumeUserGuide.html#avro-source
collector.sources = AvroIn  
collector.sources.AvroIn.type = avro  
collector.sources.AvroIn.bind = 0.0.0.0  
collector.sources.AvroIn.port = 4545  
collector.sources.AvroIn.channels = mc1 mc2

## Channels ##
## Source writes to 2 channels, one for each sink
collector.channels = mc1 mc2

#http://flume.apache.org/FlumeUserGuide.html#memory-channel

collector.channels.mc1.type = memory  
collector.channels.mc1.capacity = 100

collector.channels.mc2.type = memory  
collector.channels.mc2.capacity = 100

## Sinks ##
collector.sinks = LocalOut HadoopOut

## Write copy to Local Filesystem 
#http://flume.apache.org/FlumeUserGuide.html#file-roll-sink
collector.sinks.LocalOut.type = file_roll  
collector.sinks.LocalOut.sink.directory = /var/log/flume-ng  
collector.sinks.LocalOut.sink.rollInterval = 0  
collector.sinks.LocalOut.channel = mc1

## Write to HDFS
#http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
collector.sinks.HadoopOut.type = hdfs  
collector.sinks.HadoopOut.channel = mc2  
collector.sinks.HadoopOut.hdfs.path = /user/training/flume/events/%{log_type}/%y%m%d  
collector.sinks.HadoopOut.hdfs.fileType = DataStream  
collector.sinks.HadoopOut.hdfs.writeFormat = Text  
collector.sinks.HadoopOut.hdfs.rollSize = 0  
collector.sinks.HadoopOut.hdfs.rollCount = 10000  
collector.sinks.HadoopOut.hdfs.rollInterval = 600

Continue reading

Export data from HDFS to MySQL

First create the DB and table where you want to populate.

user@computer:$ echo "create database staff2; use staff2; CREATE TABLE editorial (id INT(100) unsigned not null AUTO_INCREMENT, name VARCHAR(20), email VARCHAR(20), primary key (id));" | mysql -u root -p

Once done, we have the data we want to copy in HDFS.

user@computer:$ hdfs dfs -cat /home/training/staff/editorial/part-m-*
1,Peter,peter@example.com
2,Jack,jack@example.com

Now dump into MySQL using sqoop.

user@computer:$ sqoop export --connect jdbc:mysql://localhost/staff2 --username root -P --table editorial --export-dir /home/training/staff/editorial
17/02/27 12:51:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
Enter password:
17/02/27 12:51:58 INFO manager.SqlManager: Using default fetchSize of 1000
17/02/27 12:51:58 INFO tool.CodeGenTool: Beginning code generation
17/02/27 12:51:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `editorial` AS t LIMIT 1
17/02/27 12:51:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `editorial` AS t LIMIT 1
17/02/27 12:51:59 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
Note: /tmp/sqoop-training/compile/e560499b42a9738bbc5ef127712adc7b/editorial.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/02/27 12:52:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-training/compile/e560499b42a9738bbc5ef127712adc7b/editorial.jar
17/02/27 12:52:03 INFO mapreduce.ExportJobBase: Beginning export of editorial
17/02/27 12:52:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
17/02/27 12:52:08 INFO input.FileInputFormat: Total input paths to process : 2
17/02/27 12:52:08 INFO input.FileInputFormat: Total input paths to process : 2
17/02/27 12:52:09 INFO mapred.JobClient: Running job: job_201702221239_0006
17/02/27 12:52:10 INFO mapred.JobClient: map 0% reduce 0%
17/02/27 12:52:31 INFO mapred.JobClient: map 50% reduce 0%
17/02/27 12:52:45 INFO mapred.JobClient: map 100% reduce 0%
17/02/27 12:52:49 INFO mapred.JobClient: Job complete: job_201702221239_0006
17/02/27 12:52:49 INFO mapred.JobClient: Counters: 24
17/02/27 12:52:49 INFO mapred.JobClient: File System Counters
17/02/27 12:52:49 INFO mapred.JobClient: FILE: Number of bytes read=0
17/02/27 12:52:49 INFO mapred.JobClient: FILE: Number of bytes written=1176756
17/02/27 12:52:49 INFO mapred.JobClient: FILE: Number of read operations=0
17/02/27 12:52:49 INFO mapred.JobClient: FILE: Number of large read operations=0
17/02/27 12:52:49 INFO mapred.JobClient: FILE: Number of write operations=0
17/02/27 12:52:49 INFO mapred.JobClient: HDFS: Number of bytes read=759
17/02/27 12:52:49 INFO mapred.JobClient: HDFS: Number of bytes written=0
17/02/27 12:52:49 INFO mapred.JobClient: HDFS: Number of read operations=19
17/02/27 12:52:49 INFO mapred.JobClient: HDFS: Number of large read operations=0
17/02/27 12:52:49 INFO mapred.JobClient: HDFS: Number of write operations=0
17/02/27 12:52:49 INFO mapred.JobClient: Job Counters
17/02/27 12:52:49 INFO mapred.JobClient: Launched map tasks=4
17/02/27 12:52:49 INFO mapred.JobClient: Data-local map tasks=4
17/02/27 12:52:49 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=64216
17/02/27 12:52:49 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
17/02/27 12:52:49 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
17/02/27 12:52:49 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
17/02/27 12:52:49 INFO mapred.JobClient: Map-Reduce Framework
17/02/27 12:52:49 INFO mapred.JobClient: Map input records=2
17/02/27 12:52:49 INFO mapred.JobClient: Map output records=2
17/02/27 12:52:49 INFO mapred.JobClient: Input split bytes=661
17/02/27 12:52:49 INFO mapred.JobClient: Spilled Records=0
17/02/27 12:52:49 INFO mapred.JobClient: CPU time spent (ms)=3390
17/02/27 12:52:49 INFO mapred.JobClient: Physical memory (bytes) snapshot=422584320
17/02/27 12:52:49 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2940895232
17/02/27 12:52:49 INFO mapred.JobClient: Total committed heap usage (bytes)=127401984
17/02/27 12:52:49 INFO mapreduce.ExportJobBase: Transferred 759 bytes in 42.9426 seconds (17.6748 bytes/sec)
17/02/27 12:52:49 INFO mapreduce.ExportJobBase: Exported 2 records.

Now we can see the content in MySQL DB named staff2.

user@computer:$ echo "use staff2; SELECT * FROM editorial;" | mysql -u root -p
Enter password:
id name email
1 Peter peter@example.com
2 Jack jack@example.com