Tag Archives: apache

Spamassassin logs to a different file

Lately I’ve been looking at my mail logs and noticed that on them it gets written spam, postfix and dovecot messages among other, which is a little uncomfortable. We can use grep to filter, but perhaps a better approach is to send spamassassin logs to a different file.

tail /var/log/mail.log
Oct 16 17:13:11 myserver postfix/smtpd[29869]: connect from unknown[185.234.219.254]
Oct 16 17:13:11 myserver postfix/smtpd[29869]: warning: unknown[185.234.219.254]: SASL LOGIN authentication failed: Invalid authentication mechanism
Oct 16 17:13:11 myserver postfix/smtpd[29869]: lost connection after AUTH from unknown[185.234.219.254]
Oct 16 17:13:11 myserver postfix/smtpd[29869]: disconnect from unknown[185.234.219.254]
Oct 16 17:15:53 myserver postfix/smtpd[29896]: warning: hostname 204.152.209.101.static.quadranet.com does not resolve to address 204.152.209.101
Oct 16 17:15:53 myserver postfix/smtpd[29896]: connect from unknown[204.152.209.101]
Oct 16 17:15:53 myserver postfix/smtpd[29896]: warning: unknown[204.152.209.101]: SASL LOGIN authentication failed: Invalid authentication mechanism
Oct 16 17:15:54 myserver postfix/smtpd[29896]: disconnect from unknown[204.152.209.101]
Oct 16 17:16:41 myserver postfix/smtpd[29896]: warning: hostname unassigned.quadranet.com does not resolve to address 192.161.170.229
Oct 16 17:16:41 myserver postfix/smtpd[29896]: connect from unknown[192.161.170.229]
Oct 16 17:16:42 myserver postfix/smtpd[29896]: Anonymous TLS connection established from unknown[192.161.170.229]: TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)
Oct 16 17:17:03 myserver postfix/smtpd[29896]: warning: eotzqh.kylieslowcase.fun.blackhole.securitysage.com: RBL lookup error: Host or domain name not found. Name service error for name=eotzqh.kylieslowcase.fun.blackhole.securitysage.com type=A: Host not found, try again
Oct 16 17:17:05 myserver postfix/smtpd[29896]: DA88C2607C5: client=unknown[192.161.170.229]
Oct 16 17:17:06 myserver postfix/cleanup[29903]: DA88C2607C5: message-id=
Oct 16 17:17:06 myserver postfix/qmgr[1238]: DA88C2607C5: from=, size=32258, nrcpt=1 (queue active)
Oct 16 17:17:06 myserver postfix/smtpd[29896]: disconnect from unknown[192.161.170.229]
Oct 16 17:17:08 myserver postfix/pickup[29368]: 3D6B4260878: uid=5001 from=
Oct 16 17:17:08 myserver postfix/pipe[29904]: DA88C2607C5: to=, relay=spamassassin, delay=26, delays=24/0.01/0/1.9, dsn=2.0.0, status=sent (delivered via spamassassin service)
Oct 16 17:17:08 myserver postfix/qmgr[1238]: DA88C2607C5: removed
Oct 16 17:17:08 myserver postfix/cleanup[29903]: 3D6B4260878: message-id=
Oct 16 17:17:08 myserver postfix/qmgr[1238]: 3D6B4260878: from=, size=32672, nrcpt=1 (queue active)
Oct 16 17:17:08 myserver postfix/pipe[29908]: 3D6B4260878: to=, relay=dovecot, delay=0.22, delays=0.07/0.01/0/0.14, dsn=2.0.0, status=sent (delivered via dovecot service)
Oct 16 17:17:08 myserver postfix/qmgr[1238]: 3D6B4260878: removed
Oct 16 17:17:51 myserver postfix/smtpd[29896]: connect from unknown[185.36.81.87]
Oct 16 17:17:51 myserver postfix/smtpd[29896]: warning: unknown[185.36.81.87]: SASL LOGIN authentication failed: Invalid authentication mechanism
Oct 16 17:17:51 myserver postfix/smtpd[29896]: lost connection after AUTH from unknown[185.36.81.87]
Oct 16 17:17:51 myserver postfix/smtpd[29896]: disconnect from unknown[185.36.81.87]
Oct 16 17:18:49 myserver postfix/smtpd[29896]: connect from unknown[187.55.179.130]
Oct 16 17:18:50 myserver postfix/smtpd[29896]: warning: unknown[187.55.179.130]: SASL LOGIN authentication failed: Invalid authentication mechanism
Oct 16 17:18:50 myserver postfix/smtpd[29896]: lost connection after AUTH from unknown[187.55.179.130]
Oct 16 17:18:50 myserver postfix/smtpd[29896]: disconnect from unknown[187.55.179.130]
Oct 16 17:19:50 myserver postfix/smtpd[29896]: warning: hostname 204.152.209.101.static.quadranet.com does not resolve to address 204.152.209.101
Oct 16 17:19:50 myserver postfix/smtpd[29896]: connect from unknown[204.152.209.101]
Oct 16 17:19:51 myserver postfix/smtpd[29896]: warning: unknown[204.152.209.101]: SASL LOGIN authentication failed: Invalid authentication mechanism
Oct 16 17:19:51 myserver postfix/smtpd[29896]: disconnect from unknown[204.152.209.101]
Oct 16 17:21:21 myserver postfix/anvil[29719]: statistics: max connection rate 1/60s for (smtp:177.143.199.94) at Oct 16 17:12:24

Continue reading

Primitive way with Folium

So I discovered Folium about two months ago and decided to map the primitive way with it. Coordinates data is retrieved from Strava gpx files and cleaned up leaving only latitude and longitude as below.

head Camin_prim_stage1.csv
lat,lon
43.3111770,-5.6941620
43.3113360,-5.6943420
43.3114370,-5.6944600
43.3115000,-5.6945420
43.3116970,-5.6948090
43.3119110,-5.6950900
43.3122360,-5.6956830
43.3123220,-5.6958090
43.3126840,-5.6963740

Below is the python file we will use to retrieve data and create the map with the routes.

import folium
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
spark = SparkSession.builder.master("local").getOrCreate()

# Change Spark loglevel
spark.sparkContext.setLogLevel('FATAL')

# Load the rides and ride_routes data from local instead of HDFS
position1 = spark.read.load("/home/user/Camin_prim_stage1.csv", format="csv", sep=",", inferSchema="true", header="true")
position2 = spark.read.load("/home/user/Camin_prim_stage2.csv", format="csv", sep=",", inferSchema="true", header="true")
position3 = spark.read.load("/home/user/Camin_prim_stage3.csv", format="csv", sep=",", inferSchema="true", header="true")

position = [position1, position2, position3]

m = folium.Map()
col=0
colArray=['red','blue','green']

# Check file was correctly loaded
for x in position:
# x.printSchema()
# x.show(2)

# Map position
coordinates = [[float(i.lat), float(i.lon)] for i in x.collect()]

# Make a Folium map
#m = folium.Map()
m.fit_bounds(coordinates, padding=(25, 25))
folium.PolyLine(locations=coordinates, weight=5, color=colArray[col]).add_to(m)
folium.Marker(coordinates[1], popup="Origin").add_to(m)
folium.Marker(coordinates[-1], popup="Destination").add_to(m)
col = col + 1
# Save to an html file
m.save('chamin_prim.html')

# Cleanup
spark.stop()

Continue reading

Playing with functions in hive

Apache Hive has built in functions which can be listed with

SHOW FUNCTIONS;

to play with concat we will run the following script.

-- Use testdb
use testdb;
-- describe concat function
DESC FUNCTION concat;
-- describe table mytable2
DESC mytable2;
-- Perform select query uniting fname and lname
SELECT CONCAT(fname,' ',lname) FROM mytable2;

We can execute with beeline or hive. We will use beeline.

beeline -u jdbc:hive2://localhost:10000 -f Documents/concat.hql --verbose=false --showWarnings=false
scan complete in 8ms
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> use testdb;
No rows affected (0.104 seconds)
0: jdbc:hive2://localhost:10000> -- Describe concat function
0: jdbc:hive2://localhost:10000> DESC FUNCTION concat;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
|                                                                                          tab_name                                                                                           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| concat(str1, str2, ... strN) - returns the concatenation of str1, str2, ... strN or concat(bin1, bin2, ... binN) - returns the concatenation of bytes in binary data  bin1, bin2, ... binN  |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
1 row selected (0.162 seconds)
0: jdbc:hive2://localhost:10000> 
0: jdbc:hive2://localhost:10000> DESC mytable2;
+-----------+------------+----------+--+
| col_name  | data_type  | comment  |
+-----------+------------+----------+--+
| id        | int        |          |
| lname     | string     |          |
| fname     | string     |          |
+-----------+------------+----------+--+
3 rows selected (0.133 seconds)
0: jdbc:hive2://localhost:10000> 
0: jdbc:hive2://localhost:10000> SELECT CONCAT(fname,' ',lname) FROM mytable2;
+--------------------+--+
|        _c0         |
+--------------------+--+
| John Doe           |
| William Lancaster  |
| Burp Gentoo        |
+--------------------+--+
3 rows selected (18.848 seconds)
0: jdbc:hive2://localhost:10000> 
Closing: 0: jdbc:hive2://localhost:10000

We can also play with functions from inside hive cli as shown below with the sqrt function.

hive> DESC function sqrt;
OK
sqrt(x) - returns the square root of x
Time taken: 0.018 seconds, Fetched: 1 row(s)
hive> SELECT SQRT(64);        
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201710131004_0373, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201710131004_0373
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_201710131004_0373
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-12-06 12:33:16,450 Stage-1 map = 0%,  reduce = 0%
2017-12-06 12:33:23,476 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.58 sec
2017-12-06 12:33:28,497 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.58 sec
MapReduce Total cumulative CPU time: 1 seconds 580 msec
Ended Job = job_201710131004_0373
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.58 sec   HDFS Read: 273 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 580 msec
OK
8.0
Time taken: 18.894 seconds, Fetched: 1 row(s)
hive> 

More info about Hive functions here.

Apache Flume to write web server logs to Hadoop

In this post we will use flume to dump Apache webserver logs into HDFS. We already have a web server running and flume installed, but we need to configure a target and a source.

We use the following file as target.

## TARGET AGENT ##  
## configuration file location:  /etc/flume-ng/conf
## START Agent: flume-ng agent -c conf -f /etc/flume-ng/conf/flume-trg-agent.conf -n collector

#http://flume.apache.org/FlumeUserGuide.html#avro-source
collector.sources = AvroIn  
collector.sources.AvroIn.type = avro  
collector.sources.AvroIn.bind = 0.0.0.0  
collector.sources.AvroIn.port = 4545  
collector.sources.AvroIn.channels = mc1 mc2

## Channels ##
## Source writes to 2 channels, one for each sink
collector.channels = mc1 mc2

#http://flume.apache.org/FlumeUserGuide.html#memory-channel

collector.channels.mc1.type = memory  
collector.channels.mc1.capacity = 100

collector.channels.mc2.type = memory  
collector.channels.mc2.capacity = 100

## Sinks ##
collector.sinks = LocalOut HadoopOut

## Write copy to Local Filesystem 
#http://flume.apache.org/FlumeUserGuide.html#file-roll-sink
collector.sinks.LocalOut.type = file_roll  
collector.sinks.LocalOut.sink.directory = /var/log/flume-ng  
collector.sinks.LocalOut.sink.rollInterval = 0  
collector.sinks.LocalOut.channel = mc1

## Write to HDFS
#http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
collector.sinks.HadoopOut.type = hdfs  
collector.sinks.HadoopOut.channel = mc2  
collector.sinks.HadoopOut.hdfs.path = /user/training/flume/events/%{log_type}/%y%m%d  
collector.sinks.HadoopOut.hdfs.fileType = DataStream  
collector.sinks.HadoopOut.hdfs.writeFormat = Text  
collector.sinks.HadoopOut.hdfs.rollSize = 0  
collector.sinks.HadoopOut.hdfs.rollCount = 10000  
collector.sinks.HadoopOut.hdfs.rollInterval = 600

Continue reading

Export data from HDFS to MySQL

First create the DB and table where you want to populate.

echo "create database staff2; use staff2; CREATE TABLE editorial (id INT(100) unsigned not null AUTO_INCREMENT, name VARCHAR(20), email VARCHAR(20), primary key (id));" | mysql -u root -p

Once done, we have the data we want to copy in HDFS.

hdfs dfs -cat /home/training/staff/editorial/part-m-*
1,Peter,peter@example.com
2,Jack,jack@example.com

Now dump into MySQL using sqoop.

sqoop export --connect jdbc:mysql://localhost/staff2 --username root -P --table editorial --export-dir /home/training/staff/editorial
17/02/27 12:51:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
Enter password: 
17/02/27 12:51:58 INFO manager.SqlManager: Using default fetchSize of 1000
17/02/27 12:51:58 INFO tool.CodeGenTool: Beginning code generation
17/02/27 12:51:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `editorial` AS t LIMIT 1
17/02/27 12:51:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `editorial` AS t LIMIT 1
17/02/27 12:51:59 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
Note: /tmp/sqoop-training/compile/e560499b42a9738bbc5ef127712adc7b/editorial.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/02/27 12:52:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-training/compile/e560499b42a9738bbc5ef127712adc7b/editorial.jar
17/02/27 12:52:03 INFO mapreduce.ExportJobBase: Beginning export of editorial
17/02/27 12:52:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
17/02/27 12:52:08 INFO input.FileInputFormat: Total input paths to process : 2
17/02/27 12:52:08 INFO input.FileInputFormat: Total input paths to process : 2
17/02/27 12:52:09 INFO mapred.JobClient: Running job: job_201702221239_0006
17/02/27 12:52:10 INFO mapred.JobClient:  map 0% reduce 0%
17/02/27 12:52:31 INFO mapred.JobClient:  map 50% reduce 0%
17/02/27 12:52:45 INFO mapred.JobClient:  map 100% reduce 0%
17/02/27 12:52:49 INFO mapred.JobClient: Job complete: job_201702221239_0006
17/02/27 12:52:49 INFO mapred.JobClient: Counters: 24
17/02/27 12:52:49 INFO mapred.JobClient:   File System Counters
17/02/27 12:52:49 INFO mapred.JobClient:     FILE: Number of bytes read=0
17/02/27 12:52:49 INFO mapred.JobClient:     FILE: Number of bytes written=1176756
17/02/27 12:52:49 INFO mapred.JobClient:     FILE: Number of read operations=0
17/02/27 12:52:49 INFO mapred.JobClient:     FILE: Number of large read operations=0
17/02/27 12:52:49 INFO mapred.JobClient:     FILE: Number of write operations=0
17/02/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of bytes read=759
17/02/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of bytes written=0
17/02/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of read operations=19
17/02/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of large read operations=0
17/02/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of write operations=0
17/02/27 12:52:49 INFO mapred.JobClient:   Job Counters 
17/02/27 12:52:49 INFO mapred.JobClient:     Launched map tasks=4
17/02/27 12:52:49 INFO mapred.JobClient:     Data-local map tasks=4
17/02/27 12:52:49 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=64216
17/02/27 12:52:49 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
17/02/27 12:52:49 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
17/02/27 12:52:49 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
17/02/27 12:52:49 INFO mapred.JobClient:   Map-Reduce Framework
17/02/27 12:52:49 INFO mapred.JobClient:     Map input records=2
17/02/27 12:52:49 INFO mapred.JobClient:     Map output records=2
17/02/27 12:52:49 INFO mapred.JobClient:     Input split bytes=661
17/02/27 12:52:49 INFO mapred.JobClient:     Spilled Records=0
17/02/27 12:52:49 INFO mapred.JobClient:     CPU time spent (ms)=3390
17/02/27 12:52:49 INFO mapred.JobClient:     Physical memory (bytes) snapshot=422584320
17/02/27 12:52:49 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2940895232
17/02/27 12:52:49 INFO mapred.JobClient:     Total committed heap usage (bytes)=127401984
17/02/27 12:52:49 INFO mapreduce.ExportJobBase: Transferred 759 bytes in 42.9426 seconds (17.6748 bytes/sec)
17/02/27 12:52:49 INFO mapreduce.ExportJobBase: Exported 2 records.

Now we can see the content in MySQL DB named staff2.

echo "use staff2; SELECT * FROM editorial;" | mysql -u root -p 
Enter password: 
id	name	email
1	Peter	peter@example.com
2	Jack	jack@example.com