Perl Script to Parse Log File
Problem
You have a log file on UNIX operating system. Each line in the log file contains an IP address in the 10th column where columns (or fields) are separated by a space. Write a Perl program to parse the log file and print the unique IP addresses in the log file
Solution
Read the file line by line then split the current line into an array of tokens using space as separator. Push the 10th token into a result array. To print the unique IP addresses in the result array I will suggest two methods. You can sort the array which is O(nlogn) operation then you go through the sorted array element by element while printing the current element as long as it is different from the previously printed element.
Code
Here is the code in Perl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
#!/usr/bin/perl # Open log file open(IN, "mylog.txt") or die "can not open file"; # Array to store final list of unique IPs my @uniqueIPs = (); # Read log file line by line while () { # Current line is stored in $_ # split current line into tokens @line = split(" ", $_); # Extract the 10th token which is the IP address # and push to the result array push(@uniqueIPs, $line[9]); } # Sort the result array @uniqueIPs = sort @uniqueIPs; # Print the first IP address my $ip = $uniqueIPs[0]; print $ip . "\\n"; # Go through the list of IPs # compare the current IP to the previous # one and print it if different for ($i = 1; $i < $#uniqueIPs; $i++) { if ($uniqueIPs[$i] ne $ip) { print $uniqueIPs[$i] . "\\n"; $ip = $uniqueIPs[$i]; } } |
Solution
The other way is to hash the IP addresses where the key is the IP itself and the value is the count. Once the hash table is fully populated you go through the hash and print the keys
Code
Here is the code in Perl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
#!/usr/bin/perl # Open log file open(IN, "mylog.txt") or die "can not open file"; # Hash to store final list of unique IPs my %uniqueIPs = (); # Read log file line by line while () { # Current line is stored in $_ # split current line into tokens @line = split(" ", $_); # Extract the 10th token which is the IP address # then hash it. The ip (token) is the hash key # and the hash value is not important. I hardcoded # the value to 1 $uniqueIPs{$line[9]} = 1; } # Go through the hash table and print the keys # which are the unique IPs for $ip (keys %uniqueIPs) { print $ip . "\\n"; } |