October 30, 2010
Perl Sort Hash by Value
Problem
Given a log file similar to Linux syslog format as in the following example:
1 |
Jan 4 03:15:01 pla kernel:pla: Some error message |
Write a Perl script to print the top 5 most repeated error messages.
Solution
Read the file line by line then split each line using “:” as separator then use the error message token as the key in a hash table and the error message count as the corresponding hash value. At the end you will get all error messages and their counts stored in the hash table. After that you need to sort the hash table by value not by key in decreasing order. The final step is to loop through the hash and print the first 5 values.
Code
Here is the code in Perl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
#!/perl/bin/perl # Command line arguments my $Args = "@ARGV"; # log file name if ($Args =~ /--log (S+)/) { $log = $1; } # Hardcoded for testing purpose only $log = "/perl/test/Log.txt"; # Check tool usage if (!(defined($log))) { print "\n"; print "Usage: question_two.pl [--log ]\n"; print "\n"; exit; } print "Please wait...\n\n"; # Open file for reading open (F, $log) or die "can not open file: $!"; # Hash table: # key: log message # value: count my %MessageCount = (); # While there are lines to read while () { # Remove EOL chomp($_); # Split current line into tokens # using : as separator # Message is token 5 @tokens = split(/:/, $_); # Hash message while incrementing # its counter $MessageCount{$tokens[4]}++; } # Close file close(F); # Counter to 5 my $i = 0; # Sort hash by value (recall: not by key) foreach $key (sort{$MessageCount {$b} <=> $MessageCount {$a}} keys %MessageCount) { $i++; print "$key : $MessageCount{$key}\n"; # Exit after printing the 5th most repeated message if ($i == 5) { last; } } |