Для таких логов:
PHP код:
192.168.1.100 - - [29/Apr/2011:06:02:54 +0400] "GET / HTTP/1.1" 500 612 "http://192.168.1.10/login" "Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0"
192.168.1.100 - - [29/Apr/2011:06:03:32 +0400] "GET / HTTP/1.1" 200 4463 "http://192.168.1.10/login" "Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0"
192.168.1.100 - - [29/Apr/2011:06:03:34 +0400] "GET /static/simple.css HTTP/1.1" 304 - "http://192.168.1.10/" "Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0"
192.168.1.100 - - [29/Apr/2011:06:03:37 +0400] "GET /static/arrow.png HTTP/1.1" 404 19 "http://192.168.1.10/static/simple.css" "Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0"
192.168.1.100 - - [29/Apr/2011:06:06:45 +0400] "GET / HTTP/1.1" 500 612 "http://192.168.1.10/login" "Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0"
192.168.1.100 - - [29/Apr/2011:06:09:11 +0400] "GET / HTTP/1.1" 200 4463 "http://192.168.1.10/login" "Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0"
192.168.1.100 - - [29/Apr/2011:06:09:14 +0400] "GET /static/simple.css HTTP/1.1" 304 - "http://192.168.1.10/" "Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0"
192.168.1.100 - - [29/Apr/2011:06:09:16 +0400] "GET /static/arrow.png HTTP/1.1" 404 19 "http://192.168.1.10/static/simple.css" "Mozilla/5.0 (Windows NT 5.1; rv:2.0) Gecko/20100101 Firefox/4.0"
Вроде как-то так:
PHP код:
#!/usr/local/bin/python
# encoding: utf-8
import re
rf = re.compile('"[^"]+?\(([^\);]+)[^"]+?\)[^"]+?([^\s]+)"$')
browser = {}
os = {}
error = 0
with open('logs.log', 'r') as f:
for line in f:
r = rf.search(line)
if not r:
error += 1
continue
cur_os = r.group(1)
cur_browser = r.group(2)
os[cur_os] = os[cur_os]+1 if os.has_key(cur_os) else 0
browser[cur_browser] = browser[cur_browser]+1 if browser.has_key(cur_browser) else 0
for cur_os, cur_count in sorted(os.items(), cmp=lambda x,y: cmp(x[1], y[1]), reverse=True):
print "%s\t%s" % (cur_count, cur_os)
print "\n\n"
for cur_browser, cur_count in sorted(browser.items(), cmp=lambda x,y: cmp(x[1], y[1]), reverse=True):
print "%s\t%s" % (cur_count, cur_browser)
Проверь у себя на кусочке, если что подкрути регулярку.