нужен скрипт, за плату - Форум успешных вебмастеров - GoFuckBiz.com

Pakotorn · 31.12.2012, 12:58

Есть текст

Код:

...
9|"http://domain.com"
0|http://domain.com/
1|http://twitter.com/index.html
2|http://yahoo.com.com/query.php?query=yahoo
3|http://domain.com/index.php?get=someget
4|http://www.google.com/index.htm
5|http://www.facebook.com/index.php
6|http://domain.com/w21d32/index.html?get=23
7|http://facebook.com/permalink.php?story_fbid=273
8|http://domaincrawler.com/domain.com/
countend
10|"http://domain2.com"
0|http://twitter.com/index.html
1|http://domain2.com/
2|http://www.facebook.com/index.php
3|http://domain2.com/index.htm
4|http://domain2.com/index.php?get=someget
5|http://domaincatalog.com/domain2.com/
6|http://yahoo.com.com/query.php?query=yahoo
7|http://domain2.com/w21d32/index.html?get=23
8|http://facebook.com/permalink.php?story_fbid=273
9|http://www.google.com/index.htm
countend
..

Множество диапазонов строк от URL в кавычках до countend. Нужно из диапазона убрать строки содержащие domain.com (URL в кавычках без http://), оставив строку с URL в кавычках.

Цена: 10WMZ.

Всемх с новым годом 8)!

DoctorFake · 02.01.2013, 20:19

Пиздец, страсти...

Barbados · 02.01.2013, 20:31

попробуй плз вот этот код

Код:

#!/usr/bin/python2.6
# -*- coding: utf-8 -*-
import os, sys, re

in_file = 'text.txt'
out_file = 'text2.txt'


f = open(in_file)

while True:
	acc = ''
	for line in f.xreadlines():
		acc += line
		if line.strip() == 'countend':
			break
	if acc.strip() == "":
		break
	m = re.findall(r'(?ism)(\d*?)\|"http:\/\/(.*?)"\n(.*?)countend', acc)
	if m:
		for cn in m:
			ln = str(cn[2]).strip().split("\n")
			i = len(ln)-1
			lout = ''
			while i:
				if not cn[1] in ln[i]:
					lout += ln[i] + "\n"
				i -= 1
			#out += cn[0] + '|"http://'+ cn[1] +'"\n' +  lout + "countend\n"
			fp = open(out_file, "a")
			print>>fp, cn[0] + '|"http://'+ cn[1] +'"\n' +  lout + "countend"
			fp.close()

~~Алёша~~ · 02.01.2013, 23:15

Цитата:

Сообщение от Pakotorn

Мне надо чтоб много доменов убрал,а не один, так я и grepом мог бы

в моём скрипте он убирает не один домен, а 3, которые domain.com и без кавычек

Цитата:

Нужно из диапазона убрать строки содержащие domain.com

что просил, то и получил
а если те надо ещё и domain2.com, то пиши ТЗ правильно
для компа это огромная разница блеать

на те на грепе

grep -vE "\|http:\/\/domain([1-9]+)?\.com" sourse_file.txt

masolit · 03.01.2013, 01:15

накидал быдло код на коленке

PHP код:


			
#!/usr/bin/env python

# -*- coding: utf-8 -*-

import re



in_file = 'my_file.txt'

out_file = 'my_out_file.txt'



pattern = re.compile(r'\d+\|"http://(.*)"')



with open(in_file) as file_name, open(out_file, 'w') as output_file:

    for line in file_name:

        if '"' in line:

            search_domain = pattern.search(line)

            if search_domain:

                search_domain = search_domain.group(1)

        if search_domain and search_domain not in line or '"' in line:

            output_file.write(line)

masolit · 03.01.2013, 13:39

не много навел "красоты"

PHP код:


			
#!/usr/bin/env python

# -*- coding: utf-8 -*-

import re



in_file = 'my_file.txt'

out_file = 'my_out_file.txt'



with open(in_file) as input_file, open(out_file, 'w') as output_file:

    pattern = re.compile(r'\d+\|"http://(.*)"')

    search_domain = None

    for line in input_file:

        if '"' in line:

            match = pattern.search(line)

            if match:

                try:

                    search_domain = match.group(1)

                except IndexError:

                    search_domain = None

        if isinstance(search_domain, basestring):

            if search_domain not in line or '"' in line:

                output_file.write(line)

DimaX · 07.01.2013, 16:24

PHP код:


			
<?php

set_time_limit(0);

$chunks = explode('countend', file_get_contents('file.txt'));

$new = array();

foreach ($chunks as $chunk)
    {
        $strings = array_map('trim', explode("\n", trim($chunk)));
        
        preg_match('#"http://(.+?)"#i', $strings[0], $domain);
        
        foreach ($strings as $i => $string)
            {
                if (strpos($string, $domain[1]) !== FALSE && strpos($string, '"http://'.$domain[1].'"') === FALSE)
                    {
                        unset($strings[$i]);
                    }
            }
        
        $new[] = implode("\n", $strings);
    }

$f = fopen('done.txt', 'w');
fwrite($f, implode("\n".'countend'."\n", $new));
fclose($f);

?>

masolit · 08.01.2013, 00:31

Цитата:

Сообщение от DimaX

PHP код:


			
<?php



set_time_limit(0);



$chunks = explode('countend', file_get_contents('file.txt'));



$new = array();



foreach ($chunks as $chunk)

    {

        $strings = array_map('trim', explode("\n", trim($chunk)));

        

        preg_match('#"http://(.+?)"#i', $strings[0], $domain);

        

        foreach ($strings as $i => $string)

            {

                if (strpos($string, $domain[1]) !== FALSE && strpos($string, '"http://'.$domain[1].'"') === FALSE)

                    {

                        unset($strings[$i]);

                    }

            }

        

        $new[] = implode("\n", $strings);

    }



$f = fopen('done.txt', 'w');

fwrite($f, implode("\n".'countend'."\n", $new));

fclose($f);



?>

а если файл 2гб ?

DimaX · 08.01.2013, 08:02

Цитата:

Сообщение от masolit

а если файл 2гб ?

Или 222 ГБ ага, ведь все может быть

В 99% случаев файлы эти не такие здоровые. Когда они именно очень большие, обычно это указывают в ТЗ.