PHP에서 URL로부터 도메인 구문 분석하기

URL에서 도메인을 구문 분석하는 함수를 작성해야합니다.

그래서,

http://google.com/dhasjkdas/sadsdds/sdda/sdads.html

또는

http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html

google.com을 반환해야합니다.

와

http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html

google.co.uk를 반환해야합니다.

해결법

==============================
1.parse_url ()을 확인하십시오.

parse_url ()을 확인하십시오.
```
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'google.com'
```
parse_url은 실제로 엉망으로 처리 된 URL을 잘 처리하지 않지만 괜찮은 URL을 일반적으로 기대한다면 괜찮습니다.
==============================
2.
```
$domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST));
```
그러면 http : //google.com / ... 및 http : //www.google.com / ...에 대한 google.com이 반환됩니다.

==============================

3.http://us3.php.net/manual/en/function.parse-url.php#93983에서

http://us3.php.net/manual/en/function.parse-url.php#93983에서

function getHost($Address) { 
   $parseUrl = parse_url(trim($Address)); 
   return trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2))); 
} 

getHost("example.com"); // Gives example.com 
getHost("http://example.com"); // Gives example.com 
getHost("www.example.com"); // Gives www.example.com 
getHost("http://example.com/xyz"); // Gives example.com

==============================

4.100 % 작동하도록 의도 된 코드는 나를 위해 잘라내 진 않았지만, 예제를 약간 패치했지만 도움이되지 않는 코드와 문제가 발견되었습니다. 그래서 몇 가지 기능으로 변경했습니다 (Mozilla에서 항상 목록을 요청하지 않고 캐시 시스템을 제거하는 것을 막기 위해). 이것은 1000 개의 URL 집합에 대해 테스트되었으며 작동하는 것으로 보입니다.

100 % 작동하도록 의도 된 코드는 나를 위해 잘라내 진 않았지만, 예제를 약간 패치했지만 도움이되지 않는 코드와 문제가 발견되었습니다. 그래서 몇 가지 기능으로 변경했습니다 (Mozilla에서 항상 목록을 요청하지 않고 캐시 시스템을 제거하는 것을 막기 위해). 이것은 1000 개의 URL 집합에 대해 테스트되었으며 작동하는 것으로 보입니다.

function domain($url)
{
    global $subtlds;
    $slds = "";
    $url = strtolower($url);

    $host = parse_url('http://'.$url,PHP_URL_HOST);

    preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
    foreach($subtlds as $sub){
        if (preg_match('/\.'.preg_quote($sub).'$/', $host, $xyz)){
            preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
        }
    }

    return @$matches[0];
}

function get_tlds() {
    $address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
    $content = file($address);
    foreach ($content as $num => $line) {
        $line = trim($line);
        if($line == '') continue;
        if(@substr($line[0], 0, 2) == '/') continue;
        $line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
        if($line == '') continue;  //$line = '.'.$line;
        if(@$line[0] == '.') $line = substr($line, 1);
        if(!strstr($line, '.')) continue;
        $subtlds[] = $line;
        //echo "{$num}: '{$line}'"; echo "<br>";
    }

    $subtlds = array_merge(array(
            'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk', 
            'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
            'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au'
        ), $subtlds);

    $subtlds = array_unique($subtlds);

    return $subtlds;    
}

다음과 같이 사용하십시오.

$subtlds = get_tlds();
echo domain('www.example.com') //outputs: example.com
echo domain('www.example.uk.com') //outputs: example.uk.com
echo domain('www.example.fr') //outputs: example.fr

나는 이것을 수업으로 돌렸어야했지만 시간이 없었 음을 안다.

==============================

5.

function get_domain($url = SITE_URL)
{
    preg_match("/[a-z0-9\-]{1,63}\.[a-z\.]{2,6}$/", parse_url($url, PHP_URL_HOST), $_domain_tld);
    return $_domain_tld[0];
}

get_domain('http://www.cdl.gr'); //cdl.gr
get_domain('http://cdl.gr'); //cdl.gr
get_domain('http://www2.cdl.gr'); //cdl.gr

==============================

6.http://google.com/dhasjkdas/sadsdds/sdda/sdads.html 문자열에서 호스트를 추출하려면 parse_url ()을 사용하는 것이 좋습니다.

http://google.com/dhasjkdas/sadsdds/sdda/sdads.html 문자열에서 호스트를 추출하려면 parse_url ()을 사용하는 것이 좋습니다.

그러나 도메인이나 그 부분을 추출하려면 Public Suffix List를 사용하여 패키지해야합니다. 예, 문자열 함수 arround parse_url ()을 사용할 수 있지만 때로는 잘못된 결과가 나타납니다.

도메인 파싱을 위해 TLDExtract를 권장합니다. 여기서 diff를 보여주는 샘플 코드는 다음과 같습니다.

$extract = new LayerShifter\TLDExtract\Extract();

# For 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'

$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';

parse_url($url, PHP_URL_HOST); // will return google.com

$result = $extract->parse($url);
$result->getFullHost(); // will return 'google.com'
$result->getRegistrableDomain(); // will return 'google.com'
$result->getSuffix(); // will return 'com'

# For 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html'

$url = 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html';

parse_url($url, PHP_URL_HOST); // will return 'search.google.com'

$result = $extract->parse($url);
$result->getFullHost(); // will return 'search.google.com'
$result->getRegistrableDomain(); // will return 'google.com'

==============================

7.다음은 mozilla sub tlds를 고려하기 때문에 도메인 이름 만 찾는 100 % 코드입니다. 확인해야 할 것은 파일의 캐시를 만드는 방법입니다. 그래서 매번 mozilla를 쿼리하지 마십시오.

다음은 mozilla sub tlds를 고려하기 때문에 도메인 이름 만 찾는 100 % 코드입니다. 확인해야 할 것은 파일의 캐시를 만드는 방법입니다. 그래서 매번 mozilla를 쿼리하지 마십시오.

이상한 이유로 co.kr과 같은 도메인은 목록에 없으므로 해킹을 만들어 수동으로 추가해야합니다. 그것의 가장 깨끗한 해결책은 아니지만 나는 그것이 누군가를 돕기를 바랍니다.

//=====================================================
static function domain($url)
{
    $slds = "";
    $url = strtolower($url);

            $address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
    if(!$subtlds = @kohana::cache('subtlds', null, 60)) 
    {
        $content = file($address);
        foreach($content as $num => $line)
        {
            $line = trim($line);
            if($line == '') continue;
            if(@substr($line[0], 0, 2) == '/') continue;
            $line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
            if($line == '') continue;  //$line = '.'.$line;
            if(@$line[0] == '.') $line = substr($line, 1);
            if(!strstr($line, '.')) continue;
            $subtlds[] = $line;
            //echo "{$num}: '{$line}'"; echo "<br>";
        }
        $subtlds = array_merge(Array(
            'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk', 
            'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
            'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au',
            ),$subtlds);

        $subtlds = array_unique($subtlds);
        //echo var_dump($subtlds);
        @kohana::cache('subtlds', $subtlds);
    }


    preg_match('/^(http:[\/]{2,})?([^\/]+)/i', $url, $matches);
    //preg_match("/^(http:\/\/|https:\/\/|)[a-zA-Z-]([^\/]+)/i", $url, $matches);
    $host = @$matches[2];
    //echo var_dump($matches);

    preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
    foreach($subtlds as $sub) 
    {
        if (preg_match("/{$sub}$/", $host, $xyz))
        preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
    }

    return @$matches[0];
}

==============================
8.두 번째 매개 변수로 PHP_URL_HOST를 parse_url 함수에 전달할 수 있습니다.

두 번째 매개 변수로 PHP_URL_HOST를 parse_url 함수에 전달할 수 있습니다.
```
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$host = parse_url($url, PHP_URL_HOST);
print $host; // prints 'google.com'
```

==============================

9.

$domain = parse_url($url, PHP_URL_HOST);
echo implode('.', array_slice(explode('.', $domain), -2, 2))

==============================

10.나는 @ philfreo의 해결책 (php.net에서 참조)이 꽤 좋은 결과를 얻는 것을 발견했지만 어떤 경우에는 php의 "notice"와 "Strict Standards"메시지를 보여준다. 여기에이 코드의 고정 된 버전이 있습니다.

나는 @ philfreo의 해결책 (php.net에서 참조)이 꽤 좋은 결과를 얻는 것을 발견했지만 어떤 경우에는 php의 "notice"와 "Strict Standards"메시지를 보여준다. 여기에이 코드의 고정 된 버전이 있습니다.

function getHost($url) { 
   $parseUrl = parse_url(trim($url)); 
   if(isset($parseUrl['host']))
   {
       $host = $parseUrl['host'];
   }
   else
   {
        $path = explode('/', $parseUrl['path']);
        $host = $path[0];
   }
   return trim($host); 
} 

echo getHost("http://example.com/anything.html");           // example.com
echo getHost("http://www.example.net/directory/post.php");  // www.example.net
echo getHost("https://example.co.uk");                      // example.co.uk
echo getHost("www.example.net");                            // example.net
echo getHost("subdomain.example.net/anything");             // subdomain.example.net
echo getHost("example.net");                                // example.net

==============================
11.parse_url이 작동하지 않았습니다. 그것은 길을 반환했습니다. php5.3 +를 사용하여 기초로 전환 :

parse_url이 작동하지 않았습니다. 그것은 길을 반환했습니다. php5.3 +를 사용하여 기초로 전환 :
```
$url  = str_replace('http://', '', strtolower( $s->website));
if (strpos($url, '/'))  $url = strstr($url, '/', true);
```

==============================

12.나는 너를 위해 편집했다.

나는 너를 위해 편집했다.

function getHost($Address) { 
    $parseUrl = parse_url(trim($Address));
    $host = trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2))); 

    $parts = explode( '.', $host );
    $num_parts = count($parts);

    if ($parts[0] == "www") {
        for ($i=1; $i < $num_parts; $i++) { 
            $h .= $parts[$i] . '.';
        }
    }else {
        for ($i=0; $i < $num_parts; $i++) { 
            $h .= $parts[$i] . '.';
        }
    }
    return substr($h,0,-1);
}

모든 유형의 URL (www.domain.ltd, sub1.subn.domain.ltd는 domain.ltd가됩니다.

==============================
13.parse_url () 확인

parse_url () 확인

==============================

14.여기 내 크롤러는 위의 답변을 기반으로합니다.

여기 내 크롤러는 위의 답변을 기반으로합니다.

크롤 클래스 코드

class crawler
{
    protected $_url;
    protected $_depth;
    protected $_host;

    public function __construct($url, $depth = 5)
    {
        $this->_url = $url;
        $this->_depth = $depth;
        $parse = parse_url($url);
        $this->_host = $parse['host'];
    }

    public function run()
    {
        $this->crawl_page($this->_url, $this->_depth = 5);
    }

    public function crawl_page($url, $depth = 5)
    {
        static $seen = array();
        if (isset($seen[$url]) || $depth === 0) {
            return;
        }
        $seen[$url] = true;
        list($content, $httpcode) = $this->getContent($url);

        $dom = new DOMDocument('1.0');
        @$dom->loadHTML($content);
        $this->processAnchors($dom, $url, $depth);

        ob_end_flush();
        echo "CODE::$httpcode, URL::$url <br>";
        ob_start();
        flush();
        // echo "URL:", $url, PHP_EOL, "CONTENT:", PHP_EOL, $dom->saveHTML(), PHP_EOL, PHP_EOL;
    }

    public function processAnchors($dom, $url, $depth)
    {
        $anchors = $dom->getElementsByTagName('a');
        foreach ($anchors as $element) {
            $href = $element->getAttribute('href');
            if (0 !== strpos($href, 'http')) {
                $path = '/' . ltrim($href, '/');
                if (extension_loaded('http')) {
                    $href = http_build_url($url, array('path' => $path));
                } else {
                    $parts = parse_url($url);
                    $href = $parts['scheme'] . '://';
                    if (isset($parts['user']) && isset($parts['pass'])) {
                        $href .= $parts['user'] . ':' . $parts['pass'] . '@';
                    }
                    $href .= $parts['host'];
                    if (isset($parts['port'])) {
                        $href .= ':' . $parts['port'];
                    }
                    $href .= $path;
                }
            }
            // Crawl only link that belongs to the start domain
            if (strpos($href, $this->_host) !== false)
                $this->crawl_page($href, $depth - 1);
        }
    }

    public function getContent($url)
    {
        $handle = curl_init($url);
        curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);

        /* Get the HTML or whatever is linked in $url. */
        $response = curl_exec($handle);

        /* Check for 404 (file not found). */
        $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
        if ($httpCode == 404) {
            /* Handle 404 here. */
        }

        curl_close($handle);
        return array($response, $httpCode);
    }
}

// USAGE
$startURL = 'http://YOUR_START_ULR';
$depth = 2;
$crawler = new crawler($startURL, $depth);
$crawler->run();

==============================

15.이 대답은 Google에서 가장 많이 팝업되는 답변이기 때문에이 답변을 추가하고 있습니다 ...

이 대답은 Google에서 가장 많이 팝업되는 답변이기 때문에이 답변을 추가하고 있습니다 ...

당신은 PHP를 사용할 수 있습니다 ...

$url = "www.google.co.uk";
$host = parse_url($url, PHP_URL_HOST);
// $host == "www.google.co.uk"

호스트가 있지만 개인 호스트가 아닌 호스트를 잡아. (예 : www.google.co.uk는 호스트이지만 google.co.uk는 비공개 도메인입니다)

개인 도메인을 점유하려면 개인 도메인을 등록 할 수있는 공용 접미어 목록을 알아야합니다. 이 목록은 Mozilla의 https://publicsuffix.org/에서 큐레이팅됩니다.

아래 코드는 공용 접미어 배열이 이미 만들어져있을 때 작동합니다. 간단하게 전화하십시오.

$domain = get_private_domain("www.google.co.uk");

나머지 코드와 함께 ...

// find some way to parse the above list of public suffix
// then add them to a PHP array
$suffix = [... all valid public suffix ...];

function get_public_suffix($host) {
  $parts = split("\.", $host);
  while (count($parts) > 0) {
    if (is_public_suffix(join(".", $parts)))
      return join(".", $parts);

    array_shift($parts);
  }

  return false;
}

function is_public_suffix($host) {
  global $suffix;
  return isset($suffix[$host]);
}

function get_private_domain($host) {
  $public = get_public_suffix($host);
  $public_parts = split("\.", $public);
  $all_parts = split("\.", $host);

  $private = [];

  for ($x = 0; $x < count($public_parts); ++$x) 
    $private[] = array_pop($all_parts);

  if (count($all_parts) > 0)
    $private[] = array_pop($all_parts);

  return join(".", array_reverse($private));
}

==============================
16.이것은 일반적으로 입력 URL이 전체 정크가 아닌 경우 매우 잘 작동합니다. 하위 도메인을 제거합니다.

이것은 일반적으로 입력 URL이 전체 정크가 아닌 경우 매우 잘 작동합니다. 하위 도메인을 제거합니다.
```
$host = parse_url( $Row->url, PHP_URL_HOST );
$parts = explode( '.', $host );
$parts = array_reverse( $parts );
$domain = $parts[1].'.'.$parts[0];
```
예

입력 : http://www2.website.com:8080/some/file/structure?some=parameters

출력 : website.com
==============================
17.worldofjr와 Alix Axel의 답변을 대부분의 유스 케이스를 처리 할 수있는 하나의 작은 함수로 결합합니다.

worldofjr와 Alix Axel의 답변을 대부분의 유스 케이스를 처리 할 수있는 하나의 작은 함수로 결합합니다.
```
function get_url_hostname($url) {

    $parse = parse_url($url);
    return str_ireplace('www.', '', $parse['host']);

}

get_url_hostname('http://www.google.com/example/path/file.html'); // google.com
```
==============================
18.다음과 같이 사용하십시오 ...

다음과 같이 사용하십시오 ...
```
<?php
   echo $_SERVER['SERVER_NAME'];
?>
```

from https://stackoverflow.com/questions/276516/parsing-domain-from-url-in-php by cc-by-sa and MIT license

'PHP' 카테고리의 다른 글

PHP에서 mod_rewrite가 활성화되었는지 확인하는 방법은 무엇입니까? (0)	2018.09.12
php serialize () 및 unserialize ()를 사용하는 방법 (0)	2018.09.12
move_uploaded_file은 내가 수행 한 모든 구성이 끝난 후 "스트림을 열지 못했습니다 : Permission denied"오류를 표시합니다. (0)	2018.09.12
str_replace를 사용하여 첫 번째 일치에서만 작동하도록 하시겠습니까? (0)	2018.09.12
문자열에서 모든 특수 문자 제거 [duplicate] (0)	2018.09.12

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

복붙노트

PHP에서 URL로부터 도메인 구문 분석하기

PHP에서 URL로부터 도메인 구문 분석하기

해결법

1.parse_url ()을 확인하십시오.

2.

3.http://us3.php.net/manual/en/function.parse-url.php#93983에서

5.

6.http://google.com/dhasjkdas/sadsdds/sdda/sdads.html 문자열에서 호스트를 추출하려면 parse_url ()을 사용하는 것이 좋습니다.

7.다음은 mozilla sub tlds를 고려하기 때문에 도메인 이름 만 찾는 100 % 코드입니다. 확인해야 할 것은 파일의 캐시를 만드는 방법입니다. 그래서 매번 mozilla를 쿼리하지 마십시오.

8.두 번째 매개 변수로 PHP_URL_HOST를 parse_url 함수에 전달할 수 있습니다.

9.

10.나는 @ philfreo의 해결책 (php.net에서 참조)이 꽤 좋은 결과를 얻는 것을 발견했지만 어떤 경우에는 php의 "notice"와 "Strict Standards"메시지를 보여준다. 여기에이 코드의 고정 된 버전이 있습니다.

11.parse_url이 작동하지 않았습니다. 그것은 길을 반환했습니다. php5.3 +를 사용하여 기초로 전환 :

12.나는 너를 위해 편집했다.

13.parse_url () 확인

14.여기 내 크롤러는 위의 답변을 기반으로합니다.

15.이 대답은 Google에서 가장 많이 팝업되는 답변이기 때문에이 답변을 추가하고 있습니다 ...

16.이것은 일반적으로 입력 URL이 전체 정크가 아닌 경우 매우 잘 작동합니다. 하위 도메인을 제거합니다.

17.worldofjr와 Alix Axel의 답변을 대부분의 유스 케이스를 처리 할 수있는 하나의 작은 함수로 결합합니다.

18.다음과 같이 사용하십시오 ...

'PHP' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역