Python 파일 구문 분석 : 텍스트 파일에서 트리 만들기

나무를 만들기 위해 들여 쓰기가 된 들여 쓰기 된 텍스트 파일이 있습니다. 각 줄은 노드를 나타내며 들여 쓰기는 현재 노드가 자식 노드 인 깊이뿐만 아니라 노드를 나타냅니다.

예를 들어, 파일이 다음과 같이 보일 수 있습니다.

ROOT
   Node1
      Node2
         Node3
            Node4
   Node5
   Node6

ROOT에 1, 5, 6 세 개의 자식이 있고 Node1에는 하나의 자식이 있습니다 : 2, Node2에는 하나의 자식 : 3이 있음을 나타냅니다.

나는 재귀 적 알고리즘을 생각해 냈다. 그리고 그것을 프로그램했다. 그러나 작동한다. 그러나 그것은보기 흉하게 들린다. 그리고 특히 매우 조심스럽게 (노드 4에서 노드 5로 갈 때) 예제를 다룰 것이다.

"indent count"를 재귀의 기초로 사용하므로 indents = current depth + 1이면 한 단계 더 깊숙이 간다. 그러나 이것은 들여 쓰기가 적은 라인을 읽을 때마다 한 번에 한 레벨 위로 돌아와 매번 깊이를 확인해야한다는 것을 의미합니다.

여기에 내가 가진 것이있다.

def _recurse_tree(node, parent, depth):
    tabs = 0

    while node:
        tabs = node.count("\t")
        if tabs == depth:
            print "%s: %s" %(parent.strip(), node.strip())
        elif tabs == depth + 1:
            node = _recurse_tree(node, prev, depth+1)
            tabs = node.count("\t")

            #check if we have to surface some more
            if tabs == depth:
                print "%s: %s" %(parent.strip(), node.strip())
            else:
                return node
        else:
            return node

        prev = node
        node = inFile.readline().rstrip()

inFile = open("test.txt")
root = inFile.readline().rstrip()
node = inFile.readline().rstrip()
_recurse_tree(node, root, 1)

지금은 부모 노드가 각 행에 대해 올바른지 확인하기 위해 노드를 인쇄하고 있지만 더 깨끗한 방법이있을 수 있습니까? 특히 재귀 호출에서 돌아올 때 elif 블록의 경우.

해결법

==============================
1.큰 이슈는 내가 생각하기에 문제가되는 추악함을 일으킨 "선견자"입니다. 약간 단축 될 수 있습니다.

큰 이슈는 내가 생각하기에 문제가되는 추악함을 일으킨 "선견자"입니다. 약간 단축 될 수 있습니다.
```
def _recurse_tree(parent, depth, source):
    last_line = source.readline().rstrip()
    while last_line:
        tabs = last_line.count('\t')
        if tabs < depth:
            break
        node = last_line.strip()
        if tabs >= depth:
            if parent is not None:
                print "%s: %s" %(parent, node)
            last_line = _recurse_tree(node, tabs+1, source)
    return last_line

inFile = open("test.txt")
_recurse_tree(None, 0, inFile)
```
우리가 재귀를 말하고 있기 때문에 나는 전역 변수 (source와 last_line)를 피하기 위해 노력했다. 그것들을 일부 파서 객체에 멤버로 만드는 것은 훨씬 더 파이썬 적입니다.

==============================

2.당신이 재귀를 주장하지 않는다면, 이것 역시 작동합니다 :

당신이 재귀를 주장하지 않는다면, 이것 역시 작동합니다 :

from itertools import takewhile

is_tab = '\t'.__eq__

def build_tree(lines):
    lines = iter(lines)
    stack = []
    for line in lines:
        indent = len(list(takewhile(is_tab, line)))
        stack[indent:] = [line.lstrip()]
        print stack

source = '''ROOT
\tNode1
\t\tNode2
\t\t\tNode3
\t\t\t\tNode4
\tNode5
\tNode6'''

build_tree(source.split('\n'))

결과:

['ROOT']
['ROOT', 'Node1']
['ROOT', 'Node1', 'Node2']
['ROOT', 'Node1', 'Node2', 'Node3']
['ROOT', 'Node1', 'Node2', 'Node3', 'Node4']
['ROOT', 'Node5']
['ROOT', 'Node6']

==============================
3.이런 식으로 재귀를 사용하지 않을 것입니다. (좋습니다. Scheme과 같은 언어로 이것을 코딩한다면 어쩌면 제가 파이썬이 될 것입니다.) 재귀는 트리 모양의 데이터를 반복하는 데 유용하며 이러한 경우 일반 루프와 비교할 때 디자인을 크게 단순화합니다.

이런 식으로 재귀를 사용하지 않을 것입니다. (좋습니다. Scheme과 같은 언어로 이것을 코딩한다면 어쩌면 제가 파이썬이 될 것입니다.) 재귀는 트리 모양의 데이터를 반복하는 데 유용하며 이러한 경우 일반 루프와 비교할 때 디자인을 크게 단순화합니다.

그러나 여기에는 해당되지 않습니다. 데이터는 반드시 트리를 나타내지 만 순차적으로 형식이 지정됩니다. 즉, 간단한 일련의 선입니다. 이러한 데이터는 간단한 루프로 처리하는 것이 가장 쉽지만 원하는 경우 디자인을보다 일반적으로 만들 수 있습니다. 순차 판독기 (탭을 깊이 수준의 스펙으로 구문 분석 함), tree inserter (트리에 삽입 된 마지막 노드를 추적하여 노드를 특정 깊이 수준의 트리에 삽입 함) 및 트리 자체 :
```
import re

# *** Tree representation ***
class Node(object):
    def __init__(self, title):
        self.title = title
        self.parent = None
        self.children = []

    def add(self, child):
        self.children.append(child)
        child.parent = self

# *** Node insertion logic ***
class Inserter(object):
    def __init__(self, node, depth = 0):
        self.node = node
        self.depth = depth

    def __call__(self, title, depth):
        newNode = Node(title)
        if (depth > self.depth):
            self.node.add(newNode)
            self.depth = depth
        elif (depth == self.depth):
            self.node.parent.add(newNode)
        else:
            parent = self.node.parent
            for i in xrange(0, self.depth - depth):
                parent = parent.parent
            parent.add(newNode)
            self.depth = depth

        self.node = newNode

# *** File iteration logic ***
with open(r'tree.txt', 'r') as f:    
    tree = Node(f.readline().rstrip('\n'))
    inserter = Inserter(tree)

    for line in f:
        line = line.rstrip('\n')
        # note there's a bug with your original tab parsing code:
        # it would count all tabs in the string, not just the ones
        # at the beginning
        tabs = re.match('\t*', line).group(0).count('\t')
        title = line[tabs:]
        inserter(title, tabs)
```
여기에 붙여 넣기 전에이 코드를 테스트해야 할 때, 나는 메모리에 읽은 트리를 예쁜 프린트하는 아주 간단한 함수를 작성했습니다. 이 함수의 경우 가장 자연스러운 것은 당연히 트리가 트리 데이터로 표시되기 때문에 재귀를 사용하는 것이 었습니다.
```
def print_tree(node, depth = 0):
    print '%s%s' % ('  ' * depth, node.title)
    for child in node.children:
        rec(child, depth + 1)

print_tree(tree)
```

from https://stackoverflow.com/questions/6075974/python-file-parsing-build-tree-from-text-file by cc-by-sa and MIT license

'PYTHON' 카테고리의 다른 글

[PYTHON] PyEphem을 사용하여 새벽과 일몰 시간 계산하기 (0)	2018.11.08
[PYTHON] RegEx를 사용하여 Python에서 IP 주소 일치시키기 (0)	2018.11.08
[PYTHON] 동일한 메소드 이름을 가진 다른 경로의 Python 다중 상속 (0)	2018.11.08
[PYTHON] 파이썬에서 너무 많은 파일 열기 (0)	2018.11.08
[PYTHON] SSL3_GET_SERVER_CERTIFICATE 인증서 확인이 () .google.com을 요청할 때 Python에서 실패했습니다. (0)	2018.11.08

복붙노트

[PYTHON] Python 파일 구문 분석 : 텍스트 파일에서 트리 만들기

Python 파일 구문 분석 : 텍스트 파일에서 트리 만들기

해결법

1.큰 이슈는 내가 생각하기에 문제가되는 추악함을 일으킨 "선견자"입니다. 약간 단축 될 수 있습니다.

2.당신이 재귀를 주장하지 않는다면, 이것 역시 작동합니다 :

'PYTHON' 카테고리의 다른 글

티스토리툴바