XML, Анализирующий с Python и minidom

См. это руководство. http://wiki.eclipse.org/index.php/Mylyn_Extensions

Меню Eclipse. Справка-> Обновления программного обеспечения...-> Добавляет Сайт...

http://download.eclipse.org/tools/mylyn/update/incubator

16
задан hWorks 20 October 2009 в 19:36
поделиться

3 ответа

getElementsByTagName is recursive, you'll get all descendents with a matching tagName. Because your Topics contain other Topics that also have Titles, the call will get the lower-down Titles many times.

If you want to ask for all matching direct children only, and you don't have XPath available, you can write a simple filter, eg.:

def getChildrenByTagName(node, tagName):
    for child in node.childNodes:
        if child.nodeType==child.ELEMENT_NODE and (tagName=='*' or child.tagName==tagName):
            yield child

for topic in document.getElementsByTagName('Topic'):
    title= list(getChildrenByTagName('Title'))[0]         # or just get(...).next()
    print title.firstChild.data
9
ответ дан 30 November 2019 в 21:36
поделиться

Позвольте мне поставить этот комментарий здесь ...

Спасибо за попытку. Это не сработало, но дало мне несколько идей. Следующие работы (та же общая идея; FWIW, nodeType - ELEMENT_NODE):

import xml.dom.minidom
from xml.dom.minidom import Node

dom = xml.dom.minidom.parse("docmap.xml")

def getChildrenByTitle(node):
    for child in node.childNodes:
        if child.localName=='Title':
            yield child

Topic=dom.getElementsByTagName('Topic')
for node in Topic:
    alist=getChildrenByTitle(node)
    for a in alist:
#        Title= a.firstChild.data
        Title= a.childNodes[0].nodeValue
        print Title
7
ответ дан 30 November 2019 в 21:36
поделиться

You could use the following generator to run through the list and get titles with indentation levels:

def f(elem, level=-1):
    if elem.nodeName == "Title":
        yield elem.childNodes[0].nodeValue, level
    elif elem.nodeType == elem.ELEMENT_NODE:
        for child in elem.childNodes:
            for e, l in f(child, level + 1):
                yield e, l

If you test it with your file:

import xml.dom.minidom as minidom
doc = minidom.parse("test.xml")
list(f(doc))

you will get a list with the following tuples:

(u'My Document', 1), 
(u'Overview', 1), 
(u'Basic Features', 2), 
(u'About This Software', 2), 
(u'Platforms Supported', 3)

It is only a basic idea to be fine-tuned of course. If you just want spaces at the beginning you can code that directly in the generator, though with the level you have more flexibility. You could also detect the first level automatically (here it's just a poor job of initializing the level to -1...).

3
ответ дан 30 November 2019 в 21:36
поделиться
Другие вопросы по тегам:

Похожие вопросы: