[kwlug-disc] BASH compare items in two files
Raul Suarez
rarsa at yahoo.com
Wed Nov 3 14:55:08 EDT 2010
--- On Wed, 11/3/10, John Van Ostrand <john at netdirect.ca> wrote:
> From: John Van Ostrand <john at netdirect.ca>
> Subject: Re: [kwlug-disc] BASH compare items in two files
> To: "KWLUG discussion" <kwlug-disc at kwlug.org>
> Received: Wednesday, November 3, 2010, 2:17 PM
> ----- Original Message -----
> > ----- Original Message -----
> > > A higher level language (python, perl, C, etc)
> program would be your
> > > best bet as they already have the XML parsing and
> data handling
> > > libraries that will make this task a cinch.
> > >
> > > 1. Load the list of IDs into a searchable list
> > > 2. Using a SAX parser compare the ID of every
> node against the list
> > > 3. Done!
> >
> > Oops I should read more carefully. He asked what a
> real programmer
> > would do.
>
> So my take on it is this. Use the document object model in
> some language. I presume that's what sax does.
No, SAX and DOM are two different ways of handling XML
In SAX the process is sequential, Top down reading each node.
In DOM you load the whole document and query (or traverse) it.
DOM gives more flexibility
SAX is simpler to implement, has less resource requirements and is faster.
Usually with a SAX parser you just pass the file name and an event handler (call back) to call for each node (Event)
Here is a simple example,
http://www.devarticles.com/c/a/XML/Parsing-XML-with-SAX-and-Python/5/
I am thinking that for Richard's purpose the code would be even simpler as he is just counting not storing the data so no need for "characters" or endElement methods
The code for the __init__ and startElement code would look like
def __init__ (self, searchTerm):
nodesDict = {}
setOfIDs = ()
# Here put the code to load the file with IDs into the setOfIDs set.
def startElement(self, name, attrs):
if not nodesDict.has_key(name) :
nodesDict[name] = 0
id = attrs.get('name',"")
if id in setOfIDs
nodesDict[name] += 1
Raul Suarez
Technology consultant
Software, Hardware and Practices
_________________
Twitter: rarsamx
http://rarsa.blogspot.com/
An eclectic collection of random thoughts
More information about the kwlug-disc
mailing list