[kwlug-disc] wget and variable assignment
Khalid Baheyeldin
kb at 2bits.com
Thu Jun 3 13:39:05 EDT 2010
On Thu, Jun 3, 2010 at 1:24 PM, Richard Weait <richard at weait.com> wrote:
> On Thu, Jun 3, 2010 at 1:09 PM, Khalid Baheyeldin <kb at 2bits.com> wrote:
> > On Thu, Jun 3, 2010 at 12:58 PM, Richard Weait <richard at weait.com>
> wrote:
> >>
> >> I have a simple screen-scrape to do.
> >>
> >> >From the command line it works fine
> >>
> >> wget -q -O - http://www.openstreetmap.org/stats/data_stats.html|<http://www.openstreetmap.org/stats/data_stats.html%7C>grep
> >> "<td>Number of users" | sed -e 's/[:a-zA-Z <>/:]//g'
> >>
> >> it returns the plain number
> >>
> >> 262086
> >>
> >> Cool, now to add it to a script
> >>
> >> This works fine
> >> GETTEE=`wget -q -O -
> >> http://www.openstreetmap.org/stats/data_stats.html|<http://www.openstreetmap.org/stats/data_stats.html%7C>grep "<td>Number
> >> of users" | sed -e 's/[:a-zA-Z <>/:]//g'`
> >> echo "GETTEE = $GETTEE"
> >>
> >> gives:
> >> GETTEE = 262086
> >>
> >> But. I want to grab some other data from the same page, so I want to
> >> wget once, then grep / sed a couple of times. And I'm breaking it.
> >> The page appears to have been stripped of its \n and so grepping the
> >> line I want is failing.
> >>
> >> GETTEE=`wget -q -O -
> http://www.openstreetmap.org/stats/data_stats.html`<http://www.openstreetmap.org/stats/data_stats.html%60>
> >> echo "GETTEE = $GETTEE"
> >>
> >> This returns a mess.
> >>
> >> The quick and dirty is to wget four times for four numbers, but I
> >> don't want to do that. How do I assign the wget to a variable and
> >> keep \n ?
> >
> > This is not fair for whoever is hosting the server.
>
> Right. That's why I'm here.
>
> > Do the wget once, using
> >
> > wget -q -O /tmp/osm.html
> >
> > Then parse that file as many times as you want for whatever you want.
>
> But no way to avoid all those disk calls? The assignment to the
> variable kills the \n or something?
>
Yes, it does. It is converted to spaces once you read it in a variable.
Makes things like this useful (at least in some use cases, not yours):
cat > myfile
first
second
third
^D
for WORD in `cat myfile`
do
# $WORD now contains first
done
But looking at the HTML page, there is only one occurrence of the string
you are looking for, so it is irrelevant if you do it from a variable or
from
a file (code wise).
--
Khalid M. Baheyeldin
2bits.com, Inc.
http://2bits.com
Drupal optimization, development, customization and consulting.
Simplicity is prerequisite for reliability. -- Edsger W.Dijkstra
Simplicity is the ultimate sophistication. -- Leonardo da Vinci
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20100603/c070a8fd/attachment.htm>
More information about the kwlug-disc
mailing list