[kwlug-disc] Utility to parse HTTP response

Fri Jan 31 09:15:01 EST 2025

On Fri, Jan 31, 2025 at 1:16 AM William Park via kwlug-disc <
kwlug-disc at kwlug.org> wrote:

> Hi all, (also posted to GTALUG)
>
> To build HTTP request, you use 'curl'.
>
> After you get HTTP response, what utility do you use to extract stuffs
> you want?  JSON format has 'jq', and XML format has 'xmlstarlet'.  I'm
> looking for something like that but for HTTP format.
>
> I know it's simple (HTTP response has 3 parts: first line, headers, and
> body), but I just want to reduce chance of typos.
>

Do you want to do this from the bash, in shell scripts?
Or something else.

>From shell scripts, I do this:

curl -s -k -L -i -o $TMP -H "User-Agent: $UA" "$URL" 2>&1

# Print the headers
sed -ne '1,/^^M$/p' $TMP

That ^M is actually a Ctrl-M which separates the header portion
from the returned content.

For kwlug.org for example, it returns this

HTTP/1.1 301 Moved Permanently
Date: Fri, 31 Jan 2025 14:04:01 GMT
Server: Apache
X-Content-Type-Options: nosniff
Location: http://kwlug.org/
Cache-Control: max-age=31536000
Expires: Sat, 31 Jan 2026 14:04:01 GMT
Content-Length: 225
Content-Type: text/html; charset=iso-8859-1

That of course means the headers and the content are merged in
one output file, and you have to extract what you need with lots
of code.

If it is something more complex, then I use Python's requests library.
You need to use exceptions to handle things like timeouts and too
many redirects, ...etc.

A brief tutorial can be found here (and elsewhere)

https://www.geeksforgeeks.org/python-requests-tutorial/

-- 
Khalid M. Baheyeldin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20250131/1b30c892/attachment-0001.htm>