How to Get Contents of an AJAX Request in a Script
31 Mar 2023 | web-scraping, playwright, scripting, bashWhile I was performing web scraping, I discovered that I needed a specific string of characters to make an HTTP request to retrieve the information I was after. Unfortunately, I couldn’t find the string of characters anywhere except in the header of a specific AJAX request. In this post, I will share my experience in attempting to extract this string programmatically.
Introducing HAR

If you’ve ever explored the network tab of your browser’s devtools, you’ve probably come across the term HAR. HAR stands for HTTP Archive and is used to record all HTTP traffic exchanged between a web browser and a web server. Essentially, everything that you can access in the network tab is encompassed within an HAR file.
Generating HAR programmatically
Unfortunately, there isn’t a command-line interface tool available that can produce an HAR (HTTP Archive) for a particular URL. In fact, to capture the AJAX request, you must allow Javascript to execute and complete its tasks, which can only be accomplished through the use of a browser. Although there are numerous headless browsers available for programmatic use, I utilized Playwright on this occasion.
First, initialize a new NodeJS project:
$ npm init -y
Then install Playwright as a dependency:
$ npm i playwright
Then, create a new file named index.js
:
const playwright = require('playwright');
(async () => {
const browser = await playwright.chromium.launch({});
const context = await browser.newContext({
recordHar: { path: 'result.har' }
});
const page = await context.newPage();
await page.goto('https://example.org', { waitUntil: 'networkidle' });
await context.close();
await browser.close();
})();
index.js
Then in package.json
, add a script command named start
:
{
...
"scripts": {
"start": "node index.js",
...
}
}
If everything is in order, a result.har
file will be generated when you run npm start
.
Parsing the contents of an HAR
Turns out, HAR is just a JSON file, so you can use jq
to parse the JSON contents and extract whatever you need.
{
"log": {
"version": "1.2",
"creator": {
"name": "Playwright",
"version": "1.31.2"
},
"browser": {
"name": "chromium",
"version": "111.0.5563.19"
},
"pages": [...],
"entries": [
{
"startedDateTime": "2023-03-31T05:33:58.720Z",
"time": 203.736,
"request": {
"method": "GET",
"url": "https://api.example.org",
"httpVersion": "HTTP/2.0",
"cookies": [],
"headers": [
{ "name": "foo", "value": "bar" },
...
],
"queryString": [],
"headersSize": 583,
"bodySize": 0
},
"response": {
"status": 200,
"statusText": "",
"httpVersion": "HTTP/2.0",
"cookies": [],
"headers": [],
"content": {
"size": 6261,
"mimeType": "text/html",
"compression": 0,
"text": "..."
},
"headersSize": 0,
"bodySize": 6957,
"redirectURL": "",
"_transferSize": 6957
},
...
},
Assuming you want to extract the header foo
from GET api.example.org
:
$ jq '.log.entries[] | select(.request.url | contains("api.example.org")) .request.headers[] | select(.name == "foo") .value')
Which will return bar
from the example above.
Bringing it altogether in a script
Under the same directory, create a bash script with the following contents:
#! /bin/bash
# Change current directory to where the script is
cd $(dirname $0)
# Generate HAR
npm start --silent
# Extract foo header value
cat result.har | jq '.log.entries[] | select(.request.url | contains("api.example.org")) .request.headers[] | select(.name == "foo") .value'
get_foo.sh
Then use chown
to make it executable:
$ chown +x get_foo.sh

Self-taught full-stack web dev based in Tokyo.
Occasionally wrecks servers through
self-hosting
and
homelab-ing.