While I was performing web scraping, I discovered that I needed a specific string of characters to make an HTTP request to retrieve the information I was after. Unfortunately, I couldn’t find the string of characters anywhere except in the header of a specific AJAX request. In this post, I will share my experience in attempting to extract this string programmatically.
Introducing HAR
If you’ve ever explored the network tab of your browser’s devtools, you’ve probably come across the term HAR. HAR stands for HTTP Archive and is used to record all HTTP traffic exchanged between a web browser and a web server. Essentially, everything that you can access in the network tab is encompassed within an HAR file.
Generating HAR programmatically
Unfortunately, there isn’t a command-line interface tool available that can produce an HAR (HTTP Archive) for a particular URL. In fact, to capture the AJAX request, you must allow Javascript to execute and complete its tasks, which can only be accomplished through the use of a browser. Although there are numerous headless browsers available for programmatic use, I utilized Playwright on this occasion.
First, initialize a new NodeJS project:
$ npm init -y
Then install Playwright as a dependency:
$ npm i playwright
Then, create a new file named index.js:
Then in package.json, add a script command named start: