Web Scraping with Puppeteer and ExpressJS

RYMS
Level Up Coding
Published in
5 min readJan 17, 2021

--

What I wanted to achieve here is to develop a simple web scraping service to assist me in scraping certain information on a website.

Photo by Evan Fitzer on Unsplash

Puppeteer

If you are proficient in Javascript, the easiest framework to use for web scraping is called Puppeteer

https://pptr.dev

What Puppeteer does is it creates a typical Chromium browser and would then be able to browse through the website. It would then look at all the properties within the website, and ‘scrapes’ whatever information that you may want to use and find.

For example, if you are looking for a ‘name’ section, it would be logical for the developer of the website to name the field or div as ‘name’. This could be something else entirely, therefore you need to look at the target website source beforehand to get the information that you require. In this case, what Puppeteer can do is to find the ‘name’ section and return it to you for your perusal.

In addition to scraping (verb, single P) it also does screenshots as well as PDFs of the website and also automating form submissions, which is key in web scraping operations.

So, one might ask, how do we run the Puppeteer? In this case, I will build it with ExpressJS server, which should be quite simple to set up and use.

--

--

Husband, son, father & multi-award winning app developer. 😊❤️ TypeScript and JavaScript. Ionic Developer Expert. Proud to be Malaysian. I tweet too @razmans