Big Data & AI Accelerator
Author: Danger Crew
Last update: August 9, 2023

Meet Web Scraping and Automation tool – Selenium

Web scraping is one of my favorite Python programming tasks. Let's get a closer look in this 5-minutes article on how powerful it is.

Index

Scroll to top

If you would like to learn Data Science, Programming or change your career to tech industry, visit DANGER Education website and enroll the interested courses now!

Assume you work in the marketing department of your company, one of your routine jobs is monitoring the performance of the marketing campaign including YouTube videos and Instagram posts on a daily basis. How many ‘Likes’ received so far? How many ‘Views’ achieved? Every day you open each video or post and jot down the numbers.

What a boring task, which can be changed if you know Web Scraping.


Web Scraping

Luckily if you know Python programming, you can automate this boring task with web scraping technique. The advantage of using web scraping technique to pull data from YouTube or Instagram is that the data on these social media platform is updating every second, and what we can do with web scraping is REAL TIME data pull! We can schedule a time for the program to run everyday (e.g. like 10 am) and store the required data for future reference.


Why Selenium

There are many web scraping tools for you to extract website information based on HTML elements likes XPATH or CSS, for example BeautifulSoup and Scrapy can handle most of the tasks; however some websites are adopting JavaScript or other methods which make web scraping difficult. One common difficulty is the required web elements will not be available until a user click a button on the webpage.

Here come the solution – Selenium! It can mimic user behaviors and be used for tasks that require automating interaction with internet browser, likes Google Chrome or IE. With Selenium, we can overcome above limitation and automate the web scraping task.


A quick example – Pull data from YouTube video

Let’s say you are monitoring the performance of a YouTube music video by Keung To. You are interesting in the Likes, Dislikes and View numbers:

First, we import the Selenium and Chromium Python packages to the program:

Then we can call the Chrome browser and point it to the YouTube video link, find the corresponding xpath from the website and store it in a variable (e.g. Likes, Dislikes, Views, Title)

Inspect the webpage elements and locate the xpath for the variables:

Finally we can save the variables which store the Likes, Dislike and View information:

Dear My Friend, here you go and you can automate this task by scheduling the program to run everyday and add the numbers on the following rows in the table.

Summary

With Python programming , we can easily pull data from the web and automate the task by web scraping, which can save us a tremendous time. Despite its powerful capability, web scraping is just one of the Python programming applications. Start your Python programming journey today and explore more!