Extract all links from webpage using python selenium webdriver

Python code to extract href links from webpage using Selenium webdriver. If you are using any firewall setting, make sure to whitelist the webpage from where you want to fetch the links or turn off the anti-virus software.

Notes: Replace PATH TO WEBDRIVER EXE FILE (on line 4) with the chromedriver.exe path on your PC

If you haven’t yet installed the chrome driver, go to below path to download

https://chromedriver.chromium.org/downloads

or read Install right version of chrome driver on Windows 10 PC

import time
from selenium import webdriver

driver = webdriver.Chrome("PATH TO WEBDRIVER EXE FILE")

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}


URL='https://gethowstuff.com/android/'
driver.get(URL)
time.sleep(7)
elems = driver.find_elements_by_xpath("//a[@href]")
for elem in elems:
    print(elem.get_attribute("href"))
driver.close()

Sample Output:

Also see: Python: Extract file name from URL path [with and without extension]

About the Author

SRINI S

A passionate blogger. Love to share solutions and best practices on wordpress hosting, issues and fixes, excel VBA macros and other apps

Leave a Reply

Your email address will not be published. Required fields are marked *