Python code to extract href links from webpage using Selenium webdriver. If you are using any firewall setting, make sure to whitelist the webpage from where you want to fetch the links or turn off the anti-virus software.
Notes: Replace PATH TO WEBDRIVER EXE FILE (on line 4) with the chromedriver.exe path on your PC
If you haven’t yet installed the chrome driver, go to below path to download
https://chromedriver.chromium.org/downloads
or read Install right version of chrome driver on Windows 10 PC
import time from selenium import webdriver driver = webdriver.Chrome("PATH TO WEBDRIVER EXE FILE") headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"} URL='https://gethowstuff.com/android/' driver.get(URL) time.sleep(7) elems = driver.find_elements_by_xpath("//a[@href]") for elem in elems: print(elem.get_attribute("href")) driver.close()
Sample Output:
Also see: Python: Extract file name from URL path [with and without extension]