How to download files using selenium and python

Using selenium with python is a quick and relatively straightforward method to browse websites, pull data, and fill out forms on web pages. A few weeks ago, I was developing an app to download data from my electricity supplier, however the data was only available as a PDF file and I didn’t know how to download it.

After some quick Googling I managed to pull together a way to download the PDF files from my electricity supplier.


I’m not going to go into all the details right now on how to setup selenium and start a browser object. I’ll just assume you know how to do that at this point.

The trick to downloading files is to the set the correct option parameters for the browser object. I use Chrome webdriver for most of my tasks, so this code is for that setup.

The key is to set the option for the default download directory, and set what happens when a downloadable file is clicked by selenium. See the option code below:

options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
        "download.default_directory": "E:\your\download\folder", #Change default directory for downloads
        "download.prompt_for_download": False, # Auto download the file
        "download.directory_upgrade": True,
        "plugins.always_open_pdf_externally": True # Do not show PDF directly in chrome
        })
browser = webdriver.Chrome(DRIVER_PATH, options=options)

First, we setup the object options to hold all our webdriver configuration options.

Second, we add the add_experimental_options() method to the options object. Note the term “experimental” so this may not function correctly in 100% of cases.

The code itself is pretty self explanatory – we set the default download directory, turn off the download prompt, and tell Chrome to not display the PDF in the browser window. I’m not sure what exactly the directory_update option does, but it seems to be necessary.

Now you have the necessary options, find the link you are trying to download and click it:

browser.find_element_by_id('cphContentMain_ctl00_lnkCantSeeBill').click()

The file, in this case a PDF, should then be downloaded and saved into your default directory for future use or further processing.


I hope you found this information useful for downloading files using selenium with python! Let me know in the comments below if you have suggestions for improvement.

Credit for a large chunk of the code goes to this post on stackoverflow:

https://stackoverflow.com/questions/40654358/how-to-control-the-download-of-files-with-selenium-python-bindings-in-chrome

John

Leave a Reply

Your email address will not be published. Required fields are marked *