Google Search Scraping With Python

Python is a language that allows you to do great things with very little code, it has a great set of powerful libraries and packages. I hope to illustrate this here by demonstrating how you can scrape results off a google search using a very simple and short python script. Older versions of such scripts were dependent on the ajax google api which no longer work, this is an alternative approach.


The way this piece of code works is by using the two modules ‘urllib’ and ‘requests’. These two modules are at the centre of this piece of code. The ‘get’ function of the ‘requests’ module is what allows you to access the specified url and the ‘urllib’ module allows you to read the urls on the page and store/output them.

For this code to work, you will also need the lxml library and the CSSselect python package. These are needed to process the formatting of the results page. lxml does not need any installation and is widely used in python scripts. You can download their package, and read their documentation here: http://lxml.de/

Now for CSSselect, you might get this error if the package is not installed on your system:

To fix this you might want to download the CSSselect package, which you can do from here: https://pypi.python.org/pypi/cssselect

To install this package run this command from the directory where the downloaded .whl file is located:

After doing so, you can run the script and/or use it in your own programs to scape off google search results. Have fun!

One Comment

Join the Discussion

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>