post

Python Web Scraper Docker Image

Docker PythonFrankly, I couldn't come up with a cooler title than what you just read 😛

I was thinking to create a long post giving you the step-by-step instructions on how to install everything you need to scrape the web with Python. Then I got a bit smarter about it and made a Docker image you can download and run literally anywhere.

Click this link to download the image.

Click here to find out more about Docker and how you can use it to quickly deploy entire infrastructures (hint: it's freaking awesome and the already-here future of software deployment).

What's In The Box

Everything you need to scrape the web with Python!

The Docker image is based on Ubuntu 14.04 (my fav) and has Python 2.7.6 installed. Here's the package list by category:

Web Scraping

  • Scrapy 0.24.5
  • BeautifulSoup4 4.3.2
  • Requests 2.2.1
  • Fake UserAgent 0.0.7
  • wget 2.2

Data Wrangling & Analysis

  • Pandas 0.13.1
  • Matplotlib 1.3.1
  • Scipy 0.13.3
  • FuzzyWuzzy 0.5.0
  • PyParsing 2.0.3
  • SimpleJSON 3.6.5

Output

  • XlsxWriter 0.6.7
  • Python Logstash 0.4.2
  • Redis 2.10.3
  • PyMySQL 0.6.6

… and more assorted goodness.

Updates Are On The Way!

I'm working on an update to the image with Python 3.x and updates to all the libraries. Keep a lookout on Docker Hub for that.

Until then, download the image, deploy it, and get scraping!

 

Comments

  1. Hey, can you give on your website one exemple 🙂

Speak Your Mind

*