Web scraping and data mining shouldn't require another degree to perform. However, in presentation after presentation we're given the impression that both of these require a huge degree of technical prowess to do and do correctly. This couldn't be farther from the truth. [Read more…]
There are many methods in Python to create a web scraper. One of the simplest is using a combination of the built-in requests library (to obtain web pages) and the Beautiful Soup library (to parse the pages and extract data). With my book – Python Business Intelligence Cookbook – being published soon, I was curious how, or if, the pricing my publisher sets changes over time. In order to track it, I created a simple web scraper. Code below… [Read more…]
Frankly, I couldn't come up with a cooler title than what you just read 😛
I was thinking to create a long post giving you the step-by-step instructions on how to install everything you need to scrape the web with Python. Then I got a bit smarter about it and made a Docker image you can download and run literally anywhere. [Read more…]
Eighty percent or more of the time spent on data science projects is spent acquiring data, cleaning it, and preparing it for analysis. That data can come from a variety of sources, including APIs or individual web pages. However, not all data is created equal. Once we have automated its acquisition, much of it requires lengthy cleaning and formatting before it can be used. In this course you will learn how to obtain, clean, and mashup data in preparation for analysis. [Read more…]
That's what I'll be talking about at DC Python on April 7th.
In this talk you’ll learn how to use pyparsing, a free Python module, to create and execute simple grammars for web scraping, application control and data wrangling. Dump the nested if statements and get parsing. Oh yes, there will be code, and lots of it, to get you started!