Web scraping and data mining shouldn't require another degree to perform. However, in presentation after presentation we're given the impression that both of these require a huge degree of technical prowess to do and do correctly. This couldn't be farther from the truth. [Read more…]
Ever been frustrated that delivering on a project seemed to take forever? Do requirements change out from under you on a frequent basis? Are you tired of having a team member stomp on a day's worth of work by overwriting your file on the file server? Have you ever lost hours of work because your computer unexpectedly took a hiatus?
If you answered yes to any of the above questions you and your data team could benefit by implementing software development practices.
Over the past few months I started hearing the term “data pipeline” more and more at the local data meetups. Curious as to just what that meant, I looked it up. In this post I'm going to tell you what I found, and more importantly provide real-world examples of data pipelines you can use for your data projects. [Read more…]
Join me on November 16th, 2015 at 1:00 PM (New York) and 10 AM (California), for a free webinar – How to Choose a Data Science Tool.
On this webinar you’ll discover:
- The four phases of selecting the tool that’s right for you and your team
- 10 key points to consider before you start your evaluation
- Tips on how to perform your research so you don’t waste your time during the evaluation phase
- How best to structure your time during the evaluation to keep productivity high and have the time you need to really test the tools
By attending the webinar you’ll receive:
- A recording of the webinar
- A one-page checklist to use during your evaluation
- A presentation template you can use to help “sell” your tool of choice to management
There are many methods in Python to create a web scraper. One of the simplest is using a combination of the built-in requests library (to obtain web pages) and the Beautiful Soup library (to parse the pages and extract data). With my book – Python Business Intelligence Cookbook – being published soon, I was curious how, or if, the pricing my publisher sets changes over time. In order to track it, I created a simple web scraper. Code below… [Read more…]
I know how hard it can be to take what you read in a book, especially a data science book, and apply it to the data you have to work with. I want to save you that experience.
If you pre-order my upcoming book – Python Business Intelligence Cookbook – you'll receive a metric ton of bonus materials and resources that will help you take what you learn in the book, and apply it to your data. Specifically, when you pre-order, you'll receive:
- Pre-webinar videos
- A seat at the two-hour live webinar
- Business intelligence project checklist
- List of additional resources
- 30-days of post-release email Q&A
Learn more about the book and this offer at pythonbicookbook.com.
Thank you for your support!
Frankly, I couldn't come up with a cooler title than what you just read 😛
I was thinking to create a long post giving you the step-by-step instructions on how to install everything you need to scrape the web with Python. Then I got a bit smarter about it and made a Docker image you can download and run literally anywhere. [Read more…]
Last week after my talk at Data Science DC on how to create your first predictive model in Python, a fellow meetup member asked me about using Pandas for some data engineering work he was doing. In short, his DataFrame didn't seem to be applying the changes he was attempting to make. After a bit of conversation I found out he was missing three key pieces of functionality in Pandas:
- Using inplace=True to make changes stick
- Applying a function to a single column of a DataFrame
- Applying a function that takes arguments to a DataFrame
While the Pandas documentation is very good, it isn't 100% clear on how to use this functionality. So to help him, and to help you, I've created an iPython Notebook which shows you how to do all of this!
Get the code below, and leave any questions you have in the comments section. [Read more…]
If you’ve been reading books and blog posts on machine learning and predictive analytics and are still left wondering how to create a predictive model and apply it to your own data, this presentation will give you the steps you need to take to do just that.
Eighty percent or more of the time spent on data science projects is spent acquiring data, cleaning it, and preparing it for analysis. That data can come from a variety of sources, including APIs or individual web pages. However, not all data is created equal. Once we have automated its acquisition, much of it requires lengthy cleaning and formatting before it can be used. In this course you will learn how to obtain, clean, and mashup data in preparation for analysis. [Read more…]