Python Web Scraper Docker Image

I was thinking to create a long post giving you the step-by-step instructions on how to install everything you need to scrape the web with Python. Then I got a bit smarter about it and made a Docker image you can download and run literally anywhere.


Three Pandas Tips for Pandas Noobs

Last week after my talk at Data Science DC on how to create your first predictive model in Python, a fellow meetup member asked me about using Pandas for some data engineering work he was doing. In short, his DataFrame didn’t seem to be applying the changes he was attempting to make. After a bit of conversation I found out he was missing three key pieces of functionality in Pandas:

  1. Using inplace=True to make changes stick
  2. Applying a function to a single column of a DataFrame
  3. Applying a function that takes arguments to a DataFrame

While the Pandas documentation is very good, it isn’t 100% clear on how to use this functionality. So to help him, and to help you, I’ve created an iPython Notebook which shows you how to do all of this!

Get the code below, and leave any questions you have in the comments section. [Read more…]


How to Create Your First Predictive Model in Python

If you’ve been reading books and blog posts on machine learning and predictive analytics and are still left wondering how to create a predictive model and apply it to your own data, this presentation will give you the steps you need to take to do just that.

Tonight at Data Science DC I will be presenting on the steps you need to take in order to go from raw data to a trained predictive model you can implement in a production system.


Data Acquisition and Wrangling with Python Workshop

Eighty percent or more of the time spent on data science projects is spent acquiring data, cleaning it, and preparing it for analysis. That data can come from a variety of sources, including APIs or individual web pages. However, not all data is created equal. Once we have automated its acquisition, much of it requires lengthy cleaning and formatting before it can be used. In this course you will learn how to obtain, clean, and mashup data in preparation for analysis.


Using PyParsing For Web Scraping, Application Control and Data Wrangling

That’s what I’ll be talking about at DC Python on April 7th.

In this talk you’ll learn how to use pyparsing, a free Python module, to create and execute simple grammars for web scraping, application control and data wrangling. Dump the nested if statements and get parsing. Oh yes, there will be code, and lots of it, to get you started!

Also On The Agenda

On July 1st I’ll be speaking at my monthly meetup – Data Wranglers DC – about writing grammars in pyparsing for web scraping, data wrangling, and app control. This will be a longer and more hands-on version of my DC Python talk.

In addition I’m talking with both Data Society and District Data Labs about teaching data wrangling classes. Stay tuned for more on those!


Add RSS Feeds To Your Dashing Dashboard

Dashing RSS Feeds Widget

With that said, you can imagine how excited I was when I came across Dashing, a Sinatra based framework that lets you build beautiful dashboards. However my excitement was short lived when, while I was working on my home dashboard (weather, RSS feeds, the daily Dilbert cartoon), I discovered the RSS widget I was using wouldn't pull in feeds from StackOverflow.


Join Me Wednesday for Growth Hacking and Knowledge Sharing

Growth Hacking

The world henceforth will be run by synthesizers, people able to put together the right information at the right time, think critically about it, and make important choices wisely.

– Shawn DuBravac, Chief Economist and Sr. Director of Research, Consumer Electronics Association

Growth hacking is a marketing technique that blends creativity, analytical thinking, and social metrics to gain exposure and sell products. Put another way, growth hacking is concerned with using data and the insights gained from data to get more exposure (marketing) and sell more products (sales) while keeping costs low and revenues high.

On Wednesday nightI'll be talking about growth hacking at Nightowls Coworking @ Techshop in Arlington, VA., specifically:


Control F.lux With Alfred

What It Is

F.lux is an Alfred workflow for controlling aspects of F.lux, an app that makes the color of your computer’s display adapt to the time of day – warm at night and like sunlight during the day.

Activating Color Effects and Disabling Flux

Activate the Workflow

The keyword trigger flux will bring up the list of available workflow options.

Activate Darkroom

The darkroom color effect makes your screen a crazy red with white and gray everything else. Try it out to experience it for yourself by selecting “Activate F.lux darkroom”.

flux-activate-darkroom [Read more…]


Use Python 3 To Send Email With Attachments Using Gmail

Python 3 Email With Attachments Using Gmail

Part of my job as a data analyst is to create reports and send them to various people. I had already used Python 2.7 to automate the creation and sending of the reports using Gmail, however all the examples I found for Python 3 were either more complicated or simple than what I needed. Below is the code that will allow you to easily create an email with attachments and send it using Gmail, all with Python 3.


What You Must Do And Stop Doing To Be Healthy [Infographic]

We’ve all seen the men and women in fashion magazines, who seem to make looking good easy. We’ve also seen how media companies manipulate photographs and cover models use steroids, giving we, the unknowing folk, a very incorrect perception of what it takes to “be healthy.”

One of the companies I respect and follow for their no-BS approach to health is Precision Nutrition. Recently the created in infographic showing just what you have to do, and stop doing, in order to get to different levels of body fat. For instance, the leaner you want to get the more you have to strategize on everything you eat. That can lead to trouble when you go out with friends. If you get to go out that is, due to the amount of time you spend in the gym.

Check out the infographic below for the tradeoffs… [Read more…]