Web Scraping and Data Mining Course
Want to learn how to scrape and mine insights from data on the web but not sure if you need a computer science degree to do it?
Too often we see highly technical presentations that make web scraping and data mining seem like an esoteric science requiring another degree to perform. Modern programming languages provide the flexibility we need to create custom software, however in many instances this is simply unnecessary.
If you've been looking to gather data from the web for research, marketing, intelligence gathering or other reasons, this course is for you.
You will learn:
- How to use a set of freely available tools to gather and mine data from the web
- Use Python to create custom web scrapers and analyze the data you gather
- Obtain data from web APIs including Twitter and LinkedIn
Unlike many courses and tutorials on the subject, this is a highly interactive course that provides individualized attention and instruction to students.
Who This Course Is For
People who would identify themselves as data analysts, data scientists, programmers, software developers. No technical experience is necessary, however basic knowledge of Python can help with the second half of the course.
This course is packed with content but you'll be guided each step of the way to obtain data from the web and mine it for insights.
We will provide you with all the tools you need plus skeleton web scraping code, which you will build upon in the course.
The course will span 8 weeks and be held online. You will meet twice a week for 60 minutes. Each class will be composed of a 50 minute lecture and 10 minutes of class Q&A.
Recordings will be available in the event that you have to miss a class, but we encourage students to attend as many classes as possible. Students will have a chance to interact with peers during the course as well, and receive feedback.
Course Format & Syllabus
The course will run from January 12 through February 3, 2016.
Each class will be composed of a 50 minute lecture followed by 10 minutes of Q&A. There will be weekly homework assignments that will take about 2-3 hours to complete. You are welcome to spend more if you'd like, but we aim to keep it to this amount.
Robert will provide feedback on the assignments through reviews of your results, and if you need additional help, he will be available for office hours by appointment.
During the course we will cover the following topics:
Week One: Know your tools: basic HTML, CSS Selectors, web APIs, Google Chrome Developer Tools, Scraper, import.io, Data Science Studio, Python (Anaconda), Python libraries including Jupyter, BeautifulSoup4 and Pandas.
Week Two: Web scraping using desktop tools
Week Three: Web scraping using online tools
Week Four: Obtain data from Web APIs with existing tools
Week Five: Mining your data for insights using Data Science Studio
Week Six: Web scraping using Python
Week Seven: Obtaining data from Web APIs using Python
Week Eight: Mining your data for insights using Python
At the end of this course you'll be able to apply your knowledge to other websites and web APIs.
Your Instructor: Robert Dempsey
Robert Dempsey is a skill hacker specializing in data engineering, business intelligence, Python development and self-improvement. He's founded and built three startups in tech and marketing, developed and sold two SaaS applications, consulted to Fortune 500 and Inc. 500 companies, and spoken nationally and internationally on software development and agile project management.
He currently heads up the data team at ARPC, organizes Data Wranglers DC, teaches data engineering at District Data Labs, and is the author of Python Business Intelligence Cookbook, from Packt Publishing.
The price of the course is $645. Most students choose to pursue a 3-part payment plan.
— The Application Period is Now Over —
If you wish to cancel your registration before the start of the course, we’ll refund the tuition you’ve paid minus a $215 non-refundable deposit. We do not refund tuition after the course has started.
*Most students choose to pursue a 3-part payment plan with a deposit of $215 and then 2 additional installments of $215. After you submit a down payment to confirm your spot we will work with you to confirm payment dates for the remainder of your tuition.
Q. What days are the classes being held?
A. Tuesdays and Thursdays from 7-8pm.
Q. How do I apply for the course?
A. Just fill out the application >>
Q. What is the time period for applying for the course?
A. The application period is open until midnight on December 31, 2015.
Q. Why do you have an application?
A. I want to be sure that anyone who signs up for the course will be successful. The application allows me to speak with people beforehand to ensure they will get the most out of the class.
Q. When does the course start?
A. The first class will be held January 12, 2016.
Q. Will I be able to speak with you during the course or is this all lecture and homework?
A. You will have unlimited email communication with me throughout the course, and I am setting aside office hours for GoToMeeting sessions with students on an as-needed basis. In short, I want to ensure this course is interactive and you can apply what you learn to other scraping projects.
Q. Will you show ways to do scraping and not get banned by the site?
A. Yes, yes I will.
Q. Can you recommend an alternative to data science studio for 100M records?
A. With that amount of data you need to use “big data” technologies like Hadoop and Spark. If you don't have infrastructure in-house you can deploy onto Amazon Web Services. However if you aren't looking for purely “free” solutions, the enterprise version of Data Science Studio can work with Spark.
Q. If I can write web spiders and scrape pages with scrapy, do I need this course?
A. If you are successfully scraping both simple and complex web pages and creating spiders, then no. I'd suggest taking an analysis-centric course instead.