Implementing Software Development Practices on a Data Team

Typography PowerEver been frustrated that delivering on a project seemed to take forever? Do requirements change out from under you on a frequent basis? Are you tired of having a team member stomp on a day's worth of work by overwriting your file on the file server? Have you ever lost hours of work because your computer unexpectedly took a hiatus?

If you answered yes to any of the above questions you and your data team could benefit by implementing software development practices.

Implement Agile Development Practices

Agile Development is a group of software development methods in which requirements and solutions evolve through collaboration between self-organizing,cross-functional teams.

No data project is static. Your team and processes shouldn't be either. There are a number of agile practices that can greatly benefit you.

Product Owner: This is the person who defines the requirements. It could be the client herself, or someone on the team who is representing the client. If your company develops a product, you probably already have this roll filled by a product manager.

Cross-Functional Team: Back in the day, and to this day in some organizations, people are separated by function. If you want to rapidly develop solutions, that can't fly. What you need is a team composed of everyone necessary to create the product. On a data team this might be a data engineer, data scientist, web developer and a designer.

Iterative Development: Agile teams work in cycles, typically 2 weeks in length. At the end of each cycle the next iteration of the product is delivered to the product owner for review.

Pair Programming: Long believed to be a waste of “development resources” by unknowing managers, pair programming, two team members working together to develop code, can produce superior and highly stable solutions. It also helps bring junior team members up to speed faster.

Daily Standup: There are few meetings in agile projects, so this one is crucial. A standup is a 15-minute daily meeting where each person answers three questions:

  1. What did you do yesterday?
  2. What are you going to do today?
  3. What, if anything, is getting in your way?

Write Tests for Your Code

It's totally possible to click around an application and believe it's working because nothing goes wrong. When you have more than one person on a project, or a project of any significance or size, this quickly falls down. This is where automated testing comes into play.

Automated testing:

  • Ensures requirements are met – the tests describe what the code is supposed to do, and therefore what the application is supposed to do
  • Ensures that as you change the code, or when someone else does, it continues to work as expected
  • Makes refactoring your code easier
  • Allows other people on your team to trust your code

Every programming language worth its salt has packages for testing, such as:

Use Version Control

Version control is the management of changes to documents, computer programs, large web sites, and other collections of information.

Yes, there's software for that.

Back in the day developers played fast and loose by writing code and copying it to a file server. That worked well until the someone else on the team overwrote a day's worth of code. Geek anger.

Enter version control. Version control software:

  • Allows you to revert back to previous versions of code
  • Makes handling file conflicts a lot easier
  • Less chance of two (or more) people stomping on each other's code
  • Provides a full audit history of all changes to code or any other assets

Suggested Tool: Git, regardless of what programming language you're using. At a minimum learn how to branch, pull and push.

Implement Continuous Integration

Continuous integration is the practice, in software engineering, of merging all developer working copies to a shared mainline several times a day.

Continuous integration (CI) systems will merge and test your code on a daily, or more frequent basis. The main benefit of this is that you can ensure that new code, provided it has tests written for it, isn't “breaking the build”.

Ever deployed something to a production server only to find out your team deployed a bug? CI can help with that.

One bit of advice from experience: test in a clone of production. Code can work in a pristine test environment and then break once it encounters production data. That's a no go.

Suggested Tools: Travis CI or Jenkins. I've used both and they work well.

Reuse Code, Yours or Other's

If someone already built it, don't build it again. I know you might have a burning desire to, however don't. Look for open source versions of what you need on GitHub, or start searching Google. More than likely someone's already built a package for your programming language of choice. And if no one has, create one and put it out there. It helps the community and shows you know what you're doing.

Your Turn

What software development practices are implemented on your data team and how are they working for you?


Image courtesy of Charis Tsevis

Speak Your Mind