Data Science for Social Good is now four years old. If we were a toddler, we’d be learning our alphabet and arithmetic. If we were a high school student, we’d be getting ready to graduate and go to college. If we were a grad student…well, we’d still be in the thick of it.
But in four years of running the fellowship, we’ve learned a lot about what skills, tools, and methods are important for doing data-driven projects with real world organizations. We’ve fine-tuned our curriculum, from the deep dive of the first two weeks through the overall week-to-week structure of the summer. This summer, we’re going to open up this process to the public, sharing our teaching materials through Github and narrating our thought process and results here and at the DSSG website.
From the start, one of our core missions for the fellowship was to build a community of data scientists passionate about using their talents to create positive change. While we can only bring 40-some fellows to Chicago each summer, we want to include as many people as possible in our journey, in both learning and helping us build a curriculum. While it’s not as fancy and interactive as an online course (maybe someday), we hope this “Hitchhiker’s Guide” will provide a valuable reference and roadmap for people interested in doing data science for social good.
=====
“Data science” is an incredibly broad term, and adding on “social good” only makes matters worse. The DSSG advance team spent the first five months of 2016 brainstorming all the skills a data science for social good master/wizard/ninja/witch/etc. should know. While there are already countless different guides to becoming a data scientist, our version places more weight on the ability to understand social context and to work with partners from nonprofits and government agencies. Eventually, we came up with eight general areas:
- Programming, because you’ll need to tell your computer what to do, usually by writing code.
- Computer science, because you’ll need to understand how your data is structured as well as the algorithms you use to manipulate and interpret it.
- Math and stats, because everything else in life is just applied math.
- Machine learning, because you’ll want to build predictive or descriptive models that are able to incorporate new data and adapt over time.
- Social science, because you’ll need to understand the methods sociologists, economists, political scientists, and other domain experts use to study people, societies, and systems.
- Scoping and project management, because you’ll need to turn a real world problem into something you can model – and then work with a partner organization and an interdisciplinary team to complete the project.
- Privacy, ethics, and security, because data is people.
- Communications, because if your partners don’t understand your work it won’t be implemented, and if more of the public doesn’t hear about your work it won’t be replicated.
- Social issues, because you care about people and need to think through the context you’re working in.
That’s all well and good, but what do you teach first? Our fellows come in with a wide spectrum of tech skills and experience, from experienced hackers to rookie coders. For the first week, our theme was “Make New Friends (Both Human and Technical).” We want to get everyone on common ground and started on the most essential skills for their summer projects, without overloading their brains and while still allowing ample time for everyone to get to know each other.
On the technical end, we need to make sure everyone has the same basic toolbox to work with. Our software setup session might not be the most glamorous activity of the first week, but it’s essential for avoiding bigger problems down the road. Before the fellowship starts, we send an e-mail to fellows about what they should install in advance, but hands-on help is still important, since our participants come from a variety of different technical backgrounds. You can peruse our full stack of software, including Python, R, Jupyter, PSQL, and more.
The second day of orientation covers a couple critical skills for running a successful project: learning the version control tools of Git and Github and building a project pipeline. A pipeline serves as a “code design philosophy” for the summer, a way to both structure programming for the project and create a reusable process for future data projects beyond the fellowship. Day 3 covers two more tools that will be used often this summer: the command line, Jupyter (formerly the iPython notebook), and Pandas, the popular data exploration package for Python.
To balance out the technical tutorials, we also spend significant time the first week giving fellows a peek behind the curtain of DSSG. We provide a history of the program, talk about how we chose this year’s projects and scoped them to make sure they were a good fit, explain our application process, and talk about the skills we want everyone to leave with at the end of the summer. That latter group includes project management skills, which we start teaching with sessions on how to communicate with project partners and use tools such as Trello and Slack to organize tasks and stay on schedule.
Finally, it’s important that we reserve plenty of time for everyone to get to know who they’ll be working with all summer. We do icebreakers such as “finding common ground through set theory,” where people are divided into small groups and have to find something that each subset of the group — and only that subset — has in common (shoutout Jonah Ostroff). Another DSSG icebreaker tradition is the “spectrum game” — where everyone stands along a line according to their opinion on a divisive statement such as “data science is just statistics” or “Katy Perry vs. Taylor Swift.” The week’s climax is a scavenger hunt, which this year sent ten teams out to ten different Chicago neighborhoods to visit landmarks, learn local history, and (most importantly) bring back food. (We plan to cover our “social orientation” in a separate post.)
One major difference in this year’s orientation is that we decided not to reveal the project teams until Week 2, primarily to facilitate bonding across the entire fellowship instead of immediately slicing everyone up into small groups. While that builds anticipation for the specifics of each fellow will be doing this summer, the response from this year’s class was positive, and we think it sets the stage for valuable across-team collaboration and cohort community throughout the summer.
By the end of the first week, we’re out of the starting gate on some of our main priorities for the summer: technical training, project management, and community building. With strong foundations in each of these areas, everyone’s prepared to start digging into projects in Week 2.
Next Week: Project assignments, partner visits, and the second wave of essential training.