(Header: A typical piece of R code analyzing a table. Yes, an operation is repeated three times. No, it doesn’t work any other way.)

What project did you work on this summer for GOV/LAB?

I worked for the DATA/GOV Initiative under Professor Hidalgo, which seeks to empower local and national activists, non-governmental organizations (NGOs), and the government itself by making data regarding transparency of local municipal-level governments more readily accessible. The full research description is online here.

What would be the real world applications of this work?

By creating a user-friendly and accessible database of transparency data (such as agendas and notes from public meetings or municipal-level budgets) we hope that NGOs, politically minded citizens, and activists can easily see which local governments are transparent with their data and which ones need to improve. Greater transparency will allow citizens to see how government decisions affect them in their everyday lives and empowers them to be a more active part of their communities. In that regard, we believe that this project will improve government transparency across the board by making compiled data open, free, and easy to use and interpret.

What was the biggest challenge in your summer research?

Many local governments in the U.S. have a fair deal of autonomy, which is great for allowing them to operate unconstrained by higher levels of authority. It is, however, challenging from a data perspective because it means that there are a wide variety of websites, layouts, and presentations of data, making it rather difficult to write one program that will analyze them all. To combat this, we use a variety of different models to predict and analyze our data, but not all of them are appropriate to use for a given town, since some methods are more human-intensive than others, straining limited resources. For instance, while Amazon provides a wonderful “Mechanical Turk” (which utilizes human power to cross-check our tools), we don’t always have the time or budget to utilize it.

Furthermore, R (an otherwise excellent data analysis language that works beautifully with lists and tables) has some interesting interactions with null values. Basically, “null” is used whenever data about a variable is needed but doesn’t have a value, and it can’t take a dummy value (like zero) without changing its significance. However, R really doesn’t like null at all, which causes very fun, Java-like null pointer issues if programmed badly. A single, misshapen line of code has never been more cause for concern (and a few cups of tea over hours of debugging).

What is the best untold secret at MIT? Or, what is the most important advice on how to survive your first year at MIT?

There are no stupid questions. Literally. At some point everyone thinks of something they need to ask, but you’re at MIT, so ask away!

Ming Liu is a sophomore at MIT majoring in Finance. He has political experience, having worked with his local representative’s office for over a year, as well as helping with voter-motivation efforts with the last election. Outside of academia, Ming enjoys art, photography, and literature. Contact Ming at mingl(at)mit.edu.