RDAP 2018 Blog Post: Open Data in Chicago

Below is a blog post I wrote for the New England chapter of ASIS&T about my experience at the 2018 Research Data Access & Preservation summit in Chicago.

Attending the RDAP summit in Chicago was a great experience for me. I appreciated the diversity of speakers and viewpoints. As a new data management outreach librarian, it was valuable for me to be able to speak with my fellow librarians who have similar positions at other institutions.

Above: Obligatory Selfie in front of the reflective “Bean Sculpture” in Millenium Park; officially it’s called Cloud Gate, by sculptor Sir Anish Kapoor.

Having worked previously as an intern with the City of Boston’s Department of Innovation and Technology on their open data website redesign and communication, I was very interested to hear from Tom Schenk, Chief Data Officer from the City of Chicago. His talk was very engaging and he told many interesting data stories that stem from the development of a vibrant and engaged civic technology community in Chicago.

One of the goals with collecting large amounts of municipal data is to use data analytics to improve problems in the city that stem from infrastructure and also to improve the lives and health outcomes of Chicagoans. The goal of much of the data analysis is to predict future problems more quickly and with greater accuracy. Another goal is to prevent problems from occurring in the first place. For example, Tom Schenck said that underground city infrastructure is hit on average every 60 minutes. A 3D model of underground city infrastructure helps to decrease and prevent contact damage to underground infrastructure like pipes and wiring.

The city has also created a heatmap of rodent complaints. Using data analytics comprised of 31 different factors that correlate with rodent complaints over a seven day period, the city can predict where in the city the next increase in rodent complaints will occur. In a similar way the city can also use data analytics to find the food establishments with the highest possibility of risk of food poisoning. Using data analytics, the city is able to speed up the rate at which they can predict food violations by 7 days, which is important in preventing food poisoning in food customers. Schenck also mentioned that the computer code for this model is open source and available on Github. Other projects tackled by the cities data analytics include predicting where West Nile virus may occur, predicting where e-coli may occur on city beaches, and the Lead Safe project which aims to reduce children’s exposure to residential lead paint.

IMG_8094 (1)

The Clean Water project was created thanks to about 1000 hours of volunteered time from Chicagoans involved in the civic tech community. According to Schenk, the project used open science that is fully reproducible and available on BiorXiv.

In addition, I enjoyed many of the talks from university data management librarians. Andrew Johnson from the University of Colorado talked about defining the role of the library in an institution’s research data management. He referenced SPEC Kit from the ARL on data curation. He asked the question, “are we doing things because we can or because we have a good reason to be doing them ?” He cautioned against preservation for preservations sake. Finally, Andrew thought the library plays a unique role in the university, because it is the only place that understands the big picture of scholarly communication.

There were also many talks about FAIR data, which is an acronym for Findable, Accessible, Interoperable, and Re-usable. In talking about big data, Ayoung Yoon mentioned the 3 Vs: volume, variety, and velocity that characterize big data sets. Wendy Kozlowski from Cornell University’s ITS, talked about the development of a usable and interactive data storage finder. I thought this website was very impressive and well thought-out.

On my last day at RDAP, I particularly enjoyed the workshop titled, Building with the Carpentries. It was an overview of how to get involved with the carpentries at your local institution. I also had the opportunity to meet and talk with Tess Grynoch and Julie Goldman about the New England Library Carpentry community. In conclusion, I really enjoyed my trip to the RDAP summit in Chicago. I particularly enjoyed speaking with fellow research data librarians from other university institutions. It was interesting to observe and ask about how the roles vary at each institution depending upon its needs, priorities, and organizational structure.

Some sources I mentioned:

https://github.com/Chicago/clear-water

http://publications.arl.org/Data-Curation-SPEC-Kit-354/

https://finder.research.cornell.edu/storage