Monday, February 04, 2008

Interview on U.S. environmental data and Web 2.0

Joab Jackson, EPA the Web 2.0 way, Government Computer News, February 4, 2008. An interview with Molly O'Neill, Chief Information Officer of the U.S. Environmental Protection Agency. (Thanks to FreeGovInfo.)

GCN: What is unique about the type of data EPA works with?

MOLLY O'NEILL: We have a lot of scientific data, so for us, data standards are really important. We also have a lot of regulatory data — that is, the data that industry gives to government. But we delegate much of the implementation of our regulations to the states. So the data goes from industry to the states to EPA. We have to ensure the data quality from the time the samples were pulled.

This is a role that the National Environmental Information Exchange Network plays. One of the most important things about the network is that it facilitates the exchange of the data among all the parties. The idea is that we don't touch it. It is all done in [Extensible Markup Language] and Web services. So we're not trying to reformat. We don't break interfaces or do double data entries, which may compromise the data quality or our decisions when we use this data for analysis. ...

GCN: Why do you think federal agencies have such a hard time disseminating information on the Web?

O'NEILL: For us, there are three reasons. One is that there is such a huge demand. We have so many stakeholders who want information in different ways.

People can't get enough environmental information. And if they can't find it, they get upset.

Sometimes the taxonomy is confusing.

Another issue is some of the data is stored in older databases.

It is harder to disseminate to people. As we update our old data systems, we are architecting in a way to more easily get the data in and out.

But the third reason is that we tend to organize data in a way that it makes sense to us. Although this is changing a little bit now, at EPA we still primarily organize our data by how we are organized as an agency. People outside the agency don't think of things that way. They get frustrated because they want all the information about a subject, like climate change or environmental indicators. So where do they go? We're doing a lot to improve search on our site. When you do a search on the main page, it will give you folder options. When you type in "waste water," it will organize by folder topics like stormwater or industrial effluent.

Also, because we're science-based, we get spelling issues or problems with chemical names.

Someone might search for "trichloroetheline" instead of "trichloroethelyne" and get totally different results. Therefore, we're doing the "Did you mean?" feature, similar to what Google provides.

But we also need to think about how we organize, present and disseminate data. One of the things we are doing here in my office is to start a national dialogue where we'll go out and ask people how they want us to disseminate our information.

We know we're going to get different ideas from different focus groups, but we need to hear them. And this will help us write a plan for addressing the issue of better information access. It's not just how the Web site is designed but how we service or disseminate the data. Do we want more e-mail lists? RSS feeds? We need to ask about those kinds of questions. The hope is that we'll make some helpful changes along the way.