A few months after I was hired to work at MDL, I started taking undergraduate classes in geographic information systems with the fantastic Don Boyes. My work on the Don Valley Project taught me about one particular subset of tools in ArcGIS, and through Prof. Boyes’ classes, I learned so much about representation, cartography, analysis, and decisionmaking using GIS. I also learned much more about why geospatial metadata is really, really important to those working with GIS, and these lessons are resonant on a daily basis in the reference work I do in the library. Geospatial datasets are frequently packaged with their accompanying metadata, and for very good reasons.
What does geospatial metadata tell us? Like many other metadata schemas that we reviewed in class, the core metadata elements typically tell us who produced the data, when, and for what reason. These identifying elements help us assess the reliability of the data contained therein, and allow us to pursue any questions we may have with the authoring party. They also describe how the data were created or collected, which allows users to evaluate their accuracy and precision for the purposes of analysis or selecting between multiple datasets. For example, a raster elevation dataset compiled from information collected by satellite would be more accurate and precise than the historic factory outlines I digitized from georeferenced maps in my previous post. However, given that no other digital maps of industry in the Lower Don Watershed exist, our datasets remain valuable, as long as the means of their production are understood.
Some additional metadata elements describe particular spatial parameters of datasets. The first of these properties, the dataset’s spatial extent, is typically autogenerated by GIS software – it is used to generate a bounding box that visually describes the distribution and limits of the information contained within it, and can be used to help assess whether a dataset is useful given one’s area of interest. The other two important parameters are critical for the proper use of a dataset: if spatial reference information is missing, it can be quite a challenge to get a layer in alignment with others on a map, as one can be forced to guess. Without definitively knowing the projection and datum of a geospatial dataset, one cannot be confident that the data observed on the map is actually associated with a given place on the ground.
Geospatial datasets also frequently contain attribute information, linking locations on the ground to quantitative or qualitative data about these locations. A good data dictionary contains information about these attributes and the different values that can be expected within them, and is very useful for understanding the contents of a dataset and what can be done with it. This attribute information can be contained within the metadata file itself, or as an accompanying document of text.
There are many, many other characteristics that are described within complete geospatial data metadata files, including licensing (very important when distributing data within an academic library!), update history and frequency, and spatial resolution, to name just a few. Metadata is typically stored according to FGDC or ISO-19115/19139 standards, and many GIS programs contain XML metadata parsers and editors. Knowing how to create, evaluate, and maintain metadata is very useful for anyone working in an environment where geospatial data is being used – it helps develop a culture of trust between people and towards the information they work with, and increases the quality of research and analysis work (while reducing the duplication of efforts) if data can be effectively searched, discovered, and evaluated. It’s also good for us to think critically about the data we produce, peruse, and consume anyways – metadata is of great use to such reflection.
In my subsequent posts, I will discuss what this course helped learn about metadata discovery and interoperability, as there is much I didn’t know about the systems I work with on a daily basis.