INF2186: Interoperability

I came into INF2186 knowing a fair bit about metadata and its importance, but one concept that I hadn’t really considered, despite being around it in action for years, was metadata standards and interoperability. Data interoperability is something we talk about all the time in GIS systems (the Data Interoperability extension, allowing legacy formats and files created in other programs to be opened in ArcGIS, is a must-have!), but I didn’t realize that metadata interoperability is crucial to the catalogues that we access most of our data through. Turns out I was contributing to that interoperability over the years I’ve worked at MDL by creating metadata just by filling out the fields in our data inventory (not realizing at first it was ISO 19139-compliant!) and in turn by writing documentation for our staff members to clarify what should be entered into each field, and how.

Here’s an example of a metadata record in our data inventory. This is a historical climate dataset for Canada, stored as annual, national-scale raster files for use in GIS software. These are the metadata properties we display to users:

Screen Shot 2015-03-27 at 9.34.07 PM

These provide information about the producer and nature of the dataset, its spatial reference parameters, licensing details, and include keywords and a description for discoverability. These properties are set with freeform text fields, date fields compliant with W3c-DTF, and picklists of our own internal taxonomy vocabularies. There are a few more metadata properties that aren’t visible here, including one YES/NO property that allows our metadata to be harvested by Scholars GeoPortal. Here’s what the same dataset looks like over there (alas, I can’t permalink it):

Screen Shot 2015-03-27 at 9.42.03 PM

Check out that bounding box created with the tool I mentioned in my previous post!

If we click on the “Details” button, we get to see the formatted metadata that was harvested from the MDL inventory.

Screen Shot 2015-03-27 at 9.50.29 PM

Some of these fields, including contact information, are populated based on the fact that the metadata pertaining to this dataset was harvested from the MDL record. But hey, this is interoperability at work! I didn’t really understand how this harvesting worked before I took this course, I just knew that it did, so that’s one more thing at work I have a better understanding of thanks to INF2186.

INF2186: My favourite metadata tool

My favourite metadata tool remains perpetually open in a browser tab at work: Klokan Technologies’ Bounding Box Tool, an easy-to-use utility that generates bounding box coordinates (in other words, the latitude and longitude values that enclose a given space) for given areas on-the-fly. This metadata is important for capturing the spatial extent of items and describing them in a consistent manner. For every paper map I catalogue into original MARC records, I record the extent in the 034 and 255 fields, and for every geospatial dataset in the MDL data inventory, the bounding box is entered into its own field in the metadata record. Given the different storage requirements of these two databases, it is very convenient that users are offered a choice of 12 (!) different syntax encoding schemes for capturing coordinates – I personally use MARC VTLS (which pops the coordinates into the appropriate subfields for quick copying and pasting) and CSV for these respective applications.

While I’ve been using the Bounding Box Tool for several years, it was only in this course that I learned the term “syntax encoding scheme”, and the flexibility that Klokan continues to develop into it makes it a fantastic tool for anyone working with geospatial resources and catalogues.

INF2186: Geospatially Speaking

A few months after I was hired to work at MDL, I started taking undergraduate classes in geographic information systems with the fantastic Don Boyes. My work on the Don Valley Project taught me about one particular subset of tools in ArcGIS, and through Prof. Boyes’ classes, I learned so much about representation, cartography, analysis, and decisionmaking using GIS. I also learned much more about why geospatial metadata is really, really important to those working with GIS, and these lessons are resonant on a daily basis in the reference work I do in the library. Geospatial datasets are frequently packaged with their accompanying metadata, and for very good reasons.

What does geospatial metadata tell us? Like many other metadata schemas that we reviewed in class, the core metadata elements typically tell us who produced the data, when, and for what reason. These identifying elements help us assess the reliability of the data contained therein, and allow us to pursue any questions we may have with the authoring party. They also describe how the data were created or collected, which allows users to evaluate their accuracy and precision for the purposes of analysis or selecting between multiple datasets. For example, a raster elevation dataset compiled from information collected by satellite would be more accurate and precise than the historic factory outlines I digitized from georeferenced maps in my previous post. However, given that no other digital maps of industry in the Lower Don Watershed exist, our datasets remain valuable, as long as the means of their production are understood.

Some additional metadata elements describe particular spatial parameters of datasets. The first of these properties, the dataset’s spatial extent, is typically autogenerated by GIS software – it is used to generate a bounding box that visually describes the distribution and limits of the information contained within it, and can be used to help assess whether a dataset is useful given one’s area of interest. The other two important parameters are critical for the proper use of a dataset: if spatial reference information is missing, it can be quite a challenge to get a layer in alignment with others on a map, as one can be forced to guess. Without definitively knowing the projection and datum of a geospatial dataset, one cannot be confident that the data observed on the map is actually associated with a given place on the ground.

Geospatial datasets also frequently contain attribute information, linking locations on the ground to quantitative or qualitative data about these locations. A good data dictionary contains information about these attributes and the different values that can be expected within them, and is very useful for understanding the contents of a dataset and what can be done with it. This attribute information can be contained within the metadata file itself, or as an accompanying document of text.

There are many, many other characteristics that are described within complete geospatial data metadata files, including licensing (very important when distributing data within an academic library!), update history and frequency, and spatial resolution, to name just a few. Metadata is typically stored according to FGDC or ISO-19115/19139 standards, and many GIS programs contain XML metadata parsers and editors. Knowing how to create, evaluate, and maintain metadata is very useful for anyone working in an environment where geospatial data is being used – it helps develop a culture of trust between people and towards the information they work with, and increases the quality of research and analysis work (while reducing the duplication of efforts) if data can be effectively searched, discovered, and evaluated. It’s also good for us to think critically about the data we produce, peruse, and consume anyways – metadata is of great use to such reflection.

In my subsequent posts, I will discuss what this course helped learn about metadata discovery and interoperability, as there is much I didn’t know about the systems I work with on a daily basis.

INF2186: My First Metadata

I first learned the word “metadata” in 2008, when I was first hired at what is now known as the Map & Data Library at the University of Toronto as a research assistant on the Don Valley Historical Mapping Project. After mastering the basics of georeferencing and digitizing in ArcGIS, I dutifully warped approximately one hundred historic maps of Toronto into place on the digital street grid, traced the course of the Don River and the wharves of the changing waterfront, and labelled six hundred points with the names and addresses of business and factories operating in the late 19th and early 20th centuries. As the project wrapped up one year after I started, we published several thematic digital map layers in two different formats, allowing viewers to overlay the environmental and industrial history of the lower Don Valley on the contemporary city in ArcGIS or Google Earth. However, how and why should a reader of these maps trust the data in them, considering a) very few of the locations mapped still exist in the present, and b) there is sometimes a mismatch between the location of features mapped at different times? (For a greater explanation and visualization of this problem, please see pages 50-52 of Marcel Fortin and Jennifer Bonnell’s chapter “Reinventing the Map Library”, part of their excellent edited volume Historical GIS Research in Canada, available as a free ebook from the University of Calgary Press.)

As it was up to me to communicate the contingencies of this research process and the individual sources themselves to users of these geospatial datasets, I learned that the answer to both of these questions lay in the metadata that would accompany them. Through the metadata creation tool included as part of our GeoNetwork installation, I created ISO 19139-compliant metadata for all layers we created containing information about the rationale of the dataset within the broader scope of the project, credits for those who worked on it, bibliographic details about the accompanying sources, and explanations about what given database entries mean in a historical context. As we worked from cartographic and textual sources to assemble our datasets (some of which conflicted with each other), we could not state that a business definitively existed at certain points in time, only that it appeared on the map at certain points in time. These important definitions and their implications were contained in the metadata.

Since the project wrapped up in 2009, I have worked on several more historical GIS projects at UofT, including a complete mapping of the Los Angeles streetcar network in the 1920s and a growing collection of historical hydrography datasets showing the changing shoreline and lost rivers of Toronto. This first project taught me the importance of geospatial metadata in the context of using GIS for historical research. In my next post, I will discuss how frustrating it can be when working in a GIS without metadata.

 

Works cited

Bonnell, Jennifer, and Marcel Fortin. 2014. Historical GIS Research in Canada. Calgary: University of Calgary Press.

INF2186: Metadata and print design

I started design school in 2004, and while I did quite well across my courses, the work that made me happiest was some of the least glamorous: I preferred the “boring” but highly detailed work of print production over art direction. (Hey, good production designers are hard to find.) Many assignments were submitted to instructors as oversize colour laser prints, which (like in the previous post) required communicating in standard languages and formats with printer shops. One of the assignments I unexpectedly enjoyed the most involved the art direction and production of a 64-page annual report for a corporation or nonprofit organization, transcribing the previous year’s content into a coherent publication. I realized I had a knack for the formatting of the names of staff members and donors, which took up pages and pages towards the end of the publication. In retrospect, coming up with a scheme for the visual hierarchy on these pages was similar to the development of a metadata schema: one must decide which details must appear on the page (for example, name, title, and location) and differentiate them typographically before embarking on lots of data entry into the software, applying the different type styles to each “field” consistently. One must also consider the limits of such fields and how data entered into them impacts the broader design – for example, when envisioning business cards for members of an organization, it is always smart to start with the individual with the longest name or job title, and ensure that the design accommodates their details before moving onto those with less information to convey in the same amount of space.

It’s amusing to reflect on these similarities now that I have the extra-long job title of “Original Cataloguer & Reference Specialist”. The library’s time-tracking software truncates this to “Original Cat”. Mew!