INF2186: This class has me got thinking…

Despite the fact that I’ve been working with geospatial metadata for many years – both in the creation and maintenance of metadata for geospatial datasets, but also in the MARC fields containing spatial information that I insert into original catalogue records for maps – this course got me thinking about all of the other ways that geographic information can be held in metadata.

Over the course of INF2186, we saw geographic information stored using the vocabulary encoding scheme of ISO 3166, which would allow objects to be mapped to the country level by using an appropriate spatial dataset of countries and joining on a field containing the same country codes. If the correct syntax encoding scheme is employed for the storage of a street address stored as a metadata property (perhaps belonging to the Dublin Core class Location), its contents can be mapped using any one of a number of geocoding APIs. As I illustrated in my earlier post about using the Bounding Box Tool, employing a syntax encoding scheme to store bounding coordinates in CSV format allows for interoperability with dcTerms:spatial, providing the potential to map out spatial resources in a format similar to what Scholars GeoPortal does here. These means of connecting fields in a database or repository to locations on a map are not new to me, but thinking about them through the concept of encoding schemes brings a different context to the work that I perform on a regular basis. Such encoding schemes were omnipresent in application profiles we examined in this course, but now I see the examples I just raised as yet another demonstration of interoperability.

In the closing weeks of this course, I learned a lot of new things about geographic data storage, like the geo: namespace for storage of locations in qnames, and the expression of spatial relationships through RDF triples (Allemang and Hendler 2011, 36-37). I’m working on another paper right now that, among other things, asks if maps of unceded indigenous territories can be housed in map libraries using Library of Congress cartographic classification, which assumes that the majority of locations fit nicely within a hierarchy of recognized administrative and political boundaries. I wonder if using semantic data modelling like RDF to express processes like colonial dispossession in larger systems of knowledge organization is possible, and if such a model could be used in map libraries.

On a very different note, I was interested to read Christian Becker and Christian Bizer’s article on the infrastructure behind and the possibilities of DBpedia Mobile (2009) and the consolidation of many different kinds of spatial data resources into one interface. Of course, it never hurts to be mindful about the political ramifications of such structures for spatial data storage, particularly when locative media is overlaid with myriad data by mere virtue of their proximity. In week nine, we read Ann Cavoukian’s primer on metadata and surveillance (2013), which lays out the concerns we should all take up regarding geolocative devices and media, metadata, and privacy. Thinking through the seemingly infinitely flexible and extensible systems used to house such metadata boggles my mind from a database perspective, especially when combined with the general mistrust of “big data” analysis that I share.

The skill learned in this class that I’d most like to develop is that of abstract modelling and diagramming, which I guess I’ll get to do in other courses, as this is only the end of my first of 5.5 years in this program. While the many entity-relationship diagrams we encountered in this course demonstrate the value of communicating the core essence of a project to stakeholders in its development and use, one line in a table in Willis, Greenberg and White’s review of metadata schemas for scientific data management got me thinking about the temporal importance of such models. In a thought-provoking claim, they note that “[a] well-defined metadata scheme will likely outlive its initial rendering. Abstraction allows needs be captured a way that supports multiple renderings over time” (2012, 1515). Given the forward-thinking-for-backwards-compatibility chatter that goes on all around me in libraries (hello Windows!), employing such diagrammatic tools as signposts to keep us focused on the core functionality of systems is really smart.

On that note, I’m also interested in diving more into linked data in the bibliographic world – as a cataloguer, I’ve heard so much about the transition from MARC to FRBR, RDA and BIBFRAME, but I’m not sure how or when such infrastructure is going to meet the work that I do. In any case, this course has been great in helping me see metadata structures where I hadn’t thought to look for it in my life, reconceptualize the work that I do on a daily basis, and better understand the future of library and information systems. Thanks for reading!

Works cited

Allemang, D., and J. Hendler. 2011. Semantic Web for the working ontologist: effective modeling in RDF and OWL. Waltham, MA: Morgan Kaufmann/Elsevier.

Becker, C., and C. Bizer. 2009. Exploring the geospatial semantic web with DBpedia Mobile. Web Semantics 7: 278-286.

Cavoukian, A. 2013. A primer on metadata: separating fact from fiction. Toronto: Privacy by Design.

Willis, C., J. Greenberg, and H. White. 2012. Analysis and synthesis of metadata goals for scientific data. JASIST 63: 1505-1520.

 

INF2186: Interoperability

I came into INF2186 knowing a fair bit about metadata and its importance, but one concept that I hadn’t really considered, despite being around it in action for years, was metadata standards and interoperability. Data interoperability is something we talk about all the time in GIS systems (the Data Interoperability extension, allowing legacy formats and files created in other programs to be opened in ArcGIS, is a must-have!), but I didn’t realize that metadata interoperability is crucial to the catalogues that we access most of our data through. Turns out I was contributing to that interoperability over the years I’ve worked at MDL by creating metadata just by filling out the fields in our data inventory (not realizing at first it was ISO 19139-compliant!) and in turn by writing documentation for our staff members to clarify what should be entered into each field, and how.

Here’s an example of a metadata record in our data inventory. This is a historical climate dataset for Canada, stored as annual, national-scale raster files for use in GIS software. These are the metadata properties we display to users:

Screen Shot 2015-03-27 at 9.34.07 PM

These provide information about the producer and nature of the dataset, its spatial reference parameters, licensing details, and include keywords and a description for discoverability. These properties are set with freeform text fields, date fields compliant with W3c-DTF, and picklists of our own internal taxonomy vocabularies. There are a few more metadata properties that aren’t visible here, including one YES/NO property that allows our metadata to be harvested by Scholars GeoPortal. Here’s what the same dataset looks like over there (alas, I can’t permalink it):

Screen Shot 2015-03-27 at 9.42.03 PM

Check out that bounding box created with the tool I mentioned in my previous post!

If we click on the “Details” button, we get to see the formatted metadata that was harvested from the MDL inventory.

Screen Shot 2015-03-27 at 9.50.29 PM

Some of these fields, including contact information, are populated based on the fact that the metadata pertaining to this dataset was harvested from the MDL record. But hey, this is interoperability at work! I didn’t really understand how this harvesting worked before I took this course, I just knew that it did, so that’s one more thing at work I have a better understanding of thanks to INF2186.

INF2186: My favourite metadata tool

My favourite metadata tool remains perpetually open in a browser tab at work: Klokan Technologies’ Bounding Box Tool, an easy-to-use utility that generates bounding box coordinates (in other words, the latitude and longitude values that enclose a given space) for given areas on-the-fly. This metadata is important for capturing the spatial extent of items and describing them in a consistent manner. For every paper map I catalogue into original MARC records, I record the extent in the 034 and 255 fields, and for every geospatial dataset in the MDL data inventory, the bounding box is entered into its own field in the metadata record. Given the different storage requirements of these two databases, it is very convenient that users are offered a choice of 12 (!) different syntax encoding schemes for capturing coordinates – I personally use MARC VTLS (which pops the coordinates into the appropriate subfields for quick copying and pasting) and CSV for these respective applications.

While I’ve been using the Bounding Box Tool for several years, it was only in this course that I learned the term “syntax encoding scheme”, and the flexibility that Klokan continues to develop into it makes it a fantastic tool for anyone working with geospatial resources and catalogues.

INF2186: Geospatially Speaking

A few months after I was hired to work at MDL, I started taking undergraduate classes in geographic information systems with the fantastic Don Boyes. My work on the Don Valley Project taught me about one particular subset of tools in ArcGIS, and through Prof. Boyes’ classes, I learned so much about representation, cartography, analysis, and decisionmaking using GIS. I also learned much more about why geospatial metadata is really, really important to those working with GIS, and these lessons are resonant on a daily basis in the reference work I do in the library. Geospatial datasets are frequently packaged with their accompanying metadata, and for very good reasons.

What does geospatial metadata tell us? Like many other metadata schemas that we reviewed in class, the core metadata elements typically tell us who produced the data, when, and for what reason. These identifying elements help us assess the reliability of the data contained therein, and allow us to pursue any questions we may have with the authoring party. They also describe how the data were created or collected, which allows users to evaluate their accuracy and precision for the purposes of analysis or selecting between multiple datasets. For example, a raster elevation dataset compiled from information collected by satellite would be more accurate and precise than the historic factory outlines I digitized from georeferenced maps in my previous post. However, given that no other digital maps of industry in the Lower Don Watershed exist, our datasets remain valuable, as long as the means of their production are understood.

Some additional metadata elements describe particular spatial parameters of datasets. The first of these properties, the dataset’s spatial extent, is typically autogenerated by GIS software – it is used to generate a bounding box that visually describes the distribution and limits of the information contained within it, and can be used to help assess whether a dataset is useful given one’s area of interest. The other two important parameters are critical for the proper use of a dataset: if spatial reference information is missing, it can be quite a challenge to get a layer in alignment with others on a map, as one can be forced to guess. Without definitively knowing the projection and datum of a geospatial dataset, one cannot be confident that the data observed on the map is actually associated with a given place on the ground.

Geospatial datasets also frequently contain attribute information, linking locations on the ground to quantitative or qualitative data about these locations. A good data dictionary contains information about these attributes and the different values that can be expected within them, and is very useful for understanding the contents of a dataset and what can be done with it. This attribute information can be contained within the metadata file itself, or as an accompanying document of text.

There are many, many other characteristics that are described within complete geospatial data metadata files, including licensing (very important when distributing data within an academic library!), update history and frequency, and spatial resolution, to name just a few. Metadata is typically stored according to FGDC or ISO-19115/19139 standards, and many GIS programs contain XML metadata parsers and editors. Knowing how to create, evaluate, and maintain metadata is very useful for anyone working in an environment where geospatial data is being used – it helps develop a culture of trust between people and towards the information they work with, and increases the quality of research and analysis work (while reducing the duplication of efforts) if data can be effectively searched, discovered, and evaluated. It’s also good for us to think critically about the data we produce, peruse, and consume anyways – metadata is of great use to such reflection.

In my subsequent posts, I will discuss what this course helped learn about metadata discovery and interoperability, as there is much I didn’t know about the systems I work with on a daily basis.

INF2186: My First Metadata

I first learned the word “metadata” in 2008, when I was first hired at what is now known as the Map & Data Library at the University of Toronto as a research assistant on the Don Valley Historical Mapping Project. After mastering the basics of georeferencing and digitizing in ArcGIS, I dutifully warped approximately one hundred historic maps of Toronto into place on the digital street grid, traced the course of the Don River and the wharves of the changing waterfront, and labelled six hundred points with the names and addresses of business and factories operating in the late 19th and early 20th centuries. As the project wrapped up one year after I started, we published several thematic digital map layers in two different formats, allowing viewers to overlay the environmental and industrial history of the lower Don Valley on the contemporary city in ArcGIS or Google Earth. However, how and why should a reader of these maps trust the data in them, considering a) very few of the locations mapped still exist in the present, and b) there is sometimes a mismatch between the location of features mapped at different times? (For a greater explanation and visualization of this problem, please see pages 50-52 of Marcel Fortin and Jennifer Bonnell’s chapter “Reinventing the Map Library”, part of their excellent edited volume Historical GIS Research in Canada, available as a free ebook from the University of Calgary Press.)

As it was up to me to communicate the contingencies of this research process and the individual sources themselves to users of these geospatial datasets, I learned that the answer to both of these questions lay in the metadata that would accompany them. Through the metadata creation tool included as part of our GeoNetwork installation, I created ISO 19139-compliant metadata for all layers we created containing information about the rationale of the dataset within the broader scope of the project, credits for those who worked on it, bibliographic details about the accompanying sources, and explanations about what given database entries mean in a historical context. As we worked from cartographic and textual sources to assemble our datasets (some of which conflicted with each other), we could not state that a business definitively existed at certain points in time, only that it appeared on the map at certain points in time. These important definitions and their implications were contained in the metadata.

Since the project wrapped up in 2009, I have worked on several more historical GIS projects at UofT, including a complete mapping of the Los Angeles streetcar network in the 1920s and a growing collection of historical hydrography datasets showing the changing shoreline and lost rivers of Toronto. This first project taught me the importance of geospatial metadata in the context of using GIS for historical research. In my next post, I will discuss how frustrating it can be when working in a GIS without metadata.

 

Works cited

Bonnell, Jennifer, and Marcel Fortin. 2014. Historical GIS Research in Canada. Calgary: University of Calgary Press.