Staring Into the COM Abyss

If you're aware of how software is developed on Windows, chances are you're eventually going to run into COM. While Windows exposes some simple C APIs (and these APIs were much better than its contemporaries like Toolbox, X, or Intuition), pretty much anything more complex than USER32 is exposed through COM interfaces, from Internet Explorer to DirectX. COM is also used to extend applications too: Office, Visual Studio, even the Windows shell provide COM interfaces to applications to hook into. Microsoft loves using COM for everything in Windows, even if third-parties don't like it as much (Usually, out of portability/complexity reasons.). Using COM to its full extent can make your application 34% sparklier, but it's a lot of work to properly use those interfaces (and there's a lot of them!).

Unfortunately not many were able to take advantage of COM, let alone in more obscure scenarios such as extending the shell. The people who could do so were quite rare, as Spolsky pointed out. Naturally, I was nerd sniped by a friend to write a shell namespace extension, one of the more obscure (little documentation, few did it, few know they exist) categories of COM extension. A shell namespace extension adds a "namespace" (basically virtual folder) to the shell, which has further objects represented in it. Common use cases for them include MTP for phones, inline ZIP file viewing, etc.

My extension was simple enough I thought I could implement it (and I did... with difficulty, as you'll see) myself. It would enumerate all the open Windows Explorer windows, and put links to them in a namespace. The point of this is that this was accessible from the stock system file dialogs, which is useful if you have a bunch of Explorer windows open, but want to save to one of them quickly. This was inspired by an OS/2 Workplace Shell feature (the one good feature of OS/2!). The challenge was going from a bare minimum knowledge of COM to knowing just enough to be dangerous able to make a shell namespace extension that works.

The OS/2 feature

A Brief Primer on COM

For something with a near-legendary reputation of complexity, it turns out COM is actually based on some very simple primitives. COM is based on interfaces (in the C++ manner) that implement vtables (a structure full of functions). These interfaces have a fixed shape, so they can be used from C. Every COM class implements the interface IUnknown, which implements three functions:

This isn't so bad. Of course, objects can implement a lot of interfaces, and because interfaces have a fixed shape, to extend an interface later, you have to create another interface. Microsoft ends up numbering them, so you get into situations where you have an IShellFolder2. And as Microsoft implements more features that require more interfaces, a class can get unwieldy if you're not careful. And then you have to assume the interfaces are well documented! And for extensibility, debugging isn't (as far as I'm aware) very great beyond printf macros.

COM classes are registered (that's what REGSVR32 is for), where they become known by other applications. A type library provides metadata, and is usually generated by an IDL file.

While you can use COM from C, it can get a bit unwieldy, because COM benefits from an environment of RAII and scoped destructors. ATL provides a template-based wrapper around COM for C++, and is what Microsoft recommends for COM development. Visual Studio greatly assists in terms of generating the boilerplate for categories of COM classes.

Do You Eat Your Burgers With or Without the Shell?

The beautiful part of Windows is how extensible it is. The ugly part of Windows is how no one can extend it properly. Trying to write a shell namespace extension from scratch is a Sisyphean endeavour. I don't think anyone's done it. Instead, you have to do things like your average Windows programmer in 2002 would have done - read someone much smarter than you's article on CodeProject, the site people copied and pasted from before Stack Overflow. His examples are helpful, but they have a critical flaw - they implement the list view on their own, instead of delegating out to the interface which handles using the stock one and its default behaviours for you. (For example, the example will crash on XP and newer because it doesn't handle the new ListView views.) Insightful, but back to the drawing board.

Round two. The example by Pascal Hurni is while slightly rougher, closer to what we want. His example uses the system ShellView, which gives you default behaviours and the ability to use it from a stock file dialog, which is what we want. The example enumerates through the registry (specifically, favourites for the file manager Directory Opus) and represents real filesystem entities, which is close to what we want - just swap out the enumerator.

I took some code I wrote for experimenting with actually listing Explorer windows. First I thought I'd have to enumerate all visible windows and filter on CabinetWClass, then figure out what messages to send to it in order to get useful information out, but it turns out an easier way was possible through COM. You create an instance of IShellWindows, then call get_Count and Item, which returns an IDispatch representing your window. IDispatch is essentially IUnknown with reflection, intended for situations like VBA where you want to enumerate methods and properties. We can cast it to an IWebBrowser (a remnant of when Internet Explorer and Windows Explorer were welded together), and get the location from there. Grafting it onto Pascal's example (and ripping out what code I didn't need for clarity), I had a working MVP (with bugs, of course)

Windows XP, opened Explorer windows

PIDLy Details

I cargo-culted (first step of learning) learned some things about the shell, and there's still a lot more I don't know about. Tip of the iceberg as follows:

The real sad part is for things like this, because the documentation is so lacking/missing, is that you may run into issues where not even Stack Overflow can help you. Experimentation or blind trust in ancient Usenet posts may be required. Or maybe you can be lucky enough to know someone who was there, remembers, and still cares. Remember, not many at the time knew how to use these effectively, and the number of people who do dwindles, to a point of extinction, another point to the cultural defeat of Windows.

I also found someone who implemented a wrapper class library and a bunch of samples around them, which probably would be handy if I didn't discover it after actually managing to make it. It might have been useful, but it does a lot for you, so perhaps it was for the best to understand how the shell/COM works at a lower level.


I managed to actually write the extension. It's available on GitHub, and I hope it provides a clearer example of a shell namespace extension (since you're likely not going to find many, let alone many who know it) as well as be useful to Explorer freaks. I had also contacted Pascal about the licensing ambiguities (since people just did open source in Windows circles by the edge of their seats back then) - it's MIT licensed for sure. Now you know how the ISausage is made, and it is delicious - if only people could figure out the best way to eat it.

Windows Vista, save dialog