If you're aware of how software is developed on Windows, chances are you're eventually going to run into COM. While Windows exposes some simple C APIs (and these APIs were much better than its contemporaries like Toolbox, X, or Intuition), pretty much anything more complex than USER32 is exposed through COM interfaces, from Internet Explorer to DirectX. COM is also used to extend applications too: Office, Visual Studio, even the Windows shell provide COM interfaces to applications to hook into. Microsoft loves using COM for everything in Windows, even if third-parties don't like it as much (Usually, out of portability/complexity reasons.). Using COM to its full extent can make your application 34% sparklier, but it's a lot of work to properly use those interfaces (and there's a lot of them!).
Unfortunately not many were able to take advantage of COM, let alone in more obscure scenarios such as extending the shell. The people who could do so were quite rare, as Spolsky pointed out. Naturally, I was nerd sniped by a friend to write a shell namespace extension, one of the more obscure (little documentation, few did it, few know they exist) categories of COM extension. A shell namespace extension adds a "namespace" (basically virtual folder) to the shell, which has further objects represented in it. Common use cases for them include MTP for phones, inline ZIP file viewing, etc.
My extension was simple enough I thought I could implement it (and I did... with difficulty, as you'll see) myself. It would enumerate all the open Windows Explorer windows, and put links to them in a namespace. The point of this is that this was accessible from the stock system file dialogs, which is useful if you have a bunch of Explorer windows open, but want to save to one of them quickly. This was inspired by an OS/2 Workplace Shell feature (the one good feature of OS/2!). The challenge was going from a bare minimum knowledge of COM to knowing just enough to be
dangerous able to make a shell namespace extension that works.
For something with a near-legendary reputation of complexity, it turns out COM is actually based on some very simple primitives. COM is based on interfaces (in the C++ manner) that implement vtables (a structure full of functions). These interfaces have a fixed shape, so they can be used from C. Every COM class implements the interface
IUnknown, which implements three functions:
AddRef, which increments the reference count. Every COM object is reference counted, so you'll use this when you make a copy of a held-on reference.
Release, which decrements the reference count. The object frees itself when the count hits zero.
QueryInterface, which gives you the vtable of another interface if the object implements it (casting). The interfaces are identified by a GUID.
This isn't so bad. Of course, objects can implement a lot of interfaces, and because interfaces have a fixed shape, to extend an interface later, you have to create another interface. Microsoft ends up numbering them, so you get into situations where you have an
IShellFolder2. And as Microsoft implements more features that require more interfaces, a class can get unwieldy if you're not careful. And then you have to assume the interfaces are well documented! And for extensibility, debugging isn't (as far as I'm aware) very great beyond printf macros.
COM classes are registered (that's what REGSVR32 is for), where they become known by other applications. A type library provides metadata, and is usually generated by an IDL file.
While you can use COM from C, it can get a bit unwieldy, because COM benefits from an environment of RAII and scoped destructors. ATL provides a template-based wrapper around COM for C++, and is what Microsoft recommends for COM development. Visual Studio greatly assists in terms of generating the boilerplate for categories of COM classes.
The beautiful part of Windows is how extensible it is. The ugly part of Windows is how no one can extend it properly. Trying to write a shell namespace extension from scratch is a Sisyphean endeavour. I don't think anyone's done it. Instead, you have to do things like your average Windows programmer in 2002 would have done - read someone much smarter than you's article on CodeProject, the site people copied and pasted from before Stack Overflow. His examples are helpful, but they have a critical flaw - they implement the list view on their own, instead of delegating out to the interface which handles using the stock one and its default behaviours for you. (For example, the example will crash on XP and newer because it doesn't handle the new ListView views.) Insightful, but back to the drawing board.
Round two. The example by Pascal Hurni is while slightly rougher, closer to what we want. His example uses the system ShellView, which gives you default behaviours and the ability to use it from a stock file dialog, which is what we want. The example enumerates through the registry (specifically, favourites for the file manager Directory Opus) and represents real filesystem entities, which is close to what we want - just swap out the enumerator.
I took some code I wrote for experimenting with actually listing Explorer windows. First I thought I'd have to enumerate all visible windows and filter on
CabinetWClass, then figure out what messages to send to it in order to get useful information out, but it turns out an easier way was possible through COM. You create an instance of
IShellWindows, then call
Item, which returns an
IDispatch representing your window.
IDispatch is essentially
IUnknown with reflection, intended for situations like VBA where you want to enumerate methods and properties. We can cast it to an
IWebBrowser (a remnant of when Internet Explorer and Windows Explorer were welded together), and get the location from there. Grafting it onto Pascal's example (and ripping out what code I didn't need for clarity), I had a working MVP (with bugs, of course)
cargo-culted (first step of learning) learned some things about the shell, and there's still a lot more I don't know about. Tip of the iceberg as follows:
IDataObject, one of those can-contain-anything OLE structures usually used for the clipboard.
IPropertyStoreand XML file representing custom metadata columns, but I was seemingly lucky enough I could just use built-in property keys and (I believe) the real filesystem entities have their metadata fill in. I just had to map the (most; I didn't notice anything bad from not mapping everything) column IDs I was using to the system included property keys and handle the property keys that resolve to other property keys for what to display on tiles and such.
The real sad part is for things like this, because the documentation is so lacking/missing, is that you may run into issues where not even Stack Overflow can help you. Experimentation or blind trust in ancient Usenet posts may be required. Or maybe you can be lucky enough to know someone who was there, remembers, and still cares. Remember, not many at the time knew how to use these effectively, and the number of people who do dwindles, to a point of extinction, another point to the cultural defeat of Windows.
I also found someone who implemented a wrapper class library and a bunch of samples around them, which probably would be handy if I didn't discover it after actually managing to make it. It might have been useful, but it does a lot for you, so perhaps it was for the best to understand how the shell/COM works at a lower level.
I managed to actually write the extension. It's available on GitHub, and I hope it provides a clearer example of a shell namespace extension (since you're likely not going to find many, let alone many who know it) as well as be useful to Explorer freaks. I had also contacted Pascal about the licensing ambiguities (since people just did open source in Windows circles by the edge of their seats back then) - it's MIT licensed for sure. Now you know how the ISausage is made, and it is delicious - if only people could figure out the best way to eat it.