Tuesday, November 14, 2017

Generic Factories within a Framework

Some third party libraries provide useful additional hooks to make the Factory patterns work even more smoothly.

Structured Initialization

The example that I'll use here is YAML, because that's what I have some experience with. Specifically, I've used the yaml-cpp library.

While Maps provided us nice, flexible initialization data, they're somewhat limited to being pretty flat structures. C++ is particularly troublesome here because it doesn't natively include the kinds of recursive maps found in other languages. So we can leverage those other recursive map objects like XML, JSON, YAML, or others, and use some third party library (or roll our own data format and parsing library, but... why?).

This provides an additional utility to your factories, namely the ability to change how the program behaves without recompiling, and without a user knowing or understanding your source code or C++. Along with your program, you can provide some configuration file, or capacity to read in a configuration file, which contains the appropriate information to spin up specific classes in specific ways, and configure those classes appropriately. 

A colleague of mine has taken this factory style and really fleshed it out, so I will simply link to it here. At a code discussion level, there's really not much particularly interesting in terms of our factory examples aside from the fact that we'll now pass in a YAML node as our initializer.

QObject Factory

Qt is a really popular C++ framework for cross-platform software. I'll save the discussion of its merits and deficits for some other time. What you need to know is that Qt has a class called "QObject" that forms the base for many of its classes. Qt creates another class of objects, 'meta objects' alongside classes that derive from QObject, which handle a variety of useful features, but chiefly callback registration (signals and slots) and object ownership.

Prior to C++ 11, C++'s biggest fault has been the issue of "new-ed" objects (to avoid discussion of stacks and heaps and stuff that is unpleasant if you aren't a CS background person). When you create an object through 'new' and hold a pointer to that object, traditionally you had to define somewhere the responsibility to delete the object, lest you leak memory. It has always surprised me as I've learned the language just how often it is useful to create these objects all the same. Anyway, prior to the smart pointers that now manage that memory for us, Qt decided to do so with what it calls "parent-child" relationships. A QObject can have another QObject for a 'parent' and that parent object will be responsible for cleaning up after the child when the parent is destroyed.

This allows us to adjust our factory in a way that makes it moderately safer. We can feel comfortable trading raw pointers out of our factory because we can simultaneously set ownership to a parent object. On the other hand, now that there are these parent-child memory management relationships, we really can't use the smart pointers available to us (since they too would try to clear out this memory, possibly at the wrong time).

class Base : public QObject{
  Q_OBJECT
public:
  Base(QObject* parent = nullptr)
    : QObject{parent}
  {}
  //...
};

template<class Base, class ClassIDKey = std::string, typename =  std::enable_if_t<std::is_base_of<QObject, Base>::value>
class GenericQFactory{
public:
  using BaseCreateFn = std::function<Base*(QObject*)>;
  //...
  template<class Derived, typename = std::enable_if_t<std::is_base_of<Base, Derived>::value> >
  class RegisterInFactory{
  public:
    static Base* CreateInstance(QObject* parent){
      return new Derived(parent);
    }
    // ...
  };
};

SFINAE in Factories

Let's pause a moment here because I've been somewhat inconsistent in this use without wanting to discuss it further. There's a thing in C++ called "SFINAE," "Substitution Failure Is Not An Error." Now, that phrase is worse than useless to me. It seems to mean that C++ will not throw an error at runtime simply because some type can't be handled by a templated class or function. Which is a roundabout way of saying, if you put restrictions on the template at compile time, you can avoid these stealthy errors from occurring without notice (namely, you'll get compilation errors that prohibit certain classes from being used in your template). Someone with the actual CS background can correct me or provide nuance if they wish.

The useful bit for us will be the double hit of 'enable_if' and 'is_base_of' template functions. Working from the inside out: is_base_of<Base, Derived> is a template class that defines a static member variable "value" inside of it. If Derived is a subclass of Base, "value" is true; and false if Derived is not a subclass of Base. (in c++17, this has been extended to a simpler helper object 'std::is_base_of_v', so you don't have to type 'std::is_base_of<Base,Derived>::value' all the time). 

std::enable_if<X> is a template class where if X is true, then the class defines a typedef 'type'. If false, such a typedef does not occur. So std::enable_if<X>::type (or the simpler helper definition: std::enable_if_t<X>) will only be meaningful when the expression in the template evaluates to true. So we've embedded our enable_if expression to evaluate the inheritance relationship, and that result defines this enable_if type (or not).

Finally 'typename = ' is kind of our general template parameter. Except that we're never actually going to use the parameter by name, so we can omit the name (it could have been something like <typename test = std::enable_if_t... ). We're using = to assign it a default type, which is the type from enable_if. And enable_if may not have 'type' if its template evaluated false. Which means we can't assign this default type (assigned to nothing, mind you), which means we can't resolve the whole template and our compiler will yell at us and we'll know for sure that we've tried to make our factory produce Bars that don't inherit from Foos and would be completely meaningless in terms of the factory. It's not necessary to do, but it's kind of nice to have that extra check in place (particularly if you have a lot of these factories for closely related bases floating around and you accidentally forget which is which).

Variants (and in particular, QVariant) as initializers

It may of course be useful for us to really stretch what we allow in as an initializer. Maybe the initialization is so specialized for each derived class that our above strategies of maps may not be sufficient. If you're lucky enough that the data may simply be one or more simple data types, perhaps the boost or (included in c++17) STL 'variant' objects may be useful for you. Perhaps even vectors of variants would give you flexibility. I haven't experimented with this too much, but you could use, perhaps:
using InitializerElement = std::variant<int, double, std::string>;
using Initializer = std::vector<InitializerElement>;

to pass in an ensemble of simple data types to the constructor.

This is where we run into the catch of using variants. We've now opened our initializers up to be 'anything' that we can pack into our variant. But it also means that entirely irrelevant data may come in, or be missing relevant data. It's like the problem with maps earlier, but expanded further, since now type information is variable.

On top of all of this, c++17 also has an 'any' class that seems like it could really hold... anything. This could be our ultimate generic object as initializer. I have had no experience at all using these, and I'll touch back on them in the next installment where we discuss variadic initialization.

For even more complication, in our Qt framework, we can pass in quite complex objects as 'QVariant's for initializers. QVariant relies on Qt's "meta" system again. So it requires that if you want to pack an object of some class into a QVariant, you must register that object using a special macro. But it becomes very useful as a generic 'anything' within their framework: Signal/slot connections can be inspected via QVariants, for example; QList<QVariant> is a QVariant, QMap<QString, QVariant> is a QVariant, allowing us the recursive lists and maps available from structured initializations.

So, to update our GenericQFactory from above:

class Base : public QObject{
  Q_OBJECT
public:
  Base(QObject* parent = nullptr)
    : QObject{parent}
  {}
  //...
};

class Derived : public Base{
  Q_OBJECT
public:
  Derived(const QVariant& initializer, QObject* parent = nullptr)
    : Base{parent}
  {
    // do something using initializer to set some variables or whatever
  }
  //...
};

template<class Base, class ClassIDKey = std::string, typename =  std::enable_if_t<std::is_base_of<QObject, Base>::value>
class GenericQFactory{
public:
  using BaseCreateFn = std::function<Base*(const QVariant&, QObject*)>;
  //...
  template<class Derived, typename = std::enable_if_t<std::is_base_of<Base, Derived>::value> >
  class RegisterInFactory{
  public:
    static Base* CreateInstance(const QVariant& initializer, QObject* parent){
      return new Derived(initializer, parent);
    }
    // ...
  };
};

A final note about QVariant, something that definitely limits its usefulness. QVariants copy construct when they're created, and copy construct when they're 'cast.' Ignoring the memory annoyance of copy construction for a moment, it does make passing in references to common objects more frustrating. I usually wrap them in a smart pointer of some kind (either STL or Qt's shared pointers, depending on situation) within whatever class is being packed into a QVariant, that way the copy construction of that smart pointer allows the reference to pass through the system.

Coming up next: Enable dark magic mode: can we come up with the ultimate factory initialization that allows us the most insanely generic factory? Variadic Factory Initialization and the Quest for the Holy Constructor

Wednesday, November 8, 2017

Factory with Subclass Initialization

Subclass Initialization


It's pretty rare for me to have a class that doesn't require some kind of initialization passed through the constructor. With my first post on Factories our factory creation function didn't really give us much room to do this. So let's discuss some options for initialization.

Trivial Initialization

Suppose all of your derived classes can be initialized using one type of object, perhaps some struct with data common to all the subclasses. But from base class to base class, this initialization object may be of a different class, so we might like to continue our Generic factory pattern.

So let's start with extending our factory's template

template<class Base, class Initializer, class ClassIDKey = std::string>

class GenericFactory{
public:
  using BaseCreateFn = std::function<std::unique_ptr<Base>(const Initializer&)>;
  // the rest of the definition largely holds because the change 
  // is absorbed by this alias for the base create function
};

and with our Registration function:

template<class Base, class Derived, class Initializer, class ClassIDKey = std::string>
class RegisterInFactory{
public:
  static std::unique_ptr<Base> CreateInstance(const Initializer& initializer){
    return std::make_unique<Base>(initializer)
  }
  
  RegisterInFactory(const ClassIDKey& key){
    GenericFactory<Base, Initializer, ClassIDKey>::instance().RegCreateFn(key, &CreateInstance);
  }
};

Simpler Registration class

It occurs to me around now that we're relying on good development practices where, when we create our RegisterInFactory object, we create it like RegisterInFactory<Base, Derived> registerMe{"derived"}; . We're relying on good development practices to retype the Base class and Derived classes out and so on. And since we usually use an alias to give a good name to our factory, it seems silly to not be able to reuse that alias well.

Furthermore, with our initializers, we want the registry objects to have matching initializers with the factory, requiring still more template matching.

So, let's simplify:

template<class Base, class Initializer, class ClassIDKey = std::string>
class GenericFactory{
public:
  using BaseCreateFn = std::function<std::unique_ptr<Base>(const Initializer&)>;
  // ... all the usual stuff
  
  template<class Derived>
  class RegisterInFactory{
    //.. moving all of our Registry functions in here, we get Base, Initializer, and ClassIDKey types for free from GenericFactory
  };
};  

And then in use somewhere, suppose we've defined using BaseFactory = GenericFactory<Base, Initializer>; (taking the default key type to be string), when we wish to register a derived class we can now use BaseFactory::RegisterInFactory<Derived> registerMe{"derived"};

So I'll continue to reuse this pattern going forward so I don't have to keep retyping all those template parameters.

String Map Initialization

Being even more generic, Subclasses can have different initialization needs from one class to another. When we considered our Generic Factory with Initializer, since the Factory itself is templated upon the Initializer class, we are kind of stuck with one class of object to serve initialization. Suppose that class is a struct. Now you've got to list all the possible kinds of initialization information in the struct, whether one subclass uses it or not; and you've lost the ability to dynamically add new stuff to your factory if it introduces new initialization needs.

One relatively simple way to resolve this conflict is with a map. Maps come in a variety of flavors. Let's restrict ourselves for now to what's available in the STL. (Framework factories are next time). The most generic map we could use is probably a string-string map. You can use stoi and related functions to pack numbers into strings. You can write or use various serializers to store binary data in the string*, and so on. (*: Although, maybe use a string-std::vector<uchar> map might be better if you're doing binary data, so that you don't run into the problems of string operations being designed around human-readable strings) 

We can create an Initializer class, like the trivial example above, where the map is a part of the Initialization class; supposing that some variables are always present in all subclasses, but some variables are only present in some subclasses, one could put the common ones directly in the struct and then tack on a map at the end for further customization.

Or, if the only initialization you need is just the map itself, we can define it to an alias for simplicity.

using Initializer = std::map<std::string, std::string>;

Pretty straightforward, right? I can't see why you couldn't use an unordered_map, but the maps aren't likely to be too large, they're initializers. (Classes with many initializer values usually strike me as a code smell, and probably could stand some refactoring.) Suffice to say, I often like to use the more familiar STL classes unless there's a good reason to choose another.

Coming up next: When I've used some specific frameworks, I've used some variations on this generic discussion to leverage the frameworks. Examples will be Qt's QObjects and QVariant; using a formatted file (e.g. yaml) to allow users to make program settings without having to edit or recompile code. Generic Factories within a Framework!

Wednesday, June 28, 2017

Generic Factory

One of my most common problems when I write C++ is how you actually go about building an object. We'll get into greater detail about how this is important for Google Mock and unit testing later. Other times, you may have a wide range of classes derived from the same base class and don't know which one you will need until you're already in runtime. Or perhaps you have a bunch of these classes, but dynamically loaded libraries/modules/plugins may provide even more subclasses that would remain unknown to the core program.

With one project, they had assembled a factory system with all sorts of Macros. Macros were already a bit passé a while ago, but with IDEs and their various tools, I have even less desire to see them in our code. While researching some alternatives, I came across this post. It's a little dated so I adapted it some for the modern era. What follows is built around C++14 and some modifications may be needed to adapt to 11, and improvements still exist going up to C++17.

Let's start with the big picture and then break it down some.

Factory

#include <algorithm>
#include <functional>
#include <map>
#include <memory>

template <class Base, typename ClassIDKey=std::string>
class GenericFactory
{
    using BaseCreateFn = std::function<std::unique_ptr<Base>()>;
    using FnRegistry = std::map<ClassIDKey, BaseCreateFn>;

public:
    static GenericFactory& instance(){
        static GenericFactory factory;
        return factory;
    }

    void RegCreateFn(const ClassIDKey& key, const BaseCreateFn& createFunction){
        auto regIt = std::find_if(registry.begin(), registry.end(),
                                  [key](const auto &registryEntry) {
                                      return key == registryEntry.first;
                                  });

        if (registry.end() == regIt) {
            registry.emplace(key, createFunction);
        } else {
            registry.at(key) = createFunction;
        }
        registry.emplace(key, createFunction);
    }

    std::unique_ptr<Base> Create(const ClassIDKey& key) const{
        auto registryIt = registry.find(key);
        if(registry.cend() != registryIt){
            return registryIt->second();
        }
        return nullptr;
    }
private:
    GenericFactory(){}
    
    FnRegistry registry;
};

template <typename Base, 
          typename Derived, 
          typename ClassIDKey=std::string, 
          typename = std::enable_if_t<std::is_base_of<Base, Derived>::value>
class RegisterInFactory
{
public:
    static std::unique_ptr<Base> CreateInstance()
    {
        // if c++ 11, would just use the traditional constructor
        return std::make_unique<Base>();
    }

    RegisterInFactory(const ClassIDKey &id)
    {
        GenericFactory<Base>::instance().RegCreateFn(id, CreateInstance);
    }
};

Factory structure

The core of the Factory is a map of some key to some function that creates a smart pointer to the base class out of some new derived class. We've set up some aliases for readability, but they aren't strictly necessary. Compared to the post on Dobbs' you can see I've updated from the atrocious old function pointers to the new functional objects. We don't, as far as I'm aware, gain tons of advantage technically here, but I might speculate there are some out there. Mostly I did it because the function objects are just so much easier to read.
Skipping down to GenericFactory::Create, this would be the function that actually gets called in a program. A key is passed in, searches through the registry of pseudo-constructors for the right constructor and then returns the result of the call to it. 
Finally, we want our factory to be a singleton. So we require the use of the 'instance' function to get a hold of the factory, and hide our constructor in private.

Factory Registration

The GenericFactory could stand on its own if we wanted, and in some of the advanced techniques we may take advantage of that fact. But, just like the post in Dobbs', we can easily simplify how this factory works by adding this helper class "RegisterInFactory."
The actual construction doesn't particularly need us to go out and define some functor that returns the appropriately wrapped construction of the Derived class, so a static member function will still fulfill the signature for our create function. (the 'CreateInstance Function). As for the class itself, construction will register the function to our factory. We'll see how that happens in...

Using the factory

Base Class

It's not just enough to say what the factory looks like, it helps to actually see it in use. We'll start with some trivial base class
(BaseClass.h)

#include "GenericFactory.h"

class BaseClass
{
public:
    virtual ~BaseClass();
    virtual void print() = 0;
};

using BaseFactory = GenericFactory<BaseClass>;

Now, the base class could be an actual class and not just a pure virtual, but I rather like this pattern as a means to make interfaces a bit more useable. If, perhaps you'd like to have default implementations of the functions, but still disallow construction of the Base class, you can do the usual and define the constructor as a protected function.

We don't technically need to do pretty much anything here, technically. However, I find that having the factory included here is simpler than including it through all the subclasses. Especially since we're hoping that users of this will use our factory to construct them rather than doing so directly. Including it here also allows us to throw in an alias to this Generic Factory as opposed to some other one. (The point of defining it Generically is that you could use that one factory template to produce many kinds of factories for many different base classes).

Nothing at all needs to be done in the source file (.cpp), but if your Base class is a useful one, you may want to have your factory produce a base pointer to the base class. In which case you'll follow the same pattern as the...

Derived Class

While BaseClass had its action in the header, Derived only needs a modification to the source file
(Derived_1.cpp)
#include "Derived_1.h"

#include <iostream>

namespace  {
RegisterInFactory<BaseClass, Derived_1> registerMe{"Derived 1"};
} // anonymous namespace

Derived_1::Derived_1() : BaseClass()
{

}

void Derived_1::print()
{
    std::cout << "Printing from Derived class 1" << std::endl;
}

Here all we need to note is the use of the 'RegisterInFactory' class. The template signature following RegisterInFactory tells us which type to register to which factory. While we never use the global object created by it, the act of making that object registers the class to the factory. You could do this with static objects, but I like anonymous namespaces here, especially since we're explicitly not using this object, the object created is constrained to this source file.

User of factory

(UserOfBases.cpp)

#include "UserOfBases.h"

#include "BaseClass.h"

void UserOfBases::printFromDerived(const std::string& derivedType)
{
    std::unique_ptr<BaseClass> object =BaseFactory::instance().Create(derivedType);
    object->print();
}

Here we see some example of how all of this comes together. It's so simple I can't really think of what else to say about it. Certainly outside of a little toy program there would be a lot more going on with our created object, but this illustrates the use nicely.

Example code here

Coming up next: What do you do when your class needs to be initialized as part of the constructor? Generic Factories with Initializers!

Tuesday, June 27, 2017

Introduction

When I was choosing an advisor for my doctoral work in physics, he asked whether I'd ever programmed before. I'd had one C++ class my first semester of undergrad, about 5 years prior. To him, this was good enough, you either knew how to program or not. That philosophy is just as bad as it sounds.

As scientists, we maybe wrote code like scientists. We would write up some analysis suite; run it through a simulation; see if the results matched what we predicted. The code was unorganized and full of random hacks here and there. The whole team's software probably reinvented the wheel several times.

I didn't finish my research work. What work I had done had been circling around not very good results, and some (at that time) undiagnosed mental health issues had left me particularly unmotivated to do more. I would accept a master's, and in my last semester took a few courses in software design and development. (which eventually succumbed to the lack of motivation into audited courses).

I give my little bit of history here because with my first position out of grad school, I learned a ton of actually useful development techniques. Stuff that I've come to find isn't often taught to CS majors even. How to organize libraries. How to write unit tests. Why to write unit tests. How to use good version control software. How to handle continuous integration within a large team. Agile Development techniques and so on.

My blog here is to recount some of these useful lessons. Some will be aimed at these broader lessons, some will be specific patterns and techniques I've adopted in my unconventional path to development.

Finally, a note about the name of the blog. C++ was originally named "C with Classes." I feel like with the contemporary wordplay/pun app naming scheme, one might now call it "classC," pronounced "classy." Or the other read of 'classic' perhaps might be used if you hadn't heard it pronounced before. My main language is C++, generally in the 11/14 standard, sometimes with Qt (5, but 4 maybe), built using CMake mostly, and tested with Google Mock/Test. Usually I develop in Linux and cross compile to other platforms. I like CLion and Qt Creator as IDEs, but whatever IDE works for you is better than no IDE in my book.