Tuesday, November 14, 2017

Generic Factories within a Framework

Some third party libraries provide useful additional hooks to make the Factory patterns work even more smoothly.

Structured Initialization

The example that I'll use here is YAML, because that's what I have some experience with. Specifically, I've used the yaml-cpp library.

While Maps provided us nice, flexible initialization data, they're somewhat limited to being pretty flat structures. C++ is particularly troublesome here because it doesn't natively include the kinds of recursive maps found in other languages. So we can leverage those other recursive map objects like XML, JSON, YAML, or others, and use some third party library (or roll our own data format and parsing library, but... why?).

This provides an additional utility to your factories, namely the ability to change how the program behaves without recompiling, and without a user knowing or understanding your source code or C++. Along with your program, you can provide some configuration file, or capacity to read in a configuration file, which contains the appropriate information to spin up specific classes in specific ways, and configure those classes appropriately. 

A colleague of mine has taken this factory style and really fleshed it out, so I will simply link to it here. At a code discussion level, there's really not much particularly interesting in terms of our factory examples aside from the fact that we'll now pass in a YAML node as our initializer.

QObject Factory

Qt is a really popular C++ framework for cross-platform software. I'll save the discussion of its merits and deficits for some other time. What you need to know is that Qt has a class called "QObject" that forms the base for many of its classes. Qt creates another class of objects, 'meta objects' alongside classes that derive from QObject, which handle a variety of useful features, but chiefly callback registration (signals and slots) and object ownership.

Prior to C++ 11, C++'s biggest fault has been the issue of "new-ed" objects (to avoid discussion of stacks and heaps and stuff that is unpleasant if you aren't a CS background person). When you create an object through 'new' and hold a pointer to that object, traditionally you had to define somewhere the responsibility to delete the object, lest you leak memory. It has always surprised me as I've learned the language just how often it is useful to create these objects all the same. Anyway, prior to the smart pointers that now manage that memory for us, Qt decided to do so with what it calls "parent-child" relationships. A QObject can have another QObject for a 'parent' and that parent object will be responsible for cleaning up after the child when the parent is destroyed.

This allows us to adjust our factory in a way that makes it moderately safer. We can feel comfortable trading raw pointers out of our factory because we can simultaneously set ownership to a parent object. On the other hand, now that there are these parent-child memory management relationships, we really can't use the smart pointers available to us (since they too would try to clear out this memory, possibly at the wrong time).

class Base : public QObject{
  Q_OBJECT
public:
  Base(QObject* parent = nullptr)
    : QObject{parent}
  {}
  //...
};

template<class Base, class ClassIDKey = std::string, typename =  std::enable_if_t<std::is_base_of<QObject, Base>::value>
class GenericQFactory{
public:
  using BaseCreateFn = std::function<Base*(QObject*)>;
  //...
  template<class Derived, typename = std::enable_if_t<std::is_base_of<Base, Derived>::value> >
  class RegisterInFactory{
  public:
    static Base* CreateInstance(QObject* parent){
      return new Derived(parent);
    }
    // ...
  };
};

SFINAE in Factories

Let's pause a moment here because I've been somewhat inconsistent in this use without wanting to discuss it further. There's a thing in C++ called "SFINAE," "Substitution Failure Is Not An Error." Now, that phrase is worse than useless to me. It seems to mean that C++ will not throw an error at runtime simply because some type can't be handled by a templated class or function. Which is a roundabout way of saying, if you put restrictions on the template at compile time, you can avoid these stealthy errors from occurring without notice (namely, you'll get compilation errors that prohibit certain classes from being used in your template). Someone with the actual CS background can correct me or provide nuance if they wish.

The useful bit for us will be the double hit of 'enable_if' and 'is_base_of' template functions. Working from the inside out: is_base_of<Base, Derived> is a template class that defines a static member variable "value" inside of it. If Derived is a subclass of Base, "value" is true; and false if Derived is not a subclass of Base. (in c++17, this has been extended to a simpler helper object 'std::is_base_of_v', so you don't have to type 'std::is_base_of<Base,Derived>::value' all the time). 

std::enable_if<X> is a template class where if X is true, then the class defines a typedef 'type'. If false, such a typedef does not occur. So std::enable_if<X>::type (or the simpler helper definition: std::enable_if_t<X>) will only be meaningful when the expression in the template evaluates to true. So we've embedded our enable_if expression to evaluate the inheritance relationship, and that result defines this enable_if type (or not).

Finally 'typename = ' is kind of our general template parameter. Except that we're never actually going to use the parameter by name, so we can omit the name (it could have been something like <typename test = std::enable_if_t... ). We're using = to assign it a default type, which is the type from enable_if. And enable_if may not have 'type' if its template evaluated false. Which means we can't assign this default type (assigned to nothing, mind you), which means we can't resolve the whole template and our compiler will yell at us and we'll know for sure that we've tried to make our factory produce Bars that don't inherit from Foos and would be completely meaningless in terms of the factory. It's not necessary to do, but it's kind of nice to have that extra check in place (particularly if you have a lot of these factories for closely related bases floating around and you accidentally forget which is which).

Variants (and in particular, QVariant) as initializers

It may of course be useful for us to really stretch what we allow in as an initializer. Maybe the initialization is so specialized for each derived class that our above strategies of maps may not be sufficient. If you're lucky enough that the data may simply be one or more simple data types, perhaps the boost or (included in c++17) STL 'variant' objects may be useful for you. Perhaps even vectors of variants would give you flexibility. I haven't experimented with this too much, but you could use, perhaps:
using InitializerElement = std::variant<int, double, std::string>;
using Initializer = std::vector<InitializerElement>;

to pass in an ensemble of simple data types to the constructor.

This is where we run into the catch of using variants. We've now opened our initializers up to be 'anything' that we can pack into our variant. But it also means that entirely irrelevant data may come in, or be missing relevant data. It's like the problem with maps earlier, but expanded further, since now type information is variable.

On top of all of this, c++17 also has an 'any' class that seems like it could really hold... anything. This could be our ultimate generic object as initializer. I have had no experience at all using these, and I'll touch back on them in the next installment where we discuss variadic initialization.

For even more complication, in our Qt framework, we can pass in quite complex objects as 'QVariant's for initializers. QVariant relies on Qt's "meta" system again. So it requires that if you want to pack an object of some class into a QVariant, you must register that object using a special macro. But it becomes very useful as a generic 'anything' within their framework: Signal/slot connections can be inspected via QVariants, for example; QList<QVariant> is a QVariant, QMap<QString, QVariant> is a QVariant, allowing us the recursive lists and maps available from structured initializations.

So, to update our GenericQFactory from above:

class Base : public QObject{
  Q_OBJECT
public:
  Base(QObject* parent = nullptr)
    : QObject{parent}
  {}
  //...
};

class Derived : public Base{
  Q_OBJECT
public:
  Derived(const QVariant& initializer, QObject* parent = nullptr)
    : Base{parent}
  {
    // do something using initializer to set some variables or whatever
  }
  //...
};

template<class Base, class ClassIDKey = std::string, typename =  std::enable_if_t<std::is_base_of<QObject, Base>::value>
class GenericQFactory{
public:
  using BaseCreateFn = std::function<Base*(const QVariant&, QObject*)>;
  //...
  template<class Derived, typename = std::enable_if_t<std::is_base_of<Base, Derived>::value> >
  class RegisterInFactory{
  public:
    static Base* CreateInstance(const QVariant& initializer, QObject* parent){
      return new Derived(initializer, parent);
    }
    // ...
  };
};

A final note about QVariant, something that definitely limits its usefulness. QVariants copy construct when they're created, and copy construct when they're 'cast.' Ignoring the memory annoyance of copy construction for a moment, it does make passing in references to common objects more frustrating. I usually wrap them in a smart pointer of some kind (either STL or Qt's shared pointers, depending on situation) within whatever class is being packed into a QVariant, that way the copy construction of that smart pointer allows the reference to pass through the system.

Coming up next: Enable dark magic mode: can we come up with the ultimate factory initialization that allows us the most insanely generic factory? Variadic Factory Initialization and the Quest for the Holy Constructor

Wednesday, November 8, 2017

Factory with Subclass Initialization

Subclass Initialization


It's pretty rare for me to have a class that doesn't require some kind of initialization passed through the constructor. With my first post on Factories our factory creation function didn't really give us much room to do this. So let's discuss some options for initialization.

Trivial Initialization

Suppose all of your derived classes can be initialized using one type of object, perhaps some struct with data common to all the subclasses. But from base class to base class, this initialization object may be of a different class, so we might like to continue our Generic factory pattern.

So let's start with extending our factory's template

template<class Base, class Initializer, class ClassIDKey = std::string>

class GenericFactory{
public:
  using BaseCreateFn = std::function<std::unique_ptr<Base>(const Initializer&)>;
  // the rest of the definition largely holds because the change 
  // is absorbed by this alias for the base create function
};

and with our Registration function:

template<class Base, class Derived, class Initializer, class ClassIDKey = std::string>
class RegisterInFactory{
public:
  static std::unique_ptr<Base> CreateInstance(const Initializer& initializer){
    return std::make_unique<Base>(initializer)
  }
  
  RegisterInFactory(const ClassIDKey& key){
    GenericFactory<Base, Initializer, ClassIDKey>::instance().RegCreateFn(key, &CreateInstance);
  }
};

Simpler Registration class

It occurs to me around now that we're relying on good development practices where, when we create our RegisterInFactory object, we create it like RegisterInFactory<Base, Derived> registerMe{"derived"}; . We're relying on good development practices to retype the Base class and Derived classes out and so on. And since we usually use an alias to give a good name to our factory, it seems silly to not be able to reuse that alias well.

Furthermore, with our initializers, we want the registry objects to have matching initializers with the factory, requiring still more template matching.

So, let's simplify:

template<class Base, class Initializer, class ClassIDKey = std::string>
class GenericFactory{
public:
  using BaseCreateFn = std::function<std::unique_ptr<Base>(const Initializer&)>;
  // ... all the usual stuff
  
  template<class Derived>
  class RegisterInFactory{
    //.. moving all of our Registry functions in here, we get Base, Initializer, and ClassIDKey types for free from GenericFactory
  };
};  

And then in use somewhere, suppose we've defined using BaseFactory = GenericFactory<Base, Initializer>; (taking the default key type to be string), when we wish to register a derived class we can now use BaseFactory::RegisterInFactory<Derived> registerMe{"derived"};

So I'll continue to reuse this pattern going forward so I don't have to keep retyping all those template parameters.

String Map Initialization

Being even more generic, Subclasses can have different initialization needs from one class to another. When we considered our Generic Factory with Initializer, since the Factory itself is templated upon the Initializer class, we are kind of stuck with one class of object to serve initialization. Suppose that class is a struct. Now you've got to list all the possible kinds of initialization information in the struct, whether one subclass uses it or not; and you've lost the ability to dynamically add new stuff to your factory if it introduces new initialization needs.

One relatively simple way to resolve this conflict is with a map. Maps come in a variety of flavors. Let's restrict ourselves for now to what's available in the STL. (Framework factories are next time). The most generic map we could use is probably a string-string map. You can use stoi and related functions to pack numbers into strings. You can write or use various serializers to store binary data in the string*, and so on. (*: Although, maybe use a string-std::vector<uchar> map might be better if you're doing binary data, so that you don't run into the problems of string operations being designed around human-readable strings) 

We can create an Initializer class, like the trivial example above, where the map is a part of the Initialization class; supposing that some variables are always present in all subclasses, but some variables are only present in some subclasses, one could put the common ones directly in the struct and then tack on a map at the end for further customization.

Or, if the only initialization you need is just the map itself, we can define it to an alias for simplicity.

using Initializer = std::map<std::string, std::string>;

Pretty straightforward, right? I can't see why you couldn't use an unordered_map, but the maps aren't likely to be too large, they're initializers. (Classes with many initializer values usually strike me as a code smell, and probably could stand some refactoring.) Suffice to say, I often like to use the more familiar STL classes unless there's a good reason to choose another.

Coming up next: When I've used some specific frameworks, I've used some variations on this generic discussion to leverage the frameworks. Examples will be Qt's QObjects and QVariant; using a formatted file (e.g. yaml) to allow users to make program settings without having to edit or recompile code. Generic Factories within a Framework!