Product units enable a neural network to form higher-order combinations of inputs, having the advantages of increased information capacity and smaller network architectures. Training product unit networks using gradient descent, or any other local optimization algorithm, is difficult, because of an increased number of local minima and increased chances of network paralysis. This paper illustrates the shortcomings of gradient descent optimization when faced with product units, and presents a comparative investigation into global optimization algorithms for the training of product unit neural networks. A comparison of results obtained from particle swarm optimization, genetic algorithms, LeapFrog and random search show that these global optimization algorithms successfully train product unit neural networks. Results of product unit neural networks are also compared to results obtained from using gradient optimization with summation units.