Abstract and 1. Introduction
Preliminaries and Related Work
Key Bottlenecks in PC Parallelization
Harnessing Block-Based PC Parallelization
4.1. Fully Connected Sum Layers
4.2. Generalizing To Practical Sum Layers
4.3. Efficient Implementations by Compiling PC Layers
4.4. Analysis: IO and Computation Overhead
Optimizing Backpropagation with PC Flows
Experiments
6.1. Faster Models with PyJuice
6.2. Better PCs At Scale
6.3. Benchmarking Existing PCs
Conclusion, Acknowledgements, Impact Statement, and References
A. Algorithm Details
B. Additional Technical Details
C. Experimental Details
D. Additional Experiments
\
\ \
\ \ \ We adopt two PD structures (i.e., PD-mid with 107M edges and PD-large with 405M edges) as well as two HCLT structures (i.e., HCLT-mid with 40M edges and HCLT-large with 174M edges). Details of the adopted models are described in Appendix C.4. We experiment with different optimization strategies and adopt full-batch EM as it yields consistently better performance across models and datasets. Specifically, the computed PC flows are accumulated across all samples in the training set before doing one EM step.
\ Results are shown in Table 3. Notably, we achieve better results compared to previous papers. For example, Liu et al. (2023a) reports 4.82 bits-per-dimension (bpd) for HCLT on ImageNet32, while we achieved 4.33 bpd. The performance improvements stem from more training epochs and the ability to do more hyperparameter search thanks to the speedup. We highlight that the goal of this section is not to set new records for tractable deep generative models, but to establish a set of baselines that can be easily reproduced to track the progress of developments in PC modeling and learning. In Appendix C.4, we include additional benchmark results on the WikiText dataset (Merity et al., 2016).
We proposed PyJuice, a novel system that supports training and inference of probabilistic circuits. PyJuice is orders of magnitude faster and much more memory efficient than even very recent baselines. We hope PyJuice can boost future research on tractable deep generative models by allowing for efficient training of large-scale architectures.
This work was funded in part by the DARPA PTG Program under award HR00112220005, the DARPA ANSR program under award FA8750-23-2-0004, and the NSF grant #IIS1943641. We thank Honghua Zhang, Pasha Khosravi, and Poorva Garg for providing valuable feedback during the development of PyJuice.
This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.
Ahmed, K., Teso, S., Chang, K.-W., Van den Broeck, G., and Vergari, A. Semantic probabilistic layers for neurosymbolic learning. In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022a.
\ Ahmed, K., Wang, E., Chang, K.-W., and Van den Broeck, G. Neuro-symbolic entropy regularization. In Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence (UAI), 2022b.
\ Ahmed, K., Chang, K.-W., and Van den Broeck, G. A pseudo-semantic loss for deep autoregressive models with logical constraints. In Advances in Neural Information Processing Systems 36 (NeurIPS), 2023a.
\ Ahmed, K., Zeng, Z., Niepert, M., and Van den Broeck, G. Simple: A gradient estimator for k-subset sampling. In Proceedings of the International Conference on Learning Representations (ICLR), 2023b.
\ Choi, Y., Vergari, A., and Van den Broeck, G. Probabilistic circuits: A unifying framework for tractable probabilistic models. techreport, 2020. URL http://starai.cs. ucla.edu/papers/ProbCirc20.pdf.
\ Choi, Y., Dang, M., and Van den Broeck, G. Group fairness by probabilistic modeling with latent fair decisions. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, 2021.
\ Correia, A., Peharz, R., and de Campos, C. P. Joints in random forests. Advances in Neural Information Processing Systems, 33:11404–11415, 2020.
\ Correia, A. H., Gala, G., Quaeghebeur, E., de Campos, C., and Peharz, R. Continuous mixtures of tractable probabilistic models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 7244–7252, 2023.
\ Dadu, V., Weng, J., Liu, S., and Nowatzki, T. Towards general purpose acceleration by exploiting common datadependence forms. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 924–939, 2019.
\ Dang, M., Vergari, A., and Van den Broeck, G. Strudel: Learning structured-decomposable probabilistic circuits. In International Conference on Probabilistic Graphical Models, pp. 137–148. PMLR, 2020.
\ Dang, M., Khosravi, P., Liang, Y., Vergari, A., and Van den Broeck, G. Juice: A julia package for logic and probabilistic circuits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 16020–16023, 2021.
\ Dang, M., Liu, A., and Van den Broeck, G. Sparse probabilistic circuits via pruning and growing. Advances in Neural Information Processing Systems, 35:28374– 28385, 2022.
\ Darwiche, A. A logical approach to factoring belief networks. KR, 2:409–420, 2002.
\ Darwiche, A. A differential approach to inference in bayesian networks. Journal of the ACM (JACM), 50 (3):280–305, 2003.
\ Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
\ Gala, G., de Campos, C., Peharz, R., Vergari, A., and Quaeghebeur, E. Probabilistic integral circuits. In International Conference on Artificial Intelligence and Statistics, pp. 2143–2151. PMLR, 2024.
\ Gens, R. and Pedro, D. Learning the structure of sumproduct networks. In International conference on machine learning, pp. 873–880. PMLR, 2013.
\ Lin, B. Y., Zhou, W., Shen, M., Zhou, P., Bhagavatula, C., Choi, Y., and Ren, X. Commongen: A constrained text generation challenge for generative commonsense reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1823–1840, 2020.
\ Liu, A. and Van den Broeck, G. Tractable regularization of probabilistic circuits. Advances in Neural Information Processing Systems, 34:3558–3570, 2021.
\ Liu, A., Mandt, S., and Van den Broeck, G. Lossless compression with probabilistic circuits. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
\ Liu, A., Zhang, H., and Van den Broeck, G. Scaling up probabilistic circuits by latent variable distillation. In Proceedings of the International Conference on Learning Representations (ICLR), 2023a.
\ Liu, A., Niepert, M., and Van den Broeck, G. Image inpainting via tractable steering of diffusion models. 2024.
\ Liu, X., Liu, A., Van den Broeck, G., and Liang, Y. Expressive modeling is insufficient for offline rl: A tractable inference perspective. arXiv preprint arXiv:2311.00094, 2023b.
\ Liu, X., Liu, A., Van den Broeck, G., and Liang, Y. Understanding the distillation process from deep generative models to tractable probabilistic circuits. In International Conference on Machine Learning, pp. 21825– 21838. PMLR, 2023c.
\ Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), 2015.
\ Loconte, L., Di Mauro, N., Peharz, R., and Vergari, A. How to turn your knowledge graph embeddings into generative models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
\ Loconte, L., Sladek, A. M., Mengel, S., Trapp, M., Solin, A., Gillis, N., and Vergari, A. Subtractive mixture models via squaring: Representation and learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2024.
\ Lowd, D. and Rooshenas, A. The libra toolkit for probabilistic models. Journal of Machine Learning Research, 16:2459–2463, 2015.
\ Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T., and Raedt, L. D. Deepproblog: neural probabilistic logic programming. In Advances in Neural Information Processing Systems 36 (NeurIPS), 2018.
\ Mari, A., Vessio, G., and Vergari, A. Unifying and understanding overparameterized circuit representations via low-rank tensor decompositions. In The 6th Workshop on Tractable Probabilistic Modeling, 2023.
\ Mathur, S., Gogate, V., and Natarajan, S. Knowledge intensive learning of cutset networks. In Uncertainty in Artificial Intelligence, pp. 1380–1389. PMLR, 2023.
\ Merity, S., Xiong, C., Bradbury, J., and Socher, R. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016.
\ Molina, A., Vergari, A., Stelzner, K., Peharz, R., Subramani, P., Di Mauro, N., Poupart, P., and Kersting, K. Spflow: An easy and extensible library for deep probabilistic learning using sum-product networks. arXiv preprint arXiv:1901.03704, 2019.
\ Murphy, K., Linderman, S., Chang, P. G., Li, X., Kara, A., Harper-Donnelly, G., and Duran-Martin, G. Dynamax, 2023. URL https://github.com/probml/ dynamax.
\ Peharz, R., Lang, S., Vergari, A., Stelzner, K., Molina, A., Trapp, M., Van den Broeck, G., Kersting, K., and Ghahramani, Z. Einsum networks: Fast and scalable learning of tractable probabilistic circuits. In International Conference on Machine Learning, pp. 7563–7574. PMLR, 2020a.
\ Peharz, R., Vergari, A., Stelzner, K., Molina, A., Shao, X., Trapp, M., Kersting, K., and Ghahramani, Z. Random sum-product networks: A simple and effective approach to probabilistic deep learning. In Uncertainty in Artificial Intelligence, pp. 334–344. PMLR, 2020b.
\ Poon, H. and Domingos, P. Sum-product networks: A new deep architecture. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689–690. IEEE, 2011.
\ Pronobis, A., Ranganath, A., and Rao, R. P. Libspn: A library for learning and inference with sum-product networks and tensorflow. In Principled Approaches to Deep Learning Workshop, 2017.
\ Qian, C., Manolache, A., Ahmed, K., Zeng, Z., Van den Broeck, G., Niepert, M., and Morris, C. Probabilistic task-adaptive graph rewiring. In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
\ Rabiner, L. and Juang, B. An introduction to hidden markov models. ieee assp magazine, 3(1):4–16, 1986.
\ Rahman, T., Kothalkar, P., and Gogate, V. Cutset networks: A simple, tractable, and scalable approach for improving the accuracy of chow-liu trees. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 15- 19, 2014. Proceedings, Part II 14, pp. 630–645. Springer, 2014.
\ Shah, N., Olascoaga, L. I. G., Zhao, S., Meert, W., and Verhelst, M. Dpu: Dag processing unit for irregular graphs with precision-scalable posit arithmetic in 28 nm. IEEE Journal of Solid-State Circuits, 57(8):2586–2596, 2021.
\ Vergari, A., Choi, Y., Peharz, R., and Van den Broeck, G. Probabilistic circuits: Representations, inference, learning and applications. AAAI Tutorial, 2020.
\ Vergari, A., Choi, Y., Liu, A., Teso, S., and Van den Broeck, G. A compositional atlas of tractable circuit operations for probabilistic inference. Advances in Neural Information Processing Systems, 34:13189–13201, 2021.
\ Wang, B. and Kwiatkowska, M. Compositional probabilistic and causal inference using tractable circuit models. In International Conference on Artificial Intelligence and Statistics, pp. 9488–9498. PMLR, 2023.
\ Xu, J., Zhang, Z., Friedman, T., Liang, Y., and Van den Broeck, G. A semantic loss function for deep learning with symbolic knowledge. In Proceedings of the 35th International Conference on Machine Learning, 2018.
\ Yang, Y., Gala, G., and Peharz, R. Bayesian structure scores for probabilistic circuits. In International Conference on Artificial Intelligence and Statistics, pp. 563–575. PMLR, 2023.
\ Yao, L., Trapp, M., Periasamy, K., Leslin, J., Singh, G., and Andraud, M. Logarithm-approximate floating-point multiplier for hardware-efficient inference in probabilistic circuits. In The 6th Workshop on Tractable Probabilistic Modeling, 2023.
\ Zhang, H., Dang, M., Peng, N., and Van den Broeck, G. Tractable control for autoregressive language generation. In International Conference on Machine Learning, pp. 40932–40945. PMLR, 2023.
\
:::info Authors:
(1) Anji Liu, Department of Computer Science, University of California, Los Angeles, USA (liuanji@cs.ucla.edu);
(2) Kareem Ahmed, Department of Computer Science, University of California, Los Angeles, USA;
(3) Guy Van den Broeck, Department of Computer Science, University of California, Los Angeles, USA;
:::
:::info This paper is available on arxiv under CC BY 4.0 DEED license.
:::
\