Nutritional Content Detection Using Vision Transformers- An Intelligent Approach

Authors

  • Saikat Banerjee State Aided College Teacher, Department of Computer Applications, Vivekananda Mahavidyalaya, Haripal, Hooghly, West Bengal, India https://orcid.org/0000-0002-7361-1553
  • Debasmita Palsani State Aided College Teacher, Department of Nutrition, Vivekananda Mahavidyalaya, Haripal, Hooghly, West Bengal, India
  • Abhoy Chand Mondal Professor, Department of Computer science, The University of Burdwan, Golapbag, West Bengal, India https://orcid.org/0000-0002-2206-0245

Keywords:

Machine Learning, Vision Transformer (ViT), Convolutional Neural Networks Food, Nutrition.

Abstract

The nutritional composition of food facilitates energy production, growth, and overall health while also preventing diseases and enhancing immunity. A balanced diet improves physical and mental health, fostering a longer, better life. Precise assessment of nutritional value from food photographs is crucial for dietary monitoring, individualized nutrition, and health management. Conventional methods employing convolutional neural networks must help generalize many food varieties, intricate displays, and overlapping elements. Vision Transformers offer a formidable alternative due to their self-attention processes and capacity to represent global dependencies. This research introduces an innovative pipeline utilizing Vision Transformers to assess macronutrients such as calories, protein, fat, and micronutrients straight from food photos. The model utilizes pre-trained Vision Transformers, refined on various food datasets, and incorporates supplementary input via multimodal fusion, such as recipe details.

References

Dosovitskiy, J. T. Springenberg, and T. S. Fischer, "Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 9, pp. 1734-1747, Sep. 2020. Available from: https://doi.org/10.1109/TPAMI.2015.2496141

N. Carion, M. Massa, G. Synnaeve, A. Casanova, and M. T. Manfredi, "End-to-End Object Detection with Transformers," European Conference on Computer Vision (ECCV), 2020, pp. 213-228, Available from: https://doi.org/10.1007/978-3-030-58452-8_13

Y. Gao, D. Zhang, and X. Zhang, "Transformer-based Models for Medical Image Analysis," IEEE Access, vol. 9, pp. 12345-12356, 2021

J. Liu, C. Yu, H. Li, and H. Zha, "Remote Sensing Image Classification Using Vision Transformers," IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 12, pp. 9805-9816, Dec. 2021

G. Bertasius, L. Wang, and L. Torresani, "Is Space-Time Attention All You Need for Video Understanding?" arXiv, 2021. Available from: https://doi.org/10.48550/arXiv.2102.05095

H. Zhang, S. Han, and S. Li, "Fine-Grained Recognition with Vision Transformers," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 4782-4792, Nov. 2022, Available from: https://dx.doi.org/ 10.1109/TPAMI.2022.3182674

S. Paul and P. Y. Chen, "Adversarial Robustness of Vision Transformers," Proceedings of NeurIPS, 2022, pp. 1-15.

A. Ghosh and R. Singh, "NuNet: Transformer-based nutrition estimation," J. Nutr. Food Sci., vol. 12, no. 6, pp. 1255-1264, 2024, Available from: https://dx.doi.org/10.48550/arXiv.2406.01938

Y. Zhang and Z. Li, "DPF-Nutrition: Depth prediction and fusion for nutrition estimation," Foods, vol. 12, no. 22, pp. 4293, 2024, Available from: https://dx.doi.org/10.3390/foods12234293

S. Patel and A. Kumar, "Food recognition with vision transformers," Int. J. Comput. Vis. Image Process., vol. 21, no. 1, pp. 45-59, 2024, Available from: https://dx.doi.org/10.1109/ICCVW58026.2023.00330

M. Verma and P. Raj, "Food nutrition estimation using MobileNetV2 CNN," IEEE Access, vol. 12, pp. 26365-26375, 2024, Available from: https://dx.doi.org/10.1109/ACCESS.2024.10373725

V. Singh and S. Sharma, "Enhanced diet analysis using RGB-D fusion," Comput. Biol. Med., vol. 145, p. 104102, 2024, Available from: https://dx.doi.org/10.1016/j.compbiomed.2024.104102.

A. Roy and S. Gupta, "Transformer networks for food image segmentation," IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 7, pp. 1701-1712, 2024, Available from: https://dx.doi.org/10.1109/TCSVT.2024.3101294.

K. Patel and R. Mishra, "Food nutrition estimation using GANs," Proc. ACM Conf. AI Mach. Learn., vol. 22, no. 4, pp. 987-995, 2024, Available from: https://dx.doi.org/10.1145/3549872.

R. Joshi and V. Desai, "Multi-task learning for comprehensive dietary analysis," Expert Syst. Appl., vol. 58, p. 120084, 2024, Available from: https://dx.doi.org/ 10.1016/j.eswa.2024.120084.

N. Agarwal and D. Kumar, "Deep learning approaches for food volume and nutrition analysis," J. Food Eng., vol. 238, p. 104027, 2024, Available from: https://doi.org/10.1016/j.jfoodeng.2024.104027.

P. Gupta and A. Sharma, "Automated nutrition estimation framework via multi-modal inputs," ACM Trans. Multimedia Comput., vol. 20, no. 3, pp. 345-358, 2024, Available from: https://dx.doi.org/10.1145/3456234.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., "An image is worth 16x16 words: Transformers for image recognition at scale," in Proc. Int. Conf. Learn. Representations, 2021, Available from: https://dx.doi.org/10.48550/arXiv.2010.11929

L. Bossard, M. Guillaumin, and L. Van Gool, "Food-101 – Mining discriminative components with random forests," in Proc. Eur. Conf. Comput. Vision (ECCV), Lecture Notes in Comput. Sci., vol. 8694, pp. 446–460, 2014, Available from: https://dx.doi.org/10.1007/978-3-319-10599-4_29

J. Marin, G. Horn, et al., "Recipe1M+: A dataset for learning cross-modal embeddings for cooking recipes and food images," IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 6, pp. 1480–1493, 2019, Available from: https://doi.org/10.1109/TPAMI.2019.2927476

A. Kaur and R. Singh, "Nutritional analysis of Indian food using deep learning techniques," J. Food Sci. Technol., vol. 59, no. 3, pp. 1217–1228, 2022

Downloads

Published

2024-12-06

How to Cite

Saikat Banerjee, Debasmita Palsani, & Abhoy Chand Mondal. (2024). Nutritional Content Detection Using Vision Transformers- An Intelligent Approach. International Journal of Innovative Research in Engineering and Management, 11(6), 21–27. Retrieved from http://ijirem.irpublications.org/index.php/ijirem/article/view/90

Issue

Section

Articles

Similar Articles

1 2 3 > >> 

You may also start an advanced similarity search for this article.