Ensemble-based Machine Learning Model for Online Fake Reviews Detection
DOI:
https://doi.org/10.57041/1k2hwy61Keywords:
Natural language processing, Data Science, Fake and Real Review, Machine Learning,, Textual informationAbstract
Online shopping/e-commerce sites are usually unknown to most customers, and they do not know the seller or the goods and services. They will purchase and make any decisions without consulting the reviews done by customers. It does not matter whether they are true or not, but product reviews can play a huge role in the bottom line of a company. Some e-commerce sites have a section where one can confirm the authenticity of the seller, but the majority of buyers would rather read reviews done by real people who have bought the target product and used it. Due to the possible number of reviews regarding a particular product being hundreds or even thousands, it is dubious as to which ones are authentic. Machine learning (ML) has, in recent years, enabled machines to perform tricky tasks with close human levels of expertise. It is possible to find different ways. Conventional means of fake reviews are time-consuming and usually unproductive due to the vast number of reviews produced. Moreover, there is no accuracy or robustness. So, we require a powerful ML-based solution that will be able to automatically evaluate the reviews, distinguish between the authentic and fake ones and then, in a very small time period, choose the most valuable comments of others. To achieve this. In that regard, we propose a modern fake reviews detecting model using ML. The evolution of this study assumes the combination of baseline learning, deep learning and ensemble learning algorithms for fake reviews detection. Therefore, Naive Bayes, Random Forest, Decision Tree, SVM, and K-N Neighbour have been paired together to train and test our proposed model. The proposed model of voting consists of a strict pre-processing procedure and feature extraction. The functions that were carried out before preprocessing are tokenization, removal of stop words, punctuation, and even deletion of rare words. We availed the step of feature engineering, which enhances data prior to entering the next stage, which is the advanced bi-grams, whose name is the N-gram and TFIDF. We have done several experiments and compared the future model and the state-of-the-art models with reference to one another. The obtained data yields that our proposed model is superior to the received data regarding the Uni-Bi-Gram TFIDF-features and effectively classifies the reviews into two classes, real and fake, with 93Percent success precision.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 https://paas-pk.org/index.php/pjosr/cr

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
