Commit dd505959 by 202205008011

Add readme.md

parents
本项目的目标是基于用户提供的评论,通过算法自动去判断其评论是正面的还是负面的情感。比如给定一个用户的评论:
评论1: “我特别喜欢这个电器,我已经用了3个月,一点问题都没有!”
评论2: “我从这家淘宝店卖的东西不到一周就开始坏掉了,强烈建议不要买,真实浪费钱”
对于这两个评论,第一个明显是正面的,第二个是负面的。 我们希望搭建一个AI算法能够自动帮我们识别出评论是正面还是负面。
情感分析是文本处理领域经典的问题。整个系统一般会包括几个模块:
1.数据的抓取: 通过爬虫的技术去网络抓取相关文本数据
2.数据的清洗/预处理:在本文中一般需要去掉无用的信息,比如各种标签(HTML标签),标点符号,停用词等等
3.把文本信息转换成向量: 这也成为特征工程,文本本身是不能作为模型的输入,只有数字(比如向量)才能成为模型的输入。所以进入模型之前,任何的信号都需要转换成模型可识别的数字信号(数字,向量,矩阵,张量...)
4.选择合适的模型以及合适的评估方法。 对于情感分析来说,这是二分类问题(或者三分类:正面,负面,中性),
所以需要采用分类算法比如逻辑回归,朴素贝叶斯,神经网络,SVM等等。另外,我们需要选择合适的评估方法,比如对于一个应用,我们是关注准确率呢,还是关注召回率
The goal of this project is to automatically determine whether the comments are positive or negative based on the comments provided by users through an algorithm. For example, given a user's review:
Comment 1: "I love this appliance, I've had it for 3 months and it's not a problem!"
Comment 2: "The things I sold from this Taobao store started to break down within a week. I strongly recommend not buying them. It's a real waste of money."
Of these two comments, the first is clearly positive and the second is negative. I want to build an AI algorithm that can automatically tell if a review is positive or negative.
Sentiment analysis is a classic problem in text processing. The whole system generally consists of several modules:
1. Data capture: crawler technology is used to capture relevant text data from the network
2. Data cleaning/preprocessing: In this paper, it is generally necessary to remove useless information, such as various tags (HTML tags), punctuation marks, stop words and so on
3. Convert text information into vectors: This is also known as feature engineering. Text itself cannot be used as input to the model, only numbers (such as vectors) can be used as input to the model. So before entering the model,
any signal needs to be transformed into a digital signal that the model can recognize (numbers, vectors, matrices, tensors...).
4. Select appropriate models and evaluation methods. For sentiment analysis, this is a dichotomous problem (or three categories: positive, negative, neutral),
So we need to use classification algorithms such as logistic regression, naive Bayes, neural networks, SVM and so on.
In addition, we need to choose the appropriate evaluation method, such as for an application, should we focus on accuracy or recall
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment