Commit 868f994a by 202205008011

Add readme.md

parents
这个项目的主要的目的是通过给定的广告信息和用户信息来预测一个广告被点击与否。
如果广告有很大概率被点击就展示广告,如果概率低,就不展示。
因为如果广告没有被点击,对双方(广告主、平台)来讲都没有好处。所以预测这个概率非常重要,也是此项目的目标。
在这个项目中,会需要完成以下的任务:
1.数据的读取和理解: 把给定的.csv文件读入到内存,并通过pandas做数据方面的统计以及可视化来更深入地理解数据。
2.特征构造: 从原始特征中衍生出一些新的特征,这部分在机器学习领域也是很重要的工作。
3.特征的转化: 特征一般分为连续型(continuous)和类别型(categorical), 需要分别做不同的处理。
4.特征选择: 从已有的特征中选择合适的特征,这部分也是很多项目中必不可少的部分。
5.模型训练与评估: 通过交叉验证方式来训练模型,这里需要涉及到网格搜索等技术
---------------------------------------
The main goal of this project is to predict whether an AD will be clicked or not based on the given AD information and user information.
If the AD has a high probability of being clicked, show it. If the probability is low, don't show it.
Because if the AD is not clicked, neither side (advertiser, APP) will benefit. So predicting that probability is very important, and that's the goal of this project.
In this project, I will complete the following tasks:
1. Read and understand data: read the given. CSV file into memory, and perform statistics and visualization in PANDAS for a deeper understanding of the data.
2. Feature construction: Derive some new features from the original features, which is also an important work in the field of machine learning.
3. Transformation of features: characteristics are generally divided into continuous and categorical, we need to do different treatment respectively.
4. Feature selection: Select appropriate features from existing features, which is also an essential part of many projects.
5. Model training and evaluation: The model is trained through cross-validation, which involves grid search and other technologies
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment