转跳到内容

快乐论文分享第三期-没错这是我的论文笔记


推荐贴

本来是想把笔记po到notion的,但是想了想,还是来一起受苦吧。

NLP预处理方面论文,基于DOM Tree的内容提取

我这几天预计要一天一个帖子,如果有一天我没有来请赶快骂我,谢谢XD

与各位共勉,研究加油啊~

 

今日论文题目:Content Extraction Using Diverse Feature Sets (2013)

 

推荐理由:对于通过对于在网页中的标签等进行机器学习,对于网站内容主体进行提取

 

精彩亮点:

We use the method in [4] to compute the F1-scores, where each word in the document is distinct even if two words are lexically the same. To demonstrate the versatility the learning approach, we train only on the 2012 Train set and make predictions on the rest of the data. In general, combining features does improve model performance, even if the individual model performance is poor. Model performance decreases on the newer 2012 data when compared to the older data sets. Individually, the IC features give a small performance improvement over the baseline, and not surprisingly perform poorly on the older data when CSS was less popular. The low individual performance of the IC features may be attributable to the fact that we accumulate tokens in each block, but meaningful tokens may appear outside the block at higher levels in the DOM. The small train/test differences suggest we may be slightly overfitting.

 

注释
Eternalcycle Eternalcycle 40.00节操
链接到点评

让我想起了最近想搞的MBTI Personality Types 500 Dataset等一系列数据,感觉主流心理学不认可的分类方式+局外教授感觉很厉害局内人笑嘻嘻的"神经算法",混合在一块不知道又能诞生多少‘科学宗教’的信徒:goutou:

qazdr0a在动漫区游玩,偶然见到女装幼妻若若在玩COSPLAY,获得了若若给的3节操封口费。

链接到点评
9 小时前, qazdr0a 说道:

让我想起了最近想搞的MBTI Personality Types 500 Dataset等一系列数据,感觉主流心理学不认可的分类方式+局外教授感觉很厉害局内人笑嘻嘻的"神经算法",混合在一块不知道又能诞生多少‘科学宗教’的信徒:goutou:

MBTI不被认可吗话说?我昨天才测试了MBTI那个,发现我更加自闭了233

链接到点评
  • 4 周后...
于 2022/2/7 于 PM8点27分, AlGoRiThM 说道:

MBTI不被认可吗话说?我昨天才测试了MBTI那个,发现我更加自闭了233

算是交叉学科的大家都比较喜欢用,最近有个硕士论文就是在MBTI分类下的用神经算法把服饰喜好的数据进行处理,最后拟合,个人感觉没啥商用价值:goutou:

链接到点评
  • 骚男锁定了本主题
游客
此主题已关闭。
×
×
  • 新建...

重要消息

为使您更好地使用该站点,请仔细阅读以下内容: 使用条款