转跳到内容

AlGoRiThM

【净土】SS自购团
  • 内容数

    1,069
  • 加入

  • 最后访问

  • 赢得天数

    1

AlGoRiThM 发表的所有内容

  1. 那是不是可以叫做,雷击木II型
  2. 好像是哦,毕竟火线
  3. 辟邪的~可以驱鬼
  4. 就是出于好奇心问的这个问题,我想问的问题是每日签到的留言都会保留吗XD 如果保留的话,不会占用太多的空间吗~~~ 还有就是服务器会不会对于历史记录进行压缩备份啊~然后对于常用的帖子进行缓存,这样子还可以进一步节省空间。 总而言之咱们这个站服务器资料大概占用多大空间啊~~ 就是纯粹的好奇hhh,我想问想了好久了,因为我最近有想法搞搞blog hhhhh SSTM学习网站没跑了
  5. 怎么看……我会盯着自助看? 如果是烤肉自助我会看着烤盘,如果是中餐自助我会看着盘子
  6. 原帖子https://sstm.moe/topic/299712-在隔离期间遇到了鬼该怎么办/ 感觉这个问题很有意思,特地开个帖子来讨论讨论hhhh 雷击木不就是木头被闪电劈了吗?那话说用高压电劈木头然后出售能不能算出售正品雷击木呢?
  7. 我感觉吧,隔离时候遇到鬼都能让隔离时候生活更加有意思一些~ 可惜见不得~
  8. 买了腊肠,打算做煲仔饭试试 虽然还没开始hhhh
  9. 对于我而言,都不重要,自由度更重要hhh
  10. 35脸屠版的大佬,壮哉hhhh 慕名而来
  11. 去运动吧!睡觉的秘诀都藏在那里了!
  12. 我倒是没有和同学一起玩黄油 但是我和同学分享过小黄书XD
  13. 其实无所谓的,只要没有出现什么 漫画balabalabala之类,剧透到结尾的,我一切都好
  14. 论文阅读第二天,两天读了六篇论文,看来你们不一定见得到我了明天hhh 今日论文推荐:《Boilerplate Detection using Shallow Text Features》 作者: Christian Kohlschütter, Peter Fankhauser, Wolfgang Nejdl 内容简介:文章通过对于文本特征的分析,建立了一个语言模型来提取正文内容当中的主题内容。其中有很多对于网页结构的分析,值得一读。与此同时,boilerplate表现也挺好的,在准确度精确度方面都达到了相当高的水准。 文章重点: 1. In the field of Quantitative Linguistics, it is generally assumed that the text creation process can be modeled as urn trials at the level of various linguistic units such as phoneme, word, sentence, text segment, etc. and for several shallow features such as frequency, length, repeat rate, polysemy, and polysexuality. 2. Through our systematical analysis, we found that removing the words from the short text class alone already is a good strategy for cleaning boilerplate and that using a combination of multiple shallow text features achieves an almost perfect accuracy. To a large extent the detection of boilerplate text does not require any inter-document knowledge (frequency of text blocks, common page layout, etc.) nor any training at the token level. 3. the textual content on the Web can apparently be grouped into two classes, long text (most likely the actual content) and short text (most likely navigational boilerplate text) respectively.
  15. MBTI不被认可吗话说?我昨天才测试了MBTI那个,发现我更加自闭了233
  16. 没错,我们是一个学习网站
  17. 本来是想把笔记po到notion的,但是想了想,还是来一起受苦吧。 NLP预处理方面论文,基于DOM Tree的内容提取 我这几天预计要一天一个帖子,如果有一天我没有来请赶快骂我,谢谢XD 与各位共勉,研究加油啊~ 今日论文题目:Content Extraction Using Diverse Feature Sets (2013) 推荐理由:对于通过对于在网页中的标签等进行机器学习,对于网站内容主体进行提取 精彩亮点: We use the method in [4] to compute the F1-scores, where each word in the document is distinct even if two words are lexically the same. To demonstrate the versatility the learning approach, we train only on the 2012 Train set and make predictions on the rest of the data. In general, combining features does improve model performance, even if the individual model performance is poor. Model performance decreases on the newer 2012 data when compared to the older data sets. Individually, the IC features give a small performance improvement over the baseline, and not surprisingly perform poorly on the older data when CSS was less popular. The low individual performance of the IC features may be attributable to the fact that we accumulate tokens in each block, but meaningful tokens may appear outside the block at higher levels in the DOM. The small train/test differences suggest we may be slightly overfitting.
  18. 学校洗衣机收费太正常了……
  19. 列大纲,列完了,啊好无聊
×
×
  • 新建...

重要消息

为使您更好地使用该站点,请仔细阅读以下内容: 使用条款