Blog Post Extraction Using Title Finding
Linhai Song, Xueqi Cheng, Yan Guo, Bo Wu, Yu Wang
accepted by ccir'2009: Proceedings of Chinese conference on Information Retrieval
Abstract: With the development of Web2.0, web mining applications pay more attention to blog pages. In order to prevent noises in blog pages from affecting the precision of web mining algorithms, it is very necessary to acquire posts from blog pages correctly. In this paper, we propose a blog post extraction algorithm which uses title finding. There are two stages in this algorithm. In the first stage, text nodes which indicate the title of the post are found and used as the beginning of the post. We take a machine learning approach to realize this stage, and employ SVM as classification model. In the second stage, we find the end of the post. Two methods are introduced in this stage, one uses VIPS segmentation results, and the other is based on hand-coded rules. Experiments are conducted to see how we find titles and how we extract posts. Experimental results show that our algorithm has ideal effects.
分享到:
相关推荐
matlab code for character recognition
Cross-Language Opinion Lexicon Extraction Using Mutual-Reinforcement Label Propagation
Abdominal Adipose Tissues Extraction Using Multi-Scale Deep Neural Network
opencv GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts ,图像分割算法原文
卫星地图或者大分辨率地图路径提取方法,matlab实现,需要较好效果需要调整一些参数。运算速度较快,结果还算准确。主要是一些图像处理方法。
ExtremeLearningMachine资源共享-Approaches-and-applications-of-semi-blind-signal-extraction-for-c_2013_Neuro.pdf 小弟准备学习ELM,才收集到一些相关资料,发现论坛中并无相关资料,因此把自己手头上收集到...
As a promising biometric identification method, gait recognition has many advantages, such as suitable for human identification at a long distance, requiring no contact and hard to imitate.
从航空影像和卫星影像进行自动地物提取,很好的一本地物提取的书籍
高分辨率地图路径提取方法,matlab实现。运算速度较快。主要是一些图像处理方法。可以参考使用,调整参数。
In relation extraction for knowledge-based question answering, searching from one entity to another entity via a single relation is called “one hop”. In related work, an exhaustivesearchfromallone-...
Improving Distantly-Supervised Neural Relation Extraction using Side Information
Real-time Extraction of MPEG-2 Stream Based on FPGA,基于FPGA进行Transport Stream数据流实时提取的文档,介绍了数据流的格式,各个数据帧的定义和长度,还有工作状态
scales, using the size of the local neighborhoods as a discrete scale parameter. This significantly improves the reliability of the detection phase and makes our method more robust in the presence of ...
BERT-Relation-Extraction
There still remain some problems in the accurate N400 waveform extraction from fewer-trial EEG data under the low signal-to-noise ratio (SNR) level. In this study, a supervised signal-to-noise ratio ...
$ cd face-extraction-with-face-api.js # Install dependencies $ yarn # or # $ npm install # Run the app and automatically open it in the default browser $ yarn start # or # $ npm start 如何使用它? 使
This project based on matlab development and best project for and ieee