东海的博客

学习嵌入式

日志

关于我

东海

文章分类

Libsvm使用心得

2009-05-14 08:40:35| 分类：毕业课题相关 | 标签： |举报 |字号大中小订阅

下载LOFTER 我的照片书 |

原文：http://freehello.blogspot.com/2009/04/libsvm.html

Libsvm使用心得
最近在做基于SVM的短信分类的项目，对libsvm的使用进行了小小研究，结合网上泛滥成灾的libsvm使用方法介绍，自己做一简短总结。
libsvm是实现svm的便捷开源工具，应用广泛（除此之外还有lightsvm，没用过）由国立台湾大学Chih-Chung Chang和 Chih-Jen Lin编写，可以实现基于SVM的分类和回归。
由于个人对SVM的理论只是“略懂”，下面只介绍libsvm在win32平台的基本使用方法。对SVM一窍不通的强烈建议看一下入门文章http://ntu.csie.org/~piaip/svm/svm_tutorial.html

先介绍一下大概的流程。
准备数据集（短信语料），处理成libsvm接受的格式，之后进行训练（svm-train）得到模型
，然后进行测试，完成。其中训练的过程需要不断选取参数寻求最佳分类结果，为此libsvm提供了grid.py(python文件)专门用来帮助自动选取最佳参数。

1、资源准备
下载Libsvm、Python和Gnuplot：
libsvm——那必须有啊，最新版本2.89，主页http://www.csie.ntu.edu.tw/~cjlin/libsvm/上下载得到，建议同时下载一个libsvm的初学者guide
Python——主要是为了运行grid.py，最新版是2.5，可以在python的主页http://www.python.org/上下载
Gnuplot——同样为了选取最佳参数和绘图，自己搜一下，win32版的为gp423win32.zip

2、具体流程

LIBSVM 使用的一般步骤是：
1）准备数据集，转化为 LIBSVM支持的数据格式：
[label] [index1]:[value1] [index2]:[value2] ...
即 [l类别标号] [特征1]:[特征值] [特征2]:[特征值] ...
2）对数据进行简单的缩放操作（scale）；（为什么要scale，这里不解释了）
3）考虑选用核函数（通常选取径函数，程序默认）；
4）采用交叉验证（一般采用5折交叉验证），选择最佳参数C与g ；
5）用得到的最佳参数C与g 对整个训练集进行训练得到SVM模型；
6）用得到的SVM模型进行测试

3、操作实现：

先进行各种安装和配置。
在C盘建立一个文件夹libsvm
解压libsvm-2.89.rar，将解压后得到的文件夹下的整个windows目录全部考到C:\libsvm下
python装在C盘，文件夹是Python25，将里面的python.exe复制到C:\libsvm\windows下
gnuplot也装在C盘,
修改grid.py文件，找到其中关于gnuplot路径的那项（其默认路径为gnuplot_exe=r"c:\tmp\gnuplot\bin\pgnuplot.exe"），根据实际路径进行修改，并保存。

1）如果只是进行简单练习，可以在http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/下载合适的数据集，如果是自己的实际项目，要自己编程转换一下。假定我们选取的数据集叫做trainset。

2）对数据进行简单的缩放操作。 svm-scale [options] data_filename
开始->cmd->进入C:\libsvm\windows
然后输入：svm-scale -l 0 -u 1 trainset
得到scale后的文件trainset.scale

使用svm-scale 对数据进行缩放是必要的。因为适当的scale有助于参数的选取和建svm模型的速度。svm-scale 会对value 做scale。范围用 -l, -u 指定，通常是[0,1]，或是[-1,1]。（文本分类一般选[0，1]）。輸出在 stdout。另外要注意的是 testing data 和 training data要一起scale。而svm-scale 最难用的地方就是沒办法指定testing data/training data为同一文档，然后一起scale。
因此为了后面测试，还要对测试集进行scale。
输入：svm-scale -l 0 -u 1 testset
得到scale后的文件testset.scale

3）考虑选用RBF 核函数
Svm-train的用法： svm-train [options] training_set_file [model_file]
training_set_file为之前训练数据，而 model_file 如果不给就会自动生成
[training_set_file].model 。 options 可以先不要给。Options：常用可用的选项即表示的涵义
如下
-s svm类型：SVM设置类型(默认0)
0 -- C-SVC 1 -- v-SVC 2 -- 一类SVM 3 -- e -SVR 4 -- v-SVR-t 核函数类型：核函数设置类型(默认2）
0 – 线性：u'v
1 – 多项式：(r*u'v + coef0)^degree
2 – RBF函数：exp(-ru-v^2)
-g r(gama)：核函数中的函数设置(默认1/ k)
-c cost：设置C-SVC和-SVR的参数(默认1)，
-v n: n-fold交互检验模式

4）采用交叉验证选择最佳参数C与g
注意：（3）中列出的，不同的参数（最常用的就是g和c）条件下会训练出不同的SVM，那怎么选取使SVM最好的参数呢——试，一个一个试！而grid.py就是干这个的。
因此，这里要先用grid选取合适的C和g值。
Usage: grid.py [-log2c begin,end,step] [-log2g begin,end,step] [-v fold][-svmtrain pathname] [-gnuplot pathname] [-out pathname] [-png pathname][additional parameters for svm-train] dataset
一般log2c -10，10，1 ；log2g 10，-10，-1, -v 5即可
输入：
python grid.py -log2c -10,10,1 -log2g 10,-10,-1 trainset.scale
返回结果告诉你best-c和best-g，以后我们就用这个进行正式训练了！另外，还同时还返回准确率。

5）训练
输入：svm-train -c 得到的best-c -g 得到的best-g trainset.scale
得到：trainset.scale.model

6)测试
Svmpredict的用法：$ svmpredict test_file model_file output_file
model_file是由svmtrain产生的模型文件；test_file是要进行预测的数据文件，其格式
和 svmtrain的输入，也就是 training_set_file 是一样的！不过每行最前面的 label 可以
省略 ( 因为 predict 就是要 predict 那个 label) 。但如果 test_file 有 label 的值的
话， predict 完会顺便拿 predict 出来的值跟 test_file 里面写的值去做比对，这代表：
test_file 写的 label 是真正的分类结果，拿来跟我们 predict 的结果比对就可以知道
predict 有没有猜对了；Output_file是svmpredict的输出文件。svm-predict没有其它的选
项。
输入：svm-predict testset.scale trainset.scale.model result
ok，返回测试结果，enjoy it！

评论这张

转发至微博

阅读(15053)| 评论(0)

历史上的今天

this.p={  m:2,
              b:2,
              loftPermalink:'',
              id:'fks_082065087085080068081087095095087085087071084080084074',
              blogTitle:'Libsvm使用心得',
              blogAbstract:'<P\> </P\>  <P\>原文：<A rel=\"nofollow\" href=\"http://freehello.blogspot.com/2009/04/libsvm.html\"  \>http://freehello.blogspot.com/2009/04/libsvm.html</A\></P\>  <P\>Libsvm使用心得<BR\>最近在做基于SVM的短信分类的项目，对libsvm的使用进行了小小研究，结合网上泛滥成灾的libsvm使用方法介绍，自己做一简短总结。<BR\>libsvm是实现svm的便捷开源工具，应用广泛（除此之外还有lightsvm，没用过）由国立台湾大学Chih-Chung Chang和 Chih-Jen Lin编写，可以实现基于SVM的分类和回归。<BR\>由于个人对SVM的理论只是“略懂”，下面只介绍libsvm在win32平台的基本使用方法 。对SVM一窍不通的强烈建议看一下入门文章</P\>',
              blogTag:'',
              blogUrl:'blog/static/33640629200941484035544',
              isPublished:1,
              istop:false,
              type:1,
              modifyTime:1316772158880,
              publishTime:1242261635544,
              permalink:'blog/static/33640629200941484035544',
              commentCount:0,
              mainCommentCount:0,
              recommendCount:0,
              bsrk:-100,
              publisherId:0,
              recomBlogHome:false,
              currentRecomBlog:false,
              attachmentsFileIds:[],
              vote:{},
              groupInfo:{},
              friendstatus:'none',
              followstatus:'unFollow',
              pubSucc:'',
              visitorProvince:'',
              visitorCity:'',
              visitorNewUser:false,
              postAddInfo:{},
              mset:'000',
              mcon:'',
              srk:-100,
              remindgoodnightblog:false,
              isBlackVisitor:false,
              isShowYodaoAd:false,
              hostIntro:'',
              hmcon:'1',
              selfRecomBlogCount:'0',
              lofter_single:'<iframe width="140" height="560" style="overflow:hidden;" src="http://www.lofter.com/mailEntry.do?blogad=1&blog" frameBorder="0"></iframe>'
            }

{list a as x}
    {if !!x}
    <div class="iblock nbw-fce nbw-f40">
      <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.visitorName}/">
      {if x.visitorName==visitor.userName}
      <img alt="${x.visitorNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.visitorName)}&r=${visitor.imageUpdateTime}"/>
      {else}
      <img alt="${x.visitorNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.visitorName)}"/>
      {/if}
      </a>
      <div class="cwd vname thide">
        {if x.moveFrom=='wap'}
          <a class="noul pnt" target="_blank" href="http://blog.163.com/services/wapblog.html?frompersonalbloghome"><span title="来自网易手机博客" class="iblock wapIcon"> </span></a>
        {elseif x.moveFrom=='iphone'}
          <a class="noul pnt" target="_blank"><span title="来自iPhone客户端" class="iblock iphoneIcon"> </span></a>
        {elseif x.moveFrom=='android'}
          <a class="noul pnt" target="_blank"><span title="来自Android客户端" class="iblock androidIcon"> </span></a>
        {elseif x.moveFrom=='mobile'}
          <a class="noul pnt" target="_blank" href="http://blog.163.com/services/emsblog.html?frompersonalbloghome"><span title="来自网易短信写博" class="iblock wapIcon"> </span></a>
        {/if}
        <a class="fc03 m2a"  target="_blank" hidefocus="true" href="http://blog.163.com/${x.visitorName}/">
          ${fn(x.visitorNickname,8)|escape}
        </a>
      </div>
    </div>
    {/if}
    {/list}

<#--最新日志，群博日志--> <#--推荐日志-->

<p class="fc06">推荐过这篇日志的人：</p>
    <div>
      {list a as x}
      {if !!x}
      <div class="iblock nbw-fce nbw-f40">
        <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.recommenderName}/">
        <img alt="${x.recommenderNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.recommenderName)}"/>
        </a>
        <div class="cwd thide">
          <a class="fc03 m2a" target="_blank" hidefocus="true" href="http://blog.163.com/${x.recommenderName}/">
            ${fn(x.recommenderNickname,6)|escape}
          </a>
        </div>
      </div>
      {/if}
      {/list}
    </div>
    {if !!b&&b.length>0}
    <p  class="fc06">他们还推荐了：</p>
    <ul>
    {list b as y}
      {if !!y}
        <li class="rrb"><span class="iblock">·</span><a class="fc03 m2a" target="_blank" href="http://blog.163.com/${y.recommendBlogPermalink}/?from=blog/static/33640629200941484035544">${y.recommendBlogTitle|escape}</a></li>
      {/if}
    {/list}
    </ul>
    {/if}

<#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇，下一篇--> <#-- 热度 -->

{list a as x}
    {if !!x}
    <div class="hotItem iblock nbw-fce nbw-f40">
      <a class="fc03 noul" target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/">
      {if x.publisherUsername==visitor.userName}
      <img alt="${x.publisherNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.publisherUsername)}&r=${visitor.imageUpdateTime}"/>
      {else}
      <img alt="${x.publisherNickname|escape}" onerror="this.src=location.f40" class="cwd bdwa bdc0" src="${fn1(x.publisherUsername)}"/>
      {/if}
      </a>
      <div class="cwd vname thide">
        <a class="fc03 m2a"  target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/">
          ${fn(x.publisherNickname,8)|escape}
        </a>
      </div>
      <a class="f-myLikeIcons hottype {if x.type==1} js-liketype{elseif x.type==2} js-reblogtype{elseif x.type==3} js-sharetype{else}{/if}" target="_blank" hidefocus="true" href="http://blog.163.com/${x.publisherUsername}/"> </a>
    </div>
    {/if}
    {/list}

<#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->

页脚

我的照片书 - 手机博客 - 下载LOFTER APP - 订阅此博客

东海的博客

导航

日志

Libsvm使用心得

历史上的今天

最近读者

热度

评论

页脚