{"id":469,"date":"2019-07-27T20:17:00","date_gmt":"2019-07-27T12:17:00","guid":{"rendered":"http:\/\/note.systw.net\/note\/?p=469"},"modified":"2023-11-02T20:19:50","modified_gmt":"2023-11-02T12:19:50","slug":"sklearn-decision-tree","status":"publish","type":"post","link":"https:\/\/systw.net\/note\/archives\/469","title":{"rendered":"SKLearn Decision Tree"},"content":{"rendered":"\n<p>Decission Tree<\/p>\n\n\n\n<p><br><strong>\u8f09\u5165\u6a21\u7d44<\/strong><br>from sklearn import tree<\/p>\n\n\n\n<p><strong>\u521d\u59cb\u5316model<\/strong><br>&lt; model&gt;= tree.DecisionTreeClassifier()<br>\u53c3\u6578\u5217\u8868<br>DecisionTreeClassifier(criterion=&#8217;gini&#8217;, splitter=&#8217;best&#8217;, max_depth=None, min_samples_split=2,min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None,class_weight=None, presort=False)<br>\u53c3\u6578\u8aaa\u660e<br>criterion: (default is &#8220;gini&#8221;),&#8221;gini&#8221; for the Gini impurity and &#8220;entropy&#8221; for the information gain.<br>\u3000gini ,\u8a08\u7b97\u5feb<br>\u3000entropy ,\u5207\u5272\u6548\u679c\u6bd4gini\u597d,\u4f46\u8a08\u7b97\u901f\u5ea6\u8f03\u4e45<\/p>\n\n\n\n<p><strong>\u8b93model\u5b78\u7fd2<\/strong><br>&lt; model&gt;.fit(intput, output)<\/p>\n\n\n\n<p><strong>\u770bmodel\u5b78\u7fd2\u72c0\u6cc1<\/strong><br>&lt; model&gt;.score(intput, output)<\/p>\n\n\n\n<p><strong>\u6839\u64dainput\u9810\u6e2coutput<\/strong><br>&lt; model&gt;.predict(input )<br><br><strong>\u986f\u793a\u7279\u5fb5\u91cd\u8981\u6027, \u6578\u5b57\u8d8a\u9ad8\u8868\u793a\u6b64\u6b04\u4f4d\u7279\u5fb5\u8d8a\u660e\u986f<\/strong><br>print &lt; model&gt;.feature_importances_<\/p>\n\n\n\n<p>refer<br>scikit-learn uses an optimised version of the CART algorithm.<br>http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.tree.DecisionTreeClassifier.html#.<\/p>\n\n\n\n<p>&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;..<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>\u8996\u89ba\u5316Decision Tree\u5224\u65b7\u7d50\u679c&nbsp;<\/strong><\/h2>\n\n\n\n<p><br>\u900f\u904e\u5716\u5f62\u986f\u793adecission tree\u7684\u6c7a\u7b56\u904e\u7a0b<br>\u9700\u8981export_graphviz\u548cdot\u9019\u5169\u500b\u5957\u4ef6<\/p>\n\n\n\n<p><strong>Install<\/strong><br>on ubuntu<br>#sudo apt-get install graphviz python-pydot<\/p>\n\n\n\n<p><strong>export_graphviz<\/strong><br>\u7528\u6cd5\u5982\u4e0b<br>tree.export_graphviz(,out_file=&lt; outpath>,feature_names=&lt; list of feature>)<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ex:<br>import os<br>from sklearn.datasets import load_iris<br>iris = load_iris()<br>from sklearn import tree<br>clf = tree.DecisionTreeClassifier()<br>model = clf.fit(iris.data, iris.target)<br>export_file = tree.export_graphviz( model ,out_file='tree.dot')<\/code><\/pre>\n\n\n\n<p>refer<br>http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.tree.export_graphviz.html#sklearn.tree.export_graphviz<\/p>\n\n\n\n<p><strong>pydot<\/strong><br>\u6aa2\u8996dot\u683c\u5f0f\uff0c\u6216\u8f38\u51fa\u6210\u5716\u5f62\u3000<br>ex:<br>dot -Tpng tree.dot -o tree.png<\/p>\n\n\n\n<p><br>tree.dot\u683c\u5f0f\u7c21\u55ae\u8aaa\u660e\u5982\u4e0b\u3000\u3000<br>digraph Tree {<br>0 [label=&#8221;X[13] &lt;= 416.0000ngini = 0.000270672616505nsamples = 100&#8243;, shape=&#8221;box&#8221;] ;<br>1 [label=&#8221;gini = 0.0000nsamples = 7387nvalue = [ 99. 0.]&#8221;, shape=&#8221;box&#8221;] ;<br>0 -&gt; 1 ;<br>2 [label=&#8221;gini = 0.0000nsamples = 1nvalue = [ 0. 1.]&#8221;, shape=&#8221;box&#8221;] ;<br>0 -&gt; 2 ;<br>}<br>[\u985e\u52250,\u985e\u52251]<br>\u7b2c13\u500b\u6b04\u4f4d\u5c0f\u65bc416,\u6703\u8b9399\u500b\u5c6c\u65bc\u985e\u52250,1\u500b\u5c6c\u65bc\u985e\u52251<\/p>\n\n\n\n<p>&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;&#8230;..<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Example in simple dataset<\/strong><\/h2>\n\n\n\n<p><strong>#vi dataset<\/strong><br>class,packet,traffic<br>0,1,4<br>0, 2,3<br>1, 3 ,2<br>1, 4 ,1<\/p>\n\n\n\n<p><strong>#vi train.py<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import numpy as np\nimport sys\nimport os\n\n##############\nfilepath=sys.argv&#91;1]\nf = open(filepath)\ndataset = np.loadtxt(f,delimiter=',',skiprows=1)\ntarget=dataset&#91;:,0]\ndata=dataset&#91;:,1:]\n\nf.seek(0)\nlisthead=f.readlines()&#91;0].strip().split(',')&#91;1:]\n\nfrom sklearn import tree\nclf = tree.DecisionTreeClassifier()\nresult = clf.fit(data,target)\ntree.export_graphviz(result,out_file=filepath+'_tree.dot',feature_names=listhead)\n\n### list important feature ( result is few different every compute)\nimportfeature=clf.feature_importances_\ndictname=dict()\nfor name,value in zip(listhead,importfeature):\n\u3000dictname&#91;name]=value\ndicsorted= sorted(dictname.iteritems(), key=lambda d:d&#91;1], reverse = True)\nfor line in dicsorted:\n\u3000print line<\/code><\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Decission Tree \u8f09\u5165\u6a21\u7d44from sklear &#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"","fifu_image_alt":"","_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[13],"tags":[],"class_list":["post-469","post","type-post","status-publish","format-standard","hentry","category-dataanalysis"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts\/469","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/comments?post=469"}],"version-history":[{"count":0,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts\/469\/revisions"}],"wp:attachment":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/media?parent=469"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/categories?post=469"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/tags?post=469"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}