{"id":453,"date":"2012-12-15T20:01:00","date_gmt":"2012-12-15T12:01:00","guid":{"rendered":"http:\/\/note.systw.net\/note\/?p=453"},"modified":"2023-11-02T20:02:02","modified_gmt":"2023-11-02T12:02:02","slug":"data-description","status":"publish","type":"post","link":"https:\/\/systw.net\/note\/archives\/453","title":{"rendered":"Data Description"},"content":{"rendered":"\n<p><strong>description data summarization<\/strong><br>\u63d0\u4f9b\u5206\u6790\u7684\u57fa\u790e\u5354\u52a9data preprocessing<\/p>\n\n\n\n<p><strong>\u5e38\u7528\u7684\u65b9\u6cd5\u4e3b\u8981\u6709<\/strong><br>\u3000measuring the central tendency(\u96c6\u4e2d\u8da8\u52e2\/\u4f4d\u7f6e\u91cf\u6578)<br>\u3000measuring the dispersion of data(\u96e2\u6563\u7a0b\u5ea6\/\u96e2\u6563\u91cf\u6578)<br>\u3000\u8861\u91cf\u8cc7\u6599\u4f4d\u7f6e<br>\u3000graphic displays of basic statistical descriptions<\/p>\n\n\n\n<p>&#8230;&nbsp;<\/p>\n\n\n\n<p><strong>measuring the central tendency(\u96c6\u4e2d\u8da8\u52e2\/\u4f4d\u7f6e\u91cf\u6578)<\/strong><br>\u4e3b\u8981\u6709<br><strong>mean<\/strong>,\u6c42\u5e73\u5747<br>\u3000weighted arithmetic mean(\u52a0\u6b0a\u7b97\u8853\u5e73\u5747)<br>\u3000trimmed mean:\u522a\u9664\u6975\u7aef\u503c<br><strong>median<\/strong>,\u6c42\u4e2d\u9593\u6578<br>\u3000sort,\u900f\u904e\u6392\u5e8f,\u8f03\u82b1\u6642\u9593,ex: n=11,\u5247median=6th\u6578\u5b57, n=10,\u5247median=(5th+6th)\/2\u7684\u6578\u5b57<br>\u3000interpolation(\u5167\u5dee\u6cd5),\u53d6\u5927\u6982\u6578,\u7528\u65bc\u89e3\u6c7a\u6578\u91cf\u904e\u5927\u7684\u554f\u984c<br><strong>mode<\/strong>,\u6c42\u51fa\u73fe\u983b\u7387\u6700\u9ad8\u7684\u6578\u5b57,\u53ea\u6709\u4e00\u500b\u6578\u4e5f\u7a31unimodal,\u5169\u500b\u7a31bimodal,\u4e09\u500b\u7a31trimodal,\u591a\u500b\u7a31multimodal<br><strong>midrange<\/strong>,(min+max)\/2<br>ps<br>\u504f\u96e2\u516c\u5f0f:<strong>mean-mode=3(mean-median)<\/strong><br>\u5224\u65b7\u6578\u64da\u8cc7\u6599\u662f\u5426\u56b4\u91cd\u504f\u96e2<\/p>\n\n\n\n<p>ps:<br><strong>interpolation<\/strong><br>n=\u8cc7\u6599\u6578<br>nl=\u4e0d\u5305\u542b\u4e2d\u9593bin\u7684\u524d\u534a\u6bb5\u6240\u6709bin\u7684\u8cc7\u6599\u6578<br>nm=\u4e2d\u9593bin\u7684\u8cc7\u6599\u6578<br>l=\u5340\u9593\u6578,ex:1-10,11-20,\u5247\u5340\u9593\u6578\u70ba10<br>m=\u4e2d\u9593\u7684bin\u7684\u7b2c\u4e00\u500b\u6578\u503c<br>\u516c\u5f0f=(median-m)\/l=((n\/2)-(nl)) \/ nm<br>ex:<br>traffic:frequency<br>1-15:7<br>16-30:10<br>31-45:8<br>46-60:7<br>61-75:3<br>76-90:4<br>91-105:1<br>so:n=40,nl=7+10+8=25,nm=7 , l=15,m=46<br>(median-46)\/l=((40\/2)-(25)) \/ 7<br>median=35.28<\/p>\n\n\n\n<p>&#8230;&nbsp;<\/p>\n\n\n\n<p><br><strong>measuring the dispersion of data(\u96e2\u6563\u7a0b\u5ea6\/\u96e2\u6563\u91cf\u6578)<\/strong><br>\u4e3b\u8981\u6709<br><strong>range<\/strong>:max-min,\u6700\u7c21\u55ae\u7684\u96e2\u6563\u91cf\u6578<br><strong>IQR(inter-quartile range,\u56db\u5206\u4f4d\u6578\u8ddd)<\/strong>:Q3-Q1,\u53ef\u514b\u670d\u6975\u7aef\u8cc7\u6599\u503c ,&nbsp;ex:\u82e5n=10\u5247IQR=5<br><strong>Five number summary<\/strong>:min,Q1,median,Q3,max<br><strong>variance<\/strong>:\u6578\u5b57\u8d8a\u5927\u8868\u793a\u5404\u6578\u64da\u8d8a\u504f\u96e2<br><strong>standard deviation<\/strong>:sqrt(variance),\u7528\u4f86\u63cf\u8ff0\u8cc7\u6599\u9ede\u8207\u5e73\u5747\u89c0\u67e5\u503c\u96e2\u591a\u9060<br><strong>coefficient of variation(\u8b8a\u7570\u4fc2\u6578)<\/strong>:standard deviation\/mean*100,\u6a19\u6e96\u5dee\u4f54\u5e73\u5747\u6578\u4e4b\u6bd4\u91cd<\/p>\n\n\n\n<p>ps:<br>\u56db\u5206\u4f4d\u6578(\u4f4d\u7f6e\u91cf\u6578)<br><strong>Q1(quartiles 1)<\/strong>:25th percentile,ex:\u82e5n=10\u5247q1=3th,\u82e5n=11\u5247q1=3th<br><strong>Q2(quartiles 2)\/median<\/strong>:50th percentile,<br><strong>Q3(quartiles 3)<\/strong>:75th percentile,ex:\u82e5n=10\u5247q3=8th,\u82e5n=11\u5247q3=9th<\/p>\n\n\n\n<p>&#8230;&nbsp;<\/p>\n\n\n\n<p><strong>\u8861\u91cf\u8cc7\u6599\u4f4d\u7f6e<\/strong><br>\u4e3b\u8981\u6709<br><strong>skewness(\u504f\u5ea6)<\/strong>:\u548c\u6a19\u6e96\u5e38\u614b\u5206\u914d\u6bd4\u8f03\u7684\u7d50\u679c<br><strong>z-score(z-\u5206\u6578)<\/strong>:\u77ad\u89e3\u89c0\u5bdf\u503c\u4e4b\u76f8\u5c0d\u4f4d\u7f6e<br>\u516c\u5f0f\u70ba:zi=(xi-mean(x))\/s<br>\u3000xi=\u7b2ci\u500b\u89c0\u5bdf\u503c<br>\u3000s=standard deviation<br>\u3000mean(x)=x\u7684mean<br><strong>chebyshev&#8217;s theorem(\u67f4\u6bd4\u96ea\u592b\u5b9a\u7406)<\/strong><br><strong>\u7d93\u9a57\u6cd5\u5247<\/strong><br>ps:<br>outlier:\u89c0\u5bdf\u503c\u4e4bz-score &gt; 3 or &lt;3 \u5247\u70ba\u7570\u5e38<\/p>\n\n\n\n<p>ps:<br>outlier(\u96e2\u7fa4\u503c)<br>\u5728\u8cc7\u6599\u96c6\u4e2d\u6975\u5927\u6216\u6975\u5c0f\u7684\u89c0\u5bdf\u503c<br>\u53ef\u7528\u65bc\u7570\u5e38\u5224\u65b7<\/p>\n\n\n\n<p>&#8230;<\/p>\n\n\n\n<p><strong>graphic displays of basic statistical descriptions<\/strong><br>\u5e38\u898b\u6709\u4ee5\u4e0b<br><strong>boxplot analysis<\/strong><br>\u3000by Five number summary<br>\u3000outlier,\u82e5Xi &gt; (1.5*IQR)+Q3 or Xi &lt; Q1-(1.5*IQR)<br>\u3000ex: Q1=60,Q2=100 ,\u5247IQR=40,\u82e5Xi\u5927\u65bc100+40*1.5\u6216\u5c0f\u65bc60-40*1.5\u5247\u70ba\u7570\u5e38<br><strong>histogram analysis<br><\/strong>\u53ef\u770b\u51fa\u8cc7\u6599\u7684:<br>\u3000location(\u4e2d\u5fc3\u4f4d\u7f6e)<br>\u3000variation(\u8b8a\u7570)<br>\u3000skewness(\u66f2\u5ea6)<br>\u3000outlier(\u662f\u5426\u6709\u504f\u96e2\u503c)<br>\u3000distribution(\u8cc7\u6599\u5206\u914d)<br><strong>quantile plot<\/strong>:\u8868\u793a\u55ae\u4e00\u8b8a\u6578\u5206\u4f48<br>\u3000\u505a\u6cd5:x\u8ef8\u70ba\u5be6\u969b\u6578\u64da,y\u8ef8\u70baf-value<br>\u3000f-value=(i-0.5)\/n,i=\u7b2ci\u500b\u6578\u64da,n=\u5171\u5e7e\u500bi<br><strong>Q-Q(quantile-quantile) plot<\/strong>:\u4e3b\u8981\u7528\u65bc\u6bd4\u8f03\u5169\u8b8a\u6578\u95dc\u4fc2<br>\u3000\u505a\u6cd5:\u5148\u5c07\u5169\u8b8a\u6578\u4e4b\u6578\u64da\u505a\u905e\u589e\u6392\u5e8f,\u518d\u5c07\u5169\u8b8a\u6578\u5404\u653e\u5728x\u8207y\u8ef8,\u5716\u5f62\u6703\u5448\u73fe\u5411\u4e0a\u6216\u5411\u4e4b\u95dc\u4fc2<br><strong>scatter plot<\/strong>:\u50c5\u5c072\u500b\u7dad\u5ea6\u7684\u6578\u64da\u6a19\u4e0a<br><strong>loess curve<\/strong><\/p>\n\n\n\n<p>&#8230;&#8230;&#8230;&#8230;..<\/p>\n\n\n\n<p><strong>outlier<\/strong><\/p>\n\n\n\n<p><strong>outlier\u7522\u751f\u7684\u539f\u56e0<\/strong><br>\u53ef\u80fd\u6709\u4ee5\u4e0b\u5169\u7a2e<br>1,\u57f7\u884c\u932f\u8aa4&nbsp;ex:\u4ee5\u6975\u9ad8\u503c\u4ee3\u8868\u5176\u4ed6\u8cc7\u8a0a,\u5c0e\u81f4\u8a08\u7b97\u6642\u4e0d\u6b63\u78ba<br>2.\u8cc7\u6599\u672c\u8eab\u7684\u8cc7\u8a0a&nbsp;ex:\u8001\u95c6\u7684\u85aa\u6c34\u548c\u4e00\u822c\u54e1\u4f86\u6bd4\u6703\u8b8a\u6210outlier<\/p>\n\n\n\n<p><strong>outlier detection<\/strong><br>\u5b9a\u7fa9\u4ec0\u9ebc\u6a23\u7684\u8cc7\u6599\u70ba\u4e0d\u4e00\u81f4,\u4e26\u900f\u904e\u4e00\u500b\u65b9\u6cd5\u627e\u51fa<\/p>\n\n\n\n<p>.<\/p>\n\n\n\n<p>\u5e38\u898b\u65b9\u6cd5\u6709<\/p>\n\n\n\n<p><strong>statistical distribution-based(\u7d71\u8a08\u65b9\u5f0f)<\/strong><br>\u9700\u5148\u627e\u5230\u8cc7\u6599\u539f\u672c\u6b63\u5e38\u7684\u5206\u4f48\u65b9\u5f0f,(\u6709\u5148\u60c5\u6cc1\u4e0b\u7121\u6cd5\u627e\u5230\u6b63\u5e38\u7684\u5206\u4f48\u65b9\u5f0f)<br>1\u5206\u4f48<br>\u3000\u7e7c\u627f\u5c0d\u7acb\u5206\u4f48<br>\u3000\u6df7\u5408\u5c0d\u7acb\u5206\u4f48<br>\u3000\u6ed1\u52d5\u5c0d\u7acb\u5206\u4f48<br>2\u6aa2\u6e2c<br>\u3000\u5340\u584a\u7a0b\u5e8f<br>\u3000\u9023\u7e8c\u7a0b\u5e8f<\/p>\n\n\n\n<p><strong>\u8ddd\u96e2\u5f0f\u7684\u65b9\u5f0f<\/strong><br>\u9700\u5148\u627e\u51fa\u6b63\u78ba\u7684\u53c3\u6578(\u9700\u5148\u7d93\u904e\u6e2c\u8a66\u624d\u53ef\u5f97\u5230)<br>\u5e38\u898b\u65b9\u6cd5\u6709<br>\u3000index-based :\u500b\u9ad4\u534a\u5f91d\u5167\u70ba\u9130\u5c45,\u82e5\u500b\u9ad4o\u7684\u9130\u5c45\u6578\u5c0f\u65bcm,\u5247\u500b\u9ad4\u70baoutlier<br>\u3000nested-loop algorithm:<br>\u3000cell-based algorithm:<\/p>\n\n\n\n<p><strong>\u5bc6\u5ea6\u5f0f\u7684\u65b9\u5f0f<\/strong><br>\u53ef\u89e3\u6c7a\u5bc6\u5ea6\u76f8\u7576\u4e0d\u540c\u7684\u5206\u4f48\u5206\u5f0f<\/p>\n\n\n\n<p><strong>\u504f\u5dee\u5f0f\u7684\u65b9\u5f0f<\/strong><br>\u4f7f\u7528\u500b\u9ad4\u7fa4\u7d44\u7684\u7279\u6027,\u504f\u96e2\u6b64\u7279\u6027\u6703\u88ab\u8996\u70baoutlier<br>\u5e38\u898b\u65b9\u6cd5\u6709<br>\u30001.sequential exception technique(\u9806\u5e8f\u7570\u5e38\u65b9\u6cd5),\u5c6c\u65bcnp-hard\u554f\u984c<br>\u30002.OLAP data cube\u65b9\u6cd5<br><br>ps:<br>statistical-based,distance-based\u5206\u6790\u53d6\u6c7a\u65bc\u8cc7\u6599\u5168\u57df\u5206\u4f48<\/p>\n","protected":false},"excerpt":{"rendered":"<p>description data summarization &#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"","fifu_image_alt":"","_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[13],"tags":[],"class_list":["post-453","post","type-post","status-publish","format-standard","hentry","category-dataanalysis"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts\/453","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/comments?post=453"}],"version-history":[{"count":0,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/posts\/453\/revisions"}],"wp:attachment":[{"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/media?parent=453"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/categories?post=453"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/systw.net\/note\/wp-json\/wp\/v2\/tags?post=453"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}