python pandas access

控制dataframe

 

..........................................
外部資料讀取與寫入

 

讀檔
pd.read_csv()
ex:
#cat data.csv
date,value
2016,100
2015,90
2014,50
>>>data=pd.read_csv('data.csv')
>>>data.date
0 2016
1 2015
2 2014
ex:
pd.read_csv('ch09/stock_px.csv', parse_dates=True, index_col=0)
ex:
temp_df = pd.read_csv('file.csv', header=None, names=['FileID', 'CustomerID', 'QueryTS', 'ProductID'])

 


儲存
pd.to_csv()
ex:
pd.to_csv("analysis_result.csv")

 

 

..........................................
資料轉換

 

轉換
pandas to series
.series()
ex:
ts = pd.Series(df['Value'].values, index=df['Date'])

pandas to numpy
.values
ex:
npdata=df.values

pandas to list
.to_list()
ex:
listdata=df.to_list()

dict to pandas
ex:
groups = ["Web", "Dev", "Cloud", "Data", "Security", "group_p"]
groupvalue = [5, 9, 1, 4, 6, 7]
data_dict = {"groups": groups,"value": groupvalue}
data_df = pd.DataFrame(data_dict)

list to pandas
ex:
listfieldlen = [59, 9, 19, 14, 6, 77]
df_fieldcount = pd.DataFrame()
df_fieldcount['fieldcount']=listfieldlen
ex:
listfieldlen = [59, 9, 19, 14, 6, 77]
df_listfieldlen = pd.DataFrame(listfieldlen)

 

refer
https://ithelp.ithome.com.tw/articles/10185922

............................

資料操作


設定欄位型別
.astype( < type> )
ex:
df["Department"] =df["Department"].astype("category")

 

 

刪除指定的欄位
.drop( < field name > )
ex:
data=data.drop(data.columns[0],axis=1)
data=data.drop("column name",axis=1)
ex:
#刪除指定欄位
> datatrain.drop(['agenttime'],axis=1,inplace=True)


分割欄位資料
.str.split( < char> )
ex:
data3['URL'].str.split('?', expand=True)

合併欄位
#透過合併欄位建立新欄位
ex:
> datatrain[‘ip'] = datatrain[‘ipv4'] + datatrain[‘ipv6']

 

以累積方式合併資料
.append()
ex:
datatrain=train1.append(train2, ignore_index=True)


水平合併
.concat
ex:
datanew = pd.concat([data1,data2],axis=1, join_axes=[data1.index])

... 

 

將空值(NaN)補0
.fillna()
ex:
df.fillna(0)
ex:
# 將空值補滿
> datatrain[‘account'].fillna("null", inplace=True)

取代
.replace()
此方法需先透過dictionary將字串變成代號
ex:
> dict_content={'Tcp':0,'TCP':0,'udp':1,'UDP':1}
> datatrain[‘protocol'].replace(dict_content,regex=True)

將資料前後空白刪除
df.str.strip()

轉換時間格式
to_datetime( < time>, format=" < time format>")
ex:
pd.to_datetime("6 Mar 2019 00:54:40",format='%d %b %Y %H:%M:%S')
>2019-03-06 00:54:40

...

 

 

 

2019-07-06 15:58:23發表 0000-00-00 00:00:00修改   

金融科技
數據分析
程式開發

計算機組織與結構
資料結構與演算法
Database and MySql
manage tool
windows
unix-like
linux service
network
network layer3
network layer2
network WAN
network service
作業系統
數位鑑識
資訊安全解決方案
資訊安全威脅
Cisco security
Cisco network
Cisco layer3
Cisco layer2



  登入      [牛的大腦] | [單字我朋友] Powered by systw.net