本篇內(nèi)容介紹了“pandas的基礎(chǔ)用法”的有關(guān)知識(shí),在實(shí)際案例的操作過程中,不少人都會(huì)遇到這樣的困境,接下來就讓小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧!希望大家仔細(xì)閱讀,能夠?qū)W有所成!
創(chuàng)新互聯(lián)專業(yè)為企業(yè)提供新晃網(wǎng)站建設(shè)、新晃做網(wǎng)站、新晃網(wǎng)站設(shè)計(jì)、新晃網(wǎng)站制作等企業(yè)網(wǎng)站建設(shè)、網(wǎng)頁設(shè)計(jì)與制作、新晃企業(yè)網(wǎng)站模板建站服務(wù),10多年新晃做網(wǎng)站經(jīng)驗(yàn),不只是建網(wǎng)站,更提供有價(jià)值的思路和整體網(wǎng)絡(luò)服務(wù)。
import pandas as pd country1 = pd.Series({'Name': '中國(guó)', 'Language': 'Chinese', 'Area': '9.597M km2', 'Happiness Rank': 79}) country2 = pd.Series({'Name': '美國(guó)', 'Language': 'English (US)', 'Area': '9.834M km2', 'Happiness Rank': 14}) country3 = pd.Series({'Name': '澳大利亞', 'Language': 'English (AU)', 'Area': '7.692M km2', 'Happiness Rank': 9}) df = pd.DataFrame([country1, country2, country3], index=['CH', 'US', 'AU'])
df = pd.DataFrame(columns=["epoch", "train_loss", "train_auc", "test_loss", "test_auc"]) log_dic = {"epoch": 1, "train_loss": 0.2, "train_auc": 1., "test_loss": 0, "test_auc": 0 } df = df.append([log_dic]) log_dic = {"epoch": 2, "train_loss": 0.2, "train_auc": 1., "test_loss": 0, "test_auc": 0 } df = df.append([log_dic]) # 對(duì)index進(jìn)行重新編號(hào) # inplace=True表示在原數(shù)據(jù)上修改 # drop=True表示丟棄之前的index df.reset_index(inplace=True, drop=True)
df1 = pd.DataFrame(columns=["epoch", "train_loss", "train_auc", "test_loss", "test_auc"]) log_dic = {"epoch": 1, "train_loss": 0.2, "train_auc": 1., "test_loss": 0, "test_auc": 0 } df1 = df1.append([log_dic]) df2 = pd.DataFrame(columns=["epoch", "train_loss", "train_auc", "test_loss", "test_auc"]) log_dic = {"epoch": 2, "train_loss": 0.1, "train_auc": 1., "test_loss": 0, "test_auc": 1 } df2 = df2.append([log_dic]) # ignore_index=True表示重新對(duì)index進(jìn)行編號(hào) df_new = pd.concat([df1, df2], axis=0, ignore_index=True)
columns = ["epoch", "train_loss", "train_auc", "test_loss", "test_auc"] df_new[header].to_csv('text.txt', index=False, header=columns, sep='\t')
df_new[header].to_csv('text.txt', index=False, header=None, sep='\t')
df = pd.read_csv('text.txt', sep='\t', header=None, nrows=100) df.columns = ["epoch", "train_loss", "train_auc", "test_loss", "test_auc"]
# 需要使用header參數(shù)指定columns在第幾行,通常是第0行 df = pd.read_csv('text.txt', sep='\t', header=[0]) #指定特定columns讀取 reprot_2016_df = pd.read_csv('2016.csv', index_col='Country', usecols=['Country', 'Happiness Rank', 'Happiness Score', 'Region'])
df = pd.DataFrame(columns=["epoch", "train_loss", "train_auc", "test_loss", "test_auc"]) log_dic = {"epoch": 2, "train_loss": 0.1, "train_auc": 1., "test_loss": 23, "test_auc": 1 } df = df.append([log_dic]) df.to_pickle('df_log.pickle')
6. 加載pickle文件
df = pd.read_pickle('df_log.pickle')
使用下圖的數(shù)據(jù)為例子
df.loc['CH'] # Series類型
df.loc['CH'].index # Index(['Name', 'Language', 'Area', 'Happiness Rank'], dtype='object') df.loc['CH']['Name'] # '中國(guó)' df.loc['CH'].to_numpy() # array(['中國(guó)', 'Chinese', '9.597M km2', 79], dtype=object)
df.iloc[1] # 索引第二行
df.loc[['CH', 'US']] df.iloc[[0, 1]]
df['Area'] # type: Series df[['Name', 'Area']] # type: DataFrame
print('先取出列,再取行:') print(df['Area']['CH']) print(df['Area'].loc['CH']) print(df['Area'].iloc[0]) print('先取出行,再取列:') print(df.loc['CH']['Area']) print(df.iloc[0]['Area']) print(df.at['CH', 'Area'])
df.drop(['CH'], inplace=True) # 刪除行 inplace=True表示在原數(shù)據(jù)上修改 df.drop(['Area'], axis=1, inplace=True) # 刪除列,需要指定axis=1
使用下面的數(shù)據(jù)
import numpy as np df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'], "toy": [np.nan, 'Batmobile', 'Bullwhip'], "born": [pd.NaT, pd.Timestamp("1940-04-25"),pd.NaT] })
""" axis: 0: 行操作(默認(rèn)) 1: 列操作 how: any: 只要有空值就刪除(默認(rèn)) all:全部為空值才刪除 inplace: False: 返回新的數(shù)據(jù)集(默認(rèn)) True: 在愿數(shù)據(jù)集上操作 """ df.dropna(axis=0, how='any', inplace=True)
df.dropna(axis=0, how='any', subset=['toy'], inplace=False) # subset指定操作特定列的nan
使用下面的數(shù)據(jù)
df = pd.DataFrame([[np.nan, 2, np.nan, 0], [3, 4, np.nan, 1], [np.nan, np.nan, np.nan, 5], [np.nan, 3, np.nan, 4]], columns=list('ABCD'))
df.fillna(0, inplace=True)
# "橫向用缺失值前面的值替換缺失值" df.fillna(axis=1, method='ffill', inplace=False)
# "縱向用缺失值上面的值替換缺失值" df.fillna(axis=0, method='bfill', inplace=False)
df['A'].fillna(0, inplace=True) # 指定特定列填充
df.isnull() df['A'].isna()
import pandas as pd staff_df = pd.DataFrame([{'姓名': '張三', '部門': '研發(fā)部'}, {'姓名': '李四', '部門': '財(cái)務(wù)部'}, {'姓名': '趙六', '部門': '市場(chǎng)部'}]) student_df = pd.DataFrame([{'姓名': '張三', '專業(yè)': '計(jì)算機(jī)'}, {'姓名': '李四', '專業(yè)': '會(huì)計(jì)'}, {'姓名': '王五', '專業(yè)': '市場(chǎng)營(yíng)銷'}])
inner(交集) outer(并集) left right
pd.merge(staff_df, student_df, how='inner', on='姓名') pd.merge(staff_df, student_df, how='outer', on='姓名')
# 設(shè)置姓名為索引 staff_df.set_index('姓名', inplace=True) student_df.set_index('姓名', inplace=True) pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)
# 重置index為range() staff_df.reset_index(inplace=True) student_df.reset_index(inplace=True) staff_df.rename(columns={'姓名': '員工姓名'}, inplace=True) student_df.rename(columns={'姓名': '學(xué)生姓名'}, inplace=True) pd.merge(staff_df, student_df, how='left', left_on='員工姓名', right_on='學(xué)生姓名')
pd.merge(staff_df, student_df, how='inner', left_on=['員工姓名', '地址'], right_on=['學(xué)生姓名', '地址'])
report_data = pd.read_csv('./2015.csv') report_data.head()
data.head() data.info() data.describe() data.columns data.index
df.rename(columns={'Region': '地區(qū)', 'Happiness Rank': '排名', 'Happiness Score': '幸福指數(shù)'}, inplace=True)
# null替換成0 df.fillna(0, inplace=False) # 丟棄null df.dropna() # 前向填充 df.ffill() # 后向填充 df.bfill(inplace=True)
# apply使用 # 獲取姓 staff_df['員工姓名'].apply(lambda x: x[0]) # 獲取名 staff_df['員工姓名'].apply(lambda x: x[1:]) # 結(jié)果合并 staff_df.loc[:, '姓'] = staff_df['員工姓名'].apply(lambda x: x[0]) staff_df.loc[:, '名'] = staff_df['員工姓名'].apply(lambda x: x[1:])
依據(jù)columns分組
grouped = report_data.groupby('Region') grouped['Happiness Score'].mean() grouped.size() # 迭代groupby對(duì)象 for group, frame in grouped: mean_score = frame['Happiness Score'].mean() max_score = frame['Happiness Score'].max() min_score = frame['Happiness Score'].min() print('{}地區(qū)的平均幸福指數(shù):{},最高幸福指數(shù):{},最低幸福指數(shù){}'.format(group, mean_score, max_score, min_score))
定義函數(shù)分組
report_data2 = report_data.set_index('Happiness Rank') def get_rank_group(rank): rank_group = '' if rank <= 10: rank_group = '0 -- 10' elif rank <= 20: rank_group = '10 -- 20' else: rank_group = '> 20' return rank_group grouped = report_data2.groupby(get_rank_group) for group, frame in grouped: print('{}分組的數(shù)據(jù)個(gè)數(shù):{}'.format(group, len(frame)))
# 實(shí)際項(xiàng)目中,通??梢韵热藶闃?gòu)造出一個(gè)分組列,然后再進(jìn)行g(shù)roupby # 按照score的整數(shù)部分進(jìn)行分組 # 按照幸福指數(shù)排名進(jìn)行劃分,1-10, 10-20, >20 # 如果自定義函數(shù),操作針對(duì)的是index report_data['score group'] = report_data['Happiness Score'].apply(lambda score: int(score)) grouped = report_data.groupby('score group') for group, frame in grouped: print('幸福指數(shù)整數(shù)部分為{}的分組數(shù)據(jù)個(gè)數(shù):{}'.format(group, len(frame)))
使用bar類型的柱狀圖統(tǒng)計(jì)每個(gè)label的個(gè)數(shù)。
train_df.label.value_counts().plot(kind='bar')
“pandas的基礎(chǔ)用法”的內(nèi)容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識(shí)可以關(guān)注創(chuàng)新互聯(lián)網(wǎng)站,小編將為大家輸出更多高質(zhì)量的實(shí)用文章!
當(dāng)前標(biāo)題:pandas的基礎(chǔ)用法
網(wǎng)頁地址:http://jinyejixie.com/article0/ggisio.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供網(wǎng)站設(shè)計(jì)、軟件開發(fā)、品牌網(wǎng)站制作、移動(dòng)網(wǎng)站建設(shè)、響應(yīng)式網(wǎng)站、外貿(mào)網(wǎng)站建設(shè)
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來源: 創(chuàng)新互聯(lián)