>>>import pandas as pd
>>> df = pd.DataFrame({'Name': ['Tom', 'Jim', 'Lily'], 'Age': [20, 18, 22], 'Gender': ['Male', 'Male', 'Female']})
注:该例子数据形式来自:https://www.python100.com/html/116332.html
示例1:提取索引名=‘1’的那一行,返回一个行Series
>>> row = df.loc[1]#按索引名提取,当使用自动生成的索引时,索引名与索引号相同
>>> row
Name Jim
Age 18
Gender Male
Name: 1, dtype: object
>>> type(row)
<class 'pandas.core.series.Series'>
注:
df.iloc[:] #按索引(号)提取
示例2:
row = df.loc[0:1]
>>> row
Name Age Gender
0 Tom 20 Male
1 Jim 18 Male
>>> row = df.iloc[0:1]
>>> row
Name Age Gender
0 Tom 20 Male
>>> row = df.loc[0:0]
>>> row
Name Age Gender
0 Tom 20 Male
>>> type(row)
<class 'pandas.core.frame.DataFrame'>
注意:两种提取的区间有区别:按索引(号)提取的区间为:[0,1)
>>> row = df.iloc[0:0]
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
>>> row = df.loc[df['Age'] > 20].iloc[0]['Name']
>>> row
'Lily'
>>>
上语句的含义是:需要从dataframe:df.loc[df[‘Age’] > 20]中提取索引为0的行Series的‘name’值
示例1:
>>> row = df.loc[df['Age'] > 18]
>>> row
Name Age Gender
0 Tom 20 Male
2 Lily 22 Female
>>>
注:超过区间,不会产生错误,返回:
>>> row = df.loc[df['Age'] > 23]
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
示例2:
>>> row = df.loc[(df['Age'] >= 18)&(df['Name'] == 'Lily')]
>>> row
Name Age Gender
2 Lily 22 Female
>>>
如果条件为False则返回的dataframe为Empty:
>>> row = df.loc[(df['Age'] >= 18)&(df['Name'] == 'tongzhi')]#'tongzhi'不存在原dataframe
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
>>>
当然也可以:用’|'关系操作符:
>>> row = df.loc[(df['Age'] >= 18)|(df['Name'] == 'Jim')]
>>> row
Name Age Gender
0 Tom 20 Male
1 Jim 18 Male
2 Lily 22 Female
>>>
注:还可以关系:~ 非
可以使用lambda或自定义函数(返回bool)选择符合返回条件的行,如:
>>> x='Jim'
>>> row = df.loc[lambda x:x['Name'] == 'Jim']
>>> row
Name Age Gender
1 Jim 18 Male
>>>
df[df[“column_name”].isin(li)] (# li = [20, 25, 27] 或 li = np.arange(20, 30))
根据从isin函数传入的列表(li),筛选出与列表中包含的数值或字符串相同的数据记录, 用法有点类似sql中的"in"
df.query(“(column_name1 == ‘str1’) & (column_name2 == ‘str2’)”)
根据query中引入的不同字段(str1,str2等)和条件,筛选出同时能满足这些要求的数据记录
df[df[“column_name”].str.contains(“str”)]
筛选出所有含有(str)的数据记录, 用法类似于sql中的"contains"
以上参考了:链接:https://blog.csdn.net/weixin_45914452/article/details/120585861
>>> row = df.loc[(18<=df['Age'] <= 22)]
Traceback (most recent call last):
File "<pyshell#56>", line 1, in <module>
row = df.loc[(18<=df['Age'] <= 22)]
File "D:\Program Files\Python371\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__
f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>>
>>> row = df.loc[df['Age'] >= 18 & df['Age'] <= 22]
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
row = df.loc[df['Age'] >= 18 & df['Age'] <= 22]
File "D:\Program Files\Python371\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__
f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().