Python DataFrame取行

发布时间:2024年01月14日


关注根据某列series的值区间,取行问题。
根据行的index区间位置,必须先知道index区间,比较基础
代码准备:
环境平台:Python 3.7.1 -IDLE Shell

>>>import pandas as pd
>>> df = pd.DataFrame({'Name': ['Tom', 'Jim', 'Lily'], 'Age': [20, 18, 22], 'Gender': ['Male', 'Male', 'Female']})

注:该例子数据形式来自:https://www.python100.com/html/116332.html

index区间取行

示例1:提取索引名=‘1’的那一行,返回一个行Series

>>> row = df.loc[1]#按索引名提取,当使用自动生成的索引时,索引名与索引号相同
>>> row
Name       Jim
Age         18
Gender    Male
Name: 1, dtype: object
>>> type(row)
<class 'pandas.core.series.Series'>

注:

df.iloc[:] #按索引(号)提取

示例2:

row = df.loc[0:1]
>>> row
  Name  Age Gender
0  Tom   20   Male
1  Jim   18   Male
>>> row = df.iloc[0:1]
>>> row
  Name  Age Gender
0  Tom   20   Male
>>> row = df.loc[0:0]
>>> row
  Name  Age Gender
0  Tom   20   Male
>>> type(row)
<class 'pandas.core.frame.DataFrame'>

注意:两种提取的区间有区别:按索引(号)提取的区间为:[0,1)

>>> row = df.iloc[0:0]
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []

列值区间条件取行

>>> row = df.loc[df['Age'] > 20].iloc[0]['Name']
>>> row
'Lily'
>>> 

上语句的含义是:需要从dataframe:df.loc[df[‘Age’] > 20]中提取索引为0的行Series的‘name’值

(1)列值区间基本表达方式

示例1:


>>> row = df.loc[df['Age'] > 18]
>>> row
   Name  Age  Gender
0   Tom   20    Male
2  Lily   22  Female
>>> 

注:超过区间,不会产生错误,返回:

>>> row = df.loc[df['Age'] > 23]
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []

(2)多条件组合表达方式

示例2:

>>> row = df.loc[(df['Age'] >= 18)&(df['Name'] == 'Lily')]
>>> row
   Name  Age  Gender
2  Lily   22  Female
>>> 

如果条件为False则返回的dataframe为Empty:

>>> row = df.loc[(df['Age'] >= 18)&(df['Name'] == 'tongzhi')]#'tongzhi'不存在原dataframe
>>> row
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
>>> 

当然也可以:用’|'关系操作符:

>>> row = df.loc[(df['Age'] >= 18)|(df['Name'] == 'Jim')]
>>> row
   Name  Age  Gender
0   Tom   20    Male
1   Jim   18    Male
2  Lily   22  Female
>>> 

注:还可以关系:~ 非

(3)函数条件表达方式

可以使用lambda或自定义函数(返回bool)选择符合返回条件的行,如:

>>> x='Jim'
>>> row = df.loc[lambda x:x['Name'] == 'Jim']
>>> row
  Name  Age Gender
1  Jim   18   Male
>>> 

datafame接受的几个过滤函数

(1)isin函数:

df[df[“column_name”].isin(li)] (# li = [20, 25, 27] 或 li = np.arange(20, 30))
根据从isin函数传入的列表(li),筛选出与列表中包含的数值或字符串相同的数据记录, 用法有点类似sql中的"in"

(2) query函数:

df.query(“(column_name1 == ‘str1’) & (column_name2 == ‘str2’)”)
根据query中引入的不同字段(str1,str2等)和条件,筛选出同时能满足这些要求的数据记录

(3) contains函数:

df[df[“column_name”].str.contains(“str”)]
筛选出所有含有(str)的数据记录, 用法类似于sql中的"contains"

以上参考了:链接:https://blog.csdn.net/weixin_45914452/article/details/120585861

错误条件格式:

示例1:

>>> row = df.loc[(18<=df['Age'] <= 22)]
Traceback (most recent call last):
  File "<pyshell#56>", line 1, in <module>
    row = df.loc[(18<=df['Age'] <= 22)]
  File "D:\Program Files\Python371\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> 

示例2:

>>> row = df.loc[df['Age'] >= 18 & df['Age'] <= 22]
Traceback (most recent call last):
  File "<pyshell#38>", line 1, in <module>
    row = df.loc[df['Age'] >= 18 & df['Age'] <= 22]
  File "D:\Program Files\Python371\lib\site-packages\pandas\core\generic.py", line 1538, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
文章来源:https://blog.csdn.net/www_djh/article/details/134940698
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。