Polars是一个Python数据处理库,介绍可以看官网,也可以看看?Pandas有了平替Polars-CSDN博客
Polars基本操作
1. Series 和 Dataframe
import polars as pl
# 创建一个Polars DataFrame
data = {
"A": [1, 2, 3, 4, 5],
"B": ["a", "b", "c", "d", "e"],
"C": [True, False, True, False, True],
"D": [1.1, 2.2, 3.3, 4.4, 5.5]
}
df = pl.DataFrame(data)
# 创建一个Polars Series
series = pl.Series("E", [6, 7, 8, 9, 10])
# 查看DataFrame的前几行
print(df.head())
# 添加新的列
df = df.with_column(pl.col("A") * 2, "A_doubled")
# 选择特定的列
selected_cols = df.select(["A", "B"])
print(selected_cols)
# 过滤数据
filtered_df = df.filter(pl.col("C") == True)
print(filtered_df)
# 排序数据
sorted_df = df.sort("D", reverse=True)
print(sorted_df)
# Series与DataFrame进行运算
new_series = series + df["A"]
print(new_series)
# 将Series添加到DataFrame
df_with_series = df.with_column(new_series.rename("F"))
print(df_with_series)
对比一下实现类似功能的Pandas
import pandas as pd
# 创建一个Pandas DataFrame
data = {
"A": [1, 2, 3, 4, 5],
"B": ["a", "b", "c", "d", "e"],
"C": [True, False, True, False, True],
"D": [1.1, 2.2, 3.3, 4.4, 5.5]
}
df = pd.DataFrame(data)
# 创建一个Pandas Series
series = pd.Series([6, 7, 8, 9, 10], name="E")
# 查看DataFrame的前几行
print(df.head())
# 添加新的列
df["A_doubled"] = df["A"] * 2
# 选择特定的列
selected_cols = df[["A", "B"]]
print(selected_cols)
# 过滤数据
filtered_df = df[df["C"] == True]
print(filtered_df)
# 排序数据
sorted_df = df.sort_values("D", ascending=False)
print(sorted_df)
# Series与DataFrame进行运算
new_series = series + df["A"]
print(new_series)
# 将Series添加到DataFrame
df_with_series = df.assign(F=new_series)
print(df_with_series)
2. Expressions
Polars库中的Expressions操作可以用于对DataFrame的列进行复杂的计算和转换。看下面例子:
import polars as pl
# 创建一个Polars DataFrame
data = {
"A": [1, 2, 3, 4, 5],
"B": ["a", "b", "c", "d", "e"],
"C": [True, False, True, False, True],
"D": [1.1, 2.2, 3.3, 4.4, 5.5]
}
df = pl.DataFrame(data)
# 使用Expressions操作创建新的列
expr = pl.when(pl.col("C")) \
.then(pl.col("A") * 2) \
.otherwise(pl.col("A") / 2)
df = df.with_column(expr.alias("New_Column"))
# 查看新的DataFrame
print(df)
# 对新的列进行条件筛选
filtered_df = df.filter(pl.col("New_Column") > 3)
print(filtered_df)
# 对新的列进行聚合操作
aggregated_df = df.groupby("B").agg(pl.col("New_Column").sum().alias("Sum"))
print(aggregated_df)
上述示例 使用when
表达式创建了一个新的列,根据C
列的值进行条件判断,如果为True
,则将A
列的值乘以2,否则将A
列的值除以2。然后,对新的列进行了条件筛选和聚合操作。
如果同样功能用pandas恐怕单独很难完成,不过配上numpy,以笔者的粗浅理解,可以这么做:
import pandas as pd
import numpy as np
# 创建一个Pandas DataFrame
data = {
"A": [1, 2, 3, 4, 5],
"B": ["a", "b", "c", "d", "e"],
"C": [True, False, True, False, True],
"D": [1.1, 2.2, 3.3, 4.4, 5.5]
}
df = pd.DataFrame(data)
# 使用numpy的条件函数和apply方法创建新的列
df["New_Column"] = np.where(df["C"], df["A"] * 2, df["A"] / 2)
# 查看新的DataFrame
print(df)
# 对新的列进行条件筛选
filtered_df = df[df["New_Column"] > 3]
print(filtered_df)
# 对新的列进行聚合操作
aggregated_df = df.groupby("B")["New_Column"].sum().reset_index(name="Sum")
print(aggregated_df)
3. 拼接(join)
在Polars中,可以使用join
方法进行DataFrame的拼接操作。
import polars as pl
# 创建两个Polars DataFrame
data1 = {
"A": [1, 2, 3],
"B": ["a", "b", "c"],
}
df1 = pl.DataFrame(data1)
data2 = {
"C": [4, 5, 6],
"D": ["d", "e", "f"],
}
df2 = pl.DataFrame(data2)
# 使用join方法进行内连接
joined_df = df1.join(df2, left_on="A", right_on="C", how="inner")
# 查看拼接后的DataFrame
print(joined_df)
Pandas中,貌似没有join方法,可以用merge来做,看起来差不多
import pandas as pd
# 创建两个Pandas DataFrame
data1 = {
"A": [1, 2, 3],
"B": ["a", "b", "c"],
}
df1 = pd.DataFrame(data1)
data2 = {
"C": [3, 4, 5],
"D": ["c", "d", "e"],
}
df2 = pd.DataFrame(data2)
# 使用merge方法进行内连接
joined_df = df1.merge(df2, left_on="A", right_on="C", how="inner")
# 查看拼接后的DataFrame
print(joined_df)