Pydantic — 強(qiáng)大的數(shù)據(jù)校驗(yàn)工具,比DRF的校驗(yàn)器還快12倍
Pydantic 是一個(gè)使用Python類型注解進(jìn)行數(shù)據(jù)驗(yàn)證和管理的模塊。安裝方法非常簡(jiǎn)單,打開(kāi)終端輸入:
pip install?pydantic它類似于?Django?DRF 序列化器的數(shù)據(jù)校驗(yàn)功能,不同的是,Django里的序列化器的Field是有限制的,如果你想要使用自己的Field還需要繼承并重寫(xiě)它的基類:
# Django 序列化器的一個(gè)使用例子,你可以和下面Pydantic的使用作對(duì)比
class?Book(models.Model):
????id = models.AutoField(primary_key=True)
????name = models.CharField(max_length=32)
????price = models.DecimalField(max_digits=5, decimal_places=2)
????author = models.CharField(max_length=32)
????publish = models.CharField(max_length=32)而 Pydantic 基于Python3.7以上的類型注解特性,實(shí)現(xiàn)了可以對(duì)任何類做數(shù)據(jù)校驗(yàn)的功能:
上滑查看更多代碼
# Pydantic 數(shù)據(jù)校驗(yàn)功能
from?datetime import?datetime
from?typing import?List, Optional
from?pydantic import?BaseModel
class?User(BaseModel):
????id: int
????name = 'John Doe'
????signup_ts: Optional[datetime] = None
????friends: List[int] = []
external_data = {
????'id': '123',
????'signup_ts': '2019-06-01 12:22',
????'friends': [1, 2, '3'],
}
user = User(**external_data)
print(user.id)
print(type(user.id))
#> 123
#>
print(repr(user.signup_ts))
#> datetime.datetime(2019, 6, 1, 12, 22)
print(user.friends)
#> [1, 2, 3]
print(user.dict())
"""
{
????'id': 123,
????'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),
????'friends': [1, 2, 3],
????'name': 'John Doe',
}
"""從上面的基本使用可以看到,它甚至能自動(dòng)幫你做數(shù)據(jù)類型的轉(zhuǎn)換,比如代碼中的 user.id, 在字典中是字符串,但經(jīng)過(guò)Pydantic校驗(yàn)器后,它自動(dòng)變成了int型,因?yàn)閁ser類里的注解就是int型。
當(dāng)我們的數(shù)據(jù)和定義的注解類型不一致時(shí)會(huì)報(bào)這樣的Error:
from?datetime import?datetime
from?typing import?List, Optional
from?pydantic import?BaseModel
class?User(BaseModel):
????id: int
????name = 'John Doe'
????signup_ts: Optional[datetime] = None
????friends: List[int] = []
external_data = {
????'id': '123',
????'signup_ts': '2019-06-01 12:222',
????'friends': [1, 2, '3'],
}
user = User(**external_data)
"""
Traceback (most recent call last):
??File "1.py", line 18, in
????user = User(**external_data)
??File "pydantic\main.py", line 331, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for User
signup_ts
??invalid datetime format (type=value_error.datetime)
"""即 "invalid datetime format", 因?yàn)槲覀魅氲?signup_ts 不是標(biāo)準(zhǔn)的時(shí)間格式(多了個(gè)2)。
1. Pydantic模型數(shù)據(jù)導(dǎo)出
通過(guò)Pydantic模型中自帶的 json 屬性方法,能讓經(jīng)過(guò)校驗(yàn)后的數(shù)據(jù)一行命令直接轉(zhuǎn)成 json 字符串,如前文中的 user 對(duì)象:
print(user.dict()) # 轉(zhuǎn)為字典
"""
{
????'id': 123,
????'signup_ts': datetime.datetime(2019, 6, 1, 12, 22),
????'friends': [1, 2, 3],
????'name': 'John Doe',
}
"""
print(user.json()) # 轉(zhuǎn)為json
"""
{"id": 123, "signup_ts": "2019-06-01T12:22:00", "friends": [1, 2, 3], "name": "John Doe"}
"""非常方便。它還支持將整個(gè)數(shù)據(jù)結(jié)構(gòu)導(dǎo)出為 schema json,它能完整地描述整個(gè)對(duì)象的數(shù)據(jù)結(jié)構(gòu)類型:
上滑查看更多代碼
print(user.schema_json(indent=2))
"""
{
??"title": "User",
??"type": "object",
??"properties": {
????"id": {
??????"title": "Id",
??????"type": "integer"
????},
????"signup_ts": {
??????"title": "Signup Ts",
??????"type": "string",
??????"format": "date-time"
????},
????"friends": {
??????"title": "Friends",
??????"default": [],
??????"type": "array",
??????"items": {
????????"type": "integer"
??????}
????},
????"name": {
??????"title": "Name",
??????"default": "John Doe",
??????"type": "string"
????}
??},
??"required": [
????"id"
??]
}
"""2.數(shù)據(jù)導(dǎo)入
除了直接定義數(shù)據(jù)校驗(yàn)?zāi)P?,它還能通過(guò)ORM、字符串、文件導(dǎo)入到數(shù)據(jù)校驗(yàn)?zāi)P停?/p>
比如字符串(raw):
from?datetime import?datetime
from?pydantic import?BaseModel
class?User(BaseModel):
????id: int
????name = 'John Doe'
????signup_ts: datetime = None
??????
m = User.parse_raw('{"id": 123, "name": "James"}')
print(m)
#> id=123 signup_ts=None name='James'此外,它能直接將ORM的對(duì)象輸入,轉(zhuǎn)為Pydantic的對(duì)象,比如從Sqlalchemy ORM:
上滑查看更多代碼
from?typing import?List
from?sqlalchemy import?Column, Integer, String
from?sqlalchemy.dialects.postgresql import?ARRAY
from?sqlalchemy.ext.declarative import?declarative_base
from?pydantic import?BaseModel, constr
Base = declarative_base()
class?CompanyOrm(Base):
????__tablename__ = 'companies'
????id = Column(Integer, primary_key=True, nullable=False)
????public_key = Column(String(20), index=True, nullable=False, unique=True)
????name = Column(String(63), unique=True)
????domains = Column(ARRAY(String(255)))
class?CompanyModel(BaseModel):
????id: int
????public_key: constr(max_length=20)
????name: constr(max_length=63)
????domains: List[constr(max_length=255)]
????class?Config:
????????orm_mode = True
co_orm = CompanyOrm(
????id=123,
????public_key='foobar',
????name='Testing',
????domains=['example.com', 'foobar.com'],
)
print(co_orm)
#>
co_model = CompanyModel.from_orm(co_orm)
print(co_model)
#> id=123 public_key='foobar' name='Testing' domains=['example.com',
#> 'foobar.com']從Json文件導(dǎo)入:
from?datetime import?datetime
from?pathlib import?Path
from?pydantic import?BaseModel
class?User(BaseModel):
????id: int
????name = 'John Doe'
????signup_ts: datetime = None
??????
path = Path('data.json')
path.write_text('{"id": 123, "name": "James"}')
m = User.parse_file(path)
print(m)從pickle導(dǎo)入:
import?pickle
from?datetime import?datetime
from?pydantic import?BaseModel
pickle_data = pickle.dumps({
????'id': 123,
????'name': 'James',
????'signup_ts': datetime(2017, 7, 14)
})
m = User.parse_raw(
????pickle_data, content_type='application/pickle', allow_pickle=True
)
print(m)
#> id=123 signup_ts=datetime.datetime(2017, 7, 14, 0, 0) name='James'3.自定義數(shù)據(jù)校驗(yàn)
你還能給它增加 validator 裝飾器,增加你需要的校驗(yàn)邏輯:
上滑查看更多代碼
from?sklearn.metrics?import?confusion_matrix
from?sklearn.linear_model?import?LogisticRegression
from?sklearn.preprocessing?import?StandardScaler
from?sklearn.model_selection?import?train_test_split
from?sklearn.preprocessing?import?LabelEncoder
import?numpy?as?np
import?matplotlib.pyplot?as?plt
import?pandas?as?pd
# 1.導(dǎo)入數(shù)據(jù)集
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1,?2,?3]].values
Y = dataset.iloc[:,?4].values
# 性別轉(zhuǎn)化為數(shù)字
labelencoder_X = LabelEncoder()
X[:,?0] = labelencoder_X.fit_transform(X[:,?0])
# 2.將數(shù)據(jù)集分成訓(xùn)練集和測(cè)試集
X_train, X_test, y_train, y_test = train_test_split(
????X, Y, test_size=0.25, random_state=0)
# 3.特征縮放
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# 4.訓(xùn)練
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
# 5.預(yù)測(cè)
y_pred = classifier.predict(X_test)
# 6.評(píng)估預(yù)測(cè)
# 生成混淆矩陣
from?sklearn.metrics?import?confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)上面,我們?cè)黾恿巳N自定義校驗(yàn)邏輯:
1.name 必須帶有空格
2.password2 必須和 password1 相同
3.username 必須為字母
讓我們?cè)囋囘@三個(gè)校驗(yàn)是否有效:
user = UserModel(
????name='samuel colvin',
????username='scolvin',
????password1='zxcvbn',
????password2='zxcvbn',
)
print(user)
#> name='Samuel Colvin' username='scolvin' password1='zxcvbn' password2='zxcvbn'
try:
????UserModel(
????????name='samuel',
????????username='scolvin',
????????password1='zxcvbn',
????????password2='zxcvbn2',
????)
except?ValidationError as?e:
????print(e)
????"""
????2 validation errors for UserModel
????name
??????must contain a space (type=value_error)
????password2
??????passwords do not match (type=value_error)
????"""可以看到,第一個(gè)UserModel里的數(shù)據(jù)完全沒(méi)有問(wèn)題,通過(guò)校驗(yàn)。
第二個(gè)UserModel里的數(shù)據(jù),由于name存在空格,password2和password1不一致,無(wú)法通過(guò)校驗(yàn)。因此我們定義的自定義校驗(yàn)器完全有效。
4.性能表現(xiàn)
這是最令我驚訝的部分,Pydantic 比 Django-rest-framework 的校驗(yàn)器還快了12.3倍:
| Package | 版本 | 相對(duì)表現(xiàn) | 平均耗時(shí) |
|---|---|---|---|
| pydantic | 1.7.3 | 93.7μs | |
| attrs + cattrs | 20.3 | 1.5x slower | 143.6μs |
| valideer | 0.4.2 | 1.9x slower | 175.9μs |
| marshmallow | 3.10 | 2.4x slower | 227.6μs |
| voluptuous | 0.12 | 2.7x slower | 257.5μs |
| trafaret | 2.1.0 | 3.2x slower | 296.7μs |
| schematics | 2.1.0 | 10.2x slower | 955.5μs |
| django-rest-framework | 3.12 | 12.3x slower | 1148.4μs |
| cerberus | 1.3.2 | 25.9x slower | 2427.6μs |
而且他們的所有基準(zhǔn)測(cè)試代碼都是開(kāi)源的,你可以在下面這個(gè)Github鏈接找到:
https://github.com/samuelcolvin/pydantic/tree/master/benchmarks
如果你的網(wǎng)絡(luò)無(wú)法訪問(wèn)GitHub,請(qǐng)關(guān)注Python實(shí)用寶典公眾號(hào)后臺(tái)回復(fù)?Pydantic?獲取。
我們的文章到此就結(jié)束啦,如果你喜歡今天的Python 實(shí)戰(zhàn)教程,請(qǐng)持續(xù)關(guān)注Python實(shí)用寶典。
有任何問(wèn)題,可以在公眾號(hào)后臺(tái)回復(fù):加群,回答相應(yīng)紅字驗(yàn)證信息,進(jìn)入互助群詢問(wèn)。
原創(chuàng)不易,希望你能在下面點(diǎn)個(gè)贊和在看支持我繼續(xù)創(chuàng)作,謝謝!
點(diǎn)擊下方閱讀原文可獲得更好的閱讀體驗(yàn)
Python實(shí)用寶典?(pythondict.com)
不只是一個(gè)寶典
歡迎關(guān)注公眾號(hào):Python實(shí)用寶典
