使用 Python/FastAPI 创建您的第一个机器学习 REST API

你将学到……

要求

你将学到……

创建机器学习服务的基本工作流程，从阐明问题空间到清理数据，再到选择模型、训练模型，最后将其部署到 Web 上。

要求

基本 bash
git/github 的基本用法
Python 3.7 基础

数据科学

你可能以前见过这张图：数据科学是如今的热门话题，它是数学、计算机和商业的交叉领域。听起来很不错，但究竟该如何进行数据科学研究呢？

就像传统科学有科学方法一样，数据科学也有一系列方法论，为项目铺平道路。其中最常用的方法论之一是CRISP-DM（跨行业数据挖掘标准流程）。

所述过程描述如下：

今天我们将大致逐一讲解这些步骤。

商业理解

你想达成什么目标？想想哪些关键词可以帮助你找到解决问题所需的数据。也许你想解决全球变暖问题，所以你查找特定地区的洪水数据。又或许你想加快生物实验室蘑菇的鉴定速度，所以你查找描述不同蘑菇物种特征的数据。现在，先别考虑数据本身，想想你想解决什么问题。

就我们的应用案例而言，我们的目标是：通过一个能够根据简短调查预测学生成绩的工具来降低辍学率。

数据理解

在继续之前，我们首先必须了解应该寻找哪种类型的数据。
数据分为三种类型：

注：非结构化数据还可以是图像、视频和音频。

对于经典机器学习（这也是本教程的重点），最容易使用的数据类型是结构化数据，因此我们将使用结构化数据。Kaggle
是查找数据的最佳平台之一，而且你可以直接从 Kaggle 创建 Jupyter Notebook ，这让一切都变得更加便捷。

Kaggle 是一个面向数据科学家的社交网络，你可以在这里找到数据、竞赛、课程以及其他用户的作品。

进入 Kaggle 后，您可以前往数据集部分。

您可以在此处插入您在“业务理解”部分中想到的关键词。

我使用关键词“教育”找到了这个数据集。需要考虑的最重要因素之一是数据集的描述，并确保它能准确描述每一列。例如：

现在选择**新建笔记本**按钮，并在以下选项中选择Python和笔记本，最后点击创建。

您将被重定向到一个笔记本，在那里我们可以开始理解我们的数据。

笔记本是一种将代码与能够解析 Markdown 的单元格结合使用的方式，这使我们能够轻松地进行代码实验，同时也能很好地记录我们的思考过程。这些笔记本也称为内核。

更多关于内核的信息请点击这里

我们导入的是之后会用到的库，现在不用太担心它们。

我们加载 DataFrame（Pandas 库中的一种数据类型，用于表示表格），并对其应用一些方法。`Sample`
函数会从我们的数据中随机抽取一些行，而 `column` 属性则会返回列名列表。

更实用的是，我们可以使用info方法发现我们有 16 列数值数据。这是一个好兆头。考虑到我们需要快速推进，我们可以避免处理分类数据。

理想情况下，我们会使用标签或独热编码对分类数据进行编码，将其转换为数值数据。这有可能改进我们的模型。

下面列出了表格数据中的主要数据类型。

我们将使用的最后一个基础数据理解工具是describe方法。

它提供了一些描述性统计数据，例如最终成绩（G3）的平均分为 10.4，考虑到评分范围是 [0, 20]，这个数字听起来可能有点奇怪。因此，我们稍作调查，发现葡萄牙的评分体系中 10 分就足够了。

我们离理解数据又近了一步！

数据准备

接下来我们会清理数据、进行一些归一化处理，并对分类数据进行编码。但目前我们先直接删除所有分类数据，这是最简单的策略。

首先，我们选择目标变量 G3（最终成绩），它是一列数值型数据。不包含 G3 的表格是我们的输入列。

然后，使用之前导入的train_test_split方法，我们将表格拆分成两部分：训练集（70%）和测试集（30%）。

这是评估我们的模型从数据中学习效果所必需的。

接下来，我们从训练数据和测试数据中删除所有分类数据，以及 G1 和 G2（我们不想利用过去的成绩来预测未来的成绩）。

通过 columns 属性，我们可以看到还剩下哪些列。

建模与评估

现在到了有趣的部分！
我们将训练数据拟合到已实例化的 RandomForestRegressor 中，并使用fit方法进行拟合。

之后，我们将根据测试数据进行预测，这将得出这些测试行的预测成绩。

为了评估我们的模型，我们将使用实际考试成绩和预测成绩进行比较，预测成绩的计算方法是使用平均误差，计算方法如下：

以上所有步骤都组合在以下代码行中。

我们看到 MAE 为 3.15，考虑到处理数据所需的步骤很少，这是一个足够好的结果。

回归业务理解

由于我们希望为学生制作一个简单的表单，我们将减少预测所需的输入变量。
通过使用从 Stack Overflow 上某个不太知名的帖子复制的代码，我们可以看到模型中使用的最重要特征的有序列表。

我们将选取前十个数据点，去掉其余列，然后再次训练我们的模型，看看它与新数据的拟合程度如何。

显然，这个模型比之前的模型更好，而且使用的数据量更少。
我们接受结果，并将此模型保存为一个名为 Pickle 模型的文件，完成此过程。我们将下载此文件并保存以备后用。

部署

我们现在有了一个可运行的模型，但如果没人使用，这个模型就毫无用处。部署模型本身就是一个挑战。在本次研讨会中，我们将使用FastAPI和Deta进行一次类似“麦克盖弗式”的部署。

使用Deta创建一个免费帐户
为你的新项目创建一个目录

mkdir grading_prediction_service
cd grading_prediction_service

安装 virtualenv 并创建 Python 虚拟环境

python3 -m pip install virtualenv
python3 -m virtualenv .venv
source .venv/bin/activate

让我们创建一个 requirements.txt 文件，其中包含我们的依赖项，内容如下。

scikit-learn
pandas
numpy
fastapi
uvicorn
joblib
```


* Install these dependancies in our virtual environment


```
pip install -r requirements.txt
```


* Create a main.py file and add the following code



```
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
async def root():
    return {"message": "Hello World"}
```


* Start the test server and in another terminal window use curl to test our endpoint


```
uvicorn main:app --reload
```




```
curl http://127.0.0.1:8000
```


We now have our hello world endpoint!
Now we'll see how to take this to where we need it to.

* We change our code adding a pydantic data class



```
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Answer(BaseModel):
    age: int
    Medu: int
    studytime: int
    failures: int
    famrel: int
    freetime: int
    goout: int
    Walc: int
    health: int
    absences: int


@app.get("/")
async def root():
    return {"message": "Hello World"}
```


Data classes are used to validate data types when receiving data, this is what we will be receiving from the client.

Move your model.mo file to this directory

* Let's create a new post endpoint that will receive our survey data



```
from fastapi import FastAPI
from fastapi.encoders import jsonable_encoder
from pydantic import BaseModel

app = FastAPI()

class Answer(BaseModel):
    age: int
    Medu: int
    studytime: int
    failures: int
    famrel: int
    freetime: int
    goout: int
    Walc: int
    health: int
    absences: int


@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.post("/grade_predict")
async def predict_student_grade(answer: Answer):
    answer_dict = jsonable_encoder(answer)
    for key, value in answer_dict.items():
        answer_dict[key] = [value]
     # answer_dict = {k:[v] for (k,v) in jsonable_encoder(answer).items()}
    return answer_dict
```


This will receive the body data, convert it into a dictionary and then convert the values in into lists. The comment is a one liner version of the same thing using a dictionary comprehension.

* Create a file in the same directory names test_prediction.json with these values


```
{
    "age": 19,
    "Medu": 1,
    "studytime": 2,
    "failures": 1,
    "famrel": 4,
    "freetime": 2,
    "goout": 4,
    "Walc": 2,
    "health": 3,
    "absences": 0
}
```


* Test your new post endpoint with curl


```
curl --request POST \
        --data @test_prediction.json \
        http://127.0.0.1:8000/grade_predict
```


* Finally we'll load our model, convert our dictionary into a Pandas DataFrame, feed our input to the model and return the predicted result.



```
import pickle

import pandas as pd

from fastapi import FastAPI
from fastapi.encoders import jsonable_encoder
from pydantic import BaseModel

app = FastAPI()
with open("model.mo", "rb") as f:
    model = pickle.load(f)

class Answer(BaseModel):
    age: int
    Medu: int
    studytime: int
    failures: int
    famrel: int
    freetime: int
    goout: int
    Walc: int
    health: int
    absences: int


@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.post("/grade_predict")
async def predict_student_grade(answer: Answer):
    answer_dict = jsonable_encoder(answer)
    for key, value in answer_dict.items():
        answer_dict[key] = [value]
     # answer_dict = {k:[v] for (k,v) in jsonable_encoder(answer).items()}
    single_instance = pd.DataFrame.from_dict(answer_dict)
    prediction = model.predict(single_instance)
    return prediction[0]
```


* Play with your brand spanking new ML API changing the different values in test_prediction.json and using curl


```
curl --request POST \
        --data @test_prediction.json \
        http://127.0.0.1:8000/grade_predict
```


This is great and all but how about we deploy this thing to the internet?
* Install the Deta CLI


```
curl -fsSL https://get.deta.dev/cli.sh | sh
```


* Login to your account


```
deta login
```


* Create a new python micro


```
deta new --python ml_grading_service
```


This will return an endpoint which is where you will be testing your new endpoint

* Deploy your app to your micro


```
deta deploy
```


* Test your endpoint with the info given the previous endpoint


```
curl --request POST \     
        --data @test_prediction.json \
        https://XXXXX.deta.dev/grade_predict
```



That is the whole CRISP-DM process. From start to finish. I hope you learned something new or at least have a new general view of what it to takes to make a data product. There are a ton of areas where this can be improved and I would love to see comments proposing improvements.

You can check out the [repo](https://github.com/GaboGomezT/grade_prediction_service), I also added a very badly written front end using React. The demo can be found [here](https://grade-ml.web.app/) and my Kaggle kernel can be found [here](https://www.kaggle.com/gabogabe/student-grade-prediction)

[![Screen Shot 2020-11-06 at 1.38.18 AM.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1604648334841/VNheZ_SnT.png)
](https://www.buymeacoffee.com/gabogomez)

文章来源：https://dev.to/gabogomezt/your-first-machine-learning-rest-api-with-python-fastapi-18jm

菜单

分享

使用 Python/FastAPI 构建你的第一个机器学习 REST API 你将学习到…… 要求

使用 Python/FastAPI 创建您的第一个机器学习 REST API

你将学到……

要求

你将学到……

要求

数据科学

商业理解

数据理解

数据准备

建模与评估

回归业务理解

部署

系统设计面试中的 19 种微服务模式

使用 React 和 AWS Amplify 实现无服务器架构第三部分：跟踪应用使用情况

模型-视图-控制器（MVC）模式到底是什么？DEV 全球项目展示挑战赛，由 Mux 主办：快来展示你的项目吧！

我在两年内从 PHP 开发人员晋升为高级 C#/.NET 开发人员。

了解 Docker：第 12 部分 – 传递构建参数

Yarn 和第三方 NPM 客户端的黑暗未来 DEV 的全球展示与讲述挑战赛，由 Mux 呈现：展示你的项目！

CSS DEV 的全球展示挑战赛“响应式字体”由 Mux 呈现：展示你的项目！

我是如何以学生开发者的身份免费获得 Tabnine Pro 的，你也可以！

五大顶级JS框架

从 Rector PHP 开始：利用自动化改进您的 PHP 代码