发布于 2026-01-06 0 阅读
0

如何抓取 Glassdoor 评论

如何抓取 Glassdoor 评论

介绍

本文将介绍如何使用Page2API从 Glassdoor 抓取公司评论。

Glassdoor.com是一个美国网站,现任和前任员工可以在上面匿名评价公司。

免责声明:

我们强烈建议您仅出于个人用途抓取 Glassdoor 上的信息。

例如:假设您正在寻找新工作,并且想要快速分析您感兴趣的公司的评价。

先决条件

要开始抓取 Glassdoor 评论,我们需要以下物品:

  • Page2API 帐户
  • 我们感兴趣的一家公司。具体来说,我们感兴趣的公司是…… Glassdoor。(该公司网站上也有用户评论)

如何抓取 Glassdoor 评论

首先,我们需要打开 glassdoor.com 网站,然后在搜索框中输入“Glassdoor reviews” 。

这将把浏览器地址栏的 URL 更改为类似以下内容:

https://www.glassdoor.com/Reviews/Glassdoor-Reviews-E100431.htm
Enter fullscreen mode Exit fullscreen mode

我们将使用此 URL 作为启动抓取过程所需的第一个参数。

您看到的页面必须与下图类似:

glassdoor-reviews-page.png

如果您检查页面 HTML 代码,您会发现单个评论看起来像这样:

glassdoor-single-review.png

我们将从 Glassdoor 评论页面抓取每条评论的以下属性:

  • 标题
  • 作者信息
  • 等级
  • 优点
  • 缺点
  • 有帮助

现在,让我们为每个属性定义选择器。

/* Parent: */
div.gdReview

/* Title */
a.reviewLink

/* Author Info */
.authorInfo

/* Rating */
span.ratingNumber

/* Pros */
span[data-test=pros]

/* Cons */
span[data-test=cons]

/* Helpful */
div.common__EiReviewDetailsStyle__socialHelpfulcontainer
Enter fullscreen mode Exit fullscreen mode

我们来看看分页处理。

glassdoor-pagination-component.png

要进入下一页,如果页面上有“下一页”按钮,我们必须点击它:

document.querySelector(".nextButton").click()
Enter fullscreen mode Exit fullscreen mode

只要页面上存在“下一页”按钮,抓取操作就会继续;如果“下一页”按钮消失,抓取操作就会停止。

抓取工具的停止条件是以下 JavaScript 代码片段:

document.querySelector(".nextButton") === null

// but to avoid timeouts, we will scrape a fixed amount of pages (see the payload below)
Enter fullscreen mode Exit fullscreen mode

现在是时候构建抓取 Glassdoor 评论的请求了。

我们发起的网络爬虫请求的有效载荷将是:

{
  "url": "https://www.glassdoor.com/Reviews/Glassdoor-Reviews-E100431.htm",
  "real_browser": true,
  "merge_loops": true,
  "premium_proxy": "us",
  "scenario": [
    {
      "loop": [
        { "wait_for": "div.gdReview" },
        { "execute": "parse" },
        { "execute_js": "document.querySelector(\".nextButton\").click()" }
      ],
      "iterations": 2
    }
  ],
  "parse": {
    "reviews": [
      {
        "_parent": "div.gdReview",
        "title": "a.reviewLink >> text",
        "author_info": ".authorInfo >> text",
        "rating": "span.ratingNumber >> text",
        "pros": "span[data-test=pros] >> text",
        "cons": "span[data-test=cons] >> text",
        "helpful": "div.common__EiReviewDetailsStyle__socialHelpfulcontainer >> text"
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

将 api_key 设置为环境变量:

export API_KEY=YOUR_PAGE2API_KEY
Enter fullscreen mode Exit fullscreen mode

使用cURL运行抓取请求

curl -v -XPOST -H "Content-type: application/json" -d '{
  "api_key": "'"$API_KEY"'",
  "url": "https://www.glassdoor.com/Reviews/Glassdoor-Reviews-E100431.htm",
  "real_browser": true,
  "merge_loops": true,
  "premium_proxy": "us",
  "scenario": [
    {
      "loop": [
        { "wait_for": "div.gdReview" },
        { "execute": "parse" },
        { "execute_js": "document.querySelector(\".nextButton\").click()" }
      ],
      "iterations": 2
    }
  ],
  "parse": {
    "reviews": [
      {
        "_parent": "div.gdReview",
        "title": "a.reviewLink >> text",
        "author_info": ".authorInfo >> text",
        "rating": "span.ratingNumber >> text",
        "pros": "span[data-test=pros] >> text",
        "cons": "span[data-test=cons] >> text",
        "helpful": "div.common__EiReviewDetailsStyle__socialHelpfulcontainer >> text"
      }
    ]
  }
}' 'https://www.page2api.com/api/v1/scrape' | python -mjson.tool
Enter fullscreen mode Exit fullscreen mode

结果:

{
  "result": {
    "reviews": [
      {
        "title": "Glassdoor Walks the Walk",
        "author_info": "Jan 7, 2022 - Senior Manager",
        "rating": "5.0",
        "pros": "Glassdoor creates a positive environment for employees to learn and grow. ...",
        "cons": "At any organization, there is always room for improvement. ...",
        "helpful": "1 person found this review helpful"
      },
      {
        "title": "Great Company To Work For",
        "author_info": "Jan 5, 2022 - Customer Success Manager",
        "rating": "4.0",
        "pros": "I absolutely love working at Glassdoor. ...",
        "cons": "While we do have more of an extensive career growth plan, ...",
        "helpful": "2 people found this review helpful"
      }, ...
    ]
  }, ...
}
Enter fullscreen mode Exit fullscreen mode

结论

完毕!

我们刚刚完成了从 Glassdoor 抓取评论的工作,事实证明,如果我们有合适的抓取工具,这项工作既简单又有趣。

原文链接如下:

page2api.com/blog/how-to-scrape-glassdoor-reviews/

文章来源:https://dev.to/nrotaru/how-to-scrape-glassdoor-reviews-362m