使用 Hugging Face 管道完成 5 个 NLP 任务
Hugging Face 是一家提供多种基于 Transformer 的自然语言处理 (NLP) 语言模型实现的公司。它通过提供易于理解和执行的语言模型架构,改变了近年来 NLP 研究的方式。其名为Transformers的 GitHub 代码库包含了所有这些模型的实现。该代码库在 GitHub 上拥有超过 3 万颗星。您可以从其官方网站了解更多信息,该网站现在也提供了关于其提供的不同模型、语言和数据集的详尽文档。
Hugging Face流程是一种执行各种NLP任务的简便方法,而且非常易于使用。它可用于解决多种NLP任务,例如:
- 情感分析
- 问答
- 命名实体识别
- 文本生成
- 掩码语言建模(掩码填充)
- 总结
- 机器翻译
在这里,我尝试展示如何使用 Hugging Face 管道解决与 NLP 相关的 5 个最流行的任务。
我已经在 Kaggle notebook 上运行了这些代码,链接在这里。您也可以在 Google Colaboratory 上运行这些代码。
首先,我们将从 transformers 库中导入管道。
from transformers import pipeline
1)情感分析
这里的情感分析是指根据给定的概率分数,将给定的文本分类为“正面”或“负面”标签。
这里我们将给出两个句子,并根据概率提取它们的标签,得分四舍五入到小数点后 4 位。
nlp = pipeline("sentiment-analysis")
#First Sentence
result = nlp("I love trekking and yoga.")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
#Second sentence
result = nlp("Racial discrimination should be outright boycotted.")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
2)问答
问答是指根据提供给模特的段落信息(称为上下文)回答问题。答案是从上下文中截取的一小部分。
这里提供了一段关于质数的段落作为背景,然后根据这段背景提出了两个问题。模型会根据背景给出答案。这段背景段落取自SQuAD数据库。
nlp = pipeline("question-answering")
context = r"""
The property of being prime (or not) is called primality.
A simple but slow method of verifying the primality of a given number n is known as trial division.
It consists of testing whether n is a multiple of any integer between 2 and itself.
Algorithms much more efficient than trial division have been devised to test the primality of large numbers.
These include the Miller–Rabin primality test, which is fast but has a small probability of error, and the AKS primality test, which always produces the correct answer in polynomial time but is too slow to be practical.
Particularly fast methods are available for numbers of special forms, such as Mersenne numbers.
As of January 2016, the largest known prime number has 22,338,618 decimal digits.
"""
#Question 1
result = nlp(question="What is a simple method to verify primality?", context=context)
print(f"Answer: '{result['answer']}'")
#Question 2
result = nlp(question="As of January 2016 how many digits does the largest known prime consist of?", context=context)
print(f"Answer: '{result['answer']}'")
第一个问题的答案是:
3)文本生成
文本生成是自然语言处理中最热门的任务之一。GPT-3 是一种文本生成模型,它可以根据给定的提示生成文本。
在本部分中,我们将尝试根据提示生成一些文本A person must always work hard and,然后模型会生成一个简短的段落,输出结果非常正确,但不够连贯,因为模型的参数较少。
text_generator = pipeline("text-generation")
text= text_generator("A person must always work hard and", max_length=50, do_sample=False)[0]
print(text['generated_text'])
4)总结
文本摘要是指理解大量文本数据,然后写出该数据的简短摘要。
我们正在尝试提取一段关于阿波罗计划的大段落的摘要。
summarizer = pipeline("summarization")
ARTICLE = """The Apollo program, also known as Project Apollo, was the third United States human spaceflight program carried out by the National Aeronautics and Space Administration (NASA), which accomplished landing the first humans on the Moon from 1969 to 1972.
First conceived during Dwight D. Eisenhower's administration as a three-man spacecraft to follow the one-man Project Mercury which put the first Americans in space,
Apollo was later dedicated to President John F. Kennedy's national goal of "landing a man on the Moon and returning him safely to the Earth" by the end of the 1960s, which he proposed in a May 25, 1961, address to Congress.
Project Mercury was followed by the two-man Project Gemini (1962–66).
The first manned flight of Apollo was in 1968.
Apollo ran from 1961 to 1972, and was supported by the two-man Gemini program which ran concurrently with it from 1962 to 1966.
Gemini missions developed some of the space travel techniques that were necessary for the success of the Apollo missions.
Apollo used Saturn family rockets as launch vehicles.
Apollo/Saturn vehicles were also used for an Apollo Applications Program, which consisted of Skylab, a space station that supported three manned missions in 1973–74, and the Apollo–Soyuz Test Project, a joint Earth orbit mission with the Soviet Union in 1975.
"""
summary=summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)[0]
print(summary['summary_text'])
上述段落生成的摘要如下:
5)翻译
大家肯定都用过谷歌翻译,这不就是把一种语言翻译成另一种语言吗?
这段代码将把一句谚语从英语翻译成德语。
translator = pipeline("translation_en_to_de")
print(translator("A great obstacle to happiness is to expect too much happiness.", max_length=40)[0]['translation_text'])
翻译后的句子是
希望大家喜欢这篇关于 Hugging Face 处理流程在执行不同 NLP 任务中的应用的简短教程。
文章来源:https://dev.to/amananandrai/5-nlp-tasks-using-hugging-face-pipeline-5b98