Haystack docs home page

Pipeline Nodes

Nodes are the core components that process and route incoming text. Some perform steps like preprocessing, retrieving, or summarizing text while routing queries through different branches of a Pipeline. Nodes are chained together using a Pipeline and they function like building blocks that can be easily switched out for each other. A Node takes the output of the previous Node (or Nodes) as input.

Usage

All Nodes are designed to be useable within a Pipeline. When you add a Node to a Pipeline and call Pipeline.run(), it calls each Node's run() method in the predefined sequence. The same is true for Pipeline.run_batch() that you can use if you want to ask multiple queries. It calls each Node's run_batch() method.

For more information, see the Pipelines page.

Alternatively, you can also call the Nodes outside of the Pipeline. See each individual Node's documentation page to learn more about its available methods.

Available Nodes

NodeClassesDescription
FileClassifierFileTypeClassifierDistinguishes between text, PDF, Markdown, Docx and HTML files
FileConvertersPDFToTextConverter, DocxToTextConverter, AzureConverter, ImageToTextConverter, MarkdownConverterPerforms cleaning and splitting on Documents
CrawlerCrawlerScrapes websites and returns text
PreProcessorPreProcessorPerforms cleaning and splitting on Documents
RetrieverBM25Retriever, ElasticsearchRetriever, DensePassageRetriever, TableTextRetriever, EmbeddingRetriever, TfidfRetriever, ElasticsearchFilterOnlyRetrieverLooks into a coupled Document Store and fetches Documents that are relevant to a given Query
ReaderFARMReader, TransformersReaderFinds an answer to a question by selecting a text span in the provided Documents
Answer GeneratorRAGenerator, Seq2SeqGenerator, OpenAIAnswerGeneratorGenerates an answer to a question by reading through the provided documents and composing an answer word-by-word
SummarizerTransformersSummarizerCreates a shorter overview of a given Document
TranslatorTransformersTranslatorTranslate text from one language into another
RankerSentenceTransformersRankerReorders a set of Documents based on their relevance to the Query
Query ClassifierTransformersQueryClassifier, SklearnQueryClassifierDistinguishes between queries that are keywords, questions or statements and routes accordingly
Question GeneratorQuestionGeneratortakes a Document as input and generates questions which it believes can be answered by the Document
Document ClassifierTransformersDocumentClassifierPerforms classification on Documents and attaches it as metadata
Entity ExtractorEntityExtractorExtracts predefined entities out of a piece of text
Route DocumentsRouteDocumentsRoutes documents based on their content type or a metadata field
Join DocumentsJoinDocumentsTakes Documents from multiple Nodes and joins them to form one list of Documents.
Join AnswersJoinAnswersTakes Answers from two or more Reader or Generator nodes and joins them to produce a single list of Answers
Docs2AnswersDocs2AnswersConverts retrieved Documents into predicted Answers format.

Decision Nodes

You can add decision nodes where only one "branch" is executed afterwards. You can use decision nodes to classify an incoming query and, depending on the result, route it to different modules. To find a ready-made example of a decision node, have a look at QueryClassifier.

image

You can also create a custom decision node. To do this, create a class that looks like this:

class QueryClassifier(BaseComponent):
outgoing_edges = 2
def run(self, query):
if "?" in query:
return {}, "output_1"
else:
return {}, "output_2"
pipe = Pipeline()
pipe.add_node(component=QueryClassifier(), name="QueryClassifier", inputs=["Query"])
pipe.add_node(component=es_retriever, name="ESRetriever", inputs=["QueryClassifier.output_1"])
pipe.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_2"])
pipe.add_node(component=JoinDocuments(join_mode="concatenate"), name="JoinResults",
inputs=["ESRetriever", "DPRRetriever"])
pipe.add_node(component=reader, name="QAReader", inputs=["JoinResults"])
res = p.run(query="What did Einstein work on?", params={"ESRetriever": {"top_k": 1}, "DPRRetriever": {"top_k": 3}})