Haystack docs home page

Module file_type


class FileTypeClassifier(BaseComponent)

Route files in an Indexing Pipeline to corresponding file converters.


def __init__(supported_types: List[str] = DEFAULT_TYPES)

Node that sends out files on a different output edge depending on their extension.


  • supported_types: the file types that this node can distinguish. Note that it's limited to a maximum of 10 outgoing edges, which correspond each to a file extension. Such extension are, by default txt, pdf, md, docx, html. Lists containing more than 10 elements will not be allowed. Lists with duplicate elements will also be rejected.


def run(file_paths: Union[Path, List[Path], str, List[str], List[Union[Path, str]]])

Sends out files on a different output edge depending on their extension.


  • file_paths: paths to route on different edges.