Class TikaUtils

java.lang.Object
uno.anahata.ai.internal.TikaUtils

public final class TikaUtils extends Object
Utility class for file type detection and content extraction using Apache Tika.

This class provides methods to identify the MIME type of a file and to extract its text content, supporting a wide range of formats (PDF, DOCX, etc.).

  • Constructor Details

    • TikaUtils

      public TikaUtils()
  • Method Details

    • detectMimeType

      public static String detectMimeType(File file) throws Exception
      Detects the MIME type of a given file.
      Parameters:
      file - The file to inspect.
      Returns:
      The detected MIME type (e.g., "image/png", "application/pdf").
      Throws:
      Exception - if an error occurs during detection.
    • detectAndParse

      public static String detectAndParse(File file) throws Exception
      Detects the file type and parses the text content from a given file.

      This method uses Tika's auto-detection to choose the appropriate parser for the file format.

      Parameters:
      file - The file to parse.
      Returns:
      The extracted text content.
      Throws:
      Exception - if an error occurs during parsing.