### Install Snappy-Go Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Command to download and install the standard Snappy-Go library. ```bash $ go get code.google.com/p/snappy-go/snappy ``` -------------------------------- ### Install docconv and dependencies Source: https://pkg.go.dev/github.com/sajari/docconv Installs the docconv package and its command-line tool 'docd'. Ensure GOPATH is set and in your PATH. ```bash $ go get code.sajari.com/docconv/... ``` -------------------------------- ### Install system dependencies for docconv Source: https://pkg.go.dev/github.com/sajari/docconv Installs common system dependencies required for docconv to function. This example uses apt-get for Debian-based systems. ```bash $ sudo apt-get install poppler-utils wv unrtf tidy $ go get github.com/JalfResi/justext ``` -------------------------------- ### Install docconv with OCR support Source: https://pkg.go.dev/github.com/sajari/docconv Fetches and builds the docconv package with optional OCR support for image conversion. Requires tesseract to be installed. ```bash $ go get -tags ocr code.sajari.com/docconv/... ``` -------------------------------- ### Start docd service with custom logging Source: https://pkg.go.dev/github.com/sajari/docconv Starts the docd service with custom logging levels. Level 0 logs only errors, while level 1 includes request logging. ```bash docd -log-level 0 # will only log errors & critical info docd -addr :8000 -log-level 1 # will run on port 8000 and log each request as well ``` -------------------------------- ### Install Tesseract OCR on macOS Source: https://pkg.go.dev/github.com/sajari/docconv Installs the Tesseract OCR engine on macOS using Homebrew, a prerequisite for image conversion support in docconv. ```bash $ brew install tesseract ``` -------------------------------- ### Convert document locally using docconv Source: https://pkg.go.dev/github.com/sajari/docconv Converts a local document file to plain text using the docconv library. Ensure the document file exists and dependencies are installed. ```go package main import ( "fmt" "log" "code.sajari.com/docconv" ) func main() { res, err := docconv.ConvertPath("your-file.pdf") if err != nil { log.Fatal(err) } fmt.Println(res) } ``` -------------------------------- ### Client Convert Method Source: https://pkg.go.dev/github.com/sajari/docconv/client Convert a file from a local path using the http client. ```go func (c *Client) Convert(r io.Reader, filename string) (*Response, error) ``` -------------------------------- ### Client Option Type Source: https://pkg.go.dev/github.com/sajari/docconv/client Opt is an option used in New to create Clients. ```go type Opt func(*Client) ``` -------------------------------- ### New Client Function Source: https://pkg.go.dev/github.com/sajari/docconv/client New creates a new docconv client for interacting with a docconv HTTP server. ```go func New(opts ...Opt) *Client ``` -------------------------------- ### Create New LocalFile Source: https://pkg.go.dev/github.com/sajari/docconv Creates a new LocalFile. If the provided io.Reader is an *os.File, it's used directly. Otherwise, a temporary file is created and data is copied. Remember to call Done() to clean up resources. ```go func NewLocalFile(r io.Reader, dir, prefix string) (*LocalFile, error) ``` -------------------------------- ### Client Struct Definition Source: https://pkg.go.dev/github.com/sajari/docconv/client Client is a docconv HTTP client. Use New to make new Clients. ```go type Client struct { // contains filtered or unexported fields } ``` -------------------------------- ### Client Configuration Source: https://pkg.go.dev/github.com/sajari/docconv/client Functions to configure the docconv client instance. ```APIDOC ## Client Configuration ### Description Options used to initialize a new docconv client. ### Functions - **New(opts ...Opt) *Client** - Creates a new docconv client. - **WithEndpoint(endpoint string) Opt** - Sets the endpoint address (host:port). - **WithHTTPClient(client *http.Client) Opt** - Sets the custom HTTP client. - **WithProtocol(protocol string) Opt** - Sets the protocol (http:// or https://). ``` -------------------------------- ### ComponentInfo Structure Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Definition and methods for ComponentInfo. ```go type ComponentInfo struct { Identifier *uint64 `protobuf:"varint,1,req,name=identifier" json:"identifier,omitempty"` PreferredLocator *string `protobuf:"bytes,2,req,name=preferred_locator" json:"preferred_locator,omitempty"` Locator *string `protobuf:"bytes,3,opt,name=locator" json:"locator,omitempty"` ReadVersion []uint32 `protobuf:"varint,4,rep,packed,name=read_version" json:"read_version,omitempty"` WriteVersion []uint32 `protobuf:"varint,5,rep,packed,name=write_version" json:"write_version,omitempty"` ExternalReferences []*ComponentExternalReference `protobuf:"bytes,6,rep,name=external_references" json:"external_references,omitempty"` DataReferences []*ComponentDataReference `protobuf:"bytes,7,rep,name=data_references" json:"data_references,omitempty"` AllowsDuplicatesOutsideOfDocumentPackage *bool `` /* 143-byte string literal not displayed */ DirtiesDocumentPackage *bool `protobuf:"varint,9,opt,name=dirties_document_package,def=1" json:"dirties_document_package,omitempty"` IsStoredOutsideObjectArchive *bool `protobuf:"varint,10,opt,name=is_stored_outside_object_archive,def=0" json:"is_stored_outside_object_archive,omitempty"` XXX_unrecognized []byte `json:"-"` } ``` ```go func (m *ComponentInfo) GetAllowsDuplicatesOutsideOfDocumentPackage() bool ``` ```go func (m *ComponentInfo) GetDataReferences() []*ComponentDataReference ``` ```go func (m *ComponentInfo) GetDirtiesDocumentPackage() bool ``` -------------------------------- ### Size Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Methods for interacting with the Size struct, including retrieving width and string representation. ```go func (m *Size) GetWidth() float32 ``` ```go func (*Size) ProtoMessage() ``` ```go func (m *Size) Reset() ``` ```go func (m *Size) String() string ``` -------------------------------- ### Size Struct and Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Structure for defining dimensions. ```go type Size struct { Width *float32 `protobuf:"fixed32,1,req,name=width" json:"width,omitempty"` Height *float32 `protobuf:"fixed32,2,req,name=height" json:"height,omitempty"` XXX_unrecognized []byte `json:"-"` } ``` ```go func (m *Size) GetHeight() float32 ``` -------------------------------- ### Snappy Reader and Writer Interfaces Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Implements io.Reader and io.Writer interfaces for handling Snappy-compressed data streams. These allow for seamless compression and decompression of data read from or written to various sources. ```APIDOC ## Snappy Reader and Writer Interfaces ### Description Implements io.Reader and io.Writer interfaces for handling Snappy-compressed data streams. These allow for seamless compression and decompression of data read from or written to various sources. ### Types #### type Reader Reader is an io.Reader than can read Snappy-compressed bytes. ##### func NewReader(r io.Reader) *Reader NewReader returns a new Reader that decompresses from r, using the framing format described at https://code.google.com/p/snappy/source/browse/trunk/framing_format.txt ##### func (*Reader) Read(p []byte) (int, error) Read satisfies the io.Reader interface. ##### func (*Reader) Reset(reader io.Reader) Reset discards any buffered data, resets all state, and switches the Snappy reader to read from r. This permits reusing a Reader rather than allocating a new one. #### type Writer Writer is an io.Writer than can write Snappy-compressed bytes. ##### func NewWriter(w io.Writer) *Writer NewWriter returns a new Writer that compresses to w, using the framing format described at https://code.google.com/p/snappy/source/browse/trunk/framing_format.txt ##### func (*Writer) Reset(writer io.Writer) Reset discards the writer's state and switches the Snappy writer to write to w. This permits reusing a Writer rather than allocating a new one. ##### func (*Writer) Write(p []byte) (n int, errRet error) Write satisfies the io.Writer interface. ``` -------------------------------- ### Path_Element Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Methods for interacting with Path_Element structures. ```go func (m *Path_Element) GetType() Path_ElementType ``` ```go func (*Path_Element) ProtoMessage() ``` ```go func (m *Path_Element) Reset() ``` ```go func (m *Path_Element) String() string ``` -------------------------------- ### Reference Struct and Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Reference structure for identifiers and type metadata. ```go type Reference struct { Identifier *uint64 `protobuf:"varint,1,req,name=identifier" json:"identifier,omitempty"` DeprecatedType *int32 `protobuf:"varint,2,opt,name=deprecated_type" json:"deprecated_type,omitempty"` DeprecatedIsExternal *bool `protobuf:"varint,3,opt,name=deprecated_is_external" json:"deprecated_is_external,omitempty"` XXX_unrecognized []byte `json:"-"` } ``` ```go func (m *Reference) GetDeprecatedIsExternal() bool ``` ```go func (m *Reference) GetDeprecatedType() int32 ``` ```go func (m *Reference) GetIdentifier() uint64 ``` ```go func (*Reference) ProtoMessage() ``` ```go func (m *Reference) Reset() ``` ```go func (m *Reference) String() string ``` -------------------------------- ### Point Struct and Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Geometric point structure and its getter methods. ```go type Point struct { X *float32 `protobuf:"fixed32,1,req,name=x" json:"x,omitempty"` Y *float32 `protobuf:"fixed32,2,req,name=y" json:"y,omitempty"` XXX_unrecognized []byte `json:"-"` } ``` ```go func (m *Point) GetX() float32 ``` ```go func (m *Point) GetY() float32 ``` ```go func (*Point) ProtoMessage() ``` ```go func (m *Point) Reset() ``` ```go func (m *Point) String() string ``` -------------------------------- ### Define HTMLReadabilityOptions Source: https://pkg.go.dev/github.com/sajari/docconv Defines parameters for the justext package used in HTML readability processing. These are global settings, and their global state is noted as a potential area for improvement. ```go type HTMLReadabilityOptions struct { LengthLow int LengthHigh int StopwordsLow float64 StopwordsHigh float64 MaxLinkDensity float64 MaxHeadingDistance int ReadabilityUseClasses string } ``` ```go var HTMLReadabilityOptionsValues HTMLReadabilityOptions ``` -------------------------------- ### NewWriter Function Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Creates a new Writer for compression. ```go func NewWriter(w io.Writer) *Writer ``` -------------------------------- ### String for ComponentInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Returns a string representation of the ComponentInfo. Useful for debugging and logging. ```go func (m *ComponentInfo) String() string ``` -------------------------------- ### ReferenceDictionary Struct and Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Dictionary structure for mapping references. ```go type ReferenceDictionary struct { Entries []*ReferenceDictionary_Entry `protobuf:"bytes,1,rep,name=entries" json:"entries,omitempty"` XXX_unrecognized []byte `json:"-"` } ``` ```go func (m *ReferenceDictionary) GetEntries() []*ReferenceDictionary_Entry ``` ```go func (*ReferenceDictionary) ProtoMessage() ``` ```go func (m *ReferenceDictionary) Reset() ``` ```go func (m *ReferenceDictionary) String() string ``` -------------------------------- ### Color Model Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Methods for accessing and manipulating Color model properties. ```go func (m *Color) GetModel() Color_ColorModel ``` ```go func (m *Color) GetR() float32 ``` ```go func (m *Color) GetW() float32 ``` ```go func (m *Color) GetY() float32 ``` ```go func (*Color) ProtoMessage() ``` ```go func (m *Color) Reset() ``` ```go func (m *Color) String() string ``` -------------------------------- ### WithHTTPClient Option Function Source: https://pkg.go.dev/github.com/sajari/docconv/client WithHTTPClient sets the *http.Client used for all underlying calls. ```go func WithHTTPClient(client *http.Client) Opt ``` -------------------------------- ### Response Struct Definition Source: https://pkg.go.dev/github.com/sajari/docconv/client Response is from docconv.Response copied here to avoid dependency on the docconv package. ```go type Response struct { Body string `json:"body"` Meta map[string]string `json:"meta"` MSecs uint32 `json:"msecs"` Error string `json:"error"` } ``` -------------------------------- ### WithProtocol Option Function Source: https://pkg.go.dev/github.com/sajari/docconv/client WithProtocol sets the protocol used in HTTP requests. Currently this must be either http:// or https://. ```go func WithProtocol(protocol string) Opt ``` -------------------------------- ### WithEndpoint Option Function Source: https://pkg.go.dev/github.com/sajari/docconv/client WithEndpoint set the endpoint on a Client. ```go func WithEndpoint(endpoint string) Opt ``` -------------------------------- ### String for DataReference Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Returns a string representation of the DataReference. Aids in debugging and logging. ```go func (m *DataReference) String() string ``` -------------------------------- ### ConvertPath Function Source: https://pkg.go.dev/github.com/sajari/docconv/client ConvertPath uses the docconv Client to convert the local file found at path. ```go func ConvertPath(c *Client, path string) (*Response, error) ``` -------------------------------- ### String for DataInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Provides a string representation of the DataInfo object. Helpful for debugging and logging purposes. ```go func (m *DataInfo) String() string ``` -------------------------------- ### ReferenceDictionary_Entry Struct and Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Entry structure for ReferenceDictionary. ```go type ReferenceDictionary_Entry struct { Key *Reference `protobuf:"bytes,1,req,name=key" json:"key,omitempty"` Value *Reference `protobuf:"bytes,2,req,name=value" json:"value,omitempty"` XXX_unrecognized []byte `json:"-"` } ``` ```go func (m *ReferenceDictionary_Entry) GetKey() *Reference ``` ```go func (m *ReferenceDictionary_Entry) GetValue() *Reference ``` ```go func (*ReferenceDictionary_Entry) ProtoMessage() ``` ```go func (m *ReferenceDictionary_Entry) Reset() ``` ```go func (m *ReferenceDictionary_Entry) String() string ``` -------------------------------- ### Convert document over network using docconv client Source: https://pkg.go.dev/github.com/sajari/docconv Converts a document by sending a request to a running docd service. Uses the docconv client library and defaults to localhost:8888. ```go package main import ( "fmt" "log" "code.sajari.com/docconv/client" ) func main() { // Create a new client, using the default endpoint (localhost:8888) c := client.New() res, err := client.ConvertPath(c, "your-file.pdf") if err != nil { log.Fatal(err) } fmt.Println(res) } ``` -------------------------------- ### Path Structure and Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Defines the Path structure and provides methods for accessing its elements. ```APIDOC ## Path ### Structure ```go type Path struct { Elements []*Path_Element `protobuf:"bytes,1,rep,name=elements" json:"elements,omitempty" XXX_unrecognized []byte `json:"-" } ``` ### GetElements Retrieves the elements that constitute the path. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **elements** ([]*Path_Element) - A list of Path_Element objects. ### Response Example ```json { "elements": [ { "type": "MoveTo", "points": [ { "x": 10, "y": 20 } ] } ] } ``` ``` -------------------------------- ### ProtoMessage for ComponentInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Placeholder method for Protocol Buffers message definition. It does not perform any action but is required for protobuf compatibility. ```go func (*ComponentInfo) ProtoMessage() ``` -------------------------------- ### ViewStateMetadata Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Methods for accessing fields and managing the state of ViewStateMetadata. ```go func (m *ViewStateMetadata) GetComponent() *ComponentInfo ``` ```go func (m *ViewStateMetadata) GetDocumentVersionUuid() string ``` ```go func (m *ViewStateMetadata) GetVersion() []uint32 ``` ```go func (*ViewStateMetadata) ProtoMessage() ``` ```go func (m *ViewStateMetadata) Reset() ``` ```go func (m *ViewStateMetadata) String() string ``` -------------------------------- ### GetLocator for ComponentInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Retrieves the locator string for a ComponentInfo. This is typically a path or URL to the component's resource. ```go func (m *ComponentInfo) GetLocator() string ``` -------------------------------- ### Convert Local File Path to Text Source: https://pkg.go.dev/github.com/sajari/docconv Converts a local file specified by its path to plain text. Returns a Response object or an error. ```go func ConvertPath(path string) (*Response, error) ``` -------------------------------- ### String for DatabaseData Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Returns a string representation of the DatabaseData object. Useful for debugging. ```go func (m *DatabaseData) String() string ``` -------------------------------- ### Define LocalFile Type Source: https://pkg.go.dev/github.com/sajari/docconv Represents a local file, wrapping an os.File. It ensures data is available in a file, either by using an existing os.File or creating a temporary one. ```go type LocalFile struct { *os.File // contains filtered or unexported fields } ``` -------------------------------- ### GetWriteVersion for ComponentInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Retrieves the write version information for a ComponentInfo. Use this to manage updates and ensure data integrity. ```go func (m *ComponentInfo) GetWriteVersion() []uint32 ``` -------------------------------- ### NewReader Function Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Creates a new Reader for decompression. ```go func NewReader(r io.Reader) *Reader ``` -------------------------------- ### Default Protocol Constant Source: https://pkg.go.dev/github.com/sajari/docconv/client DefaultProtocol is the default protocol used to construct paths when making docconv requests. ```go const DefaultProtocol = "http://" ``` -------------------------------- ### Snappy Compression and Decompression Functions Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Provides functions for encoding and decoding byte slices using the Snappy compression algorithm. It also includes utilities to determine the maximum encoded length and the decoded length of a given byte slice. ```APIDOC ## Snappy Compression and Decompression Functions ### Description Provides functions for encoding and decoding byte slices using the Snappy compression algorithm. It also includes utilities to determine the maximum encoded length and the decoded length of a given byte slice. ### Functions #### func Decode(dst, src []byte) ([]byte, error) Decode returns the decoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire decoded block. Otherwise, a newly allocated slice will be returned. It is valid to pass a nil dst. #### func DecodedLen(src []byte) (int, error) DecodedLen returns the length of the decoded block. #### func Encode(dst, src []byte) ([]byte, error) Encode returns the encoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire encoded block. Otherwise, a newly allocated slice will be returned. It is valid to pass a nil dst. #### func MaxEncodedLen(srcLen int) int MaxEncodedLen returns the maximum length of a snappy block, given its uncompressed length. ``` -------------------------------- ### Color Model Enum Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Methods for interacting with the Color_ColorModel enum type. ```go func (x Color_ColorModel) Enum() *Color_ColorModel ``` ```go func (x Color_ColorModel) String() string ``` ```go func (x *Color_ColorModel) UnmarshalJSON(data []byte) error ``` -------------------------------- ### ProtoMessage for DataInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Placeholder method for Protocol Buffers message definition. Required for protobuf compatibility. ```go func (*DataInfo) ProtoMessage() ``` -------------------------------- ### Writer Write Method Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Writes compressed data to the underlying writer. ```go func (w *Writer) Write(p []byte) (n int, errRet error) ``` -------------------------------- ### Encode Function Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Encodes a byte slice using the snappy format. ```go func Encode(dst, src []byte) ([]byte, error) ``` -------------------------------- ### Range Struct and Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Range structure for defining locations and lengths. ```go type Range struct { Location *uint32 `protobuf:"varint,1,req,name=location" json:"location,omitempty"` Length *uint32 `protobuf:"varint,2,req,name=length" json:"length,omitempty"` XXX_unrecognized []byte `json:"-"` } ``` ```go func (m *Range) GetLength() uint32 ``` ```go func (m *Range) GetLocation() uint32 ``` ```go func (*Range) ProtoMessage() ``` ```go func (m *Range) Reset() ``` ```go func (m *Range) String() string ``` -------------------------------- ### ViewStateMetadata Struct Definition Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Definition of the ViewStateMetadata struct with protobuf tags. ```go type ViewStateMetadata struct { Version []uint32 `protobuf:"varint,1,rep,packed,name=version" json:"version,omitempty"` DocumentVersionUuid *string `protobuf:"bytes,2,req,name=document_version_uuid" json:"document_version_uuid,omitempty"` Component *ComponentInfo `protobuf:"bytes,3,req,name=component" json:"component,omitempty"` XXX_unrecognized []byte `json:"-"` } ``` -------------------------------- ### Snappy Error Variables Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Common error variables used to report corruption or unsupported input. ```go var ( // ErrCorrupt reports that the input is invalid. ErrCorrupt = errors.New("snappy: corrupt input") // ErrUnsupported reports that the input isn't supported. ErrUnsupported = errors.New("snappy: unsupported input") ) ``` -------------------------------- ### POST /convert Source: https://pkg.go.dev/github.com/sajari/docconv/client Converts a file provided via an io.Reader to a structured response containing body text and metadata. ```APIDOC ## POST /convert ### Description Converts a file from a local path or reader using the docconv HTTP client. ### Method POST ### Parameters #### Request Body - **r** (io.Reader) - Required - The file content to be converted. - **filename** (string) - Required - The name of the file being converted. ### Response #### Success Response (200) - **Body** (string) - The extracted text content of the file. - **Meta** (map[string]string) - Metadata associated with the file. - **MSecs** (uint32) - Processing time in milliseconds. - **Error** (string) - Error message if the conversion failed. ``` -------------------------------- ### Reset for ComponentInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Resets the ComponentInfo to its default state. Use this to clear existing data before re-populating. ```go func (m *ComponentInfo) Reset() ``` -------------------------------- ### GetReadVersion for ComponentInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Retrieves the read version information for a ComponentInfo. This is useful for version control and compatibility checks. ```go func (m *ComponentInfo) GetReadVersion() []uint32 ``` -------------------------------- ### ConvertPath Function Source: https://pkg.go.dev/github.com/sajari/docconv Converts a local file path to plain text. ```APIDOC ## func ConvertPath ### Description ConvertPath converts a local path to text. ### Method func ConvertPath(path string) (*Response, error) ### Parameters #### Path Parameters - **path** (string) - Required - The local file path to convert. #### Query Parameters None #### Request Body None ### Request Example None ### Response #### Success Response (200) - **Response** (*Response) - The conversion result, containing body, meta, milliseconds, and error fields. #### Response Example ```json { "body": "string", "meta": { "key": "value" }, "msecs": 100, "error": "" } ``` ``` -------------------------------- ### Size Structure Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Represents a size with width and height. ```APIDOC ## Size ### Description Represents a two-dimensional size with width and height. ### Fields - **Width** (*float32) - The width of the size. - **Height** (*float32) - The height of the size. ### Methods #### GetHeight() - **Description**: Returns the height of the size. - **Returns**: float32 #### GetWidth() - **Description**: Returns the width of the size. - **Returns**: float32 #### ProtoMessage() - **Description**: Marks the type as a protobuf message. #### Reset() - **Description**: Resets the size to its default state (width 0, height 0). #### String() - **Description**: Returns a string representation of the size. - **Returns**: string ``` -------------------------------- ### TSP Package Overview Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Overview of the TSP package, including its source files and top-level messages. ```APIDOC ## TSP Package ### Overview Package TSP is a generated protocol buffer package. It is generated from the following files: - TSPArchiveMessages.proto - TSPDatabaseMessages.proto - TSPMessages.proto It has the following top-level messages: - ArchiveInfo - MessageInfo - FieldInfo - FieldPath - ComponentInfo - ComponentExternalReference - ComponentDataReference - PackageMetadata - PasteboardMetadata - DataInfo - ViewStateMetadata ``` -------------------------------- ### Utility Functions Source: https://pkg.go.dev/github.com/sajari/docconv Helper functions for MIME type detection and XML processing. ```APIDOC ## MimeTypeByExtension ### Description Returns a mimetype for the given file extension. ### Parameters #### Query Parameters - **filename** (string) - Required - The filename or extension to check. ### Response - **string** - The determined MIME type or 'application/octet-stream'. ``` -------------------------------- ### Path_Element Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Methods available for the Path_Element type. ```APIDOC ## Path_Element ### Description Represents an element within a path, used for drawing operations. ### Methods #### GetType() - **Description**: Returns the type of the path element. - **Returns**: Path_ElementType #### ProtoMessage() - **Description**: Marks the type as a protobuf message. #### Reset() - **Description**: Resets the path element to its default state. #### String() - **Description**: Returns a string representation of the path element. - **Returns**: string ``` -------------------------------- ### GetFileName for DataInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Retrieves the file name associated with the DataInfo. This is the actual name of the file. ```go func (m *DataInfo) GetFileName() string ``` -------------------------------- ### GetSourceBookmarkData for DataInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Retrieves bookmark data associated with the source of the DataInfo. Useful for tracking origins or specific points within the data. ```go func (m *DataInfo) GetSourceBookmarkData() []byte ``` -------------------------------- ### Reader Reset Method Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Resets the Reader to read from a new source. ```go func (r *Reader) Reset(reader io.Reader) ``` -------------------------------- ### Snappy Package Variables Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Defines package-level variables, including error types for corrupt or unsupported input data. ```APIDOC ## Snappy Package Variables ### Description Defines package-level variables, including error types for corrupt or unsupported input data. ### Variables ```go var ( // ErrCorrupt reports that the input is invalid. ErrCorrupt = errors.New("snappy: corrupt input") // ErrUnsupported reports that the input isn't supported. ErrUnsupported = errors.New("snappy: unsupported input") ) ``` ``` -------------------------------- ### PackageMetadata Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Methods associated with the PackageMetadata type for retrieving read and write versions. ```APIDOC ## PackageMetadata ### GetReadVersion Retrieves the read version of the package metadata. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **version** ([]uint32) - The read version. ### Response Example ```json { "version": [1, 0, 0] } ``` ## PackageMetadata ### GetWriteVersion Retrieves the write version of the package metadata. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **version** ([]uint32) - The write version. ### Response Example ```json { "version": [1, 0, 0] } ``` ``` -------------------------------- ### LocalFile Type and Methods Source: https://pkg.go.dev/github.com/sajari/docconv Defines the LocalFile type for handling local files and its associated Done method for resource cleanup. ```APIDOC ## type LocalFile ### Description LocalFile is a type which wraps an *os.File. See NewLocalFile for more details. ## func NewLocalFile ### Description NewLocalFile ensures that there is a file which contains the data provided by r. If r is actually an instance of *os.File then this file is used, otherwise a temporary file is created (using dir and prefix) and the data from r copied into it. Callers must call Done() when the LocalFile is no longer needed to ensure all resources are cleaned up. ### Method func NewLocalFile(r io.Reader, dir, prefix string) (*LocalFile, error) ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example None ### Response #### Success Response (200) - **LocalFile** (*LocalFile) - A pointer to the created LocalFile. - **error** (error) - An error if the file creation fails. #### Response Example None ## func (*LocalFile) Done ### Description Done cleans up all resources. ### Method func (l *LocalFile) Done() ### Parameters None ### Request Example None ### Response #### Success Response (200) None #### Response Example None ``` -------------------------------- ### Reset for DataInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Resets the DataInfo fields to their default values. Call this to clear and reuse a DataInfo instance. ```go func (m *DataInfo) Reset() ``` -------------------------------- ### GetPreferredLocator for ComponentInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Retrieves the preferred locator string for a ComponentInfo. This may differ from the general locator based on system configuration or usage. ```go func (m *ComponentInfo) GetPreferredLocator() string ``` -------------------------------- ### GetDocumentResourceLocator for DataInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Retrieves the resource locator for the document data. This points to where the data can be found. ```go func (m *DataInfo) GetDocumentResourceLocator() string ``` -------------------------------- ### Decode Function Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Decodes a snappy-compressed byte slice. ```go func Decode(dst, src []byte) ([]byte, error) ``` -------------------------------- ### ProtoMessage for DatabaseData Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Placeholder method for Protocol Buffers message definition. Ensures compatibility with protobuf serialization. ```go func (*DatabaseData) ProtoMessage() ``` -------------------------------- ### Writer Reset Method Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Resets the Writer to write to a new destination. ```go func (w *Writer) Reset(writer io.Writer) ``` -------------------------------- ### GetPreferredFileName for DataInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Retrieves the preferred file name for the data. This might be a user-friendly name or a standardized name. ```go func (m *DataInfo) GetPreferredFileName() string ``` -------------------------------- ### GetData for DatabaseDataArchive Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Retrieves the data reference for the archived data. This points to the actual data content. ```go func (m *DatabaseDataArchive) GetData() *Reference ``` -------------------------------- ### ProtoMessage for DataReference Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Placeholder method for Protocol Buffers message definition. Ensures compatibility with protobuf encoding/decoding. ```go func (*DataReference) ProtoMessage() ``` -------------------------------- ### Convert io.Reader to Text Source: https://pkg.go.dev/github.com/sajari/docconv Converts data from an io.Reader to plain text, with an option to enable HTML readability processing. Returns a Response object or an error. ```go func Convert(r io.Reader, mimeType string, readability bool) (*Response, error) ``` -------------------------------- ### Convert Function Source: https://pkg.go.dev/github.com/sajari/docconv Converts a file from an io.Reader to plain text. It supports specifying the MIME type and whether to use HTML readability heuristics. ```APIDOC ## func Convert ### Description Convert a file to plain text. ### Method func Convert(r io.Reader, mimeType string, readability bool) (*Response, error) ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example None ### Response #### Success Response (200) - **Response** (*Response) - The conversion result, containing body, meta, milliseconds, and error fields. #### Response Example ```json { "body": "string", "meta": { "key": "value" }, "msecs": 100, "error": "" } ``` ``` -------------------------------- ### PasteboardMetadata Structure and Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Defines the PasteboardMetadata structure and provides methods for accessing its fields. ```APIDOC ## PasteboardMetadata ### Structure ```go type PasteboardMetadata struct { Version []uint32 `protobuf:"varint,1,rep,packed,name=version" json:"version,omitempty" AppName *string `protobuf:"bytes,2,req,name=app_name" json:"app_name,omitempty" Datas []*DataInfo `protobuf:"bytes,3,rep,name=datas" json:"datas,omitempty" SourceDocumentUuid *string `protobuf:"bytes,4,opt,name=source_document_uuid" json:"source_document_uuid,omitempty" XXX_unrecognized []byte `json:"-" } ``` ### GetAppName Retrieves the application name associated with the pasteboard metadata. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **appName** (string) - The name of the application. ### Response Example ```json { "appName": "ExampleApp" } ``` ## PasteboardMetadata ### GetDatas Retrieves the data information associated with the pasteboard metadata. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **datas** ([]*DataInfo) - A list of DataInfo objects. ### Response Example ```json { "datas": [ { "type": "text", "content": "Sample data" } ] } ``` ## PasteboardMetadata ### GetSourceDocumentUuid Retrieves the UUID of the source document for the pasteboard metadata. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **sourceDocumentUuid** (string) - The UUID of the source document. ### Response Example ```json { "sourceDocumentUuid": "a1b2c3d4-e5f6-7890-1234-567890abcdef" } ``` ## PasteboardMetadata ### GetVersion Retrieves the version information of the pasteboard metadata. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **version** ([]uint32) - The version as a slice of unsigned integers. ### Response Example ```json { "version": [1, 0, 0] } ``` ``` -------------------------------- ### Clean Up LocalFile Resources Source: https://pkg.go.dev/github.com/sajari/docconv Cleans up all resources associated with a LocalFile. This method must be called when the LocalFile is no longer needed. ```go func (l *LocalFile) Done() ``` -------------------------------- ### HTML and Web Conversion Source: https://pkg.go.dev/github.com/sajari/docconv Functions for processing HTML content and URLs. ```APIDOC ## ConvertHTML ### Description Converts HTML content into plain text. ### Parameters #### Request Body - **r** (io.Reader) - Required - The reader containing the HTML content. - **readability** (bool) - Required - Whether to apply readability processing. ### Response - **string** - The extracted text. - **map[string]string** - Metadata extracted from the HTML. - **error** - Error object if conversion fails. ``` -------------------------------- ### Path_ElementType Definition and Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Enum type for path elements and associated helper methods. ```go type Path_ElementType int32 ``` ```go const ( Path_moveTo Path_ElementType = 1 Path_lineTo Path_ElementType = 2 Path_quadCurveTo Path_ElementType = 3 Path_curveTo Path_ElementType = 4 Path_closeSubpath Path_ElementType = 5 ) ``` ```go func (x Path_ElementType) Enum() *Path_ElementType ``` ```go func (x Path_ElementType) String() string ``` ```go func (x *Path_ElementType) UnmarshalJSON(data []byte) error ``` -------------------------------- ### ComponentInfo Data Structure Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Represents component information with methods to retrieve external references, identifiers, and versioning information. ```APIDOC ## ComponentInfo Methods ### Description Provides accessors for ComponentInfo fields including identifiers, locators, and versioning. ### Methods - GetExternalReferences() []*ComponentExternalReference - GetIdentifier() uint64 - GetIsStoredOutsideObjectArchive() bool - GetLocator() string - GetPreferredLocator() string - GetReadVersion() []uint32 - GetWriteVersion() []uint32 ``` -------------------------------- ### DatabaseDataArchive Structure Definition Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Defines a structure for archiving database data, including its path, display name, and metadata. Used for storing and retrieving archived data. ```go type DatabaseDataArchive struct { Data *Reference `protobuf:"bytes,1,opt,name=data" json:"data,omitempty" AppRelativePath *string `protobuf:"bytes,2,opt,name=app_relative_path" json:"app_relative_path,omitempty" DisplayName *string `protobuf:"bytes,3,req,name=display_name" json:"display_name,omitempty" Length *uint64 `protobuf:"varint,4,opt,name=length" json:"length,omitempty" Hash *uint32 `protobuf:"varint,5,opt,name=hash" json:"hash,omitempty" Sharable *bool `protobuf:"varint,6,req,name=sharable,def=1" json:"sharable,omitempty" XXX_unrecognized []byte `json:"-" } ``` -------------------------------- ### Default HTTP Client Variable Source: https://pkg.go.dev/github.com/sajari/docconv/client DefaultHTTPClient is the default HTTP client used to make requests to docconv HTTP servers. ```go var DefaultHTTPClient = http.DefaultClient ``` -------------------------------- ### PasteboardObject Structure and Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Defines the PasteboardObject structure and provides methods for accessing its various reference-based fields. ```APIDOC ## PasteboardObject ### Structure ```go type PasteboardObject struct { Stylesheet *Reference `protobuf:"bytes,1,opt,name=stylesheet" json:"stylesheet,omitempty" Drawables []*Reference `protobuf:"bytes,2,rep,name=drawables" json:"drawables,omitempty" Styles []*Reference `protobuf:"bytes,3,rep,name=styles" json:"styles,omitempty" Theme *Reference `protobuf:"bytes,4,opt,name=theme" json:"theme,omitempty" WpStorage *Reference `protobuf:"bytes,5,opt,name=wp_storage" json:"wp_storage,omitempty" GuideStorage *Reference `protobuf:"bytes,9,opt,name=guide_storage" json:"guide_storage,omitempty" AppNativeObject *Reference `protobuf:"bytes,6,opt,name=app_native_object" json:"app_native_object,omitempty" IsTextPrimary *bool `protobuf:"varint,7,opt,name=is_text_primary,def=0" json:"is_text_primary,omitempty" IsSmart *bool `protobuf:"varint,8,opt,name=is_smart,def=0" json:"is_smart,omitempty" XXX_unrecognized []byte `json:"-" } ``` ### GetAppNativeObject Retrieves the native object reference for the application. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **appNativeObject** (*Reference) - A reference to the application's native object. ### Response Example ```json { "appNativeObject": { "identifier": "native-obj-123" } } ``` ## PasteboardObject ### GetDrawables Retrieves references to the drawables within the pasteboard object. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **drawables** ([]*Reference) - A list of references to drawable objects. ### Response Example ```json { "drawables": [ { "identifier": "drawable-1" }, { "identifier": "drawable-2" } ] } ``` ## PasteboardObject ### GetGuideStorage Retrieves the reference to the guide storage. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **guideStorage** (*Reference) - A reference to the guide storage. ### Response Example ```json { "guideStorage": { "identifier": "guide-storage-abc" } } ``` ## PasteboardObject ### GetIsSmart Checks if the pasteboard object is considered 'smart'. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **isSmart** (bool) - True if the object is smart, false otherwise. ### Response Example ```json { "isSmart": true } ``` ## PasteboardObject ### GetIsTextPrimary Checks if text is the primary element of the pasteboard object. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **isTextPrimary** (bool) - True if text is primary, false otherwise. ### Response Example ```json { "isTextPrimary": false } ``` ## PasteboardObject ### GetStyles Retrieves references to the styles applied to the pasteboard object. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **styles** ([]*Reference) - A list of references to style objects. ### Response Example ```json { "styles": [ { "identifier": "style-bold" } ] } ``` ## PasteboardObject ### GetStylesheet Retrieves the reference to the stylesheet. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **stylesheet** (*Reference) - A reference to the stylesheet. ### Response Example ```json { "stylesheet": { "identifier": "main-stylesheet" } } ``` ## PasteboardObject ### GetTheme Retrieves the reference to the theme. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **theme** (*Reference) - A reference to the theme. ### Response Example ```json { "theme": { "identifier": "default-theme" } } ``` ## PasteboardObject ### GetWpStorage Retrieves the reference to the WP storage. ### Method GET ### Endpoint N/A (Method of a type) ### Response #### Success Response (200) - **wpStorage** (*Reference) - A reference to the WP storage. ### Response Example ```json { "wpStorage": { "identifier": "wp-storage-xyz" } } ``` ``` -------------------------------- ### DatabaseDataArchive Methods Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Methods for accessing properties of a DatabaseDataArchive object. ```APIDOC ## DatabaseDataArchive Methods ### Description Provides accessors for retrieving metadata from a DatabaseDataArchive instance. ### Methods - GetDisplayName() string: Returns the display name. - GetHash() uint32: Returns the hash value. - GetLength() uint64: Returns the length. - GetSharable() bool: Returns the sharable status. ``` -------------------------------- ### HTMLReadabilityOptions Type Source: https://pkg.go.dev/github.com/sajari/docconv Defines parameters for HTML readability processing, used by the justext package. ```APIDOC ## type HTMLReadabilityOptions ### Description HTMLReadabilityOptions is a type which defines parameters that are passed to the justext package. TODO: Improve this! ### Fields - **LengthLow** (int) - **LengthHigh** (int) - **StopwordsLow** (float64) - **StopwordsHigh** (float64) - **MaxLinkDensity** (float64) - **MaxHeadingDistance** (int) - **ReadabilityUseClasses** (string) ``` -------------------------------- ### Reader Read Method Source: https://pkg.go.dev/github.com/sajari/docconv/snappy Reads decompressed data into the provided buffer. ```go func (r *Reader) Read(p []byte) (int, error) ``` -------------------------------- ### GetAppRelativePath for DatabaseDataArchive Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Retrieves the application-relative path for the archived data. Use this to locate the data within the application's file structure. ```go func (m *DatabaseDataArchive) GetAppRelativePath() string ``` -------------------------------- ### Color Model Type Definitions Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Definitions for Color_ColorModel enum and its associated constants. ```go type Color_ColorModel int32 ``` ```go const ( Color_rgb Color_ColorModel = 1 Color_cmyk Color_ColorModel = 2 Color_white Color_ColorModel = 3 ) ``` -------------------------------- ### PackageMetadata Type Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Contains metadata for a package, including components, data, and version information. ```APIDOC ## PackageMetadata Type ### Description Metadata associated with a package, including information about its components, data, and versioning. ### Methods - `GetComponents() []*ComponentInfo`: Returns a list of component information. - `GetDatas() []*DataInfo`: Returns a list of data information. - `GetLastObjectIdentifier() uint64`: Returns the identifier of the last object. - `GetReadVersion() []uint32`: Returns the read version of the package. - `GetWriteVersion() []uint32`: Returns the write version of the package. - `ProtoMessage()`: Marks the type as a Protocol Buffer message. - `Reset()`: Resets the package metadata to its default state. - `String() string`: Returns a string representation of the package metadata. ``` -------------------------------- ### GetIsStoredOutsideObjectArchive for ComponentInfo Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Checks if a ComponentInfo is stored outside the object archive. Use this to determine storage location and access patterns. ```go func (m *ComponentInfo) GetIsStoredOutsideObjectArchive() bool ``` -------------------------------- ### ComponentExternalReference Structure Source: https://pkg.go.dev/github.com/sajari/docconv/iWork Definition and methods for ComponentExternalReference. ```go type ComponentExternalReference struct { ComponentIdentifier *uint64 `protobuf:"varint,1,req,name=component_identifier" json:"component_identifier,omitempty"` ObjectIdentifier *uint64 `protobuf:"varint,2,opt,name=object_identifier" json:"object_identifier,omitempty"` IsWeak *bool `protobuf:"varint,3,opt,name=is_weak" json:"is_weak,omitempty"` XXX_unrecognized []byte `json:"-"` } ``` ```go func (m *ComponentExternalReference) GetComponentIdentifier() uint64 ``` ```go func (m *ComponentExternalReference) GetIsWeak() bool ``` ```go func (m *ComponentExternalReference) GetObjectIdentifier() uint64 ``` ```go func (*ComponentExternalReference) ProtoMessage() ``` ```go func (m *ComponentExternalReference) Reset() ``` ```go func (m *ComponentExternalReference) String() string ```