### Create WarcFileReader from File Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 NewWarcFileReader creates a WarcFileReader from a filename. It can start reading from a specified offset and be configured with options. See WarcRecordOption. ```go func NewWarcFileReader(filename string, offset int64, opts ...WarcRecordOption) (*WarcFileReader, error) ``` -------------------------------- ### Get Hostname Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal%40v3.1.0 Retrieves the hostname reported by the kernel. Returns 'unknown' if resolution fails. ```go func GetHostName() string ``` -------------------------------- ### Create WarcFileReader from Stream Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 NewWarcFileReaderFromStream creates a WarcFileReader from an io.Reader. It can start reading from a specified offset and be configured with options. The caller is responsible for closing the io.Reader. ```go func NewWarcFileReaderFromStream(r io.Reader, offset int64, opts ...WarcRecordOption) (*WarcFileReader, error) ``` -------------------------------- ### WarcFileReader.NewWarcFileReader Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Creates a new WarcFileReader to read WARC files from a specified filename. It can start reading from a given offset and be configured with options. ```APIDOC ## NewWarcFileReader ### Description NewWarcFileReader creates a new WarcFileReader from the supplied filename. If offset is > 0, the reader will start reading from that offset. The WarcFileReader can be configured with options. See WarcRecordOption. ### Method func NewWarcFileReader(filename string, offset int64, opts ...WarcRecordOption) (*WarcFileReader, error) ### Example ```go // Example usage (assuming WarcRecordOption is defined elsewhere) // reader, err := NewWarcFileReader("example.warc", 0, WithSyntaxErrorPolicy(ErrWarn)) ``` ``` -------------------------------- ### Example WARC Record Unmarshalling Output Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Illustrates the typical output format when unmarshalling a WARC record, including offset, record details, and potential validation errors. ```text Offset: 2, WARC record: version: WARC/1.1, type: warcinfo, id: urn:uuid:e9a0cecc-0221-11e7-adb1-0242ac120008 Validation errors: 1: gowarc: record was found 2 bytes after expected offset 2: block: wrong digest: expected sha1:af4d582b4ffc017d07a947d841e392a821f754f3, computed: sha1:8a936f9fd60d664cf95b1ffb40f1c4093e65bb40 3: too few bytes in end of record marker. Expected "\r\n\r\n", was "" ``` -------------------------------- ### Get Major WARC Version Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Returns the major version number of the WARC specification. ```go func (v *WarcVersion) Major() uint8 ``` -------------------------------- ### Get Minor WARC Version Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Returns the minor version number of the WARC specification. ```go func (v *WarcVersion) Minor() uint8 ``` -------------------------------- ### Get the number of bytes read Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/countingreader Retrieves the total number of bytes that have been read through the Reader so far. ```go func (r *Reader) N() int64 ``` -------------------------------- ### Get Hostname or IP Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal%40v3.1.0 Retrieves the hostname reported by the kernel, falling back to the outbound IP if hostname resolution fails. Returns 'unknown' if both fail. ```go func GetHostNameOrIP() string ``` -------------------------------- ### Get Method for WarcFields Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Retrieves the first value associated with a key from WarcFields. Returns an empty string if the key is not found. Key comparison is case-insensitive. ```go func (wf *WarcFields) Get(key string) string ``` -------------------------------- ### Get Current Time in UTC Source: https://pkg.go.dev/github.com/nlnwa/gowarc/internal/timestamp Returns the current time in UTC. This function is a simple wrapper around time.Now().UTC(). ```go func UTC(t time.Time) time.Time ``` -------------------------------- ### WarcRecordBuilder Interface Definition Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 The WarcRecordBuilder interface allows for the construction of WarcRecord instances. It implements various io.Writer interfaces and provides methods to add headers, set the record type, get the current size, and build the final record with validation. ```go type WarcRecordBuilder interface { io.Writer io.StringWriter io.ReaderFrom io.Closer AddWarcHeader(name string, value string) AddWarcHeaderInt(name string, value int) AddWarcHeaderInt64(name string, value int64) AddWarcHeaderTime(name string, value time.Time) Build() (record WarcRecord, validation []error, err error) Size() int64 SetRecordType(recordType RecordType) } ``` -------------------------------- ### Buffer Configuration Options Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/diskbuffer Provides functions to create options for configuring a Buffer. Use these to set memory limits, temporary directory, and read-only status. ```go func WithMaxMemBytes(size int) Option ``` ```go func WithMaxTotalBytes(size int64) Option ``` ```go func WithMemBufferSizeHint(size int) Option ``` ```go func WithReadOnly(readOnly bool) Option ``` ```go func WithTmpDir(dir string) Option ``` -------------------------------- ### New Buffer Creation Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/diskbuffer Creates and initializes a new Buffer. Use sizeHint as the initial size of the memory buffer. ```go func New(opts ...Option) Buffer ``` -------------------------------- ### Initialize WARC File Writer Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Creates a new WarcFileWriter with optional configurations. Writes WARC records using a pool of independent file writers. ```go func NewWarcFileWriter(opts ...WarcFileWriterOption) *WarcFileWriter ``` -------------------------------- ### Configure Compression Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets if the writer should write gzip compressed WARC files. Defaults to true. ```go func WithCompression(compress bool) WarcFileWriterOption ``` -------------------------------- ### Get Outbound IP Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal%40v3.1.0 Retrieves the preferred outbound IP address of the current node. Returns 'unknown' if resolution fails. ```go func GetOutboundIP() string ``` -------------------------------- ### Configure URL Parser Options Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Apply specific options to the URL parser used within the WARC library. ```go func WithUrlParserOptions(opts ...url.ParserOption) WarcRecordOption ``` -------------------------------- ### WithRecordOptions Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets the options to use for creating WarcInfo records. See WithWarcInfoFunc. ```APIDOC ## func WithRecordOptions(opts ...WarcRecordOption) WarcFileWriterOption ### Description Sets the options to use for creating WarcInfo records. See WithWarcInfoFunc. ``` -------------------------------- ### Buffer Options Source: https://pkg.go.dev/github.com/nlnwa/gowarc/internal/diskbuffer Configuration options for creating and customizing Buffer instances. ```APIDOC ## Buffer Options ### Description The `Option` interface is used to configure the behavior of a `Buffer` when it is created using the `New` function. Several predefined option functions are available. ### Available Options - `WithMaxMemBytes(size int64) Option`: Sets the maximum size of the in-memory portion of the buffer. - `WithMaxTotalBytes(size int64) Option`: Sets the maximum total size of the buffer, including any disk-based storage. - `WithMemBufferSizeHint(size int64) Option`: Provides a hint for the initial size of the memory buffer. - `WithTmpDir(dir string) Option`: Specifies the directory to be used for temporary disk storage. ``` -------------------------------- ### New Buffer Creation Source: https://pkg.go.dev/github.com/nlnwa/gowarc/internal/diskbuffer Creates a new Buffer instance with optional configuration settings. ```APIDOC ## New Buffer Creation ### Description The `New` function creates and initializes a new `Buffer`. It accepts a variable number of `Option` functions to configure the buffer's behavior, such as memory limits and temporary directory usage. ### Function Signature ```go func New(opts ...Option) Buffer ``` ### Parameters - `opts ...Option`: A variadic list of `Option` functions to configure the buffer. ``` -------------------------------- ### Rotate WARC File Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Closes the current file of each worker, ordered after all previously queued requests. This is useful for starting a new WARC file. ```go func (w *WarcFileWriter) Rotate() error ``` -------------------------------- ### Set Record Options for WARC Writer Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Use WithRecordOptions to configure options for creating WarcInfo records. Refer to WithWarcInfoFunc for more details. ```go func WithRecordOptions(opts ...WarcRecordOption) WarcFileWriterOption ``` -------------------------------- ### Initialize WarcRecordBuilder Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Initializes a `WarcRecordBuilder` for creating new WARC records. Implement `io.Writer` for content. Ensure `SetRecordType` or `AddWarcHeader` is called before `Build` if `recordType` is initially zero. Call `Build` when headers and content are complete. ```go func NewRecordBuilder(recordType RecordType, opts ...WarcRecordOption) WarcRecordBuilder ``` -------------------------------- ### String Representation of WARC Version Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Returns a string representation of the WARC version, formatted as 'WARC/1.0' or 'WARC/1.1'. ```go func (v *WarcVersion) String() string ``` -------------------------------- ### Set Before File Creation Hook Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets a function to be called before a new file is created. The function receives the file name of the new file. ```go func WithBeforeFileCreationHook(f func(fileName string) error) WarcFileWriterOption ``` -------------------------------- ### Configure WARC Record Options Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Provides options to configure WARC record validation, marshaling, and unmarshaling behavior. ```go type WarcRecordOption func(*warcRecordOptions) ``` -------------------------------- ### WithNoValidation Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Configures the parser to perform minimal validation for maximum speed and leniency. ```APIDOC ## func WithNoValidation ### Description Sets the parser to do as little validation as possible. This option is for parsing as fast as possible and being as lenient as possible. Settings implied by this option are: * `SyntaxErrorPolicy = ErrIgnore` * `SpecViolationPolicy = ErrIgnore` * `UnknownRecordPolicy = ErrIgnore` * `SkipParseBlock = true` ``` -------------------------------- ### Option for Buffer Temporary Directory Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets the directory to use for temporary files. If not set or empty, the default directory for temporary files (os.TempDir) is used. ```go func WithBufferTmpDir(dir string) WarcRecordOption ``` -------------------------------- ### Option Interface Definition Source: https://pkg.go.dev/github.com/nlnwa/gowarc/internal/diskbuffer The Option interface is used for configuring Buffer creation. It contains unexported methods, indicating it's intended for use with specific option functions. ```go type Option interface { // contains filtered or unexported methods } ``` -------------------------------- ### Set After File Creation Hook Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets a function to be called after a new file is created. The function receives the file name, size, and WARC-Warcinfo-ID of the new file. ```go func WithAfterFileCreationHook(f func(fileName string, size int64, warcInfoId string) error) WarcFileWriterOption ``` -------------------------------- ### WithVersion Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets the WARC version for new records. Defaults to WARC/1.1. ```APIDOC ## func WithVersion ### Description Sets the WARC version to use for new records. ### Parameters * `version` (*WarcVersion) - The WARC version to use. ``` -------------------------------- ### Sprintt with Named Parameters Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal%40v3.1.0 A string formatting function similar to fmt.Sprintf, but it accepts named parameters from a map. Useful for templating strings with dynamic values. ```go params := map[string]any{ "hello": "world", "num": 42, } result := internal.Sprintt("Hello %{hello}s. The answer is %{num}d", params) ``` -------------------------------- ### NewWarcfileName Method Signature Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Method signature for generating a new WARC filename. It returns the directory path and the generated filename. ```go func (g *PatternNameGenerator) NewWarcfileName() (string, string) ``` -------------------------------- ### NewWarcFileWriter Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Creates a new WarcFileWriter with optional configurations. ```APIDOC ## NewWarcFileWriter ### Description Initializes a new WarcFileWriter, which writes WARC records using a pool of independent file writers. Options can be provided to configure its behavior. ### Method func NewWarcFileWriter(opts ...WarcFileWriterOption) *WarcFileWriter ### Options - `WithAddWarcConcurrentToHeader(addConcurrentHeader bool)`: Configures if records written in the same call to Write should have WARC-Concurrent-To headers added for cross-reference. Defaults to false. - `WithAfterFileCreationHook(f func(fileName string, size int64, warcInfoId string) error)`: Sets a function to be called after a new file is created. The function receives the file name, size, and WARC-Warcinfo-ID. - `WithBeforeFileCreationHook(f func(fileName string) error)`: Sets a function to be called before a new file is created. The function receives the file name. - `WithCompressedFileSuffix(suffix string)`: Sets a suffix to be added after the name generated by the WarcFileNameGenerator if compression is on. Defaults to ".gz". - `WithCompression(compress bool)`: Sets if the writer should write gzip compressed WARC files. Defaults to true. - `WithCompressionLevel(gzipLevel int)`: Sets the gzip level (1-9) to use for compression. Defaults to 5. - `WithExpectedCompressionRatio(ratio float64)`: Sets the expected reduction in size when using compression. Defaults to 0.5. - `WithFileNameGenerator(generator WarcFileNameGenerator)`: Sets the WarcFileNameGenerator to use for generating new Warc file names. Defaults to a PatternNameGenerator. - `WithFlush(flush bool)`: Sets if the writer should commit each record to stable storage. Defaults to false. - `WithMarshaler(marshaler Marshaler)`: Sets the Warc record marshaler to use. Defaults to defaultMarshaler. - `WithMaxConcurrentWriters(count int)`: Sets the maximum number of Warc files that can be written simultaneously. Defaults to one. ``` -------------------------------- ### Set Compression Level Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets the gzip compression level (1-9) to use. Defaults to 5. ```go func WithCompressionLevel(gzipLevel int) WarcFileWriterOption ``` -------------------------------- ### WithMemBufferSizeHint Option Function Source: https://pkg.go.dev/github.com/nlnwa/gowarc/internal/diskbuffer WithMemBufferSizeHint creates an Option to provide an initial size hint for the in-memory portion of the buffer. ```go func WithMemBufferSizeHint(size int64) Option ``` -------------------------------- ### Buffer Options Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/diskbuffer Options for configuring a Buffer created by the New function. These options allow customization of memory usage, read-only status, and temporary directory. ```APIDOC ## type Option ```go type Option func(*options) ``` Option configures a Buffer created by New. ### func WithMaxMemBytes ```go func WithMaxMemBytes(size int) Option ``` ### func WithMaxTotalBytes ```go func WithMaxTotalBytes(size int64) Option ``` ### func WithMemBufferSizeHint ```go func WithMemBufferSizeHint(size int) Option ``` ### func WithReadOnly ```go func WithReadOnly(readOnly bool) Option ``` ### func WithTmpDir ```go func WithTmpDir(dir string) Option ``` ``` -------------------------------- ### Set Compressed File Suffix Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets a suffix to be added after the name generated by the WarcFileNameGenerator if compression is on. Defaults to ".gz". ```go func WithCompressedFileSuffix(suffix string) WarcFileWriterOption ``` -------------------------------- ### WarcFileReader Struct Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 WarcFileReader is used to read WARC files. Use NewWarcFileReader to create an instance. ```go type WarcFileReader struct { // contains filtered or unexported fields } ``` -------------------------------- ### WithUrlParserOptions Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Applies specific options to the URL parser used within GoWarc. ```APIDOC ## func WithUrlParserOptions ### Description Applies specific options to the URL parser. ### Parameters * `opts` (...url.ParserOption) - Options to configure the URL parser. ``` -------------------------------- ### GetAll Method for WarcFields Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Retrieves all values associated with a given key from WarcFields. Key comparison is case-insensitive. ```go func (wf *WarcFields) GetAll(name string) []string ``` -------------------------------- ### WithMaxFileSize Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets the maximum size of a WARC file before a new one is created. Defaults to 1 GiB. ```APIDOC ## func WithMaxFileSize(size int64) WarcFileWriterOption ### Description Sets the max size of the Warc file before creating a new one. Defaults to 1 GiB. ``` -------------------------------- ### Write WARC Fields to io.Writer Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Write implements the io.Writer interface for WarcFields, allowing them to be written to an io.Writer. ```go func (wf *WarcFields) Write(w io.Writer) (n int64, err error) ``` -------------------------------- ### Set Max File Size for WARC Writer Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Use WithMaxFileSize to specify the maximum size of a WARC file before a new one is created. Defaults to 1 GiB. ```go func WithMaxFileSize(size int64) WarcFileWriterOption ``` -------------------------------- ### Configure Flush Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets if the writer should commit each record to stable storage. Defaults to false. ```go func WithFlush(flush bool) WarcFileWriterOption ``` -------------------------------- ### WarcFileWriterOption Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Configuration options for WarcFileWriter. ```APIDOC ## WarcFileWriterOption ### Description Options to configure the behavior of `WarcFileWriter`. ### Options #### WithAddWarcConcurrentToHeader ```go func WithAddWarcConcurrentToHeader(addConcurrentHeader bool) WarcFileWriterOption ``` Enables or disables adding the WARC-Concurrent-To header. #### WithAfterFileCreationHook ```go func WithAfterFileCreationHook(f func(fileName string, size int64, warcInfoId string) error) WarcFileWriterOption ``` Registers a hook to be called after a WARC file is created. #### WithBeforeFileCreationHook ```go func WithBeforeFileCreationHook(f func(fileName string) error) WarcFileWriterOption ``` Registers a hook to be called before a WARC file is created. #### WithCompressedFileSuffix ```go func WithCompressedFileSuffix(suffix string) WarcFileWriterOption ``` Sets the suffix for compressed WARC files. #### WithCompression ```go func WithCompression(compress bool) WarcFileWriterOption ``` Enables or disables compression for WARC files. #### WithCompressionLevel ```go func WithCompressionLevel(gzipLevel int) WarcFileWriterOption ``` Sets the gzip compression level. #### WithExpectedCompressionRatio ```go func WithExpectedCompressionRatio(ratio float64) WarcFileWriterOption ``` Sets the expected compression ratio. #### WithFileNameGenerator ```go func WithFileNameGenerator(generator WarcFileNameGenerator) WarcFileWriterOption ``` Provides a custom generator for WARC file names. #### WithFlush ```go func WithFlush(flush bool) WarcFileWriterOption ``` Enables or disables flushing writes immediately. #### WithMarshaler ```go func WithMarshaler(marshaler Marshaler) WarcFileWriterOption ``` Sets a custom marshaler for WARC records. #### WithMaxConcurrentWriters ```go func WithMaxConcurrentWriters(count int) WarcFileWriterOption ``` Sets the maximum number of concurrent writers. #### WithMaxFileSize ```go func WithMaxFileSize(size int64) WarcFileWriterOption ``` Sets the maximum size for a WARC file before rotation. #### WithOpenFileSuffix ```go func WithOpenFileSuffix(suffix string) WarcFileWriterOption ``` Sets the suffix for open WARC files. #### WithRecordOptions ```go func WithRecordOptions(opts ...WarcRecordOption) WarcFileWriterOption ``` Applies default record options to all records written. #### WithSegmentation ```go func WithSegmentation() WarcFileWriterOption ``` Enables WARC file segmentation. #### WithWarcInfoFunc ```go func WithWarcInfoFunc(f func(recordBuilder WarcRecordBuilder) error) WarcFileWriterOption ``` Provides a function to customize the WARC-Info record. ``` -------------------------------- ### Configure WARC Concurrent Header Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Configures if records written in the same call to Write should have WARC-Concurrent-To headers added for cross-reference. Defaults to false. ```go func WithAddWarcConcurrentToHeader(addConcurrentHeader bool) WarcFileWriterOption ``` -------------------------------- ### Enable Segmentation for WARC Writer Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Use WithSegmentation to enable segmentation for large WARC records. Defaults to false. ```go func WithSegmentation() WarcFileWriterOption ``` -------------------------------- ### New Buffer Function Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/diskbuffer Creates and initializes a new Buffer. Options can be provided to configure its behavior, such as memory limits and read-only mode. ```APIDOC ## func New ```go func New(opts ...Option) Buffer ``` New creates and initializes a new Buffer using sizeHint as the initial size of the memory buffer ``` -------------------------------- ### GetInt64 Method for WarcFields Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Retrieves the first value associated with a key and converts it to an int64. Returns an error if conversion fails. Key comparison is case-insensitive. ```go func (wf *WarcFields) GetInt64(name string) (int64, error) ``` -------------------------------- ### Buffer Interface Definition Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/diskbuffer Defines the Buffer interface, which combines multiple standard Go io interfaces and adds specific methods for buffer manipulation. ```go type Buffer interface { io.Reader io.ReaderAt io.Writer io.ByteReader io.ByteWriter io.Closer io.ReaderFrom io.StringWriter io.WriterTo io.Seeker ReadBytes(delim byte) (line []byte, err error) Peek(n int) (p []byte, err error) Size() int64 } ``` -------------------------------- ### Sprintt Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal%40v3.1.0 A string formatting function similar to fmt.Sprintf, but it accepts named parameters from a map. ```APIDOC ## Sprintt ### Description Sprintt is like fmt.Sprintf, but accepts named parameters from a map. ### Signature ```go func Sprintt(format string, params map[string]any) string ``` ### Example ```go params := map[string]any{ "hello": "world", "num": 42, } result := internal.Sprintt("Hello %{hello}s. The answer is %{num}d", params) // result will be: 'Hello world. The answer is 42' ``` ``` -------------------------------- ### Buffer Interface Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/diskbuffer The Buffer interface defines the methods available for interacting with the disk buffer, including standard I/O operations and buffer-specific methods like Peek and Size. ```APIDOC ## type Buffer ```go type Buffer interface { io.Reader io.ReaderAt io.Writer io.ByteReader io.ByteWriter io.Closer io.ReaderFrom io.StringWriter io.WriterTo io.Seeker ReadBytes(delim byte) (line []byte, err error) Peek(n int) (p []byte, err error) Size() int64 } ``` ``` -------------------------------- ### WarcFileReader.NewWarcFileReaderFromStream Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Creates a new WarcFileReader to read WARC data from an io.Reader. The reader can be configured with options, and the caller is responsible for closing the io.Reader. ```APIDOC ## NewWarcFileReaderFromStream ### Description NewWarcFileReaderFromStream creates a new WarcFileReader from the supplied io.Reader. The WarcFileReader can be configured with options. See WarcRecordOption. It is the responsibility of the caller to close the io.Reader. ### Method func NewWarcFileReaderFromStream(r io.Reader, offset int64, opts ...WarcRecordOption) (*WarcFileReader, error) ``` -------------------------------- ### String representation of WARC Fields Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Provides a string representation of the WARCFields. ```go func (wf *WarcFields) String() string ``` -------------------------------- ### Create a new countingreader.Reader Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/countingreader Creates a new Reader that wraps an io.Reader and counts the bytes read through it. ```go func New(r io.Reader) *Reader ``` -------------------------------- ### WithMaxTotalBytes Option Function Source: https://pkg.go.dev/github.com/nlnwa/gowarc/internal/diskbuffer WithMaxTotalBytes creates an Option to set the maximum total size of the buffer, including both in-memory and on-disk data. ```go func WithMaxTotalBytes(size int64) Option ``` -------------------------------- ### WithOpenFileSuffix Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets a suffix to be added to the file name while the file is open for writing. The suffix is automatically removed when the file is closed. Defaults to ".open". ```APIDOC ## func WithOpenFileSuffix(suffix string) WarcFileWriterOption ### Description Sets a suffix to be added to the file name while the file is open for writing. The suffix is automatically removed when the file is closed. Defaults to ".open". ``` -------------------------------- ### Option Type Definition Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/diskbuffer Defines the Option type, which is a function used to configure a Buffer created by the New function. ```go type Option func(*options) ``` -------------------------------- ### WARC Version Definitions Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Defines constants for WARC versions 1.0 and 1.1. Used to specify or check the WARC version. ```go var ( // WARC versions V1_0 = &WarcVersion{id: 1, txt: "1.0", major: 1, minor: 0} // WARC 1.0 V1_1 = &WarcVersion{id: 2, txt: "1.1", major: 1, minor: 1} // WARC 1.1 ) ``` -------------------------------- ### WithWarcInfoFunc Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets a warcinfo-record generator function to be called for every new WARC-file created. The function receives a WarcRecordBuilder which is prepopulated with WARC-Record-ID, WARC-Type, WARC-Date and Content-Type. After the submitted function returns, Content-Length and WARC-Block-Digest fields are calculated. When this option is set, records written to the warcfile will have the WARC-Warcinfo-ID automatically set to point to the generated warcinfo record. Use WithRecordOptions to modify the options used to create the WarcInfo record. Defaults nil (no generation of warcinfo record). ```APIDOC ## func WithWarcInfoFunc(f func(recordBuilder WarcRecordBuilder) error) WarcFileWriterOption ### Description Sets a warcinfo-record generator function to be called for every new WARC-file created. The function receives a WarcRecordBuilder which is prepopulated with WARC-Record-ID, WARC-Type, WARC-Date and Content-Type. After the submitted function returns, Content-Length and WARC-Block-Digest fields are calculated. When this option is set, records written to the warcfile will have the WARC-Warcinfo-ID automatically set to point to the generated warcinfo record. Use WithRecordOptions to modify the options used to create the WarcInfo record. Defaults nil (no generation of warcinfo record). ``` -------------------------------- ### WithMaxMemBytes Option Function Source: https://pkg.go.dev/github.com/nlnwa/gowarc/internal/diskbuffer WithMaxMemBytes creates an Option to set the maximum amount of memory the buffer can use before spilling to disk. ```go func WithMaxMemBytes(size int64) Option ``` -------------------------------- ### New Function Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/countingreader Creates a new Reader that wraps an existing io.Reader and counts the bytes read. ```APIDOC ## Function: New ```go func New(r io.Reader) *Reader ``` NewReader makes a new Reader that counts the bytes read through it. ``` -------------------------------- ### Timestamp Layout Constant Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/timestamp Defines the layout string for 14-digit timestamps. ```go const Layout14 = "20060102150405" ``` -------------------------------- ### WarcVersion Structure Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Represents a WARC specification version. Supports WARC 1.0 and 1.1 for record creation. During parsing, it reflects the version found in the record. ```go type WarcVersion struct { // contains filtered or unexported fields } ``` -------------------------------- ### NewRecordBuilder Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Initializes a WarcRecordBuilder used for creating a new WARC record. The builder implements io.Writer for adding the content block. Record type and headers must be set before building. ```APIDOC ## func NewRecordBuilder ### Description Initializes a WarcRecordBuilder used for creating a new record. WarcRecordBuilder implements io.Writer for adding the content block. recordType might be 0, but then SetRecordType or AddWarcHeader(WarcType, "myRecordType") must be called before Build is called. When finished with adding headers and writing content, call Build on the WarcRecordBuilder to create a WarcRecord. ### Signature ```go func NewRecordBuilder(recordType RecordType, opts ...WarcRecordOption) WarcRecordBuilder ``` ### Example ``` WARC record: version: WARC/1.1, type: response, id: urn:uuid:e9a0cecc-0221-11e7-adb1-0242ac120008 ``` ``` -------------------------------- ### Enable Strict WARC Validation Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Configure the parser to fail on the first error or specification violation. This sets SyntaxErrorPolicy, SpecViolationPolicy, and UnknownRecordPolicy to ErrFail, and SkipParseBlock to false. ```go func WithStrictValidation() WarcRecordOption ``` -------------------------------- ### Option for Default Digest Algorithm Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets the algorithm to use for digest generation. Valid values are 'md5', 'sha1', 'sha256', and 'sha512'. Defaults to 'sha256'. ```go func WithDefaultDigestAlgorithm(defaultDigestAlgorithm string) WarcRecordOption ``` -------------------------------- ### Set WARC Version for New Records Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Specify the WARC version to be used when creating new records. Defaults to WARC/1.1. ```go func WithVersion(version *WarcVersion) WarcRecordOption ``` -------------------------------- ### Set Custom WARC Info Generator Function Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Use WithWarcInfoFunc to provide a function that generates WARC info records for each new WARC file. This function receives a WarcRecordBuilder and populates fields like WARC-Record-ID, WARC-Type, WARC-Date, and Content-Type. Use WithRecordOptions to adjust the WarcInfo record creation. Defaults to nil (no generation). ```go func WithWarcInfoFunc(f func(recordBuilder WarcRecordBuilder) error) WarcFileWriterOption ``` -------------------------------- ### GetTime Method for WarcFields Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Retrieves the first value associated with a key and converts it to a time.Time object, expecting RFC 3339 format. Returns an error if conversion fails. Key comparison is case-insensitive. ```go func (wf *WarcFields) GetTime(name string) (time.Time, error) ``` -------------------------------- ### WarcRecordOption Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Options for configuring WARC records during creation or reading. ```APIDOC ## WarcRecordOption ### Description Options that can be applied when creating or processing WARC records. ### Options #### WithAddMissingContentLength ```go func WithAddMissingContentLength(addMissingContentLength bool) WarcRecordOption ``` Enables or disables the automatic addition of the Content-Length header if missing. #### WithAddMissingDigest ```go func WithAddMissingDigest(addMissingDigest bool) WarcRecordOption ``` Enables or disables the automatic addition of digest headers if missing. #### WithAddMissingRecordId ```go func WithAddMissingRecordId(addMissingRecordId bool) WarcRecordOption ``` Enables or disables the automatic addition of a WARC-Record-ID header if missing. #### WithBlockErrorPolicy ```go func WithBlockErrorPolicy(policy ErrorPolicy) WarcRecordOption ``` Sets the policy for handling errors encountered within record blocks. #### WithBufferMaxMemBytes ```go func WithBufferMaxMemBytes(size int64) WarcRecordOption ``` Sets the maximum memory size in bytes for in-memory buffering. #### WithBufferTmpDir ```go func WithBufferTmpDir(dir string) WarcRecordOption ``` Sets the temporary directory to use for buffering when memory limits are exceeded. ``` -------------------------------- ### Option for Buffer Max Memory Bytes Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Sets the maximum amount of memory a buffer is allowed to use before overflowing to disk. ```go func WithBufferMaxMemBytes(size int64) WarcRecordOption ``` -------------------------------- ### N Method Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/countingreader Returns the total number of bytes that have been read through the Reader so far. ```APIDOC ## Method: (*Reader) N ```go func (r *Reader) N() int64 ``` N gets the number of bytes that have been read so far. ``` -------------------------------- ### GetHostName Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal%40v3.1.0 Retrieves the hostname reported by the kernel. Returns 'unknown' if resolution fails. ```APIDOC ## GetHostName ### Description GetHostName returns the hostname reported by the kernel. If resolution fails, 'unknown' is returned. ### Signature ```go func GetHostName() string ``` ``` -------------------------------- ### WarcFileWriter Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Provides methods for writing WARC files, including options for compression, segmentation, and custom file naming. ```APIDOC ## WarcFileWriter ### Description Represents a writer for WARC files, enabling the creation and management of WARC archives. ### Functions #### NewWarcFileWriter ```go func NewWarcFileWriter(opts ...WarcFileWriterOption) *WarcFileWriter ``` Creates a new `WarcFileWriter` with optional configuration. ### Methods #### Close ```go func (w *WarcFileWriter) Close() error ``` Closes the WARC file writer, flushing any remaining data. #### Rotate ```go func (w *WarcFileWriter) Rotate() error ``` Rotates the current WARC file, starting a new one. #### String ```go func (w *WarcFileWriter) String() string ``` Returns a string representation of the writer. #### Write ```go func (w *WarcFileWriter) Write(records ...WarcRecord) []WriteResponse ``` Writes one or more WARC records to the file. ``` -------------------------------- ### NewMarshaler Function Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Creates a new Marshaler instance. ```APIDOC ## NewMarshaler Function ### Description `NewMarshaler` is a constructor function that creates and returns a new instance of the `Marshaler` interface. This allows for the serialization of WARC records. ### Returns - `Marshaler`: An implementation of the `Marshaler` interface. ``` -------------------------------- ### WarcFields.String Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Returns a string representation of the WARC fields. ```APIDOC ## WarcFields.String ### Description Returns a string representation of the WARC fields. ### Method func (wf *WarcFields) String() string ``` -------------------------------- ### GetInt Method for WarcFields Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Retrieves the first value associated with a key and converts it to an int. Returns an error if conversion fails. Key comparison is case-insensitive. ```go func (wf *WarcFields) GetInt(key string) (int, error) ``` -------------------------------- ### WarcRecordBuilder Interface Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 The WarcRecordBuilder interface provides methods for constructing WARC records, including writing content, adding headers, and building the final record. ```APIDOC ## type WarcRecordBuilder interface ### Description Provides methods for constructing WARC records. ### Methods - **Write(p []byte)** (n int, err error): Writes data to the record builder (implements io.Writer). - **WriteString(s string)** (n int, err error): Writes a string to the record builder (implements io.StringWriter). - **ReadFrom(r io.Reader)** (n int64, err error): Reads data from an io.Reader into the record builder (implements io.ReaderFrom). - **Close()** (err error): Closes the record builder and releases resources (implements io.Closer). - **AddWarcHeader(name string, value string)**: Adds a WARC header field. - **AddWarcHeaderInt(name string, value int)**: Adds a WARC header field with an integer value. - **AddWarcHeaderInt64(name string, value int64)**: Adds a WARC header field with a 64-bit integer value. - **AddWarcHeaderTime(name string, value time.Time)**: Adds a WARC header field with a time value. - **Build()** (record WarcRecord, validation []error, err error): Builds and returns the final WarcRecord, along with any validation errors and a potential build error. - **Size()** (int64): Returns the current size of the record being built. ``` -------------------------------- ### Create a new limited countingreader.Reader Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3/internal/countingreader Creates a new Reader that wraps an io.Reader and counts bytes. Reading stops after maxBytes, returning io.EOF. ```go func NewLimited(r io.Reader, maxBytes int64) *Reader ``` -------------------------------- ### WarcFileReader Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Provides methods for reading WARC files sequentially or from a specific offset. ```APIDOC ## WarcFileReader ### Description Represents a reader for WARC files, allowing iteration over records. ### Functions #### NewWarcFileReader ```go func NewWarcFileReader(filename string, offset int64, opts ...WarcRecordOption) (*WarcFileReader, error) ``` Creates a new `WarcFileReader` from a file path. #### NewWarcFileReaderFromStream ```go func NewWarcFileReaderFromStream(r io.Reader, offset int64, opts ...WarcRecordOption) (*WarcFileReader, error) ``` Creates a new `WarcFileReader` from an `io.Reader`. ### Methods #### Close ```go func (wf *WarcFileReader) Close() error ``` Closes the WARC file reader. #### Next ```go func (wf *WarcFileReader) Next() (Record, error) ``` Reads the next WARC record from the file. #### Records ```go func (wf *WarcFileReader) Records() iter.Seq2[Record, error] ``` Returns a sequence of WARC records and errors. ``` -------------------------------- ### PatternNameGenerator.NewWarcfileName Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Generates a new WARC filename based on the configured pattern. It returns the directory path and the generated filename. ```APIDOC ## PatternNameGenerator.NewWarcfileName ### Description Generates a new WARC filename based on the configured pattern. It returns the directory path and the generated filename. ### Method func (*PatternNameGenerator) NewWarcfileName() (string, string) ### Parameters This method does not accept any parameters. ### Response - **string**: The directory path for the WARC file (can be empty). - **string**: The generated WARC filename. ``` -------------------------------- ### WarcFileReader.Next Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Reads the next WARC record from the file. It returns the parsed record, its offset and size, and any validation findings. Error handling depends on the configured ErrorPolicy. ```APIDOC ## WarcFileReader.Next ### Description Next reads the next Record from the WarcFileReader. The returned Record contains the parsed WarcRecord, its byte offset and size within the file, and any non-fatal validation findings. The returned values depend on the ErrorPolicy options set on the WarcFileReader: * ErrIgnore: errors are suppressed. A Record is returned without any validation. An error is only returned if the file is so badly formatted that nothing meaningful can be parsed. * ErrWarn: a Record is returned. Non-fatal validation findings are collected in the [Record.Validation] slice, which should be inspected by the caller. * ErrFail: the first validation failure is returned as err, and [Record.WarcRecord] may be nil. * Mixed Policies: different ErrorPolicy values may be set per error category with WithSyntaxErrorPolicy, WithSpecViolationPolicy and WithUnknownRecordTypePolicy. The return values of Next are a mix of the above based on the configured policies. When at end of file, [Record.WarcRecord] is nil and err is io.EOF. ### Method func (wf *WarcFileReader) Next() (Record, error) ``` -------------------------------- ### New Function Source: https://pkg.go.dev/github.com/nlnwa/gowarc/internal/countingreader Creates a new CountingReader that wraps an existing io.Reader and counts the bytes read. ```APIDOC ## func New ```go func New(r io.Reader) *Reader ``` NewReader makes a new Reader that counts the bytes read through it. ``` -------------------------------- ### WarcRecordBuilder Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Helper for constructing WARC records. ```APIDOC ## WarcRecordBuilder ### Description A builder pattern for creating `WarcRecord` instances. ### Functions #### NewRecordBuilder ```go func NewRecordBuilder(recordType RecordType, opts ...WarcRecordOption) WarcRecordBuilder ``` Creates a new `WarcRecordBuilder` for a given record type with optional configurations. ``` -------------------------------- ### WarcRecordOption Functions Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 These functions provide options to configure the WarcRecordBuilder, affecting aspects like header calculation, digest generation, and error handling. ```APIDOC ## WarcRecordOption ### Description WarcRecordOption configures validation, marshaling and unmarshaling of WARC records. ### Signature ```go type WarcRecordOption func(*warcRecordOptions) ``` ``` ```APIDOC ## func WithAddMissingContentLength ### Description Sets if missing Content-Length header should be calculated. When creating records with NewRecordBuilder, missing Content-Length is always set. This option primarily affects parsing/unmarshalling behavior. ### Signature ```go func WithAddMissingContentLength(addMissingContentLength bool) WarcRecordOption ``` ### Defaults `false` ``` ```APIDOC ## func WithAddMissingDigest ### Description Sets if missing Block digest and eventually Payload digest header fields should be calculated. Only digest fields are controlled by this option. Record ID and Content-Length are always set for records created with NewRecordBuilder when missing. ### Signature ```go func WithAddMissingDigest(addMissingDigest bool) WarcRecordOption ``` ### Defaults `false` ``` ```APIDOC ## func WithAddMissingRecordId ### Description Sets if missing WARC-Record-ID header should be generated. When creating records with NewRecordBuilder, missing WARC-Record-ID is always generated. This option primarily affects parsing/unmarshalling behavior. ### Signature ```go func WithAddMissingRecordId(addMissingRecordId bool) WarcRecordOption ``` ### Defaults `false` ``` ```APIDOC ## func WithBlockErrorPolicy ### Description Sets the policy for handling errors in block parsing. For most records this is the content fetched from the original source and errors here should be ignored. ### Signature ```go func WithBlockErrorPolicy(policy ErrorPolicy) WarcRecordOption ``` ### Defaults `ErrIgnore` ``` ```APIDOC ## func WithBufferMaxMemBytes ### Description Sets the maximum amount of memory a buffer is allowed to use before overflowing to disk. ### Signature ```go func WithBufferMaxMemBytes(size int64) WarcRecordOption ``` ### Defaults `1 MiB` ``` ```APIDOC ## func WithBufferTmpDir ### Description Sets the directory to use for temporary files. If not set or dir is the empty string then the default directory for temporary files is used (see os.TempDir). ### Signature ```go func WithBufferTmpDir(dir string) WarcRecordOption ``` ``` ```APIDOC ## func WithDefaultDigestAlgorithm ### Description Sets which algorithm to use for digest generation. Valid values: 'md5', 'sha1', 'sha256' and 'sha512'. ### Signature ```go func WithDefaultDigestAlgorithm(defaultDigestAlgorithm string) WarcRecordOption ``` ### Defaults `sha256` ``` ```APIDOC ## func WithDefaultDigestEncoding ### Description Sets which encoding to use for digest generation. Valid values: Base16, Base32 and Base64. Note: Base64 may violate strict WARC digest-value token grammar because Base64 output can contain '/' characters. Generated Base64 digest values are encoded without padding so no '=' characters will be present. By default, the spec-recommended encoding per algorithm is used: SHA-1 uses uppercase Base32, all others use lowercase Base16. ### Signature ```go func WithDefaultDigestEncoding(defaultDigestEncoding DigestEncoding) WarcRecordOption ``` ``` ```APIDOC ## func WithFixContentLength ### Description Sets if a ContentLength header with value which do not match the actual content length should be set to the real value. This will not have any impact if SpecViolationPolicy is ErrIgnore. ### Signature ```go func WithFixContentLength(fixContentLength bool) WarcRecordOption ``` ### Defaults `false` ``` ```APIDOC ## func WithFixDigest ### Description Sets if a BlockDigest header or a PayloadDigest header with a value which do not match the actual content should be recalculated. This will not have any impact if SpecViolationPolicy is ErrIgnore. ### Signature ```go func WithFixDigest(fixDigest bool) WarcRecordOption ``` ### Defaults `false` ``` ```APIDOC ## func WithFixSyntaxErrors ### Description Sets if an attempt to fix syntax errors should be done when those are detected. This will not have any impact if SyntaxErrorPolicy is ErrIgnore. ### Signature ```go func WithFixSyntaxErrors(fixSyntaxErrors bool) WarcRecordOption ``` ### Defaults `false` ``` -------------------------------- ### AddTime Method for WarcFields Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Adds a time.Time key-value pair to WarcFields, converting the time to RFC 3339 format. Key comparison is case-insensitive. ```go func (wf *WarcFields) AddTime(name string, value time.Time) ``` -------------------------------- ### Well-Known Revisit Profile Constants Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Defines constants for standard WARC revisit profiles. Use these when creating or interpreting revisit records. ```go const ( // Well known revisit profiles ProfileIdenticalPayloadDigestV1_1 = "http://netpreserve.org/warc/1.1/revisit/identical-payload-digest" ProfileServerNotModifiedV1_1 = "http://netpreserve.org/warc/1.1/revisit/server-not-modified" ProfileIdenticalPayloadDigestV1_0 = "http://netpreserve.org/warc/1.0/revisit/identical-payload-digest" ProfileServerNotModifiedV1_0 = "http://netpreserve.org/warc/1.0/revisit/server-not-modified" ) ``` -------------------------------- ### Set Open File Suffix for WARC Writer Source: https://pkg.go.dev/github.com/nlnwa/gowarc/v3 Use WithOpenFileSuffix to add a suffix to the filename while a WARC file is open. The suffix is removed upon closing. Defaults to ".open". ```go func WithOpenFileSuffix(suffix string) WarcFileWriterOption ```