My Saved Templates

If you have several PDFs with ``` -------------------------------- ### Revision and Alternate Extraction Guidance Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html Provides guidance to users on how to revise their cell selections or choose an alternate extraction method if the initial data extraction is incorrect. ```HTML

Is the extracted data incorrect?

You can revise your selected cells or try an alternate extraction method.

Revise Selected Cells

Data has been extracted from the cells you selected in the previous step. You can revise your selection(s) to add or remove cells.

``` -------------------------------- ### HTML Structure for About Tabula Section Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html This HTML snippet defines the 'About Tabula' section, providing information about the project's purpose, target audience, technical details (text-based PDFs vs. scanned), security considerations, user base, credits, and design information. It uses standard HTML tags for content presentation. ```html

About Tabula

Tabula is a tool for liberating data tables trapped inside PDF files.

Tabula was created by journalists for journalists and anyone else working with data locked away in PDFs. Tabula will always be free and open source.

If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux.

Caveat: Tabula only works on text-based PDFs, not scanned documents. If you can click-and-drag to select text in your table in a PDF viewer (even if the output is unorganized trash), then your PDF is text-based and Tabula should work.

Security Concerns? Tabula is designed with security in mind. Your PDF and the extracted data never touch the net -- when you use Tabula, as long as your browser's URL bar says "localhost" or "127.0.0.1", all processing takes place on your local machine. Tabula does download a list of Tabula versions from our server to alert you if Tabula has been updated (and we use hits to that list to count how often Tabula is being used); it also downloads a few badges and assets from the web.

Who Uses Tabula?

Tabula is used to power investigative reporting at news organizations of all sizes, including ProPublica, The Times of London, Foreign Policy, La Nación (Argentina) and the St. Paul (MN) Pioneer Press.

Grassroots organizations like SchoolCuts.org rely on Tabula to turn clunky documents into human-friendly public resources.

And researchers of all kinds use Tabula to turn PDF reports into Excel spreadsheets, CSVs, and JSON files for use in analysis and database applications.

Credits

Tabula was created by Manuel Aristarán, Mike Tigas and Jeremy B. Merrill with the support of ProPublica, La Nación DATA, Knight-Mozilla OpenNews, The New York Times, Northwestern University Knight Lab, The Knight Foundation, and The Shuttleworth Foundation. Tabula was designed by Jason Das.

``` -------------------------------- ### JavaScript File Upload and Table Utilities Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html This snippet includes JavaScript functions for handling file uploads using Bootstrap, sorting tables with jQuery Tablesorter, and managing loading indicators with Spin.js. It's designed for interactive user interfaces. ```javascript nestedscript type="text/javascript" src="js/vendor/upload-group.js" nestedscript type="text/javascript" src="js/vendor/jquery.tablesorter.min.js" nestedscript type="text/javascript" src="js/vendor/spin.min.js" ``` -------------------------------- ### Tabula Template Library Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html Displays saved Tabula templates with options to remove and download them. Includes a form for importing new templates. ```html

Template Name	Selection Count	Page Count	Date Added	Remove	Download

Import one or more Tabula Templates

Once you save a Tabula Template, it'll appear here.

``` -------------------------------- ### Run Tabula JAR Source: https://github.com/tabulapdf/tabula/blob/master/build/dist-README.txt Command to run the Tabula JAR file from the terminal. This command specifies encoding, memory allocation, and the JAR file to execute. It also shows how to change the default port. ```shell java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -jar tabula.jar ``` ```shell java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -Dwarbler.port=9999 -jar tabula.jar ``` ```shell java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -Dtabula.openBrowser=true -jar tabula.jar ``` -------------------------------- ### Build WAR file Source: https://github.com/tabulapdf/tabula/blob/master/README.md Command to build a WAR file, which is a prerequisite for building packaged applications on different platforms. ```bash WEBSERVER_VERSION=9.4.31.v20200723 MAVEN_REPO=https://repo1.maven.org/maven2 rake war ``` -------------------------------- ### Tabula-Java CLI Usage Source: https://github.com/tabulapdf/tabula/blob/master/README.md Demonstrates how to use the tabula-java command-line interface for automating PDF table extraction. This is part of incorporating Tabula into JVM projects. ```APIDOC tabula-java CLI: Usage: Incorporate tabula-java JAR into JVM projects (Java, Scala, Clojure) for table extraction. Refer to the tabula-java repository for detailed CLI usage and script automation instructions. ``` -------------------------------- ### Run Tabula JAR (Linux/Other) Source: https://github.com/tabulapdf/tabula/blob/master/README.md Executes the Tabula JAR file from the command line. This method is suitable for Linux and other platforms where a JAR file can be run. It includes options for encoding, memory allocation, and port configuration. ```java java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -jar tabula.jar ``` ```java java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -Dtabula.openBrowser=true -jar tabula.jar ``` ```java java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -Dwarbler.port=9999 -jar tabula.jar ``` -------------------------------- ### JavaScript Libraries for Tabula Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html This section lists essential JavaScript libraries used by the Tabula project for various functionalities, including UI elements, file handling, and potentially data manipulation. ```HTML ``` -------------------------------- ### Error Handling for Failed Uploads Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html This section displays information about failed uploads, including the original filename, failure message, and any associated warnings. It provides a link to the original PDF file. ```HTML

Failed Uploads

<%= original_filename %>: <%= failure_message %>

<%= filename %>

<%= message %>

Warning: <%= warnings[warning] %>

``` -------------------------------- ### Template Management and Selection Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html Provides functionality for managing and applying templates to PDF extractions. This includes saving current selections as a template and loading existing templates. ```HTML <%= original_filename %> ``` -------------------------------- ### Troubleshooting Tabula on Mac (Gatekeeper) Source: https://github.com/tabulapdf/tabula/blob/master/README.md Provides steps to resolve the 'Tabula is damaged and can't be opened' error on macOS by adjusting Gatekeeper settings. This involves right-clicking the application and selecting 'Open' to bypass security warnings for unidentified developers. ```APIDOC Gatekeeper Troubleshooting (Mac): Issue: "Tabula is damaged and can't be opened" on Mac OS X 10.8 or later. Solution: 1. Right-click on Tabula.app. 2. Select 'Open' from the context menu. 3. Confirm opening the application from an unidentified developer. Note: This action remembers your choice and prevents future prompts. Refer to OS X Gatekeeper documentation for more details. ``` -------------------------------- ### Tabula Template Upload Form Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html Handles the import of Tabula templates via a file upload form, specifying the target URL and accepted file types. ```javascript ``` -------------------------------- ### R Bindings for Tabula-Java (tabulizer) Source: https://github.com/tabulapdf/tabula/blob/master/README.md Provides information on the 'tabulizer' package, which offers R bindings for tabula-java. This allows R users to leverage Tabula's PDF table extraction capabilities. ```APIDOC R Bindings (tabulizer): Description: Community-supported R bindings for tabula-java. Repository: https://github.com/leeper/tabulizer Usage: Integrate tabula-java functionality within R projects. ``` -------------------------------- ### Extraction Method Selection Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html Allows users to choose between different data extraction methods (Stream and Lattice) to improve accuracy when data is not mapped correctly. ```HTML

The current preview uses the Stream extraction method. If the data is not mapped to the correct cells, try the Lattice method instead.

> New version! Tabula <%= new_release.name %> is available (you have <%= api_version %>)

<%= notification.name %> <%= notification.body %>

``` -------------------------------- ### Data Export Form Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html Handles the submission of extracted data for export. It includes options for selecting the export format (CSV, TSV, JSON, etc.) and triggers the download process. ```HTML ``` -------------------------------- ### Python Bindings for Tabula-Java (tabula-py) Source: https://github.com/tabulapdf/tabula/blob/master/README.md Introduces the 'tabula-py' package, offering Python bindings for tabula-java. This allows Python developers to utilize Tabula's PDF table extraction features. ```APIDOC Python Bindings (tabula-py): Description: Community-supported Python bindings for tabula-java. Repository: https://github.com/chezou/tabula-py Usage: Integrate tabula-java functionality within Python projects. ``` -------------------------------- ### File Upload Form Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html This HTML form is used for uploading PDF files to the Tabula application. It utilizes Bootstrap for styling and includes input fields for file selection and submission. It also handles potential error messages and progress updates. ```HTML

Import one or more PDFs

Imported PDFs

File Name	Size	Pages	Date Added	Remove	Process

If you have several PDFs with the same layout, you can select the appropriate regions once, then save the selections as a Tabula Template from the Select Tables page. If someone has shared a template with you, you can upload

``` -------------------------------- ### Progress Bar Display Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html This code snippet illustrates how the upload progress is visually represented using an HTML progress bar. It dynamically updates the width of the progress bar based on the completion percentage. ```HTML

``` -------------------------------- ### Node.js Bindings for Tabula-Java (tabula-js) Source: https://github.com/tabulapdf/tabula/blob/master/README.md Details the 'tabula-js' package, which provides Node.js bindings for tabula-java. This enables JavaScript developers to use Tabula for PDF table extraction in Node.js environments. ```APIDOC Node.js Bindings (tabula-js): Description: Community-supported Node.js bindings for tabula-java. Repository: https://github.com/ezodude/tabula-js Usage: Integrate tabula-java functionality within Node.js projects. ``` -------------------------------- ### Troubleshooting Tabula Encoding Issues (Windows) Source: https://github.com/tabulapdf/tabula/blob/master/README.md Details how to fix 'Encoding::CompatibilityError: incompatible character encodings' on Windows. This involves changing the Command Prompt's codepage to Unicode (65001) before running Tabula. ```APIDOC Encoding Troubleshooting (Windows): Issue: org.jruby.exceptions.RaiseException: (Encoding::CompatibilityError) incompatible character encodings. Solution: 1. Open Command Prompt. 2. Navigate to Tabula's directory (e.g., `cd C:\Users\Username\Downloads`). 3. Change codepage: `chcp 65001`. 4. Run Tabula: `tabula.exe`. Note: These commands only affect the current terminal session and Tabula's execution. ``` -------------------------------- ### PDF Data Display Logic Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html This snippet demonstrates the client-side logic for rendering extracted tabular data from PDFs. It handles cases where data is available, loading, an error occurs, or no data is found. It uses a templating engine (likely Underscore.js) to iterate through tables and rows. ```HTML <% if(data.length){ %> <% \_(data).each(function(table){ %> <% \_(table).each(function(row){ %> <% \_(row).each(function(cell){ %> <% }) %> <% }) %>

<%= cell %>

<% }) %> <% } else if (loading) { %>

<% } else if (error_message) { %>

Tabula couldn't finish processing your request.

<%= error_message %>

<% } else { %> No data. <% } %> ``` -------------------------------- ### LSD Line Segment Detector (C) Source: https://github.com/tabulapdf/tabula/blob/master/build/dist-LICENSE.txt The Line Segment Detector (LSD) is a C implementation for detecting line segments in digital images. It is based on the publication 'LSD: a Line Segment Detector' and is distributed under the GNU Affero General Public License version 3. ```C /* LSD - Line Segment Detector on digital images This code is part of the following publication and was subject to peer review: "LSD: a Line Segment Detector" by Rafael Grompone von Gioi, Jeremie Jakubowicz, Jean-Michel Morel, and Gregory Randall, Image Processing On Line, 2012. DOI:10.5201/ipol.2012.gjmr-lsd http://dx.doi.org/10.5201/ipol.2012.gjmr-lsd Copyright (c) 2007-2011 rafael grompone von gioi This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see . Additional permission under GNU GPL version 3 section 7 If you modify this Program, or any covered work, by linking or combining it with Tabula (or a modified version of that library), containing parts covered by the terms of "MIT License", the licensors of this Program grant you additional permission to convey the resulting work. Corresponding Source for a non-source form of such a combination shall include the source code for the parts of Tabula used as well as that of the covered work. */ ``` -------------------------------- ### PDFBox Dependency (Java) Source: https://github.com/tabulapdf/tabula/blob/master/build/dist-LICENSE.txt The tabula-extractor component utilizes the PDFBox library, specifically target/pdfbox-app-1.8.0.jar. PDFBox is developed at The Apache Software Foundation and is licensed under the Apache License, Version 2.0. ```Java /* This product includes software (target/pdfbox-app-1.8.0.jar) developed at The Apache Software Foundation (http://www.apache.org/). Licensed under the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0). */ ``` -------------------------------- ### Selection Control and Data Preview Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html Offers controls for managing cell selections within the PDF, including clearing all selections, restoring auto-detected tables, and previewing extracted data. ```HTML ``` -------------------------------- ### HTML Table Row for Saved Template Management Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html This HTML snippet defines a table row for managing saved templates. It displays the template name, allows editing, shows selection and page counts, upload time, deletion, and a download button for the template's JSON configuration. It utilizes embedded JavaScript expressions for dynamic data. ```html <%= name %> data-templateid=<%= id %> class="glyphicon glyphicon-pencil edit-template-name"> <%= selection_count %> selection<%= selection_count == "!" ? '' : 's' %> <%= page_count || '??' %> <%= new Date(parseInt(time) * 1000).toUTCString().slice(5, -7) %> data-templateid=<%= id %> class="glyphicon glyphicon-remove delete-template"> ``` -------------------------------- ### HTML Structure for Upload Error Message Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html This HTML snippet provides a structure for displaying an upload error message to the user. It includes a heading, a paragraph to display the specific error message dynamically using an embedded JavaScript expression. ```html

Tabula

Upload Error

<%= message %>

``` -------------------------------- ### HTML Table Row for PDF Data Display Source: https://github.com/tabulapdf/tabula/blob/master/webapp/index.html This HTML snippet defines a table row structure for displaying information about uploaded PDFs. It includes links to view the PDF, its size, page count, upload time, and options to delete or extract data. It uses embedded JavaScript expressions for dynamic content. ```html <%= original_filename %> <%= size ? Math.floor(size / 1024) : '??' %> kB <%= page_count || '??' %> <%= new Date(parseInt(time) * 1000).toUTCString().slice(5, -7) %> data-pdfid=<%= id %> class="glyphicon glyphicon-remove delete-pdf"> ``` -------------------------------- ### Changing Tabula's Default Port Source: https://github.com/tabulapdf/tabula/blob/master/README.md Explains how to change the default port (8080) Tabula uses if it conflicts with other applications. This is done by specifying the `warbler.port` property when running Tabula from the terminal. ```APIDOC Port Conflict Resolution: Issue: Another program uses port 8080, preventing Tabula from starting or causing incorrect loading in the browser. Solution: Run Tabula from the terminal with a custom port using the `warbler.port` property: `java -Dfile.encoding=utf-8 -Xms256M -Xmx1024M -Dwarbler.port=9999 -jar tabula.jar` Replace `9999` with your desired port number. ```

How to Use Tabula

Having trouble with Tabula?

My Saved Templates

Is the extracted data incorrect?

Revise Selected Cells

About Tabula

Who Uses Tabula?

Credits

Import one or more Tabula Templates

Failed Uploads

<%= filename %>

Import one or more PDFs

Imported PDFs

Tabula

Upload Error