### Installing rpy2-arrow with All Optional Dependencies (Bash) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/polars.rst Installs the rpy2-arrow package along with all its optional dependencies, including those for Polars, using pip. This is an alternative way to ensure Polars support is installed. ```bash pip install 'rpy2-arrow[all]' ``` -------------------------------- ### Install rpy2-arrow development version Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/index.rst Installs the current development version of the rpy2-arrow Python package directly from the main branch of the GitHub repository using pip. ```bash pip install -e git://github.com/rpy2/rpy2-arrow.git@main#egg=rpy2_arrow ``` -------------------------------- ### Installing rpy2-arrow package (bash) Source: https://github.com/rpy2/rpy2-arrow/blob/main/README.md Installs the rpy2-arrow Python package from PyPI using the pip package manager. This command is typically run in a terminal or command prompt. ```bash pip install rpy2-arrow ``` -------------------------------- ### Install rpy2-arrow via pip Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/index.rst Installs the latest released version of the rpy2-arrow Python package from PyPI using the pip package manager. ```bash pip install rpy2-arrow ``` -------------------------------- ### Creating a Sample Polars DataFrame (Python) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/polars.rst Demonstrates how to create a simple `polars.DataFrame` instance in Python. This DataFrame is used as the source object for subsequent conversion examples between Python and R. ```python podataf = polars.DataFrame({'a': [1, 2], 'b': [3, 4]}) print('Python polars.DataFrame:') print(podataf) ``` -------------------------------- ### Installing rpy2-arrow with Polars Support (Bash) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/polars.rst Installs the rpy2-arrow package with the optional dependencies required for Polars integration using pip. This ensures the necessary Python packages are available for Polars conversion rules. ```bash pip install 'rpy2-arrow[polars]' ``` -------------------------------- ### Loading R Package with rpy2 - R Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster-rpy2-conversion.rst Loads the `dplyr` package in the R environment managed by rpy2, suppressing informational messages during the loading process. This is a common setup step before using `dplyr` functions. ```R suppressMessages(require(dplyr)) ``` -------------------------------- ### Install R arrow package Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/index.rst Installs the required R 'arrow' package from CRAN within an R environment. This package is necessary for rpy2-arrow to function correctly and avoid issues like segfaults. ```r install.packages("arrow") ``` -------------------------------- ### List Attributes of R Arrow Object in Python Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/basic-usage.rst This snippet shows how to access and list the attributes (methods and properties) available on the R Arrow object (r_array) obtained from the conversion. It treats the R object as a mapping and uses the .keys() method to get a view of its attributes, which is then converted to a tuple for display. ```Python tuple(r_array.keys()) ``` -------------------------------- ### Creating a large pandas DataFrame Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster-rpy2-conversion.rst Imports the pandas library and creates a DataFrame named `pd_dataf` with a large number of rows (`_N`) and two columns ('x' and 'y') for use in performance testing examples. ```Python import pandas as pd # Number or rows in the DataFrame. _N = 500000 pd_dataf = pd.DataFrame({'x': range(_N), 'y': ['abc', 'def'] * (_N//2)}) ``` -------------------------------- ### Importing rpy2-arrow and PyArrow Libraries Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Imports the necessary Python libraries for working with Apache Arrow and the rpy2-arrow conversion module. `pyarrow` is the core Arrow library, `pyarrow.dataset` is used for working with datasets spanning multiple files, and `rpy2_arrow.pyarrow_rarrow` provides the conversion utilities. ```python import pyarrow import pyarrow.dataset as ds import rpy2_arrow.pyarrow_rarrow as pyra ``` -------------------------------- ### Loading rpy2 IPython Extension Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster-rpy2-conversion.rst Loads the `rpy2.ipython` extension in a Jupyter/IPython environment using the `%load_ext` magic command, which enables the use of R magic commands like `%%R` for interacting with R. ```Python %load_ext rpy2.ipython ``` -------------------------------- ### Loading Parquet Files into PyArrow Dataset Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Creates a PyArrow Dataset object from the list of local file paths (`paths`) collected during the download process. This allows treating the collection of Parquet files as a single logical dataset without loading everything into memory at once. ```python dataset = ds.dataset(paths, format='parquet') dataset ``` -------------------------------- ### Benchmarking Direct PyArrow Table to R Conversion Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Measures the time taken to convert the entire PyArrow Table (`tbl`) directly into an R object using `rpy2_arrow.converter.py2rpy`. This uses the IPython `%%time` magic command for convenient timing. ```python %%time r_tbl = pyra.converter.py2rpy(tbl) ``` -------------------------------- ### Loading rpy2 IPython Extension in Python Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster_rpy2_conversion.ipynb Loads the rpy2 extension for IPython/Jupyter notebooks, enabling the use of `%%R` magic commands to execute R code directly within Python cells. ```python %load_ext rpy2.ipython ``` -------------------------------- ### Downloading NYC Taxi Dataset Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Python script to download a subset of the NYC taxi dataset in Parquet format from a public S3 bucket. It iterates through years and months, checks if files exist locally, and downloads them if necessary, creating directories as needed. `MAX_NMONTHS` can be set to limit the download size. ```python # This allows download an incomplete dataset # in the interest of time. Set it to None or -1 # to download the complete dataset. MAX_NMONTHS = 10 import os import urllib.parse import urllib.request import shutil BUCKET = 'https://ursa-labs-taxi-data.s3.us-east-2.amazonaws.com' paths = [] print(' | |') for year in range(2009, 2020): if len(paths) == MAX_NMONTHS: print() break print(f'{year} ', end='', flush=True) if year == 2019: # We only have through June 2019 there months = range(1, 7) else: months = range(1, 13) for month in months: if len(paths) == MAX_NMONTHS: print() break month_str = f'{month:02d}' year_str = str(year) url = urllib.parse.urljoin(BUCKET, '/'.join((year_str, month_str, 'data.parquet'))) filename = os.path.join(DATA_PATH, year_str, month_str, 'data.parquet') if os.path.exists(filename): print('s', end='', flush=True) paths.append(filename) continue print('D', end='', flush=True) os.makedirs(os.path.join(DATA_PATH, year_str, month_str)) with urllib.request.urlopen(url) as response, open(filename, 'wb') as output_file: shutil.copyfileobj(response, output_file) paths.append(filename) print() ``` -------------------------------- ### Benchmarking PyArrow Column Access and R Conversion Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Uses the `timeit` module to measure the performance of accessing individual columns (which are ChunkedArrays) from the PyArrow Table and then converting those individual columns to their corresponding R objects using `rpy2_arrow.converter.py2rpy`. It demonstrates the time taken per column. ```python import timeit N = 3 for col_i in (0, 1, 5): print(f'Column: {tbl.schema.types[col_i]}') t_getitem = timeit.timeit(lambda: tbl[col_i], number=N) / N print(f' getitem: {t_getitem:.2e}s', end='', flush=True) array = tbl[col_i] t_convert = timeit.timeit(lambda: pyra.converter.py2rpy(array), number=N) / N print(f' to R: {t_convert:.2e}s') ``` -------------------------------- ### Generating Plot with ggplot2 on Imported R Object Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Executes R code using the `%%R` magic command with options for plot output size and type. It imports the `r_tbl` object, loads `ggplot2` and `viridis`, and creates a hexagonal bin plot visualizing the relationship between fare amount and tip percentage using the imported Arrow data. ```r %%R -w 800 -h 600 --type cairo-png library(ggplot2, warn.conflicts = FALSE) library(viridis) options(bitmapType="cairo") X11.options(antialias = "subpixel") p <- ggplot(r_tbl %>% collect()) + geom_hex(aes(x = fare_amount, y = tip_amount/fare_amount), bins = 75) + scale_fill_viridis(trans="log10") + scale_y_continuous("tip", labels = scales::percent, trans="log10") + ggtitle("Tip as a percentage of the fare") + theme_gray(base_size=19) print(p) ``` -------------------------------- ### Loading dplyr Package in R Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster_rpy2_conversion.ipynb Loads the `dplyr` package in the R session, demonstrating that data manipulation packages can work with the Arrow-backed data structure imported from Python. ```r %%R suppressMessages(require(dplyr)) ``` -------------------------------- ### Benchmarking Combined-Chunks PyArrow Table to R Conversion Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Measures the time taken to first combine the chunks within the PyArrow Table (`tbl`) into a new table (`cb_tbl`) and then convert this combined-chunks table to an R object. This often results in significantly faster conversion compared to converting a table with many small chunks. ```python %%time cb_tbl = tbl.combine_chunks() r_tbl = pyra.converter.py2rpy(cb_tbl) ``` -------------------------------- ### Loading rpy2 IPython Extension Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Loads the rpy2 extension within an IPython environment (like Jupyter notebooks). This enables the use of magic commands such as `%%R` to execute R code directly within Python cells. ```python %load_ext rpy2.ipython ``` -------------------------------- ### Defining Data Path Constant Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Defines a string constant `DATA_PATH` which specifies the local directory name where the NYC taxi dataset files will be stored after downloading. ```python DATA_PATH = 'nyc-taxi' ``` -------------------------------- ### Importing Required Libraries for Polars Conversion (Python) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/polars.rst Imports the necessary Python libraries: `polars` for DataFrame creation, `rpy2.robjects` for R interaction, and `rpy2_arrow.polars` for the specific Polars conversion functions. These imports are prerequisites for converting Polars DataFrames. ```python import polars import rpy2.robjects import rpy2_arrow.polars as rpy2polars ``` -------------------------------- ### Creating pandas DataFrame in Python Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster_rpy2_conversion.ipynb Initializes a pandas DataFrame with a large number of rows for performance testing. The size is controlled by the `_N` variable to allow for noticeable conversion time differences. ```python import pandas as pd # Number or rows in the DataFrame. _N = 500000 pd_dataf = pd.DataFrame({'x': range(_N), 'y': ['abc', 'def'] * (_N//2)}) ``` -------------------------------- ### Importing pandas DataFrame into R using Default rpy2 Converter Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster_rpy2_conversion.ipynb Demonstrates importing a pandas DataFrame (`pd_dataf`) into an R session within a Jupyter notebook using the default rpy2 conversion mechanism. It measures the time taken and prints the head of the resulting R data.frame. ```r %%time %%R -i pd_dataf print(head(pd_dataf)) rm(pd_dataf) ``` -------------------------------- ### Converting PyArrow Dataset to Table with Filter Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Reads the PyArrow Dataset into a single PyArrow Table (`tbl`). A filter (`ds.field('tip_amount') > 10`) is applied during conversion to reduce memory usage by only including rows where the tip amount is greater than 10. The `batch_size` parameter controls the size of record batches read from the dataset. ```python tbl = dataset.to_table(filter=ds.field('tip_amount') > 10, batch_size=5E6) tbl.shape ``` -------------------------------- ### Transferring pandas DataFrame to R (Default Conversion) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster-rpy2-conversion.rst Uses the `%%R` magic command with the `-i` flag to transfer the `pd_dataf` pandas DataFrame to the R environment using the default rpy2 conversion. It then prints the head of the resulting R object and removes it, measuring the execution time with `%%time`. ```R %%time %%R -i pd_dataf print(head(pd_dataf)) rm(pd_dataf) ``` -------------------------------- ### Execute R Code String with R Arrow Object in Python Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/basic-usage.rst This snippet demonstrates executing a block of R code provided as a string from Python, using the R Arrow object. It defines an R code string, sets the R Arrow object (r_array) into the R context under the name my_array, and then executes the R code using rpy2.robjects.r(). ```Python r_code = """ ## this assumes a R Arrow array my_array res = sum(my_array > 5) print(res) """ import rpy2.rinterface import rpy2.robjects with rpy2.rinterface.local_context() as r_context: r_context['my_array'] = r_array res = rpy2.robjects.r(r_code) ``` -------------------------------- ### Converting Python Polars DataFrame to R using Helper (Python) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/polars.rst Illustrates using the `rpy2_arrow.polars.pl_to_rpl()` convenience function to convert a Python `polars.DataFrame` directly into an R Polars DataFrame object. This provides a simpler syntax compared to using the context manager for direct conversion. ```python r_podataf = rpy2polars.pl_to_rpl(podataf) print(r_podataf) ``` -------------------------------- ### Call ToString Method on R Arrow Object in Python Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/basic-usage.rst This snippet demonstrates calling a method (ToString) on the R Arrow object (r_array) from Python. It accesses the method like a dictionary item and calls it, then joins the resulting output (which appears to be a list of strings) into a single string for printing. ```Python print( ''.join( r_array['ToString']() ) ) ``` -------------------------------- ### Accessing PyArrow Table Schema Types Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Accesses the schema of the PyArrow Table and retrieves the data types for each column. This shows the structure and types of the data loaded into the table. ```python tbl.schema.types ``` -------------------------------- ### Converting R Object to Python Polars DataFrame using Context (Python) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/polars.rst Demonstrates converting an R object (previously created from a Python Polars DataFrame) back into a Python `polars.DataFrame`. This is achieved by accessing the R object from the global environment within the `rpy2_arrow.polars.converter.context()`. It shows the round trip conversion. ```python with rpy2polars.converter.context() as cv_ctx: podataf_back = rpy2.robjects.globalenv['r_podataf'] print('Python polars_back:') print(podataf_back) ``` -------------------------------- ### Creating pyarrow Table from pandas DataFrame - Python Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster-rpy2-conversion.rst Converts a pandas DataFrame (`pd_dataf`) into an Apache Arrow Table using the `pyarrow` library. This is a crucial step for enabling efficient data sharing between Python and R using Arrow's in-memory format. ```Python tbl = pyarrow.lib.Table.from_pandas(pd_dataf) ``` -------------------------------- ### Defining Custom rpy2 Converter for pandas to R data.frame via Arrow in Python Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster_rpy2_conversion.ipynb Creates a custom rpy2 converter that utilizes Apache Arrow as an intermediate step to accelerate the conversion of pandas DataFrames to R data.frames. It registers a conversion function that transforms a pandas DataFrame into a pyarrow Table and then converts the Arrow Table to an R data.frame. ```python import pyarrow from rpy2.robjects.packages import importr import rpy2.robjects.conversion import rpy2.rinterface import rpy2_arrow.pyarrow_rarrow as pyra base = importr('base') # We use the converter included in rpy2-arrow as template. conv = rpy2.robjects.conversion.Converter('Pandas to data.frame', template=pyra.converter) @conv.py2rpy.register(pd.DataFrame) def py2rpy_pandas(dataf): pa_tbl = pyarrow.Table.from_pandas(dataf) # pa_tbl is a pyarrow table, and this is something that # that converter shipping with rpy2-arrow knows how to handle. return base.as_data_frame(pa_tbl) # We build a custom converter that is the default converter for # ipython/jupyter shipping with rpy2, to which we add rules for # Arrow + pandas we just made. conv = rpy2.ipython.rmagic.converter + conv ``` -------------------------------- ### Defining Custom rpy2 Converter (Pandas to R via Arrow) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster-rpy2-conversion.rst Imports necessary libraries (pyarrow, rpy2, rpy2-arrow) and defines a custom rpy2 converter (`conv`) that converts a pandas DataFrame to an Arrow Table using pyarrow, then leverages the rpy2-arrow converter to handle the Arrow Table to R conversion, finally casting it to an R data.frame. This custom converter is added to the default rpy2 IPython converter. ```Python import pyarrow from rpy2.robjects.packages import importr import rpy2.robjects.conversion import rpy2.rinterface import rpy2_arrow.arrow as pyra base = importr('base') # We use the converter included in rpy2-arrow as template. conv = rpy2.robjects.conversion.Converter( 'Pandas to data.frame', template=pyra.converter) @conv.py2rpy.register(pd.DataFrame) def py2rpy_pandas(dataf): pa_tbl = pyarrow.Table.from_pandas(dataf) # pa_tbl is a pyarrow table, and this is something # that the converter shipping with rpy2-arrow knows # how to handle. return base.as_data_frame(pa_tbl) # We build a custom converter that is the default converter # for ipython/jupyter shipping with rpy2, to which we add # rules for Arrow + pandas we just made. conv = rpy2.ipython.rmagic.converter + conv ``` -------------------------------- ### Accessing and Removing Shared Arrow Table in R via rpy2 - R Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster-rpy2-conversion.rst Accesses an Apache Arrow Table (`tbl`) that has been shared from Python into the R environment via rpy2-arrow's converter. It prints the first few rows of the table using `head()` and then removes the object from the R environment using `rm()`. This demonstrates interaction with the shared data structure in R. ```R print(head(tbl)) rm(tbl) ``` -------------------------------- ### Converting R Object to Python Polars DataFrame using Helper (Python) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/polars.rst Shows how to use the `rpy2_arrow.polars.rpl_to_pl()` convenience function to convert an R Polars DataFrame object back into a Python `polars.DataFrame`. This offers a direct and simplified approach for the reverse conversion. ```python podataf_back_2 = rpy2polars.rpl_to_pl(r_podataf) print(podataf_back_2) ``` -------------------------------- ### Defining Custom rpy2 Converter (Pandas to Arrow Table) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster-rpy2-conversion.rst Defines a second custom rpy2 converter (`conv2`) that converts a pandas DataFrame directly into a pyarrow Table and then uses the rpy2-arrow converter to make this Arrow Table available in R. This converter is also added to the default rpy2 IPython converter. ```Python conv2 = rpy2.robjects.conversion.Converter( 'Pandas to pyarrow', template=pyra.converter) @conv2.py2rpy.register(pd.DataFrame) def py2rpy_pandas(dataf): pa_tbl = pyarrow.Table.from_pandas(dataf) return pyra.converter.py2rpy(pa_tbl) conv2 = rpy2.ipython.rmagic.converter + conv2 ``` -------------------------------- ### Using dplyr with Arrow-backed Data in R Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster_rpy2_conversion.ipynb Demonstrates performing a data manipulation task (grouping and summarizing) using the `dplyr` package on the data imported into R via the Arrow-based converter (`conv2`). This shows compatibility and potential performance benefits when working directly with the Arrow-backed data in R. ```r %%time %%R -i pd_dataf -c conv2 pd_dataf %>% group_by(y) %>% summarize(n = length(x)) ``` -------------------------------- ### Convert PyArrow Array to R Arrow Object in Python Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/basic-usage.rst This snippet demonstrates the basic conversion of a Python PyArrow array to an R Arrow object using rpy2_arrow. It imports the necessary libraries, creates a PyArrow array, and then uses pyra.pyarrow_to_r_array to perform the conversion, making the R object available in Python. ```Python import pyarrow import rpy2_arrow.arrow as pyra # Create an array on the Python side using pyarrow py_array = pyarrow.array(range(10)) # Pass the C/C++ pointer to an R arrow object r_array = pyra.pyarrow_to_r_array(py_array) ``` -------------------------------- ### Importing pandas DataFrame into R as Arrow Table using Custom Converter Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster_rpy2_conversion.ipynb Imports the pandas DataFrame into R using the second custom converter (`conv2`), which converts the data into an R object wrapping an Arrow table. Measures the time taken and prints the head of the R object. ```r %%time %%R -i pd_dataf -c conv2 print(head(pd_dataf)) rm(pd_dataf) ``` -------------------------------- ### Defining Custom rpy2 Converter for pandas to R Arrow Table in Python Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster_rpy2_conversion.ipynb Defines a second custom rpy2 converter that converts a pandas DataFrame to a pyarrow Table and then passes the pyarrow Table to the rpy2-arrow converter. This results in an R object that wraps an Arrow table, potentially reducing memory copies during conversion. ```python conv2 = rpy2.robjects.conversion.Converter('Pandas to pyarrow', template=pyra.converter) @conv2.py2rpy.register(pd.DataFrame) def py2rpy_pandas(dataf): pa_tbl = pyarrow.Table.from_pandas(dataf) return pyra.converter.py2rpy(pa_tbl) conv2 = rpy2.ipython.rmagic.converter + conv2 ``` -------------------------------- ### Using dplyr on Imported R Object Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Executes R code using the `%%R` magic command, importing the `r_tbl` object from Python. It loads the `dplyr` library and performs data manipulation on `r_tbl`, specifically creating a new column `tip_group` and counting the occurrences within each group. ```r %%R -i r_tbl library(dplyr, warn.conflicts = FALSE) r_tbl %>% mutate(tip_group = round(tip_amount / 5) * 5) %>% count(tip_group) ``` -------------------------------- ### Importing pandas DataFrame into R using Custom Arrow Converter Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster_rpy2_conversion.ipynb Imports the pandas DataFrame into R using the custom converter (`conv`) that leverages Apache Arrow for faster conversion to an R data.frame. It measures the time taken and prints the class and head of the resulting R object. ```r %%time %%R -i pd_dataf -c conv print(class(pd_dataf)) print(head(pd_dataf)) rm(pd_dataf) ``` -------------------------------- ### Converting Python Polars DataFrame to R using Context (Python) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/polars.rst Shows how to convert a Python `polars.DataFrame` to an R object by assigning it to an R global environment variable within the `rpy2_arrow.polars.converter.context()`. This method utilizes the active conversion rules to handle the data transfer. ```python with rpy2polars.converter.context() as cv_ctx: rpy2.robjects.globalenv['r_podataf'] = podataf print('R polars::pl$DataFrame:') rpy2.robjects.r('print(r_podataf)') ``` -------------------------------- ### Transferring pandas DataFrame to R (Custom Arrow Table Conversion) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster-rpy2-conversion.rst Uses the `%%R` magic command with the `-i` flag and the second custom converter (`-c conv2`) to transfer the `pd_dataf` pandas DataFrame to R, making it available as an Arrow Table. It prints the head of the object and removes it, measuring the execution time. ```R %%time %%R -i pd_dataf -c conv2 print(head(pd_dataf)) rm(pd_dataf) ``` -------------------------------- ### Performing dplyr Grouped Aggregation via rpy2 - R Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster-rpy2-conversion.rst Executes R code using the `dplyr` package via rpy2. It performs a grouped aggregation on an input data structure (`pd_dataf`), grouping by column `y` and calculating the count, minimum, and mean of column `x`. This demonstrates using R functions on data potentially shared from Python. ```R res <- pd_dataf %>% group_by(y) %>% summarize(n = length(x), min = min(x), avg = mean(x)) print(res) ``` -------------------------------- ### Transferring pandas DataFrame to R (Custom Arrow Conversion) Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/faster-rpy2-conversion.rst Uses the `%%R` magic command with the `-i` flag and the custom converter (`-c conv`) to transfer the `pd_dataf` pandas DataFrame to R using the Arrow-based conversion. It prints the class and head of the resulting R object (an R data.frame) and removes it, measuring the execution time. ```R %%time %%R -i pd_dataf -c conv print(class(pd_dataf)) print(head(pd_dataf)) rm(pd_dataf) ``` -------------------------------- ### Checking Class of Imported R Object Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/notebooks/demo.ipynb Uses the `%%R` magic command to execute R code. The `-i r_tbl` option makes the Python variable `r_tbl` available in the R environment. The R code then prints the class of the imported object to confirm its type in R. ```r %%R -i r_tbl print(class(r_tbl)) ``` -------------------------------- ### Use R Base Function with R Arrow Object in Python Source: https://github.com/rpy2/rpy2-arrow/blob/main/doc/basic-usage.rst This snippet shows how to use an R base function (sum) with the R Arrow object (r_array) from Python. It imports the R base package using rpy2.robjects.packages.importr and then calls the sum function, passing the R Arrow object as an argument. ```Python import rpy2.robjects.packages as packages base = packages.importr('base') print(base.sum(r_array)) ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.