Path: csiph.com!fu-berlin.de!bofh.it!news.nic.it!robomod From: Dirk Eddelbuettel Newsgroups: linux.debian.bugs.dist,linux.debian.maint.python,linux.debian.devel Subject: Bug#970021: Seeking a small group to package Apache Arrow (was: Bug#970021: RFP: apache-arrow -- cross-language development platform for in-memory analytics) Date: Sun, 31 Mar 2024 14:00:01 +0200 Message-ID: References: X-Mailbox-Line: From debian-bugs-dist-request@lists.debian.org Sun Mar 31 11:51:12 2024 Old-Return-Path: X-Spam-Flag: NO X-Spam-Score: 0.201 Reply-To: Dirk Eddelbuettel , 970021@bugs.debian.org Resent-To: debian-bugs-dist@lists.debian.org Resent-Cc: wnpp@debian.org X-Debian-Pr-Message: followup 970021 X-Debian-Pr-Package: wnpp MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: VM 8.2.0b under 29.1 (x86_64-pc-linux-gnu) X-Debian-User: edd X-Debian-Message: from BTS X-Mailing-List: archive/latest/1830864 List-ID: List-URL: Approved: robomod@news.nic.it Lines: 23 Organization: linux.* mail to news gateway Sender: robomod@news.nic.it X-Original-Cc: Diane Trout , debian-science@lists.debian.org, 970021@bugs.debian.org, debian-python@lists.debian.org, debian-devel@lists.debian.org X-Original-Date: Sun, 31 Mar 2024 06:48:13 -0500 X-Original-Message-ID: <26121.19837.566425.4463@rob.eddelbuettel.com> X-Original-References: <71718a3f86b42248de7cb21abf460b65d56ff61a.camel@ghic.org> <159973208530.29733.12619476555077805477.reportbug@BLN04NB0421> Xref: csiph.com linux.debian.bugs.dist:1192495 linux.debian.maint.python:15692 linux.debian.devel:111148 Julian, Arrow is a complicated and large package. We use it at work (where there is a fair amount of Python, also to Conda etc) and do have issues with more complex builds especially because it is 'data infrastructure' and can come in from different parts. I would recommend against packaging at old one -- we also have seen issues with different (py)arrow version biting. Have you seen https://github.com/apache/arrow-nanoarrow ? It works via the C API to Arrow which interchanges data via two void* to the the two structs for arrow array and schema -- and avoids linkage issue. (In user space the pyarrow or R arrow packages can still be used also interfacing via these.) I have been using it for R package bindings for some time and we plan to expand that (again, at work) -- as do others. It is already use by duckdb, by the Arrow 'ADBC' interfaces (which are generic in the ODBC/JDBC sense but for Arrow, and also by a python interface to snowflake. Dirk -- dirk.eddelbuettel.com | @eddelbuettel | edd@debian.org