Linux News
The world is talking about GNU/Linux and Free/Open Source Software

Login

If you don't have an account yet, visit the registration page to sign up.

If you already have an account, you may login here:

Today's Big Story

Anti-Cheat: A Thorny Problem For Linux Gamers

LXer Features

Linux That's Small

Encryption, Trust, and the Hidden Dangers of Vendor-Controlled Data

My Linux Mint Tribute

How I Turned My Chromebook Into A "Mintbook"

Adventures With My New Chromebook

My Linux Laptop

Have something to say?

Ready to be published? LXer is read by around 350,000 individuals each month, and is an excellent place for you to publish your ideas, thoughts, reviews, complaints, etc. Do you have something to say to the Linux community?

Publish it here.

DaniWeb Linux Community
An exciting professional discussion group about software development, php, shell scripting, networking, ruby, and more.

Latest Discussions

Good Advice

Straight frwd run archinstall 3.0.0 crashes in pipewiire section

Hyprland 0.45.0 has just been released for Arch Linux based

See Activation Cube feature on Fedora 41 KDE Spin in VENV

You don't need AI to know how this goes

Should be a SNL skit

Please add https to lxer.com

Anyone remember Distrotest.net? I found a new one!

Site Menu

Other News

- LWN.net
Their weekly coverage of Linux news is unmatched in this community.

- LinuxGizmos.com
Excellent news for embedded Linux.

- LinuxQuestions.org
Discussion forums for Linux users.

LinuxQuestions.org is a friendly and active Linux Community with forums, reviews, a hardware compatibility list, a wiki, tutorials, a download site, a podcast and more.

Using Spark DataFrames for large scale data science

Posted by bob on Mar 26, 2015 3:46 PM EDT
Opensource.com

Mail this story
Print this story

When we first open sourced Spark, we aimed to provide a simple API for distributed data processing in general-purpose programming languages (Java, Python, Scala). Spark enabled distributed data processing through functional transformations on distributed collections of data (RDDs). This was an incredibly powerful API—tasks that used to take thousands of lines of code to express could be reduced to dozens. read more

Full Story

Nav

» Read more about: Groups: Python; Story Type: News Story

« Return to the newswire homepage

This topic does not have any threads posted yet!

You cannot post until you login.

Linux NewsThe world is talking about GNU/Linux and Free/Open Source Software