Untitled-design-1
Tags   
Tags 

How to choose between R and Python for careers in data science

Data scientists are a hot commodity in the tech world. Discovering which coding language will help you achieve the Big Data career of your dreams is the first step.
-
WorkingNation’s Jaimie Stevens.

At the forefront of the AI revolution, in a future where data-driven decisions are becoming the norm around the world, lies the continually evolving field of data science. Data scientists are basically the detectives of the tech industry.

A good data scientist must have data intuition, an understanding of cause and effect, the motivation to find trends and identify variables, the proficiency to verify whether a method fits a model and the ability to communicate their findings.

But behind each data scientist is a programming language that serves as the essential tool allowing them to demonstrate those capabilities — and usually, it comes down to these two languages:  R or Python.

Both languages facilitate machine learning, work with large datasets or create complex visualizations. They are free and Open Source, which contribute to their popularity and allow them to have large libraries available.

You will find numerous surveys comparing the popularity of these two languages. When you look at the more recent polls that focus on programming languages used for data analysis, R stands out as the clear winner, even when comparing it directly to Python. However, people are switching more frequently over to Python from R.

RELATED STORY: Getting to know the R programming language

While these numbers demonstrate how each of the languages is flourishing in the world of programming, it’s hard to compare them next to each other, mainly because you will only find R in a data science/statistics environment.

Even though you probably won’t be left behind regardless of which one you choose, how do you decide on which one to learn? Is there one that is more cutting-edge?

This is a familiar debate amongst data scientists, and I am going to try and make the breakdown as simple to understand as possible.

Choosing between R and Python depends on what you’re looking to accomplish.

  • R is for statistical analysis, and Python is for general purpose programming. This means that R is for a more specific purpose, while Python is utilized to write software for a wider variety of application domains.
  • R is used when the data analysis task requires standalone computing or analysis on individual servers. You can utilize Python when your data analysis tasks need to be integrated with web apps or if statistics code needs to be implemented into a production database.
  • Python is better for data manipulation and repeated tasks, while R is better for ad-hoc analysis and exploring datasets.

Are there any clear advantages of one language over the other?

Let’s start with Python.

As a beginner, Python is considered easier to learn. R has a pretty steep learning curve because statisticians developed it for statisticians. Python has an easier-to-learn syntax.

  • Since Python is a general programming language, learning it gives you the skills to go beyond just data analysis — you can build a website from Python or understand command-line tools.
  • Programmers think Python coincides with the way programmers think more than R does, and therefore it translates over to other languages more easily. As mentioned above, the roots of R lie in statistics, so it has a unique design. If you want to go down the road of learning other general purpose languages, Python is the language to pursue.
  • A large part of data analysis is cleaning up the data beforehand. It’s nice to clean data with a full-service language like Python because you can add new functions and layers to take apart your data. If these functions require local storage or web access, it’s fairly easy to include these with Python.
  • Python is evolving with time. New code is being introduced and breaking old code, which makes Python a living language. This leads to more open source code and solutions. R’s steps are not as forward-thinking. Instead, it has stayed pure.
  • Python moves more quickly than R. This is because R was developed to center around the convenience of statisticians, not the convenience of the computer.

What are R’s advantages?

  • R is great for statistical analysis.
  • R is also built around a command line, but many people work inside of environments like RStudio or R commander that include a data editor, debugging support, and a window to hold graphics as well. Python has tried to catch up with this with IDEs like Eclipse or Visual Studio.
  • Visualized data can be better understood than raw numbers. R and visualization go hand-in-hand. It includes quite a few packages that correspond with this. Pythons visualizations are a little more convoluted, and there aren’t as many visualization libraries to choose.

Is there an advantage to learning both?

The two can definitely reflect on each other. The first stage of data aggregation can be accomplished with Python when you need to scrape data from websites, files or other data sources.

Then you could let R apply the optimized statistical analysis routines built into the language to the data that’s been gathered and cleaned for you. You could consider Python the preprocessing library for R.

Before you choose between the two languages, ask yourself the following questions:

  • What kinds of problems are you looking to solve?
  • Are you looking to do statistical analysis specifically, or are you looking to do more than that?
  • How do you want your data results to be represented?
  • What kind of tools are available to each of these languages and how can they help me accomplish my goals?

Why don’t you try out R and Python yourself and see what you think?
You can check out tutorials and examples of R on Code School You can download Swirl and get started with R right away! You can also check out both R and Python online at DataCamp.

RELATED STORY: Why you should pick up Python skills first

It is quite possible that you may have to learn both, depending on what company you end up working at and what they use. Job trends have indicated that there an increasing demand for both skills, and the wages are well above average.

In truth, the differences between these two languages are growing more and more minimal. At this point, the features that one program or the other could handle are now possible in both. There are even libraries to use Python with R, and vice versa – so you can have the best of both worlds.

This article is part of WorkingNation Associate Producer Jaimie Stevens’ “Starting Out in Tech” series where she shares her insight into becoming a computer programmer. Catch up on her previous articles here.

Join the Conversation: Share your thoughts on the latest Starting Out in Tech column on our Facebook page.

Dana Beth Ardi

Executive Committee

Dana Beth Ardi, PhD, Executive Committee, is a thought leader and expert in the fields of executive search, talent management, organizational design, assessment, leadership and coaching. As an innovator in the human capital movement, Ardi creates enhanced value in companies by matching the most sought after talent with the best opportunities. Ardi coaches boards and investors on the art and science of building high caliber management teams. She provides them with the necessary skills to seek out and attract top-level management, to design the ideal organizational architectures and to deploy people against strategy. Ardi unearths the way a business works and the most effective way for people to work in them.

Ardi is an experienced business executive and senior consultant who leverages business organizational transformation through talent strategies. She uses her knowledge and experience to develop talent strategies to enhance revenue and profit contributions. She has a deep expertise in change management and organizational effectiveness and has designed and built high performance cultures. Ardi has significant experience in mergers, acquisitions, divestitures, IPO’s and turnarounds.

Ardi is an expert on the multi-generational workforce. She understands the four intersecting generations of workers coming together in contemporary companies, each with their own mindsets, leadership and communications styles, values and motivations. Ardi is sought after to assist companies manage and thrive by bringing the generations together. Her book, Fall of the Alphas: How Beta Leaders Win Through Connection, Collaboration and Influence, will be published by St. Martin’s Press. The book reflects Ardi’s deep expertise in understanding organizations and our changing society. It focuses on building a winning culture, how companies must grow and evolve, and how talent influences and shapes communities of work. This is what she has coined “Corporate Anthropology.” It is a playbook on how modern companies must meet challenges – culturally, globally, digitally, across genders and generations.

Ardi is currently the Managing Director and Founder of Corporate Anthropology Advisors, LLC, a consulting company that provides human capital advisory and innovative solutions to companies building value through people. Corporate Anthropology works with organizations, their cultures, the way they grow and develop, and the people who are responsible for forming their communities of work.

Prior to her position at Corporate Anthropology Advisors, Ardi served as a Partner/Managing Director at the private equity firms CCMP Capital and JPMorgan Partners. She was a partner at Flatiron Partners, a venture capital firm working with early state companies where she pioneered the human capital role within an investment portfolio.

Ardi holds a BS from the State University of New York at Buffalo as well as a Masters degree and PhD from Boston College. She started her career as professor at the Graduate Center at Fordham University in New York.