Tuesday, March 27, 2012

Data Languages: So Much To Fix

Computer programmer:
"Too much imagination can be disruptive"

Programming Language Researcher:
"Too little imagination can have harmful long term effects"

Data-focused programming languages include both data management programming languages (SQL) and statistical programming languages ( SAS and R(same as S) ). (Also variants of PL/SQL and also MDX). Because of data preparation issues, the two categories overlap.

Since the mid-1980s, there has been very little fundamental research and development in the subject of data languages. The over-hyping of SQL/relational might be the most important contributing factor.
The result is: pretty serious worker efficiency problems present 25 years ago that remain unsolved today.

The four most important obstacles for the end-user doing data-related work are:
1: Usability (learning curve and readability)
2: Worker productivity
3: Flexibility
4: Reliability (in the QA/reproducibility sense)

My name is Robert Wilkins, I am a statistical programming language researcher, and I am here to solve your problems.
For the purpose of statistical table production, in the context of worker productivity, I've already designed and implemented a new language that blows the SAS programming language out of the water.
But there is also data cleaning , a more multi-faceted and difficult area of research. There are obvious things that can be done, in the relative short-term, that would greatly benefit scientific researchers, among others. It is, however, a difficult subject that requires long-term research efforts as well. I am well aware that SQL is the only open-standard well known data-handling programming language, but that fact was a mistake. SQL (or SQL alone) is not a good enough solution for handling data preparation work efficiently.

No comments:

Post a Comment