Sunday, January 07, 2018

HPL/SQL: Procedural SQL on Hadoop, NoSQL and RDBMS

I admit it - I've a lot to learn when it comes to the internals of big data technologies. Even though I've worked on many data science projects at Goibibo, my knowledge of the principles involved is very limited. So when this project happened, where we needed to build some data pipelines for business needs, I found the perfect opportunity to explore the Hadoop and Hive ecosystem. This blog post is about a new tool that I came across, HPL/SQL.


Years back, the database world was ruled by relational databases, chief amongst them being Oracle. And it had this great language for writing queries - PL/SQL. From the Oracle website,
PL/SQL is a procedural language designed specifically to embrace SQL statements within its syntax. PL/SQL program units are compiled by the Oracle Database server and stored inside the database. And at run-time, both PL/SQL and SQL run within the same server process, bringing optimal efficiency. PL/SQL automatically inherits the robustness, security, and portability of the Oracle Database.
However, the explosion in information technology requirements led to a humongous amount of data being available for analysis. This gave rise to two other classes of databases:

Big data systems
Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them.
NOSQL database
NoSQL (originally referring to "non SQL") database provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
So depending on the problem at hand, engineers now had a host of RDBMS (Oracle, MySQL, Aurora etc), NOSQL (Cassandra, MongoDB, Redis etc), and Big Data systems (Hadoop) to choose from. 

This gave rise to a complex challenge - most data analysis earlier could be powered by the same RDBMS alone - example, businesses could build intelligence using stored procedures and triggers in PL/SQL on Oracle. In the world of multiple databases, engineers now have to learn multiple tools for data collection, analysis and so on - example Spark, Mapreduce, Hive and so on.

Different ecosystems and data analysis tools started coming out with their approaches to solve this problem. For example, connecting to multiple databases, custom SQL that is compatible with a covered set of databases and so on. However, the coverage has not been big so far.

To solve this complex problem, Hive comes with a language/tool called as HPL/SQL. The promise of HPL/SQL is simple - it is a hybrid and heterogeneous language that understands syntax of most  procedural SQL dialect, and can be used with any databases, running existing Oracle PL/SQL code on Apache Hive and Microsoft SQL Server, or running Transact-SQL on Oracle, Cloudera Impala or Amazon Redshift. So, you can use it as a language to write new procedural SQL with Hive, and you can use it as a tool to execute existing PL/SQL and logic that you have written for other databases.

Compared to writing code in Python, Java or Scala, HPL/SQL is much more friendly for the BI/SQL expert to run, and allows them to hit the ground running faster. It offers features like
  • Control Flow Statements
  • Built-in Functions
  • Stored Procedures, Functions and Packages
  • Exception and Condition Handling
  • On-the-fly SQL Conversion
So, if you are exploring an efficient way to implement ETL processes in Hadoop, give HPL/SQL a try. Since Hive 2.0, HPL/SQL comes built in with Hive. All you need to do is, 

When executing commands directly:

hplsql -e "SELECT * FROM src LIMIT 1" 
When running from a file:
hplsql -f script.sql
You can go through the documentation to get more details if required.

31 comments:

  1. I want to know if it supports inserting data by partition.
    insert into tablename partition(partition_name) from select * from table;
    but it is error .
    Function:not found partition

    ReplyDelete
  2. i don`t konw why my name is unknown .
    this is my email : 10372756845aa@gmail.com

    ReplyDelete
  3. The control, stockpiling, and correspondence of advanced data require capable utilization of calculations. Software engineering encourages the use of these calculations.ExcelR Data Science Courses

    ReplyDelete
  4. Awesome blog. I enjoyed reading your articles. This is truly a great read for me. Keep up the good work!


    data analytics course

    data science course

    business analytics course

    ReplyDelete
  5. I have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.

    ExcelR data analytics courses

    ReplyDelete
  6. Excellent! I love to post a comment that "The content of your post is awesome" Great work!

    best data analytics courses in mumbai

    ReplyDelete
  7. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression

    ReplyDelete
  8. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.

    Simple Linear Regression

    Correlation vs Covariance

    ReplyDelete
  9. Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome.You can also check my articles as well.

    Data Science In Banglore With Placements
    Data Science Course In Bangalore
    Data Science Training In Bangalore
    Best Data Science Courses In Bangalore
    Data Science Institute In Bangalore

    Thank you..

    ReplyDelete
  10. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression
    data science interview questions

    ReplyDelete
  11. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
    360DigiTMG

    ReplyDelete
  12. Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Simple Linear Regression
    Correlation vs covariance
    data science interview questions
    KNN Algorithm
    Logistic Regression explained

    ReplyDelete
  13. Very nice blogs!!! i have to learning for lot of information for this sites…Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing, data sciecne course in hyderabad

    ReplyDelete
  14. "Thank you for sharing wonderful information with us.Really useful for everyone data scientist courses
    "

    ReplyDelete
  15. very well explained .I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Simple Linear Regression
    Correlation vs covariance
    data science interview questions
    KNN Algorithm
    Logistic Regression explained

    ReplyDelete
  16. Attend The Business Analytics Courses From ExcelR. Practical Business Analytics Courses Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Analytics Courses. data science course in Hyderabad

    ReplyDelete
  17. If you are interested in data science, you may want to check out its training in data science course in india

    ReplyDelete
  18. I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
    Data Science courses

    ReplyDelete
  19. I have bookmarked your website because this site contains valuable information in it. I am really happy with articles quality and presentation. Thanks a lot for keeping great stuff. I am very much thankful for this site.
    data scientist training and placement

    ReplyDelete
  20. Highly appreciable regarding the uniqueness of the content. This perhaps makes the readers feels excited to get stick to the subject. Certainly, the learners would thank the blogger to come up with the innovative content which keeps the readers to be up to date to stand by the competition. Once again nice blog keep it up and keep sharing the content as always.

    Data Science Course in Bhilai

    ReplyDelete
  21. Fantastic blog extremely good well enjoyed with the incredible informative content which surely activates the learners to gain the enough knowledge. Which in turn makes the readers to explore themselves and involve deeply in to the subject. Wish you to dispatch the similar content successively in future as well.

    data science training in bhilai

    ReplyDelete
  22. Truly mind blowing blog went amazed with the subject they have developed the content. These kind of posts really helpful to gain the knowledge of unknown things which surely triggers to motivate and learn the new innovative contents. Hope you deliver the similar successive contents forthcoming as well.

    data science in bangalore

    ReplyDelete
  23. This is a splendid website! I"m extremely content with the remarks!.
    data scientist training and placement in hyderabad

    ReplyDelete
  24. Thanks for posting the best information and the blog is very good .digital marketing institute in hyderabad

    ReplyDelete
  25. Thanks for posting the best information and the blog is very good.artificial intelligence course in hyderabad

    ReplyDelete
  26. Thanks for posting the best information and the blog is very good.data science institutes in hyderabad

    ReplyDelete
  27. Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing. best institute for cloud computing in hyderabad

    ReplyDelete
  28. I wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts.
    data analytics courses in hyderabad with placements

    ReplyDelete