Sunday, January 07, 2018

HPL/SQL: Procedural SQL on Hadoop, NoSQL and RDBMS

I admit it - I've a lot to learn when it comes to the internals of big data technologies. Even though I've worked on many data science projects at Goibibo, my knowledge of the principles involved is very limited. So when this project happened, where we needed to build some data pipelines for business needs, I found the perfect opportunity to explore the Hadoop and Hive ecosystem. This blog post is about a new tool that I came across, HPL/SQL.


Years back, the database world was ruled by relational databases, chief amongst them being Oracle. And it had this great language for writing queries - PL/SQL. From the Oracle website,
PL/SQL is a procedural language designed specifically to embrace SQL statements within its syntax. PL/SQL program units are compiled by the Oracle Database server and stored inside the database. And at run-time, both PL/SQL and SQL run within the same server process, bringing optimal efficiency. PL/SQL automatically inherits the robustness, security, and portability of the Oracle Database.
However, the explosion in information technology requirements led to a humongous amount of data being available for analysis. This gave rise to two other classes of databases:

Big data systems
Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them.
NOSQL database
NoSQL (originally referring to "non SQL") database provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
So depending on the problem at hand, engineers now had a host of RDBMS (Oracle, MySQL, Aurora etc), NOSQL (Cassandra, MongoDB, Redis etc), and Big Data systems (Hadoop) to choose from. 

This gave rise to a complex challenge - most data analysis earlier could be powered by the same RDBMS alone - example, businesses could build intelligence using stored procedures and triggers in PL/SQL on Oracle. In the world of multiple databases, engineers now have to learn multiple tools for data collection, analysis and so on - example Spark, Mapreduce, Hive and so on.

Different ecosystems and data analysis tools started coming out with their approaches to solve this problem. For example, connecting to multiple databases, custom SQL that is compatible with a covered set of databases and so on. However, the coverage has not been big so far.

To solve this complex problem, Hive comes with a language/tool called as HPL/SQL. The promise of HPL/SQL is simple - it is a hybrid and heterogeneous language that understands syntax of most  procedural SQL dialect, and can be used with any databases, running existing Oracle PL/SQL code on Apache Hive and Microsoft SQL Server, or running Transact-SQL on Oracle, Cloudera Impala or Amazon Redshift. So, you can use it as a language to write new procedural SQL with Hive, and you can use it as a tool to execute existing PL/SQL and logic that you have written for other databases.

Compared to writing code in Python, Java or Scala, HPL/SQL is much more friendly for the BI/SQL expert to run, and allows them to hit the ground running faster. It offers features like
  • Control Flow Statements
  • Built-in Functions
  • Stored Procedures, Functions and Packages
  • Exception and Condition Handling
  • On-the-fly SQL Conversion
So, if you are exploring an efficient way to implement ETL processes in Hadoop, give HPL/SQL a try. Since Hive 2.0, HPL/SQL comes built in with Hive. All you need to do is, 

When executing commands directly:

hplsql -e "SELECT * FROM src LIMIT 1" 
When running from a file:
hplsql -f script.sql
You can go through the documentation to get more details if required.

21 comments:

  1. I want to know if it supports inserting data by partition.
    insert into tablename partition(partition_name) from select * from table;
    but it is error .
    Function:not found partition

    ReplyDelete
  2. i don`t konw why my name is unknown .
    this is my email : 10372756845aa@gmail.com

    ReplyDelete
  3. The control, stockpiling, and correspondence of advanced data require capable utilization of calculations. Software engineering encourages the use of these calculations.ExcelR Data Science Courses

    ReplyDelete
  4. Awesome blog. I enjoyed reading your articles. This is truly a great read for me. Keep up the good work!


    data analytics course

    data science course

    business analytics course

    ReplyDelete
  5. Excellent! I love to post a comment that "The content of your post is awesome" Great work!

    best data analytics courses in mumbai

    ReplyDelete
  6. "Thank you for sharing wonderful information with us.Really useful for everyone data scientist courses
    "

    ReplyDelete
  7. If you are interested in data science, you may want to check out its training in data science course in india

    ReplyDelete
  8. I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
    Data Science courses

    ReplyDelete
  9. I have bookmarked your website because this site contains valuable information in it. I am really happy with articles quality and presentation. Thanks a lot for keeping great stuff. I am very much thankful for this site.
    data scientist training and placement

    ReplyDelete
  10. This is a splendid website! I"m extremely content with the remarks!.
    data scientist training and placement in hyderabad

    ReplyDelete
  11. Thanks for posting the best information and the blog is very good .digital marketing institute in hyderabad

    ReplyDelete
  12. I wanted to leave a little comment to support you and wish you a good continuation. Wishing you the best of luck for all your blogging efforts.
    data analytics courses in hyderabad with placements

    ReplyDelete
  13. In this article, the author writes about HPL/SQL, a hybrid and heterogeneous language that understands syntax of most procedural SQL dialects and can be used with any databases. I appreciate sharing such an important insights here with the readers. Thank you.
    Is iim skills fake?

    ReplyDelete
  14. Your writing is very useful and instructive. I frequently read these articles to expand my knowledge and better my comprehension. I appreciate you sharing.
    Data Analytics Courses in Agra

    ReplyDelete
  15. This is a great and intriguing article. I had been seeking this type of information and found it enjoyable to read. Please continue to publish, and thank you for sharing.
    daa Analytics courses in leeds

    ReplyDelete
  16. The blog post on HPL/SQL is very informative and insightful. Thanks for sharing valuable information.
    Digital Marketing Courses in Italy

    ReplyDelete
  17. Thank you for sharing in depth knowledge and insights on Procedural SQL on Hadoop, NoSQL and RDBMS.
    Adwords marketing

    ReplyDelete
  18. really well written article, great work on the blog post
    GST Certification Course

    ReplyDelete