Posted 3 months ago

Typo's in transformation functions #167

I'm currently working through IBM's coursera notebooks, and there appear to be some errors in the .ipynb's for certain transformations. Specifically:

"claimed/component-library/transform/spark-csv-to-parquet.ipynb" : destination path and parqet filename is stored in a variable "output_data_parquet" (third code cell). In code cell 5: data_dir + data_parquet fails to run because data_parquet is not defined. I think this should be output_data_parquet as appears in the eighth code cell.

"claimed/component-library/transform/spark-sql.ipynb" : In cell 4, where the environment variables are defined, "data_dir" is defined twice. The first occurance appears to be correct based on the comment. The second occurance appears to be incorrect, as the comment suggests it should be a sql query. As a result, in cell 7, the variable "sql" is not defined. I think that the second occurance of data_dir should really be a line along the lines of: "sql = os.environ.get('sql_query, 'select * from df')"