jtcohen6

jtcohen6

data, theatre, vegetable eater

Member Since 6 years ago

@dbt-labs, Marseille

Experience Points
53
follower
Lessons Completed
11
follow
Lessons Completed
55
stars
Best Reply Awards
10
repos

1992 contributions in the last year

Pinned
⚡ dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Activity
May
20
3 days ago
Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

Threshold argument for Test Severity

Describe the feature

  • A threshold argument for test severity.
  • Useful for more flexibility around test severity. Eg: If the test fails for < 10 IDs, severity can be set to a warning. If the test fails for > 10 IDs, severity can be set to an error.

Describe alternatives you've considered

  • Currently we've been keeping an eye on daily test failures and switching severity around for tests as needed. This is not scalable or efficient.

Additional context

  • This is not database specific.

Who will this benefit?

  • We've found multiple use cases in relationship tests in multi-layer modeling like event modeling.
Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

[CT-668] [Bug] dbt-core capturing KeyboardInterrupt Event with significant delay after pressing ctrl-c

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

When pressing ctrl-c while dbt model executing, dbt-core waiting to capture KeyboardInterrupt Event until current model execution completion and also its waiting even after that with significant delay.

My output: 14:48:40 1 of 8 START view model dbt_test.stg_customers.................................. [RUN] 14:48:55 1 of 8 OK created view model dbt_test.stg_customers............................. [OK in 14.96s] 14:48:55 2 of 8 START view model dbt_test.stg_orders..................................... [RUN] 14:49:15 2 of 8 OK created view model dbt_test.stg_orders................................ [OK in 19.97s] 14:49:15 dbt-core adapter: KeyboardInterrupt Event Captured!!! 14:49:15 3 of 8 START view model dbt_test.stg_payments................................... [RUN] 14:49:15 teradata adapter: Canceling Open connections from teradata adapter 14:49:15 teradata adapter: Connection Name : list_schemas, connection state is: closed 14:49:15 teradata adapter: Connection Name : list_None_dbt_test, connection state is: closed 14:49:15 teradata adapter: Connection Name : model.jaffle_shop.stg_payments, connection state is: closed

Here the run was cancelled with ctrl+c after 1 second from starting but the model was running for the full almost 15 seconds despite this and also next model ran for 20 secs.

Expected Behavior

When pressing ctrl-c while dbt model exeucting, dbt-core should capture KeyboardInterrupt Event immediately and it should not wait until model execution.

Steps To Reproduce

  1. Create dbt project with dbt-teradata adapter
  2. Create model with a long runtime, materialised as a table (my example runs about 400 seconds)
  3. Run the model with dbt run and model selector
  4. Wait for model to start executing
  5. Cancel the run with ctrl+c in the terminal
  6. Check the console or log file

Relevant log output

$ dbt  run
14:47:54  Running with dbt=1.0.1
14:47:54  Found 8 models, 20 tests, 0 snapshots, 0 analyses, 197 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics
14:47:54
14:48:40  Concurrency: 1 threads (target='dev')
14:48:40
14:48:40  1 of 8 START view model dbt_test.stg_customers.................................. [RUN]
14:48:55  1 of 8 OK created view model dbt_test.stg_customers............................. [OK in 14.96s]
14:48:55  2 of 8 START view model dbt_test.stg_orders..................................... [RUN]
14:49:15  2 of 8 OK created view model dbt_test.stg_orders................................ [OK in 19.97s]
14:49:15  dbt-core adapter: Satish: KeyboardInterrupt Event Captured!!!
14:49:15  3 of 8 START view model dbt_test.stg_payments................................... [RUN]
14:49:15  teradata adapter:  Canceling Open connections from teradata adapter
14:49:15  teradata adapter:  Connection Name :  list_schemas, connection state is: closed
14:49:15  teradata adapter:  Connection Name :  list_None_dbt_test, connection state is: closed
14:49:15  teradata adapter:  Connection Name :  model.jaffle_shop.stg_payments, connection state is: closed
14:49:26  teradata adapter:  calling teradata adapter cancel method to cancel connection : model.jaffle_shop.stg_payments as state of this connection is open
14:49:26  teradata adapter:  Closing connection name: model.jaffle_shop.stg_payments
14:49:26  teradata adapter: returning all connection names to dbt core ['list_schemas', 'list_None_dbt_test', 'model.jaffle_shop.stg_payments']
14:49:26  CANCEL query PrintCancelLine from dbt-core list_schemas......................... [CANCEL]
14:49:26  CANCEL query PrintCancelLine from dbt-core list_None_dbt_test................... [CANCEL]
14:49:26  CANCEL query PrintCancelLine from dbt-core model.jaffle_shop.stg_payments....... [CANCEL]
14:49:26  Unhandled error while executing model.jaffle_shop.stg_payments
6 is not a valid connection pool handle
14:49:26  3 of 8 ERROR creating view model dbt_test.stg_payments.......................... [ERROR in 11.10s]
14:49:26
14:49:26  Exited because of keyboard interrupt.
14:49:26
14:49:26  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
14:49:28  ctrl-c

Environment

- OS: Windows 10 
- Python:Python 3.9.11
- dbt:1.0.1
    Plugins:
     - teradata: 1.0.0a

What database are you using dbt with?

other (mention it in "Additional Context")

Additional Context

database : teradata connector : dbt-teradata (https://github.com/Teradata/dbt-teradata Parent Issue: https://github.com/Teradata/dbt-teradata/issues/35

dbt-teradata adapter supports cancelling connections. this extended dbt sql adatper functionality and it inherits the is_cancelable

jtcohen6
jtcohen6

Linking Slack thread that prompted this issue, for additional context.

This exact issue is going to be quite tricky for us to reproduce exactly ourselves, given the reasons @iknox-fa mentioned above. Based on the information provided, it does seem like the KeyboardInterrupt is blocked by the current model's execution:

14:49:15 2 of 8 OK created view model dbt_test.stg_orders................................ [OK in 19.97s]
14:49:15 dbt-core adapter: KeyboardInterrupt Event Captured!!!

The logic to catch that and cancel is all inside dbt-core:

https://github.com/dbt-labs/dbt-core/blob/e50678c91424142aa1a4995ee1d358dac6a6fdf8/core/dbt/task/runnable.py#L386-L389

@SatishChGit Could you confirm that the dbt-core adapter: KeyboardInterrupt Event Captured!!! log line is something you put in place, within dbt-core's codebase, in the spot linked just above?

Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

[CT-670] [Feature] Standard result node selectors

Is this your first time opening an issue?

Describe the Feature

Today to use the results node selectors for tests vs models, you have to remember that for models, you should use result:error and for tests: you can use result:fail.

While there is a clear difference between a runtime error (SQL failed during execution) and failed assumptions (data does not meet expectations), the way we have named the differences feels pedantic and requires more mental leaps than necessary. I understand that a test could fail or error out but right now, it's not like we support result:error for tests so I think it would be better to align on one agreed-upon declaration of "something went wrong, let's rerun the things that didn't pass the initial run".

image

Describe alternatives you've considered

Because the goal is to point to a failure or error and there is a clear distinct between what is an error and a fail for tests, I think we should agree on an synonym so it's higher level (to signal this failure) or just go with fail and call it a day

Who will this benefit?

Anyone trying to grasp an understanding of all of the different node selectors

Are you interested in contributing this feature?

yes

Anything else?

No response

jtcohen6
jtcohen6

@amychen1776 Thanks for opening!

Goal

We should align on 5 types of statuses, and then pick good names for each one:

  1. Success: Everything ran as expected. Model/snapshot/source succeeded when building, and data in tests / source freshness meets expectations. (Do these need to be two different ones, success and pass? I don't think so personally, but I'm open to disagreement.)
  2. Warning: Data does not meet expectations, but the user has opted to be warned for this test / at this threshold.
  3. Data quality error / failure / stale: Data does not meet expectations, beyond an acceptable threshold, or for a higher-priority test.
  4. Runtime error: SQL encountered an error while running. Syntax error, the table is missing, the merge was aborted halfway through.
  5. Skipped: Didn't get run. Today, this can only be because an upstream DAG resource encountered status 3 or 4. In the future, could this also happen because the model doesn't need re-running (e.g. materialized view)?

The fuzziest distinction today is definitely between 3 + 4. In the case of models/seeds/snapshots + tests, error means 4; in the case of source freshness, error means 3, whereas runtime error means 4. Let's fix it.

Implementation

All of these statuses are defined here.

This is most closely related to Execution. The relevant stakeholders are really our metadata consumers:

  1. consumers of run_results.json
  2. consumers of structured logging
  3. result: selector method
  4. people who use Results Jinja object, i.e. within post-hooks

I think it might be easy enough for us to offer backwards compatibility for the rename in 3+4. Doing so for 1+2 would be a bit more difficult.

I think it would be better to align on one agreed-upon declaration of "something went wrong, let's rerun the things that didn't pass the initial run

I also think it's totally reasonable for us to provide, within the result: selector method in particular, some catch-all selection method for "did not succeed," whether because of runtime errors or failure to meet data quality expectations. (Should that include warn statuses as well?)

Open questions

  • Do those 5 statuses make sense? Are those the right conceptual categories?
  • Can we come up with clear one-word names for each one, that apply across resource types and are sufficiently distinct from one another?
Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

[CT-672] _relations_cache_for_schemas() takes 2 positional arguments but 3 were given

First time install of DBT (dbt-postgres) on Ubuntu.

dbt init myFirstProject

Changed settings in ~/.dbt/profiles.yml

dbt run returns: _relations_cache_for_schemas() takes 2 positional arguments but 3 were given

Log files don't seem to give hints on the cause of the error. How can I find the cause of this error?

jtcohen6
jtcohen6

@smithics Could you run dbt --version and see what it returns? It sounds like you may have mismatched installations of dbt-core and dbt-postgres. In v1.1 we expanded the signature of this method-,Internal%20adapter%20methods%20set_relations_cache%20%2B%20_relations_cache_for_schemas%20each%20take%20an%20additional%20argument%2C%20for%20use%20with%20experimental%20CACHE_SELECTED_ONLY%20config%20(%234688%2C%20%234860),-We%20have%20a), to accept an additional argument.

May
19
4 days ago
push

jtcohen6 push dbt-labs/dbt-core

jtcohen6
jtcohen6

changie - convert changelogs to yaml files and make quality of life improvements (#4917)

  • convert changelog to changie yaml files

  • update contributor format and README instructions

  • update action to rerun when labeled/unlabled

  • remove synchronize from action

  • remove md file replaced by the yaml

  • add synchronize and comment of what's happening

  • tweak formatting

jtcohen6
jtcohen6

Updating backport workflow to use forked action (#4920)

jtcohen6
jtcohen6

update of macro for postgres/redshift use of unique_key as a list (#4858)

  • pre-commit additions

  • added changie changelog entry

  • moving integration test over

  • Pair programming

  • removing ref to mapping as seems to be unnecessary check, unique_key tests pass locally for postgres

Co-authored-by: Jeremy Cohen [email protected]

jtcohen6
jtcohen6

Add space before justification periods (#4744)

  • Update format.py

  • Update CHANGELOG.md

  • add change file

Co-authored-by: Gerda Shank [email protected]

jtcohen6
jtcohen6

Bumping version to 1.1.0b1 (#4933)

  • Bumping version to 1.1.0b1
jtcohen6
jtcohen6

Fix inconsistent timestamps snapshots (#4513)

jtcohen6
jtcohen6

add compliation and cache tracking (#4912)

jtcohen6
jtcohen6

remove capping version of typing extensions (#4934)

jtcohen6
jtcohen6

Cosmetic changelog/changie fixups (#4944)

  • Reorder kinds in changie

  • Reorder change categories for v1.1.0b1

  • Update language for breaking change

  • Contributors deserve an h3

  • Make pre-commit happy? Update language

  • Rm trailing whitespace

jtcohen6
jtcohen6

Convert source tests (#4935)

  • convert 059 to new test framework

  • remove replaced tests

  • WIP, has pre-commit errors

  • WIP, has pre-commit errors

  • one failing test, most issued resolved

  • fixed final test and cleaned up fixtures

  • remove converted tests

  • updated test to work on windows

  • remove config version

jtcohen6
jtcohen6

Custom names for generic tests (#4898)

  • Support user-supplied name for generic tests

  • Support dict-style generic test spec

  • Add changelog entry

  • Add TODO comment

  • Rework raise_duplicate_resource_name

  • Add functional tests

  • Update comments, rm TODO

  • PR feedback

jtcohen6
jtcohen6

Create a dbt.tests.adapter release when releasing dbt and postgres (#4948)

  • update black version for pre-commit
jtcohen6
jtcohen6

Remove unneeded code in default snapshot materialization (#4993)

  • Rm unneeded create_schema in snapshot mtlzn

  • Add changelog entry

jtcohen6
jtcohen6

[Snyk] Security upgrade python from 3.9.9-slim-bullseye to 3.10.3-slim-bullseye (#4963)

  • fix: docker/Dockerfile to reduce vulnerabilities

The following vulnerabilities are fixed with an upgrade:

  • add changelog entry

Co-authored-by: Nathaniel May [email protected]

jtcohen6
jtcohen6

[CT-352] catch and retry malformed json (#4982)

  • catch None and malformed json reponses

  • add json.dumps for format

  • format

  • Cache registry request results. Avoid one request per version

  • updated to be direct in type checking

  • add changelog entry

  • add back logic for none check

  • PR feedback: memoize > global

  • add checks for expected types and keys

  • consolidated cache and retry logic

  • minor cleanup for clarity/consistency

  • add pr review suggestions

  • update unit test

Co-authored-by: Jeremy Cohen [email protected]

jtcohen6
jtcohen6

include directory README (#4685)

  • start of a README for the include directory

  • minor updates

  • minor updates after comments from gerda and emily

  • trailing space issue?

  • black formatting

  • minor word change

  • typo update

  • minor fixes and changelog creation

  • remove changelog

jtcohen6
jtcohen6

add DO_NOT_TRACK environment variable support (#5000)

jtcohen6
jtcohen6

init push up of converted unique_key tests (#4958)

  • init push up of converted unique_key tests

  • testing cause of failure

  • adding changelog entry

  • moving non basic test up one directory to be more broadly part of adapter zone

  • minor changes to the bad_unique_key tests

  • removed unused fixture

  • moving tests to base class and inheriting in a simple class

  • taking in chenyu's changes to fixtures

  • remove older test_unique_key tests

  • removed commented out code

  • uncommenting seed_count

  • v2 based on feedback for base version of testing, plus small removal of leftover breakpoint

  • create incremental test directory in adapter zone

  • commenting out TableComparision and trying to implement check_relations_equal instead

  • remove unused commented out code

  • changing cast for date to fix test to work on bigquery

jtcohen6
jtcohen6

Remove TableComparison and convert existing calls to use dbt.tests.util (#4986)

commit sha: 81e6a2b0d3ca4122e08945fae97c6ededfca3490

push time in 3 days ago
Activity icon
delete

jtcohen6 in dbt-labs/dbt-core delete branch jerco/secret-rendering

deleted time in 3 days ago
push

jtcohen6 push dbt-labs/dbt-core

jtcohen6
jtcohen6

Truncate relation names when appending a suffix (#4921)

  • Truncate relation names when appending a suffix that will result in len > 63 characters using make_temp_relation and make_backup_relation macros

  • Remove timestamp from suffix appended to backup relation

  • Add changelog entry

  • Implememt make_relation_with_suffix macro

  • Add make_intermediate_relation macro that controls _tmp relation creation in table and view materializations to delienate from database- and schema-less behavior of relation returned from make_temp_relation

  • Create backup_relation at top of materialization to use for identifier

  • cleanup

  • Add dstring arg to make_relation_with_suffix macro

  • Only reference dstring in conditional of make_relation_with_suffix macro

  • Create both a temp and intermediate relation, update preexisting_temp_relation to preexisting_intermediate_relation

  • Migrate test updates to new test location

  • Remove restored tmp.csv

  • Revert "Remove restored tmp.csv"

This reverts commit 900c9dbcad9a1e6a5a6737c84004504bfdd9926f.

  • Actually remove restored tmp.csv

commit sha: e7218d3e99837f0139fb7ecd367d3bdf1135a961

push time in 3 days ago
Activity icon
delete

jtcohen6 in dbt-labs/dbt-core delete branch dev/epapineau

deleted time in 3 days ago
pull request

jtcohen6 pull request dbt-labs/dbt-core

jtcohen6
jtcohen6

Truncate relation names when appending a suffix

Truncate relation names when appending a suffix that will result in len > 63 characters using make_temp_relation and make_backup_relation macros

resolves #2869

Description

Suffixes are appended to temp and backup relation names. However, these may exceed the 63 character limit on Postgres which currently raises a compiler Error. This PR leverages existing make_temp_relation macro and adds make_backup_relation and make_intermediate_relation macros to truncate base relation name when generated relation name exceeds character length.

Checklist

  • I have signed the CLA
  • I have added information about my change to be included in the CHANGELOG.
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
Activity icon
issue

jtcohen6 issue dbt-labs/dbt-core

jtcohen6
jtcohen6

Relation name '*__dbt_tmp' is longer than 63 characters

Describe the bug

My view name is pretty long, but it is less than 63 characters.

When dbt adds the __dbt_tmp suffix, it goes over the limit of 63 chars.

The following error is thrown by PostgreSQL:

Relation name 'foo__dbt_tmp' is longer than 63 characters

Steps To Reproduce

  1. Use a view name that is 63 characters long.
  2. dbt run

Expected behavior

To work anyways.

Screenshots and log output

See above.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

Using Docker image fishtownanalytics/dbt:0.18.1.

The operating system you're using:

Using Docker image fishtownanalytics/dbt:0.18.1.

The output of python --version:

Using Docker image fishtownanalytics/dbt:0.18.1.

Additional context

Perhaps the temporary name can be truncated by the length of the suffix to make sure it fits within the limits?

pull request

jtcohen6 merge to dbt-labs/dbt-core

jtcohen6
jtcohen6

Truncate relation names when appending a suffix

Truncate relation names when appending a suffix that will result in len > 63 characters using make_temp_relation and make_backup_relation macros

resolves #2869

Description

Suffixes are appended to temp and backup relation names. However, these may exceed the 63 character limit on Postgres which currently raises a compiler Error. This PR leverages existing make_temp_relation macro and adds make_backup_relation and make_intermediate_relation macros to truncate base relation name when generated relation name exceeds character length.

Checklist

  • I have signed the CLA
  • I have added information about my change to be included in the CHANGELOG.
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
jtcohen6
jtcohen6

Amazing work @epapineau! Thanks so much for contributing, and for seeing this all the way through :)

Activity icon
created branch

jtcohen6 in dbt-labs/dbt-core create branch jerco/secret-rendering

createdAt 3 days ago
push

jtcohen6 push dbt-labs/dbt-core

jtcohen6
jtcohen6

Feature: Add set/zip function to contexts (#5107)

  • add set function to contexts

  • add zip function to contexts

  • add changelog

  • add try_ equivalents

  • remove defaults

  • add tests

  • update tests

jtcohen6
jtcohen6

fix: Avoid access to profile when calling str(UnsetProfileConfig) (#5209)

  • fix: Avoid access to profile when calling str(UnsetProfileConfig)

dbt.config.UnsetProfileConfig inherits str from dbt.config.Project. Moreover, UnsetProfileConfig also raises an exception when attempting to access unset profile attributes. As Project.str ultimately calls to_project_config and accesses said profile attributes, we override to_project_config in UnsetProfileConfig to avoid accessing the attributes that raise an exception.

This allows calling str(UnsetProfileConfig) and repr(UnsetProfileConfig).

Basic unit testing is also included in commit.

  • fix: Skip repr for profile fields in UnsetProfileConfig

  • chore(changie): Add changie file

jtcohen6
jtcohen6

Add support for File Selectors and add file selectors to the default method selector list (#5241)

  • Add a new selector method for files and add it to the default method selection criteria if the given selector has a . in it but no path separators

  • Add a file: selector method to the default selector methods because it will make Pedram happy

  • changie stuff

jtcohen6
jtcohen6

Fix macro modified from previous state with pkg (#5224)

  • Fix macro modified from previous state with pkg

When iterating through nodes to check if any of its macro dependencies have been modified, the state selector will first check all upstream macro dependencies before returning a judgement.

jtcohen6
jtcohen6

Add dbt Core roadmap as of May 2022 (#5246)

jtcohen6
jtcohen6

Tweak test to avoid set ordering problem (#5272)

jtcohen6
jtcohen6

Creating ADR for versioning and branching strategy (#4998)

  • Creating ADR for versioning and branching strategy

  • Fixing image link

  • Grammar clean-up

Co-authored-by: Stu Kilgore [email protected]

  • Grammar clean-up

Co-authored-by: Stu Kilgore [email protected]

  • Update docs/arch/adr-003-versioning-branching-strategy.md

Co-authored-by: Stu Kilgore [email protected]

  • Update docs/arch/adr-003-versioning-branching-strategy.md

Co-authored-by: Stu Kilgore [email protected]

  • Update docs/arch/adr-003-versioning-branching-strategy.md

Co-authored-by: Stu Kilgore [email protected]

  • Update docs/arch/adr-003-versioning-branching-strategy.md

Co-authored-by: Stu Kilgore [email protected]

  • Update docs/arch/adr-003-versioning-branching-strategy.md

Co-authored-by: Stu Kilgore [email protected]

  • Update docs/arch/adr-003-versioning-branching-strategy.md

Co-authored-by: Stu Kilgore [email protected]

  • Update docs/arch/adr-003-versioning-branching-strategy.md

Co-authored-by: Stu Kilgore [email protected]

  • Update docs/arch/adr-003-versioning-branching-strategy.md

Co-authored-by: Stu Kilgore [email protected]

  • Updating Outside Scope section

  • Changing from using type to stage

  • Adding section on getting changes into certain releases

  • Changed stages to phases

  • Some wording updates

  • New section for branching pros and cons

  • Clarifying version bump statement

  • A few minor comment fix ups

  • Adding requirement to define released

  • Updating to completed!

Co-authored-by: Stu Kilgore [email protected]

jtcohen6
jtcohen6

Truncate relation names when appending a suffix that will result in len > 63 characters using make_temp_relation and make_backup_relation macros

jtcohen6
jtcohen6

Remove timestamp from suffix appended to backup relation

jtcohen6
jtcohen6
jtcohen6
jtcohen6

Implememt make_relation_with_suffix macro

jtcohen6
jtcohen6

Add make_intermediate_relation macro that controls _tmp relation creation in table and view materializations to delienate from database- and schema-less behavior of relation returned from make_temp_relation

jtcohen6
jtcohen6

Create backup_relation at top of materialization to use for identifier

jtcohen6
jtcohen6

Add dstring arg to make_relation_with_suffix macro

jtcohen6
jtcohen6

Only reference dstring in conditional of make_relation_with_suffix macro

jtcohen6
jtcohen6

Create both a temp and intermediate relation, update preexisting_temp_relation to preexisting_intermediate_relation

jtcohen6
jtcohen6

Migrate test updates to new test location

jtcohen6
jtcohen6
jtcohen6
jtcohen6

Revert "Remove restored tmp.csv"

This reverts commit 900c9dbcad9a1e6a5a6737c84004504bfdd9926f.

commit sha: 80aa9aca21a820657ad938540a0e840faf2de0f2

push time in 4 days ago
Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

Truncate relation names when appending a suffix

Truncate relation names when appending a suffix that will result in len > 63 characters using make_temp_relation and make_backup_relation macros

resolves #2869

Description

Suffixes are appended to temp and backup relation names. However, these may exceed the 63 character limit on Postgres which currently raises a compiler Error. This PR leverages existing make_temp_relation macro and adds make_backup_relation and make_intermediate_relation macros to truncate base relation name when generated relation name exceeds character length.

Checklist

  • I have signed the CLA
  • I have added information about my change to be included in the CHANGELOG.
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
jtcohen6
jtcohen6

Going to rebase this PR against main to pull in the fix we merged yesterday for the flaky failing test

Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

[CT-228] [Feature] Allow tests to warn/fail based on percentage

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

Add ability to support warn/fail threshold being a percentage instead of just fixed number. We will likely need to add the definition of the total number to calculate percentage with. Original request #4334

  - name: my_view
    columns:
      - name: id
        tests:
          - unique:
              config:
                warn_if: "> 10%"
                fail_if: "> 20%"

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

jtcohen6
jtcohen6

@vergenzt If I understand you right, you're thinking that the not_null test should evolve from:

select * from {{ model }}
where {{ column_name }} is null

To:

select
    case when {{ column_name }} is null then true else false end as dbt_test_failed
from {{ model }}

And the default fail_calc, instead of being count(*), should instead be count(dbt_test_failed).

In order to calculate percentages, the fail_calc would simply evolve to 1.0 * count(dbt_test_failed) / count(*), or even to 1.0 * sum(case when dbt_test_failed then some_value else 0 end) / sum(some_value).

That's a neat idea! I think the SQL may be a bit trickier to write for unique, but for not_null and accepted_values it's intuitive enough. We'd need to rethink --store-failures very slightly; rather than saving the whole test query result to the database (i.e. the whole table), it should probably just save:

select * from (
    {{ test_query }}
)
where dbt_test_failed = true

@jaypeedevlin @ehmartens What do you think?

Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

Enable Jinja FileSystem access

resolves #5247

Description

Add simple support of FileSystem for Jinja No Feature toggle yet

Checklist

jtcohen6
jtcohen6

I owe a response on #5247! I assigned myself there, and will be leaving some thoughts. Sorry for the delay!

Activity icon
issue

jtcohen6 issue dbt-labs/dbt-core

jtcohen6
jtcohen6

Increase the usage of augmented assignment statements

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

:eyes: Some source code analysis tools can help to find opportunities for improving software components. :thought_balloon: I propose to increase the usage of augmented assignment statements accordingly.

diff --git a/core/dbt/dataclass_schema.py b/core/dbt/dataclass_schema.py
index da95699e..309d7297 100644
--- a/core/dbt/dataclass_schema.py
+++ b/core/dbt/dataclass_schema.py
@@ -22,7 +22,7 @@ class DateTimeSerialization(SerializationStrategy):
         out = value.isoformat()
         # Assume UTC if timezone is missing
         if value.tzinfo is None:
-            out = out + "Z"
+            out += "Z"
         return out
 
     def deserialize(self, value):
diff --git a/core/dbt/graph/selector_spec.py b/core/dbt/graph/selector_spec.py
index 17c616f6..d82c2f26 100644
--- a/core/dbt/graph/selector_spec.py
+++ b/core/dbt/graph/selector_spec.py
@@ -147,7 +147,7 @@ class SelectionCriteria:
         method_name, method_arguments = cls.parse_method(dct)
         meth_name = str(method_name)
         if method_arguments:
-            meth_name = meth_name + '.' + '.'.join(method_arguments)
+            meth_name += '.' + '.'.join(method_arguments)
         dct['method'] = meth_name
         dct = {k: v for k, v in dct.items() if (v is not None and v != '')}
         if 'childrens_parents' in dct:
diff --git a/core/dbt/parser/manifest.py b/core/dbt/parser/manifest.py
index d36d4a8e..7918e39a 100644
--- a/core/dbt/parser/manifest.py
+++ b/core/dbt/parser/manifest.py
@@ -389,7 +389,7 @@ class ManifestLoader:
                     block = FileBlock(self.manifest.files[file_id])
                     parser.parse_file(block)
                     # increment parsed path count for performance tracking
-                    self._perf_info.parsed_path_count = self._perf_info.parsed_path_count + 1
+                    self._perf_info.parsed_path_count += 1
             # generic tests hisotrically lived in the macros directoy but can now be nested
             # in a /generic directory under /tests so we want to process them here as well
             if 'GenericTestParser' in parser_files:
@@ -398,7 +398,7 @@ class ManifestLoader:
                     block = FileBlock(self.manifest.files[file_id])
                     parser.parse_file(block)
                     # increment parsed path count for performance tracking
-                    self._perf_info.parsed_path_count = self._perf_info.parsed_path_count + 1
+                    self._perf_info.parsed_path_count += 1
 
         self.build_macro_resolver()
         # Look at changed macros and update the macro.depends_on.macros
@@ -441,7 +441,7 @@ class ManifestLoader:
                     parser.parse_file(block, dct=dct)
                 else:
                     parser.parse_file(block)
-                project_parsed_path_count = project_parsed_path_count + 1
+                project_parsed_path_count += 1
 
             # Save timing info
             project_loader_info.parsers.append(ParserInfo(
@@ -449,7 +449,7 @@ class ManifestLoader:
                 parsed_path_count=project_parsed_path_count,
                 elapsed=time.perf_counter() - parser_start_timer
             ))
-            total_parsed_path_count = total_parsed_path_count + project_parsed_path_count
+            total_parsed_path_count += project_parsed_path_count
 
         # HookParser doesn't run from loaded files, just dbt_project.yml,
         # so do separately
@@ -469,7 +469,7 @@ class ManifestLoader:
         project_loader_info.parsed_path_count = (
             project_loader_info.parsed_path_count + total_parsed_path_count
         )
-        project_loader_info.elapsed = project_loader_info.elapsed + elapsed
+        project_loader_info.elapsed += elapsed
         self._perf_info.parsed_path_count = (
             self._perf_info.parsed_path_count + total_parsed_path_count
         )
@@ -689,7 +689,7 @@ class ManifestLoader:
         key_list.sort()
         env_var_str = ''
         for key in key_list:
-            env_var_str = env_var_str + f'{key}:{config.project_env_vars[key]}|'
+            env_var_str += f'{key}:{config.project_env_vars[key]}|'
         project_env_vars_hash = FileHash.from_contents(env_var_str)
 
         # Create a FileHash of the env_vars in the project
@@ -697,7 +697,7 @@ class ManifestLoader:
         key_list.sort()
         env_var_str = ''
         for key in key_list:
-            env_var_str = env_var_str + f'{key}:{config.profile_env_vars[key]}|'
+            env_var_str += f'{key}:{config.profile_env_vars[key]}|'
         profile_env_vars_hash = FileHash.from_contents(env_var_str)
 
         # Create a FileHash of the profile file
diff --git a/test/integration/047_dbt_ls_test/test_ls.py b/test/integration/047_dbt_ls_test/test_ls.py
index 9ab815c7..7fcc47e7 100644
--- a/test/integration/047_dbt_ls_test/test_ls.py
+++ b/test/integration/047_dbt_ls_test/test_ls.py
@@ -42,7 +42,7 @@ class TestStrictUndefined(DBTIntegrationTest):
         log_manager.stdout_console()
         full_args = ['ls']
         if args is not None:
-            full_args = full_args + args
+            full_args += args
 
         result = self.run_dbt(args=full_args, expect_pass=expect_pass)
 
diff --git a/test/unit/test_macro_calls.py b/test/unit/test_macro_calls.py
index 4c2837be..bfa29869 100644
--- a/test/unit/test_macro_calls.py
+++ b/test/unit/test_macro_calls.py
@@ -47,6 +47,6 @@ class MacroCalls(unittest.TestCase):
         for macro_string in self.macro_strings:
             possible_macro_calls = statically_extract_macro_calls(macro_string, ctx)
             self.assertEqual(self.possible_macro_calls[index], possible_macro_calls)
-            index = index + 1
+            index += 1
 
 

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

Increase the usage of augmented assignment statements

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

:eyes: Some source code analysis tools can help to find opportunities for improving software components. :thought_balloon: I propose to increase the usage of augmented assignment statements accordingly.

diff --git a/core/dbt/dataclass_schema.py b/core/dbt/dataclass_schema.py
index da95699e..309d7297 100644
--- a/core/dbt/dataclass_schema.py
+++ b/core/dbt/dataclass_schema.py
@@ -22,7 +22,7 @@ class DateTimeSerialization(SerializationStrategy):
         out = value.isoformat()
         # Assume UTC if timezone is missing
         if value.tzinfo is None:
-            out = out + "Z"
+            out += "Z"
         return out
 
     def deserialize(self, value):
diff --git a/core/dbt/graph/selector_spec.py b/core/dbt/graph/selector_spec.py
index 17c616f6..d82c2f26 100644
--- a/core/dbt/graph/selector_spec.py
+++ b/core/dbt/graph/selector_spec.py
@@ -147,7 +147,7 @@ class SelectionCriteria:
         method_name, method_arguments = cls.parse_method(dct)
         meth_name = str(method_name)
         if method_arguments:
-            meth_name = meth_name + '.' + '.'.join(method_arguments)
+            meth_name += '.' + '.'.join(method_arguments)
         dct['method'] = meth_name
         dct = {k: v for k, v in dct.items() if (v is not None and v != '')}
         if 'childrens_parents' in dct:
diff --git a/core/dbt/parser/manifest.py b/core/dbt/parser/manifest.py
index d36d4a8e..7918e39a 100644
--- a/core/dbt/parser/manifest.py
+++ b/core/dbt/parser/manifest.py
@@ -389,7 +389,7 @@ class ManifestLoader:
                     block = FileBlock(self.manifest.files[file_id])
                     parser.parse_file(block)
                     # increment parsed path count for performance tracking
-                    self._perf_info.parsed_path_count = self._perf_info.parsed_path_count + 1
+                    self._perf_info.parsed_path_count += 1
             # generic tests hisotrically lived in the macros directoy but can now be nested
             # in a /generic directory under /tests so we want to process them here as well
             if 'GenericTestParser' in parser_files:
@@ -398,7 +398,7 @@ class ManifestLoader:
                     block = FileBlock(self.manifest.files[file_id])
                     parser.parse_file(block)
                     # increment parsed path count for performance tracking
-                    self._perf_info.parsed_path_count = self._perf_info.parsed_path_count + 1
+                    self._perf_info.parsed_path_count += 1
 
         self.build_macro_resolver()
         # Look at changed macros and update the macro.depends_on.macros
@@ -441,7 +441,7 @@ class ManifestLoader:
                     parser.parse_file(block, dct=dct)
                 else:
                     parser.parse_file(block)
-                project_parsed_path_count = project_parsed_path_count + 1
+                project_parsed_path_count += 1
 
             # Save timing info
             project_loader_info.parsers.append(ParserInfo(
@@ -449,7 +449,7 @@ class ManifestLoader:
                 parsed_path_count=project_parsed_path_count,
                 elapsed=time.perf_counter() - parser_start_timer
             ))
-            total_parsed_path_count = total_parsed_path_count + project_parsed_path_count
+            total_parsed_path_count += project_parsed_path_count
 
         # HookParser doesn't run from loaded files, just dbt_project.yml,
         # so do separately
@@ -469,7 +469,7 @@ class ManifestLoader:
         project_loader_info.parsed_path_count = (
             project_loader_info.parsed_path_count + total_parsed_path_count
         )
-        project_loader_info.elapsed = project_loader_info.elapsed + elapsed
+        project_loader_info.elapsed += elapsed
         self._perf_info.parsed_path_count = (
             self._perf_info.parsed_path_count + total_parsed_path_count
         )
@@ -689,7 +689,7 @@ class ManifestLoader:
         key_list.sort()
         env_var_str = ''
         for key in key_list:
-            env_var_str = env_var_str + f'{key}:{config.project_env_vars[key]}|'
+            env_var_str += f'{key}:{config.project_env_vars[key]}|'
         project_env_vars_hash = FileHash.from_contents(env_var_str)
 
         # Create a FileHash of the env_vars in the project
@@ -697,7 +697,7 @@ class ManifestLoader:
         key_list.sort()
         env_var_str = ''
         for key in key_list:
-            env_var_str = env_var_str + f'{key}:{config.profile_env_vars[key]}|'
+            env_var_str += f'{key}:{config.profile_env_vars[key]}|'
         profile_env_vars_hash = FileHash.from_contents(env_var_str)
 
         # Create a FileHash of the profile file
diff --git a/test/integration/047_dbt_ls_test/test_ls.py b/test/integration/047_dbt_ls_test/test_ls.py
index 9ab815c7..7fcc47e7 100644
--- a/test/integration/047_dbt_ls_test/test_ls.py
+++ b/test/integration/047_dbt_ls_test/test_ls.py
@@ -42,7 +42,7 @@ class TestStrictUndefined(DBTIntegrationTest):
         log_manager.stdout_console()
         full_args = ['ls']
         if args is not None:
-            full_args = full_args + args
+            full_args += args
 
         result = self.run_dbt(args=full_args, expect_pass=expect_pass)
 
diff --git a/test/unit/test_macro_calls.py b/test/unit/test_macro_calls.py
index 4c2837be..bfa29869 100644
--- a/test/unit/test_macro_calls.py
+++ b/test/unit/test_macro_calls.py
@@ -47,6 +47,6 @@ class MacroCalls(unittest.TestCase):
         for macro_string in self.macro_strings:
             possible_macro_calls = statically_extract_macro_calls(macro_string, ctx)
             self.assertEqual(self.possible_macro_calls[index], possible_macro_calls)
-            index = index + 1
+            index += 1
 
 

Describe alternatives you've considered

No response

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

jtcohen6
jtcohen6

@elfring You're back! As is this original issue! This really threw us for a loop a few months ago, haha.

I'm going to close this issue out, since we ended up reopening and resolved

Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

[CT-665] [Feature] Allow control over the outer CTE identifier generated in ephemeral materialization

Is this your first time opening an issue?

Describe the Feature

We use a dot notation to group and organize dbt model files with a naming pattern of {group}.{model} such as ingest.source and dw.dimension etc. With specially complex models this provides a level of organization and consistency that keeps files organized, unique and faster to identify and find.

We rely on config aliases to then properly name relations - otherwise adding schema would be an invalid fully qualified relation name like schema.layer.table. This works for both table and view materialization, however causes the generated sql to be invalid when using the ephemeral materialization due to the outer CTE identifier being generated as __dbt__cte__ingest.source.

If model alias is configured it should be used everywhere as the identifier of the output instead of the file name - regardless of materialization used to give end user better control over generated SQL code and avoid errors or potential invalid code being generated when using an unconventional naming of model files.

Describe alternatives you've considered

An alternative might be some form of sanitization of CTE name during generation to avoid generating invalid CTE identifier - like stripping non alphanumeric characters or generating a non model name linked name - like an incremental __dbt__cte__1 - as long as its a valid and unique CTE name to enable stacking CTEs.

Who will this benefit?

This feature would introduce consistency in behavior compared to other materialization strategies (alias being used as table or view name), give uses control over naming of models vs files and allow end users to come up with creative naming convention and organization of model files without having to worry that switching materialization will cause a failure because of the model file name.

Are you interested in contributing this feature?

Happy to contribute, looking for feedback and some pointers on this from the team

Anything else?

No response

jtcohen6
jtcohen6

@miro-ur Thanks for opening!

This is something that's defined in Python today, but within the "adapter" interface, to accommodate the fact that different databases support different naming conventions:

https://github.com/dbt-labs/dbt-core/blob/2c42fb436cb5024c691fcd42c3b3004b598dcc9d/core/dbt/adapters/base/relation.py#L204-L219

I don't know that we'd get much benefit from turning this into a macro, and it would require a lot of code plumbing to set up a Jinja rendering context where it isn't currently needed.

I think both of your alternative proposals are totally reasonable:

  • Use node.alias instead of node.name for creating the CTE name
  • Replace non-alphanumeric characters with _ when creating the CTE name

I can't think offhand of any risks or downstream implications. It will be worth verifying with our automated tests, and of course adding a new one. We have a few existing tests for "models with dots in their names" here and here, since this is a capability that we know some users rely on, and we want to avoid any future regressions.

I'd welcome a PR for this!

May
18
5 days ago
open pull request

jtcohen6 wants to merge dbt-labs/dbt-bigquery

jtcohen6
jtcohen6

Add regression test case

#180

Description

checks for backwards compatibility in an edge case that database/schema information can be grabbed from default

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-bigquery next" section.
jtcohen6
jtcohen6
        run_dbt(["run"])
        run_dbt(["test"])

Or just:

        run_dbt(["build"])
open pull request

jtcohen6 wants to merge dbt-labs/dbt-bigquery

jtcohen6
jtcohen6

Add regression test case

#180

Description

checks for backwards compatibility in an edge case that database/schema information can be grabbed from default

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-bigquery next" section.
jtcohen6
jtcohen6

This model file needs to be named "my_model.sql", so that it matches up with the tests defined on my_model in properties__model_yml:

        return { "my_model.sql": models__my_model }
open pull request

jtcohen6 wants to merge dbt-labs/dbt-bigquery

jtcohen6
jtcohen6

Add regression test case

#180

Description

checks for backwards compatibility in an edge case that database/schema information can be grabbed from default

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-bigquery next" section.
jtcohen6
jtcohen6

Generic test definitions can go into either the macros/ directory, or the tests/generic subdirectory. It doesn't look like the framework lets us specify the file path as "generic/get_col_in.sql", so macros will do.

    def macros(self):

https://docs.getdbt.com/docs/guides/writing-custom-generic-tests

pull request

jtcohen6 merge to dbt-labs/dbt-bigquery

jtcohen6
jtcohen6

Add regression test case

#180

Description

checks for backwards compatibility in an edge case that database/schema information can be grabbed from default

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-bigquery next" section.
jtcohen6
jtcohen6

Looks good, nearly there! Remember you'll want to include the fix as well when backporting to 1.1.latest. https://github.com/dbt-labs/dbt-bigquery/blob/8aaa6ab6da182aa8e831f631feb12af5ec76d9ce/dbt/adapters/bigquery/connections.py#L558-L560

pull request

jtcohen6 merge to dbt-labs/dbt-bigquery

jtcohen6
jtcohen6

Add regression test case

#180

Description

checks for backwards compatibility in an edge case that database/schema information can be grabbed from default

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-bigquery next" section.
jtcohen6
jtcohen6

Looks good, nearly there! Remember you'll want to include the fix as well when backporting to 1.1.latest. https://github.com/dbt-labs/dbt-bigquery/blob/8aaa6ab6da182aa8e831f631feb12af5ec76d9ce/dbt/adapters/bigquery/connections.py#L558-L560

Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

[CT-660] [Feature] Add Grant SQL to Global Project

Description

This is the second in a two ticket series. #5189 is the first ticket. They will be merged sequentially, but both are required for this feature to be exposed to users.

Today users often configure a post_hook to grant permissions on models, seeds, and snapshots:

{{ config(post_hook = 'grant select on {{ this }} to role reporter') }}

These two tickets aim to make it easier to specify grants allowing it to be configured directly both in dbt_project.yml as well as in source files:

# dbt_project.yml
models:
  export:
    +grants:
      select: ['reporter', 'bi']
-- SQL
{{ config(grants = {'select': ['other_user']}) }}

These grant configs will not necessarily look the same for each and every warehouse. The logic to generate the sql from these configs can be overriden by adapters.

Implementation

  • create a new macro in the global project: get_grant_sql(grant_config: dict) -> str that creates the warehouse-specific sql from grant portion of the config. Since it will be common, the default implementation should return grant <privilege> on <object> to <recipient>. Warehouses that deviate from this can override. Use dispatch pattern (dbt docs) with a default__ to enable this. The macro signature here is designed to accept different shapes of grant configs for each adapter to use whatever best fits the warehouse's permissions system.
  • create a new macro in the global project: apply_grants which calls get_grant_sql to apply the grants (see persist_docs for an example of similar implementation). This should be overridable by adapters so use the dispatch pattern (dbt docs) with a default__ to enable this.
  • add a call to apply_grants in all materializations
  • test that postgres projects can define grants in 1) the dbt_project.yml, 2) models, seeds, and snapshots, and 3) both the dbt_project.yml and models, seeds, and snapshots and those permissions can be read back directly from the postgres warehouse after a run.

In order for this to be usable, the adapters must each override get_grant_sql. Because that does not prevent a merge into main for core, adapter work will be tracked in separate tickets.

jtcohen6
jtcohen6

@VersusFacit I agree that we should run grants right after the model is created/replaced, to minimize downtime for downstream queriers! Important to call out that "post hooks," as users define them, are run within the materialization here:

https://github.com/dbt-labs/dbt-core/blob/0d8e061a3d8d726bb10ff767d8f276439cc472b2/core/dbt/include/global_project/macros/materializations/models/table/table.sql#L58

https://github.com/dbt-labs/dbt-core/blob/0d8e061a3d8d726bb10ff767d8f276439cc472b2/core/dbt/include/global_project/macros/materializations/models/table/table.sql#L68

The inside_transaction=True|False distinction is only relevant on databases that support transactions. On those databases, I believe it's possible to run grant within a transaction, except on external tables (per Redshift docs), which is okay, that's not really relevant to our case.

The after_run and after_execute methods, within the RunTask / ModelRunner, do other unrelated things.

Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

[CT-660] [Feature] Add Grant SQL to Global Project

Description

This is the second in a two ticket series. #5189 is the first ticket. They will be merged sequentially, but both are required for this feature to be exposed to users.

Today users often configure a post_hook to grant permissions on models, seeds, and snapshots:

{{ config(post_hook = 'grant select on {{ this }} to role reporter') }}

These two tickets aim to make it easier to specify grants allowing it to be configured directly both in dbt_project.yml as well as in source files:

# dbt_project.yml
models:
  export:
    +grants:
      select: ['reporter', 'bi']
-- SQL
{{ config(grants = {'select': ['other_user']}) }}

These grant configs will not necessarily look the same for each and every warehouse. The logic to generate the sql from these configs can be overriden by adapters.

Implementation

  • create a new macro in the global project: get_grant_sql(grant_config: dict) -> str that creates the warehouse-specific sql from grant portion of the config. Since it will be common, the default implementation should return grant <privilege> on <object> to <recipient>. Warehouses that deviate from this can override. Use dispatch pattern (dbt docs) with a default__ to enable this. The macro signature here is designed to accept different shapes of grant configs for each adapter to use whatever best fits the warehouse's permissions system.
  • create a new macro in the global project: apply_grants which calls get_grant_sql to apply the grants (see persist_docs for an example of similar implementation). This should be overridable by adapters so use the dispatch pattern (dbt docs) with a default__ to enable this.
  • add a call to apply_grants in all materializations
  • test that postgres projects can define grants in 1) the dbt_project.yml, 2) models, seeds, and snapshots, and 3) both the dbt_project.yml and models, seeds, and snapshots and those permissions can be read back directly from the postgres warehouse after a run.

In order for this to be usable, the adapters must each override get_grant_sql. Because that does not prevent a merge into main for core, adapter work will be tracked in separate tickets.

jtcohen6
jtcohen6

Copying from Slack message sent a few weeks ago:

On some databases, dbt run completely replaces a model (view or table). All grants previously on that view/table are lost, and need to be reapplied from scratch. This is much easier for us to reason about, since the grants configured on the model by the user will be the exact grants that the view/table ends up with.

On some databases, dbt run replaces a view/table in such a way that the grants can be copied from the old view/table to the new view/table. This tends to be true for databases that use create or replace view|table. In these cases, there's a potential delta between the grants the user has configured (possibly none), and the grants on the resulting view/table. I believe BigQuery + Spark/Databricks work like this by default. Snowflake doesn't do this by default, but it can with the COPY GRANTS option (which is supported as a config in dbt-snowflake).

I think there are two approaches we can take:

  • Revoke all old grants, then apply all new ones. Some databases support a revoke all privileges statement. Definitely simpler.
  • Introspect the database to find current grants, calculate diffs, then just the exact revoke + grant statements needed. More complicated + error-prone. Many databases support a show grants statement, but this needs to be an API call on BigQuery.

Why would we prefer 2 over 1? Here's what I've heard from customers:

  • Because that's the "Terraform-y" approach: minimum changes to achieve new configuration
  • Because Security/DevOps/DBAs raise their eyebrows when they see tons of revoke statements in database logs

I think it's acceptable to pursue option 1 for now if it is indeed simpler. I anticipate we'll want to implement methods/macros for show_grants and/or revoke_all_grants within the adapter interface, in addition to get_grant_sql + apply_grants.

Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

[CT-653] [Feature] Allow running unit/integration tests locally with any supported Python version

Is this your first time opening an issue?

Describe the Feature

This is more of a rough edge for a non-professional dbt-core developer: the Makefile hard-codes py38 as the preferred python version in tox for running unit tests and integration tests; we use Python 3.9 at my day job, so when I need to do work on dbt-core I need to remember to swap out the version in the Makefile to py39 before I run dbt tests and then switch it back before I send a PR upstream.

Describe alternatives you've considered

I would love it if the Makefile was either a) configured to work with the Python version that I had installed locally as long as it was supported (i.e., if the default in the Makefile was py and py-integration instead of py38 and py38-integration), or, if that presents a problem for e.g. CI or some such thing, if it was easy (like via an environment variable that was documented in the CONTRIBUTING.md guide) for me to override the setting in the Makefile on my local machine.

Who will this benefit?

Developers who want to create PRs for dbt-core who do not work at dbt Labs.

Are you interested in contributing this feature?

Happy to!

Anything else?

Issue originally discussed in a Slack thread in dbt-core-development: https://getdbt.slack.com/archives/C50NEBJGG/p1652466095907019

jtcohen6
jtcohen6

Using py and py-integration makes tons of sense—it will just uses your own local PATH version of Python, right?

We don't use the Makefile in CI (we hit tox directly), so I don't think there should be any implications there. Let's do a nice thing for non-professional dbt-core developers :)

The only edge I could see is someone very eagerly upgrading to a new minor version of Python, before dbt-core officially supports it, e.g. Python 3.10 prior to the v1.1 release. But I'd like to see us supporting new Python minor versions more proactively going forward, especially when they promise to have some good stuff inside...

@jwills proceed at your leisure!

Activity icon
issue

jtcohen6 issue comment dbt-labs/dbt-core

jtcohen6
jtcohen6

[CT-663] YAML anchor expansion failure in 1.2.0a1 / `main`

Proactively identified regression in 1.2.0a1. If I had to guess, it might be related to https://github.com/dbt-labs/dbt-core/pull/5146?

Here's a profile that uses YAML anchors to DRY up duplicated code. This is valid YAML, per "online YAML parser":

# profiles.yml
garage-postgres:
  outputs:
    dev: &garage-pg
      database: jerco
      host: localhost
      pass: nopass
      port: 5432
      schema: dbt_jcohen
      threads: 5
      type: postgres
      user: jerco
    prod:
      <<: *garage-pg
      schema: analytics
  target: dev

When running v1.1.0, dbt is perfectly happy to expand the YAML anchor:

$ dbt debug
11:02:25  Running with dbt=1.1.0
dbt version: 1.1.0
python version: 3.9.12
python path: /Users/jerco/dev/scratch/testy/env/bin/python3.9
os info: macOS-12.3.1-x86_64-i386-64bit
Using profiles.yml file at /Users/jerco/.dbt/profiles.yml
Using dbt_project.yml file at /Users/jerco/dev/scratch/testy/dbt_project.yml

Configuration:
  profiles.yml file [OK found and valid]
  dbt_project.yml file [OK found and valid]

Required dependencies:
 - git [OK found]

Connection:
  host: localhost
  port: 5432
  user: jerco
  database: jerco
  schema: dbt_jcohen
  search_path: None
  keepalives_idle: 0
  sslmode: None
  Connection test: [OK connection ok]

All checks passed!

Running dbt-core installed from main, it's not:

$ dbt debug
11:02:38  Running with dbt=1.2.0-a1
dbt version: 1.2.0-a1
python version: 3.9.12
python path: /Users/jerco/dev/product/dbt-core/env/bin/python3.9
os info: macOS-12.3.1-x86_64-i386-64bit
Using profiles.yml file at /Users/jerco/.dbt/profiles.yml
Using dbt_project.yml file at /Users/jerco/dev/scratch/testy/dbt_project.yml

11:02:39  Encountered an error:
Runtime Error

  dbt encountered an error while trying to read your profiles.yml file.

  Runtime Error
    Syntax error near line 55
    ------------------------------
    52 |       type: postgres
    53 |       user: jerco
    54 |     prod:
    55 |       <<: *garage-pg
    56 |       schema: analytics
    57 |   target: dev
    58 | sandbox-redshift:

    Raw Error:
    ------------------------------
    could not determine a constructor for the tag 'tag:yaml.org,2002:merge'
      in "<unicode string>", line 55, column 7
jtcohen6
jtcohen6

Decided this shouldn't actually be labeled regression, since it's yet to be released. But we definitely need a test case for this sort of thing, since folks do make use of yaml anchors! (e.g.)

Activity icon
issue

jtcohen6 issue dbt-labs/dbt-core

jtcohen6
jtcohen6

YAML anchor expansion failure in 1.2.0a1 / `main`

Proactively identified regression in 1.2.0a1. If I had to guess, it might be related to https://github.com/dbt-labs/dbt-core/pull/5146?

Here's a profile that uses YAML anchors to DRY up duplicated code. This is valid YAML, per "online YAML parser":

# profiles.yml
garage-postgres:
  outputs:
    dev: &garage-pg
      database: jerco
      host: localhost
      pass: nopass
      port: 5432
      schema: dbt_jcohen
      threads: 5
      type: postgres
      user: jerco
    prod:
      <<: *garage-pg
      schema: analytics
  target: dev

When running v1.1.0, dbt is perfectly happy to expand the YAML anchor:

$ dbt debug
11:02:25  Running with dbt=1.1.0
dbt version: 1.1.0
python version: 3.9.12
python path: /Users/jerco/dev/scratch/testy/env/bin/python3.9
os info: macOS-12.3.1-x86_64-i386-64bit
Using profiles.yml file at /Users/jerco/.dbt/profiles.yml
Using dbt_project.yml file at /Users/jerco/dev/scratch/testy/dbt_project.yml

Configuration:
  profiles.yml file [OK found and valid]
  dbt_project.yml file [OK found and valid]

Required dependencies:
 - git [OK found]

Connection:
  host: localhost
  port: 5432
  user: jerco
  database: jerco
  schema: dbt_jcohen
  search_path: None
  keepalives_idle: 0
  sslmode: None
  Connection test: [OK connection ok]

All checks passed!

Running dbt-core installed from main, it's not:

$ dbt debug
11:02:38  Running with dbt=1.2.0-a1
dbt version: 1.2.0-a1
python version: 3.9.12
python path: /Users/jerco/dev/product/dbt-core/env/bin/python3.9
os info: macOS-12.3.1-x86_64-i386-64bit
Using profiles.yml file at /Users/jerco/.dbt/profiles.yml
Using dbt_project.yml file at /Users/jerco/dev/scratch/testy/dbt_project.yml

11:02:39  Encountered an error:
Runtime Error

  dbt encountered an error while trying to read your profiles.yml file.

  Runtime Error
    Syntax error near line 55
    ------------------------------
    52 |       type: postgres
    53 |       user: jerco
    54 |     prod:
    55 |       <<: *garage-pg
    56 |       schema: analytics
    57 |   target: dev
    58 | sandbox-redshift:

    Raw Error:
    ------------------------------
    could not determine a constructor for the tag 'tag:yaml.org,2002:merge'
      in "<unicode string>", line 55, column 7