python - Iterating over query results from sqlalchemy -
i have sqlalchemy query function this
def foo(): local_session = session() results = local_session.query(t.x, t.y, t.z, t.a, t.b, t.c , t.d, t.e, t.f, t.g, t.h, t.i, t.j, t.k, t.l , t.m, t.n, t.o, t.p, t.q, t.r, t.s, t.t, t.u , t.v, user.gender).join(user)\ .filter(t.language == 'en', t.where_i_am_from == 'us', user.some_num >= 0.9).limit(1000000) local_session.close() return results, results.count()
the query works fine. , call function here:
def fubar(): raw_data,raw_data_length = mymodule.foo() df = pd.dataframe() each in raw_data: df = df.append(pd.dataframe({ #add each.x etc df..... }} return df
the issue wont iterate on "for each in raw_data" loop when have .limit on foo query above 5000, or use .all() or have no limit. program hang , nothing (0 cpu usage). i've tested both on local sql server , amazon one. when run sql directly on database return around 800,000 rows. why happening?
i'm using latest mysql , latest sqlalchemy.
this may mysql driver problem. following in order:
- run python
-v
flag,python -v yourprogram.py
.
this has potential of showing program got stuck.
- get 800,000 results , stick them in sqlite tables in equivalent schema.
that's relatively cheap do, have afterwards change sqa database string. obviously, show whether problem lies driver or it's in code.
- you're doing join between 2 classes (
t
,user
) - eager load instead of default lazy load.
if have 800,000 rows , doing lazy join, may problem. add joinedload
(eagerload
in earlier versions of sqlalchemy) options
.
Comments
Post a Comment