Skip to content

Column filing_id not found despite its manifest existence #90

@MichaelTiemannOSC

Description

@MichaelTiemannOSC

I am trying to load 18 CSV files from CorpWatch into a SQLite database using csvs-to-sqlite. I will spare you the deprecation warnings for error_bad_lines.

michael$ time /Users/michael/Library/Python/3.8/bin/csvs-to-sqlite ~/Downloads/corpwatch_api_tables_csv ~/Temp/corpwatch.db -s $'\t'

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (8,10) have mixed types. Specify dtype option on import or set low_memory=False.

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (17) have mixed types. Specify dtype option on import or set low_memory=False.

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (5,6,7,8,9) have mixed types. Specify dtype option on import or set low_memory=False.

  return pd.read_csv(
/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/utils.py:38: DtypeWarning: Columns (3,6,7) have mixed types. Specify dtype option on import or set low_memory=False.
  return pd.read_csv(
Loaded 18 dataframes
Traceback (most recent call last):
  File "/Users/michael/Library/Python/3.8/bin/csvs-to-sqlite", line 8, in <module>
    sys.exit(cli())
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/csvs_to_sqlite/cli.py", line 251, in cli
    df.to_sql(df.table_name, conn, if_exists="append", index=False)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/core/generic.py", line 2951, in to_sql
    return sql.to_sql(
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 697, in to_sql
    return pandas_sql.to_sql(
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 2190, in to_sql
    return table.insert(chunksize, method)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 950, in insert
    num_inserted = exec_insert(conn, keys, chunk_iter)
  File "/Users/michael/Library/Python/3.8/lib/python/site-packages/pandas/io/sql.py", line 1896, in _execute_insert
    conn.executemany(self.insert_statement(num_rows=1), data_list)
sqlite3.OperationalError: table ./company_filings has no column named filing_id

real	1m30.052s
user	1m12.763s
sys	0m16.640s

So the complaint is about company_filings.csv not having a filing_id. Yet:

michael$ head ~/Downloads/corpwatch_api_tables_csv/company_filings.csv
filing_id cik year quarter period_of_report filing_date form_10k_url sec_21_url
47 1000180 2008 1 20071230 2008-02-25 http://www.sec.gov/Archives/edgar/data/1000180/0000950134-08-003259.txt http://www.sec.gov/Archives/edgar/data/1000180/000095013408003259/f38194exv21w1.htm
104 1000209 2008 1 20071231 2008-03-13 http://www.sec.gov/Archives/edgar/data/1000209/0001193125-08-055644.txt http://www.sec.gov/Archives/edgar/data/1000209/000119312508055644/dex211.htm
121 1000228 2008 1 20071229 2008-02-26 http://www.sec.gov/Archives/edgar/data/1000228/0000950123-08-002119.txt http://www.sec.gov/Archives/edgar/data/1000228/000095012308002119/y50229exv21w1.htm
152 1000229 2008 1 0 2008-02-22 http://www.sec.gov/Archives/edgar/data/1000229/0001000229-08-000005.txt NULL
174 1000230 2008 1 0 2008-01-29 http://www.sec.gov/Archives/edgar/data/1000230/0001193125-08-014289.txt NULL
186 1000232 2008 1 20071231 2008-03-31 http://www.sec.gov/Archives/edgar/data/1000232/0001000232-08-000005.txt http://www.sec.gov/Archives/edgar/data/1000232/000100023208000005/exhibit21subsidiaries.txt
213 1000234 2008 1 20070924 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000003.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000003/ex21.htm
214 1000234 2008 1 20071231 2008-03-17 http://www.sec.gov/Archives/edgar/data/1000234/0001000234-08-000004.txt http://www.sec.gov/Archives/edgar/data/1000234/000100023408000004/ex21.htm
462 1000623 2008 1 20071231 2008-03-07 http://www.sec.gov/Archives/edgar/data/1000623/0001047469-08-002365.txt http://www.sec.gov/Archives/edgar/data/1000623/000104746908002365/a2183428zex-21.htm


Why is this column not being found?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions