[PYTHON] Garbled characters when uploading CSV to Google Cloud Storage → Solved by json

Conclusion

When I changed the CSV uploaded to Storage to ison, the characters were no longer garbled.

Premise

See this official document for how to upload files with the Storage API https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python

Experimental result

For CSV

raw data

Locally created CSV (UTF-8)

storage_upload.csv


supplements,tablet
...

result

CSV uploaded to Storage

storage_upload.csv


supplements,Tanka
...

For json

raw data

storage_upload.json


{{"name":"supplements","name":"tablet"},{...}}

result

Same as the original data

storage_upload.json


{{"name":"supplements","name":"tablet"},{...}}

Remarks

writeJson.py


import json
class WriteJson:
    def __init__(self, ad_data, new_file,file_name):
        self.ad_data = ad_data
        self.new_file = new_file
        self.file_name = file_name
    def write(self):
        method = "w" if self.new_file else "a"
        f = open(self.file_name, method, encoding="utf-8")
        json.dump(self.ad_data,f,ensure_ascii=False)

→ Note that the output data will be escaped in the form of "\ uXXXX" unless ensure_ascii = False is set in the last line.

        json.dump(self.ad_data,f)→ It will be escaped

[python] Format JSON file and dump

Work notes: Article background

・ Scraping with Python → The following error occurred when performing loop processing to send CSV data to BigQuery. "10054,'The existing connection was forcibly closed to the remote host'" → As an alternative, I decided to upload the file to Cloud Storage once. → When uploading UTF-8 encoded CSV to Storage, all Japanese characters are garbled → Even if you search, as of November 06, 2019, this kind of thing does not hit https://stackoverflow.com/questions/45394157/google-cloud-storage-not-handling-utf-8-filenames https://groups.google.com/forum/?hl=ja#!topic/google-app-engine-japan/0NHIIqbLx9w

If you know a good way other than json, please let me know __ (._.) _


Sorry to trouble you, but please let us know in this article or the following account!

\ Follow Me! / ** * Qiita account * ** ** * Twitter account * **

Recommended Posts

Garbled characters when uploading CSV to Google Cloud Storage → Solved by json
Eliminate garbled Japanese characters in JSON data acquired by API.
[Linux] How to deal with garbled characters when viewing files
How to fix the shit heavy when reading Google Cloud Storage images from Django deployed on GAE