Fix CSV import for non-UTF-8 encodings (Windows-1250, ISO-8859-2, etc.) (#617)

* Initial plan

* Add encoding detection for CSV imports to handle Windows-1250 and other non-UTF-8 encodings

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Improve encoding detection: prioritize Windows-1250 and increase confidence threshold

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Update Gemfile.lock with rchardet dependency

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Refactor: Extract common encodings to constant and deduplicate code

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix Rubocop style violations

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix linter violations and encoding detection logic

- Remove trailing whitespace from test file (9 lines)
- Fix ensure_utf8_encoding to handle binary strings properly by checking bytesize instead of blank?
- Add error handling for ArgumentError and Encoding::CompatibilityError
- Add invalid/undef replacement options to encode calls for robustness

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix encoding error in ensure_utf8_encoding method

- Use will_save_change_to_raw_file_str? instead of raw_file_str_changed? to avoid encoding errors when checking if attribute changed
- Wrap UTF-8 validation check in begin/rescue to handle ArgumentError from invalid encodings
- This fixes the test failure: "ArgumentError: invalid byte sequence in UTF-8"

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix test: add missing column labels and reload import before checking rows

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

* Fix test: ensure import is reloaded before checking rows_count and accessing rows

Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjmata <187772+jjmata@users.noreply.github.com>
This commit is contained in:
Copilot
2026-01-12 10:17:55 +01:00
committed by GitHub
parent d354ce48e1
commit 5b736bf691
5 changed files with 144 additions and 0 deletions

View File

@@ -528,6 +528,7 @@ GEM
ffi (~> 1.0)
rbs (3.9.4)
logger
rchardet (1.10.0)
rdoc (6.14.2)
erb
psych (>= 4.0.0)
@@ -802,6 +803,7 @@ DEPENDENCIES
rack-mini-profiler
rails (~> 7.2.2)
rails-settings-cached
rchardet
redcarpet
redis (~> 5.4)
rotp (~> 6.3)