Rich Media parsing was previously handled on-demand with a 2 second HTTP request timeout and retained only in Cachex. Every time a Pleroma instance is restarted it will have to request and parse the data for each status with a URL detected. When fetching a batch of statuses they were processed in parallel to attempt to keep the maximum latency at 2 seconds, but often resulted in a timeline appearing to hang during loading due to a URL that could not be successfully reached. URLs which had images links that expire (Amazon AWS) were parsed and inserted with a TTL to ensure the image link would not break.
Rich Media data is now cached in the database and fetched asynchronously. Cachex is used as a read-through cache. When the data becomes available we stream an update to the clients. If the result is returned quickly the experience is almost seamless. Activities were already processed for their Rich Media data during ingestion to warm the cache, so users should not normally encounter the asynchronous loading of the Rich Media data.
Implementation notes:
- The async worker is a Task with a globally unique process name to prevent duplicate processing of the same URL
- The Task will attempt to fetch the data 3 times with increasing sleep time between attempts
- The HTTP request obeys the default HTTP request timeout value instead of 2 seconds
- URLs that cannot be successfully parsed due to an unexpected error receives a negative cache entry for 15 minutes
- URLs that fails with an expected error will receive a negative cache with no TTL
- Activities that have no detected URLs insert a nil value in the Cachex :scrubber_cache so we do not repeat parsing the object content with Floki every time the activity is rendered
- Expiring image URLs are handled with an Oban job
- There is no automatic cleanup of the Rich Media data in the database, but it is safe to delete at any time
- The post draft/preview feature makes the URL processing synchronous so the rendered post preview will have an accurate rendering
Overall performance of timelines and creating new posts which contain URLs is greatly improved.
This was recently changed to solve a Dialyzer error, but the replacement logic was faulty as "retry" would only be compared to :error and not have its truthiness evaluated.
The original logic was also faulty as it returned {:error, :pool_full} even retry was true. It never retried when the pool was full.
The Rich Media Previews were not regenerated when a post was updated due to a cache invalidation issue. They are now cached by the activity id so they can be evicted with the other activity cache objects in the :scrubber_cache.
lib/pleroma/web/mastodon_api/controllers/directory_controller.ex:6:unused_fun
Function skip_auth/2 will never be called.
________________________________________________________________________________
lib/pleroma/web/mastodon_api/controllers/directory_controller.ex:6:unused_fun
Function skip_plug/2 will never be called.
________________________________________________________________________________
lib/pleroma/web/mastodon_api/controllers/directory_controller.ex:18:guard_fail
The guard clause:
when _action :: atom() == <<105, 110, 100, 101, 120>>
can never succeed.
This is for streaming media to ffmpeg thumbnailer. The existing implementation relies on undocumented behavior.
Erlang open_port/2 does not officially support passing a string of a file path for opening. The specs clearly state you are to provide one of the following for open_port/2:
{spawn, Command :: string() | binary()} |
{spawn_driver, Command :: string() | binary()} |
{spawn_executable, FileName :: file:name_all()} |
{fd, In :: integer() >= 0, Out :: integer() >= 0}
Our method technically works but is strongly discouraged as it can block the scheduler and dialyzer throws errors as it recognizes we're breaking the contract and some of the functions we wrote may never return.
This is indirectly covered by the Erlang FAQ section "9.12 Why can't I open devices (e.g. a serial port) like normal files?"
https://www.erlang.org/faq/problems#idm1127