Commit graph

11 commits

Author SHA1 Message Date
Alexander Sulfrian
56994b3b5c
[ZeitBridge] Remove content from original feed (#4260)
The original feed contains a small version of the header image and
the summary or a literal "None". The header image is already added, but
the original content was kept. This removes the original content and
adds the summary if it exists.
2024-10-17 08:47:44 +02:00
Mynacol
7bde7a56f9 [ZeitBridge] Fix linting 2024-05-18 16:35:24 +02:00
Mynacol
4d12aa2a9e [ZeitBridge] Remove annoyances, add content
Remove navigational elements, podcast images.
Add many more header images, article content in <ul> (and for ggod
measure in <ol>) and quotes with their content and not only their
author.

Extreme example:
https://www.zeit.de/campus/2024-05/protest-palaestina-universitaet-europa-uebersicht
2024-05-18 16:35:24 +02:00
Mynacol
a7ed3d56f9 [ZeitBridge] Prettify author field
By removing HTML tags (plaintext) and trimming it.
2024-05-18 16:35:24 +02:00
Mynacol
254efc2812 [ZeitBridge] Remove doubled text
The first two paragraphs were repeated at the end of articles. The first
CSS selector filters those out (example 1).
The second CSS selector removes a "Zum Anschauen benötigen wir Ihre Zustimmung"
line from a poll widget. We can't load the widget successfully,
therefore we should remove all embeds that seem to use javascript
(example 2).

1: https://www.zeit.de/campus/2024-03/bundesregierung-wissenschaft-arbeitsvertrag-regeln
2: https://www.zeit.de/campus/2024-03/ausbildung-abgebrochen-gruende-azubi-aufruf
2024-03-10 22:27:32 +01:00
Dag
2880524dfc
refactor: remove parent calls to parseItem (#3747) 2023-10-13 01:59:05 +02:00
Dag
382648fc22
refactor: FeedExpander::parseItem() descendants (#3744) 2023-10-13 00:25:34 +02:00
Mynacol
c3b5b382ba [ZeitBridge] Remove broken paywall workaround
Clean up spoofing Google Bot as this workaround doesn't work anymore.
2023-08-27 12:57:36 +02:00
Paul Prechtel
4068668de9
[ZeitBridge] Re-add paywall workaround (#3352)
Additionally to the Googlebot User-Agent, a Googlebot IP address has to
be used. For now, we can use `X-Forwarded-For` for this.
2023-04-18 18:41:40 +02:00
Paul Prechtel
0718fdc829
[ZeitBridge] Revert User-Agent (#3350)
The Googlebot User-Agent is no longer sufficient to circumvent the
paywall.
2023-04-17 15:33:14 +02:00
Mynacol
9d871e8a45
[ZeitBridge] Add bridge for zeit.de (#3056)
* [ZeitBridge] Add bridge for zeit.de

New bridge expanding the feeds of zeit.de to full-text ones.
Circumvents cookie banners and Z+ premium article paywalls.

* [ZeitBridge] Formatting
2022-09-21 22:24:11 +02:00