Commit graph

1123 commits

Author SHA1 Message Date
ORelio
de8cee6a1c Catching up | [Main] Debug mode, parse utils, MIME | [Bridges] Add/Improve 20 bridges (#802)
* Debug mode improvements

 - Improve debug warning message
 - Restore error reporting in debug mode
 - Fix 'notice' messages for unset fields

* Add parsing utility functions

html.php
 - extractFromDelimiters
 - stripWithDelimiters
 - stripRecursiveHTMLSection
 - markdownToHtml (partial)

bridges
 - remove now-duplicate functions
 - call functions from html.php instead

* [Anidex] New bridge

Anime torrent tracker

* [Anime-Ultime] Restore thumbnail

* [CNET] Recreate bridge

Full rewrite as the previous one was broken

* [Dilbert] Minor URI fix

Use new self::URI property

* [EstCeQuonMetEnProd] Fix content extraction

Bridge was broken

* [Facebook] Fix "SpSonsSoriSsés" label

... which was taking space in item title

* [Futura-Sciences] Use HTTPS, More cleanup

Use HTTPS as FS now offer HTTPS
Clean additional useless HTML elements

* [GBATemp] Multiple fixes

- Fix categories: missing "break" statements
- Restore thumbnail as enclosure
- Fix date extraction
- Fix user blog post extraction
- Use getSimpleHTMLDOMCached

* [JapanExpo] Fix bridge, HTTPS, thumbnails

- Fix getSimpleHTMLDOMCached call
- Upgrade to HTTPS as JE now offers HTTPS
- Restore thumbnails as enclosures

* [LeMondeInformatique] Fix bridge, HTTPS

- Upgrade to HTTPS as LMI now offers HTTPS
- Restore thumbnails using small images
- Fix content extraction
- Fix text encoding issue

* [Nextgov] Fix content extraction

- Restore thumbnail and use small image
- Field extraction fixes

* [NextInpact] Add categories and filtering by type

- Offer all RSS feeds
- Allow filtering by article type
- Implement extraction for brief articles
- Remove article limit, many brief articles are publied all at once

* [NyaaTorrents] New bridge

Anime torrent tracker

* [Releases3DS] Cache content, restore thumbnail

- Use getSimpleHTMLDOMCached
- Restore thumbnail as enclosure

* [TheHackerNews] Fix bridge

 - Fix content extraction including article body
 - Restore thumbnail as enclosure

* [WeLiveSecurity] HTTPS, Fix content extraction

- Upgrade to HTTPS as WLS now offers HTTPS
- Fix content extraction including article body

* [WordPress] Reduce timeout, more content selectors

- Reduce timeout to use default one (1h)
- Add new content selector (articleBody)
- Find thumbnail and set as enclosure
- Fix <script> cleanup

* [YGGTorrent] Increase limit, use cache

- Increase item limit as uploads are very frequent
- Use getSimpleHTMLDOMCached

* [ZDNet] Rewrite with FeedExpander

- Upgrade to HTTPS as ZD now offers HTTPS
- Use FeedExpander for secondary fields
- Fix content extraction for article body

* [Main] Handle MIME type for enclosures

Many feed readers will ignore enclosures (e.g. thumbnails) with no MIME type. This commit adds automatic MIME type detection based on file extension (which may be inaccurate but is the only way without fetching the content).

One can force enclosure type using #.ext anchor (hacky, needs improving)

* [FeedExpander] Improve field extraction

- Add support for passing enclosures
- Improve author and uri extraction
- Fix 'notice' PHP error messages

* [Pull] Coding style fixes for #802

* [Pull] Implementing changes for #802

 - Fix coding style issues with str append
 - Remove useless CACHE_TIMEOUT
 - Use count() instead of $limit
 - Use defaultLinkTo() + handle strings
 - Use http_build_query()
 - Fix missing </em>
 - Remove error_reporting(0)
 - warning CSS (@LogMANOriginal)
 - Fix typo in FeedExpander comment

* [Main] More documentation for markdownToHtml

See #802 for more details
2018-09-09 20:20:13 +01:00
Quentin Delmas
123fce4394 [ForGifsBridge] Fix permissions of ForGifsBridge 2018-09-09 17:34:36 +01:00
Quentin Delmas
a3f99c9c3f [GOGBridge] Added bridge for GOG.com 2018-09-09 17:32:36 +01:00
Eugene Molotov
bf30ad127c [FacebookBridge] Removes query string from post links
* [FacebookBridge] Removes query string from post links
2018-09-09 16:31:15 +01:00
logmanoriginal
37f84196b7 [GooglePlusPostBridge] Fix title is empty if content is too short
The bridge would generate empty titles if the content is longer than
50 characters, but doesn't have further spaces in it. With this commit
the title is correctly generated based on the contents, taking missing
spaces into account.

References #786
2018-09-08 17:07:57 +02:00
Corentin Garcia
44764f7182 [GrandComicsDatabaseBridge] Fix links in content (#804) 2018-09-08 11:12:27 +01:00
Antoine Cadoret
19f294d71d Add fields to leboncoin bridge (#783)
* [LeBonCoinBridge] Add fields to LeBonCoinBridge
2018-08-31 14:34:41 +01:00
Teromene
b0e33e4e01
Update LeBonCoinBridge to use the site's API (#795)
* Update LeBonCoinBridge to use the site's API
2018-08-28 14:20:02 +01:00
Quentin Delmas
059656c370 Fix phpcs. 2018-08-22 16:25:08 +01:00
Quentin Delmas
9fc1e97efe Avoid bot exclusion. 2018-08-22 16:21:39 +01:00
sysadminstory
c4cccfe0f3 [LesJoiesDuCode] Switch to HTTPS and remove author (#787)
Website offers now HTTPS, therefore the bridge was switched to it.
The post author is not displayed anymore on the homepage, so it has been
removed.
2018-08-21 17:41:56 +02:00
Piranhaplant
e7dab5d351 Fixed timestamp on Pixiv bridge (#785) 2018-08-18 16:54:24 -03:00
logmanoriginal
ad82d50bbd [CNETBridge] Remove bridge
CNET now provides public feeds at https://www.cnet.com/rss/

References #775
2018-08-12 11:02:44 +02:00
logmanoriginal
c305c1ded7 [BlaguesDeMerdeBridge] Adjust to layout changes
References #767
2018-08-10 21:08:47 +02:00
logmanoriginal
f14a5bd771 [CADBridge] Remove bridge
https://cad-comic.com/ now provides feeds at

- https://cad-comic.com/feed (rss)
- https://cad-comic.com/feed/atom (atom)

Thus multiple alternatives are available to choose from, making this
bridge obsolete:

- FilterBridge (using one of the feeds above)
- WordPressBridge (on the main site)
- One of the two available feeds

References #752
2018-08-10 19:53:32 +02:00
logmanoriginal
ee28b124e0 [DanbooruBridge] Fix bridge
This commit fixes an issue caused by self closing tags not supported
by simplehtmldom (<source>).

Adds a monkey patch to extend simplehtmldom with the ability to detect
that particular tag. Most of the code added is copied directly from
simplehtmldom (see vendor/simplehtmldom) with adjustments to account
for RSS-Bridge formatting.

Related to: https://sourceforge.net/p/simplehtmldom/bugs/83/

Notice: The tag itself is valid according to Mozilla:

The HTML <picture> element serves as a container for zero or more
<source> elements and one <img> element to provide versions of an
image for different display device scenarios. The browser will
consider each of the child <source> elements and select one
corresponding to the best match found; if no matches are found
among the <source> elements, the file specified by the <img>
element's src attribute is selected. The selected image is then
presented in the space occupied by the <img> element.

-- https://developer.mozilla.org/en-US/docs/Web/HTML/Element/picture

References #753
2018-08-09 21:55:43 +02:00
logmanoriginal
5fea9fc1f5 bridges: Fix bridges failing unit test 2018-08-09 17:04:16 +02:00
Eugene Molotov
df81fa62d1 [VkBridge] Video attachment fixes (#766)
* use defaultLinkTo
* remove duplicate video links
* remove line ending before "Reposted" label
* return newline before reposted string
* remove comments
* use video links that won't require login
* set title if video has no title
2018-08-09 17:02:36 +02:00
logmanoriginal
09c9d015b4 [ForGifsBridge] Add new bridge 2018-08-04 23:42:58 +02:00
logmanoriginal
3a496e3b18 [FilterBridge] Add option to build title from content
Adds a new option '&title_from_content=on' to build the title for feed
items from the feeds content. The title is generated from the first
whitespace after 50 characters of the content or the entire content if
the total size is lower than 50 characters.

References #587
2018-08-04 20:46:59 +02:00
sublimz
f92ac49947 [LeBonCoinBridge] Add cities support (#751) 2018-08-01 17:25:18 +02:00
Benasse
a574fa15ac [YGGTorrentBridge] Order search result by publish date (#762) 2018-07-31 21:46:10 +02:00
Nemo
8f9a385b4d [AmazonPriceTrackerBridge] Improve Amazon scraper logic (#761)
- Now works on all websites, and even with products
  with multiple prices
- Closes #750
2018-07-31 21:44:37 +02:00
logmanoriginal
53bdfa3bf0 [GooglePlusPostBridge] Skip posts without message 2018-07-31 19:15:09 +02:00
logmanoriginal
53278b2eed [GooglePlusPostBridge] Add option to include image in content
References #600
2018-07-31 19:09:12 +02:00
logmanoriginal
5f3c55b808 [GooglePlusPostBridge] General cleanup 2018-07-31 18:55:35 +02:00
logmanoriginal
fb79a67370 [GooglePlusPostBridge] Normalize static::URI usage
This commit fixes a few things related to static::URI

1) Remove trailing slash from the URI to simplify using 'defaultLinkTo'
2) Use static::URI instead of self::URI for consistency
3) Remove custom implementation of 'defaultLinkTo'
2018-07-31 18:29:14 +02:00
logmanoriginal
3c4e12ceba [GooglePlusPostBridge] Add images to enclosures
Images are collected for each post and added to enclosures. Images or
animtions from lh3.googleusercontent.com are specifically handled in
order to return the animated version of the gif and the original sized
image (this is normally taken care of by JS in the browser).
2018-07-31 18:18:22 +02:00
logmanoriginal
0d1923c52f [GitHubGistBridge] Add new bridge
Adds a new bridge for https://gist.github.com

The bridge generates feeds for comments on a particular gist based on
the gist ID or full URI. For better readability the general behavior
of code sections is manually restored with the original CSS styles
from GitHub.
2018-07-29 16:31:47 +02:00
logmanoriginal
ce896b4247 [SkimfeedBridge] Add new bridge
New bridge for Skimfeed: https://skimfeed.com

Generates feeds for all features of Skimfeed:

- News (the ones displayed on the front page)
- Hot topics ("What's Hot" section on the front page)
- Tech news (preconfigured feeds in the menu bar)
- Custom feeds (using the configuration system of Skimfeed), see
https://skimfeed.com/custom.php

The number of items returned by the bridge can be limited for all
categories ('&limit=...'). This parameter is optional, all categories
are unlimited by default!

Authors are added with HTML anchors in order to allow quick navigation
to source channels.

The bridge ships with developer tools to auto-generate lists in the
future (especially useful for 'Tech news'!)

References #748
2018-07-27 23:18:32 +02:00
sysadminstory
a4b2d88dbe [DealabsBridge] Follow website change (#758) 2018-07-25 20:02:31 +02:00
logmanoriginal
afb4de318b [FlickrBridge] Fix missing scheme for image URLs
References #754
2018-07-23 20:14:46 +02:00
Eugene Molotov
43bb17f995 [VkBridge] Converting hashtags to categories (#755)
* [VkBridge] Converting hashtags to categories
2018-07-22 16:43:00 +02:00
logmanoriginal
bae7a5879f [FlickrBridge] Fixed broken bridge
Following changes in the JSON data and selecting images for the
content (320x240 or bigger) and enclosure (largest version). All of
the data is now extracted from the JSON data instead of parsing the
DOM.

References #754
2018-07-22 14:06:04 +02:00
logmanoriginal
15e6d77569 [FierPandaBridge] Fix bridge
This bridge now returns all articles from the front page, following
layout changes in the past.

References #679
2018-07-21 18:07:03 +02:00
logmanoriginal
f97d2ef254 [Torrent9Bridge] Remove bridge
The site moved from www.torrent9.pe to www.t9.pe and is now protected
by Cloudflare challenges, making it inaccessible to RSS-Bridge.
2018-07-21 17:45:22 +02:00
logmanoriginal
91ae2a23d7 [CpasbienBridge] Remove bridge
Removing this bridge for two reasons:

1) The service moved from www.cpasbien.cm to www.torrents9.blue,
changing the layout in the process (incompatible).

2) The new site is permanently protected by Cloudflare IUAM, making
it inaccessible by RSS-Bridge.

While it would certainly be possible to rewrite the bridge to work
with the new layout, the site is still inaccessible.

References #605
2018-07-21 17:43:29 +02:00
LogMANOriginal
4facbf32e3
[InstructableBridge] Add new bridge (#724)
This commit adds a new bridge for http://www.instructables.com. This bridge
currently supports fetching content by category (all categories available 200+),
using available filters (featured, recent, popular, views, contest winners).
2018-07-21 15:25:13 +02:00
logmanoriginal
6bd76af326 [YoutubeBridge] Add duration limits for all modes
Adds duration limits (minimum duration, maximum duration) for all
modes (user/id/playlist/search). Duration limits are optional, so
existing subscriptions don't break.

The limits are specified by two separate parameters, each of which
is optional:

- `&duration_min=` (minimum duration in minutes, default: -1)
- `&duration_max=` (maximum duration in minutes, default: INF)

If duration limits are specified in either user, id or playlist mode,
the bridge defaults to fetching data from HTML intead of XML feeds,
which requires more bandwidth and takes longer, because each video is
loaded individually!

References #670
2018-07-21 14:33:07 +02:00
teromene
c4d489f018 Add URI to ElloBridge elements. 2018-07-19 17:07:54 +02:00
teromene
1f2fe25471 Fix LeBonCoinBridge, now uses getContents correctly, 2018-07-17 10:50:30 +02:00
Antoine Cadoret
87fc9e9156 fix LeBonCoin bridge (#747) 2018-07-16 20:13:08 +02:00
Nemo
c7b0c9fd31 Amazon Price Tracker Bridge (#741)
* [amazonprice] Adds AmazonPriceTracker bridge
2018-07-16 14:54:52 +02:00
TheRadialActive
3f41d0593a Added RSS bridge for zenodo.org (#749)
* added RSS bridge for zenodo.org
2018-07-16 12:02:41 +02:00
sysadminstory
7126f5e838 [DealabsBridge] First version of the generic "Pepper" Bridge (#726)
* [DealabsBridge] First version of the generic "Pepper" Bridge
2018-07-13 00:35:13 +01:00
Nemo
ead7b2e8de [fb2] Switches to getContents (#742) 2018-07-10 02:29:47 +01:00
LogMANOriginal
0d80a19e84
[FacebookBridge] Add context for public Facebook groups (#739)
The previous context is now labeled 'User', while the new context is
labeled 'Group'. The existing code was not changed, instead new group*
functions were implemented to handle groups.

The general principle of capturing groups is the same as done for users
with adjustments to account for different HTML structures.

Captcha responses are currently not supported for groups! There doesn't
seem to be a way to trigger them consistently, which makes it hard to
handle them properly.

Features of the group context:

- The feed title is based on the group name
- The group URI used for capturing is returned for the feed URI
- Author names and timestamps are reproduced from the source
- Post titles are reproduced from the source if they exist, otherwise
the title is build manually from the author name and the content
- Original contents are included with the feed
- All images are attached as enclosures as well

Closes #
2018-07-08 17:16:00 +02:00
logmanoriginal
2bc8daa101 [JustETFBridge] Add new bridge
Supports latest news and profiling a given ETF in Englisch, German
or Italian language. Cover images are attached as enclosures and not
as part of the content.

News:

Optionally loads the full article for each news item. Some articles
may include scripts to provide interactive graphs. These scripts are
removed as they would be rendered as pure text and a message is shown
instead: "[Content removed! Visit site to see full contents!]"

Profile:

Optionally includes the ETF strategy and description.
2018-06-30 10:27:05 +02:00
logmanoriginal
bca79d3f88 [KununuBridge] Fix broken page layout and sort reviews 2018-06-30 10:27:05 +02:00
teromene
71c29d4192 Fix phpcs for master. 2018-06-29 23:15:22 +01:00